From dholmes at openjdk.org Wed Nov 1 02:04:07 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 1 Nov 2023 02:04:07 GMT Subject: RFR: 8315890: Attempts to load from nullptr in instanceKlass.cpp and unsafe.cpp [v2] In-Reply-To: References: Message-ID: On Mon, 30 Oct 2023 20:19:45 GMT, Matias Saavedra Silva wrote: >> Calls in instanceKlass.cpp and unsafe.cpp try to call an atomic load on method calls that could return nullptr. This patch ensures that nullptr is not passed into the load. >> >> In `print_as_native_pointer` in archiveBuilder, `source_obj_to_requested_obj` should not be able to return `nullptr` as the result is immediately cast to an oop which cascades down to the failure reported in `get_volatile()` in `unsafe.cpp`. Placing an assert close to the top of this call stack should prevent this from happening and will better indicate the source of an unexpected `nullptr` should it occur. >> >> Verified with tier1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Moved assert higher in call stack Okay. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16405#pullrequestreview-1707471870 From dholmes at openjdk.org Wed Nov 1 02:12:04 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 1 Nov 2023 02:12:04 GMT Subject: RFR: 8318694: [JVMCI] disable can_call_java in most contexts for libjvmci compiler threads [v6] In-Reply-To: References: Message-ID: On Mon, 30 Oct 2023 07:58:36 GMT, David Holmes wrote: >> Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> allow JavaCalls in HotSpotConstantPool.callSystemExit > > Can't comment on all the details of the changes, but I don't see anything untoward in general. > > Thanks. > What do you think @dholmes-ora @vnkozlov ? Normally we use asserts for things that are absolutely expected to be uncovered during testing - things that are purely internal VM concerns and for which a failure is a VM coding error. So I don't think it is necessary to change these to guarantees provided we have sufficient test coverage. The guarantee may not hurt but the same could be said for many other assertions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16383#issuecomment-1788288243 From dholmes at openjdk.org Wed Nov 1 02:24:04 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 1 Nov 2023 02:24:04 GMT Subject: RFR: 8318982: Improve Exceptions::special_exception [v2] In-Reply-To: <2RV7IhWydVPMPR9hoZO3TedGNLbrIBS3Zx_dx5OEzig=.825296ee-44eb-4e3f-a2a3-7c9da42773d4@github.com> References: <2RV7IhWydVPMPR9hoZO3TedGNLbrIBS3Zx_dx5OEzig=.825296ee-44eb-4e3f-a2a3-7c9da42773d4@github.com> Message-ID: On Tue, 31 Oct 2023 19:31:56 GMT, Doug Simon wrote: >> This PR consolidates the 2 almost identical versions of `Exceptions::special_exception` into a single method. >> If a special exception is thrown and `-Xlog:exceptions` is enabled, a log message is emitted and it indicates the special handling. >> >> Here's an example in the output from running `compiler/linkage/LinkageErrors.java` with `-Xlog:exceptions -Xcomp`: >> >> [0.194s][info][exceptions] Exception (java.util.Set, java.lang.String, java.util.Set, boolean)' (java.lang.module.ModuleDescriptor$1 and java.lang.module.ModuleDescriptor$Exports are in module java.base of loader 'bootstrap')> (0x0000000000000000) >> thrown [src/hotspot/share/interpreter/linkResolver.cpp, line 591] >> for thread 0x000000011e18c600 >> thread cannot call Java, throwing pre-allocated exception: a 'java/lang/VirtualMachineError'{0x0000000772e06f00} >> >> >> The motivation for this change was work on [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) where it's useful to know when exceptions are thrown on a CompilerThread. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > add missing ResourceMark So to review this I had to mentally try to untangle the new combined version, to see if each potential path resulted in the same behaviour as before - that was/is extremely painful. I'm really not seeing the benefit of combining these as it just makes the code much more complicated. Sure there is a little duplication but each method was used in different circumstances. I think someone is going to look at this in the future and consider it an obvious candidate for splitting into two methods! src/hotspot/share/utilities/exceptions.cpp line 85: > 83: // Implementation of Exceptions > 84: > 85: bool Exceptions::special_exception(JavaThread* thread, const char* file, int line, Handle h_exception, Symbol* h_name, const char* message) { So IIUC now we either have a non-null exception and a null symbol and msg; or vice-versa. Can we assert that so someone doesn't mistakenly pass in non-null values for all three. ------------- PR Review: https://git.openjdk.org/jdk/pull/16401#pullrequestreview-1707482098 PR Review Comment: https://git.openjdk.org/jdk/pull/16401#discussion_r1378328110 From dholmes at openjdk.org Wed Nov 1 02:24:05 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 1 Nov 2023 02:24:05 GMT Subject: RFR: 8318982: Improve Exceptions::special_exception [v2] In-Reply-To: <_I0sKx5qzcPboC1oHue2T-pv7pN4eka7PX2-Tw0ahg4=.0b27bd88-70af-4e58-99fa-d587676e2152@github.com> References: <8fYpTiffRL76fPFWSk34cd2n15q7083XyGMiFonn_Bc=.879fc7d1-b2de-4c1c-8400-3cc91140a44b@github.com> <6NyLOEHFudi76SkNCdDCdtvLPltKcgBNRSHsntmdeIs=.185f7635-6f65-467e-8425-211a01439342@github.com> <_I0sKx5qzcPboC1oHue2T-pv7pN4eka7PX2-Tw0ahg4=.0b27bd88-70af-4e58-99fa-d587676e2152@github.com> Message-ID: <6MabOb3z0aR0F7q6rF1Nmb-HV-QMHwjCVgokWhCaGrA=.4d57463f-5cd8-4a19-9251-403654e25bf6@github.com> On Tue, 31 Oct 2023 19:27:52 GMT, Doug Simon wrote: >> Yes, seems fine. >> >> This code might need a local ResourceMark. > > Good point: https://github.com/openjdk/jdk/pull/16401/commits/f74fa5ee688558db5917dd2951ced3786410b7fe Duplicate messages are just a bug waiting to be filed IMO. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16401#discussion_r1378328752 From qamai at openjdk.org Wed Nov 1 03:54:01 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 1 Nov 2023 03:54:01 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v7] In-Reply-To: References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> <2lKJmd3IjknwUw1KHpU1Wk24TXaGELOIlQe1LrRJK_k=.1621c791-6495-457a-b4cb-60c716ef5484@github.com> Message-ID: On Tue, 31 Oct 2023 14:27:30 GMT, Kim Barrett wrote: >> src/hotspot/share/utilities/growableArray.hpp line 411: >> >>> 409: if (i >= this->_capacity) grow(i); >>> 410: for (int j = this->_len; j <= i; j++) >>> 411: new (&this->_data[j]) E(args...); >> >> Use global placement new, e.g. `::new`. Also below, in `at_put_grow` > > Style: Missing braces around the for-loop body. Also below in `at_put_grow`. I believe you also need to destroy the old object before initialise a new one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1378366004 From jwaters at openjdk.org Wed Nov 1 05:27:38 2023 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 1 Nov 2023 05:27:38 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v6] In-Reply-To: References: Message-ID: > On Windows we have a non-trivial function (exit_process_or_thread) that provides the implementation of various functions like os::die, os::abort, &etc. Those os functions are marked as noreturn, so this implementation helper should also be noreturn. > > The current change does several things around exit_process_or_thread, which is moved out of the os::win32 class since it does not need access to anything in that class other than the Ept enum, which is also moved to os_windows.cpp as well. The signature is changed to void, since exit_process_or_thread's current return value is simply the exit code parameter that it is passed, and to qualify as noreturn it should not return its current value of int. The only usage of this return value is in thread_native_entry, and this usage can easily be replaced by returning the res local that exit_process_or_thread is passed. For this os::infinite_sleep takes the place of the return statement in exit_process_or_thread, to make sure it qualifies as noreturn. > > Although not mentioned in the title, raise_fail_fast has also been fixed. RaiseFailFastException is not itself noreturn, so os::infinite_sleep has been added there as well, to make raise_fail_fast qualify as a noreturn method > > All the changes mentioned above fixes any Undefined Behaviour on Windows with regards to noreturn, by making all code marked noreturn (os::die() os::exit() etc) never return. This is not a problem on other platforms since their implementations of os::abort or os::die etc already call noreturn methods. > > This fix also fixes compilation on gcc windows, which warns about noreturn methods returning prior to this changeset Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'openjdk:master' into noreturn - Revert to exit_code in os_windows.cpp - Revert os_windows.cpp - Revert os_posix.cpp - Merge branch 'openjdk:master' into noreturn - Minor Style Change in os_windows.cpp - 8304939 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16303/files - new: https://git.openjdk.org/jdk/pull/16303/files/6b81a926..38f707cc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16303&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16303&range=04-05 Stats: 30150 lines in 1257 files changed: 16739 ins; 4830 del; 8581 mod Patch: https://git.openjdk.org/jdk/pull/16303.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16303/head:pull/16303 PR: https://git.openjdk.org/jdk/pull/16303 From jwaters at openjdk.org Wed Nov 1 05:32:26 2023 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 1 Nov 2023 05:32:26 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v7] In-Reply-To: References: Message-ID: > On Windows we have a non-trivial function (exit_process_or_thread) that provides the implementation of various functions like os::die, os::abort, &etc. Those os functions are marked as noreturn, so this implementation helper should also be noreturn. > > The current change does several things around exit_process_or_thread, which is moved out of the os::win32 class since it does not need access to anything in that class other than the Ept enum, which is also moved to os_windows.cpp as well. The signature is changed to void, since exit_process_or_thread's current return value is simply the exit code parameter that it is passed, and to qualify as noreturn it should not return its current value of int. The only usage of this return value is in thread_native_entry, and this usage can easily be replaced by returning the res local that exit_process_or_thread is passed. For this os::infinite_sleep takes the place of the return statement in exit_process_or_thread, to make sure it qualifies as noreturn. > > Although not mentioned in the title, raise_fail_fast has also been fixed. RaiseFailFastException is not itself noreturn, so os::infinite_sleep has been added there as well, to make raise_fail_fast qualify as a noreturn method > > All the changes mentioned above fixes any Undefined Behaviour on Windows with regards to noreturn, by making all code marked noreturn (os::die() os::exit() etc) never return. This is not a problem on other platforms since their implementations of os::abort or os::die etc already call noreturn methods. > > This fix also fixes compilation on gcc windows, which warns about noreturn methods returning prior to this changeset Julian Waters has updated the pull request incrementally with two additional commits since the last revision: - Remove thread_native_entry declaration os_windows.cpp - Formatting in vmError_windows.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16303/files - new: https://git.openjdk.org/jdk/pull/16303/files/38f707cc..7a0897c9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16303&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16303&range=05-06 Stats: 7 lines in 2 files changed: 2 ins; 4 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16303.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16303/head:pull/16303 PR: https://git.openjdk.org/jdk/pull/16303 From jwaters at openjdk.org Wed Nov 1 05:40:06 2023 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 1 Nov 2023 05:40:06 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v5] In-Reply-To: References: Message-ID: On Mon, 30 Oct 2023 07:15:28 GMT, David Holmes wrote: >> Addressed some of the review comments >> >> Side note: Should the Style Guide only permit noreturn for void methods? It's Undefined Behaviour when applied to something that returns int for instance, such as exit_process_or_thread here (which I had to refactor to void) > >> Side note: Should the Style Guide only permit noreturn for void methods? It's Undefined Behaviour when applied to something that returns int for instance, such as exit_process_or_thread here (which I had to refactor to void) > > I think it is implied that attributes should only be used in a way that is valid. @dholmes-ora @kimbarrett Please see if you are fine with the final set of changes ------------- PR Comment: https://git.openjdk.org/jdk/pull/16303#issuecomment-1788439437 From dholmes at openjdk.org Wed Nov 1 06:53:03 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 1 Nov 2023 06:53:03 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v7] In-Reply-To: References: Message-ID: <31C4o4xmo7MzpR5xZGuLIfoqKUHFku3b1NeDqmF906w=.0a6a0d0a-d290-4257-81bf-53452d395d09@github.com> On Wed, 1 Nov 2023 05:32:26 GMT, Julian Waters wrote: >> On Windows we have a non-trivial function (exit_process_or_thread) that provides the implementation of various functions like os::die, os::abort, &etc. Those os functions are marked as noreturn, so this implementation helper should also be noreturn. >> >> The current change does several things around exit_process_or_thread, which is moved out of the os::win32 class since it does not need access to anything in that class other than the Ept enum, which is also moved to os_windows.cpp as well. The signature is changed to void, since exit_process_or_thread's current return value is simply the exit code parameter that it is passed, and to qualify as noreturn it should not return its current value of int. The only usage of this return value is in thread_native_entry, and this usage can easily be replaced by returning the res local that exit_process_or_thread is passed. For this os::infinite_sleep takes the place of the return statement in exit_process_or_thread, to make sure it qualifies as noreturn. >> >> Although not mentioned in the title, raise_fail_fast has also been fixed. RaiseFailFastException is not itself noreturn, so os::infinite_sleep has been added there as well, to make raise_fail_fast qualify as a noreturn method >> >> All the changes mentioned above fixes any Undefined Behaviour on Windows with regards to noreturn, by making all code marked noreturn (os::die() os::exit() etc) never return. This is not a problem on other platforms since their implementations of os::abort or os::die etc already call noreturn methods. >> >> This fix also fixes compilation on gcc windows, which warns about noreturn methods returning prior to this changeset > > Julian Waters has updated the pull request incrementally with two additional commits since the last revision: > > - Remove thread_native_entry declaration os_windows.cpp > - Formatting in vmError_windows.cpp Still seems fine to me. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16303#pullrequestreview-1707662447 From stefank at openjdk.org Wed Nov 1 07:03:58 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 1 Nov 2023 07:03:58 GMT Subject: RFR: 8319200: Don't use test thread factory in ProcessTools.createLimitedTestJavaProcessBuilder() In-Reply-To: <3damdMQpRBrkUN2S32tBD0Tmrl2tmSqA31NniV8FzHU=.d3a36aa7-d5f4-4a63-b2ff-8b9b616a9637@github.com> References: <3damdMQpRBrkUN2S32tBD0Tmrl2tmSqA31NniV8FzHU=.d3a36aa7-d5f4-4a63-b2ff-8b9b616a9637@github.com> Message-ID: On Wed, 1 Nov 2023 00:06:35 GMT, Leonid Mesnik wrote: > Test thread factory is a mode similar to VM flags and should not be used in ProcessTools.createLimitedTestJavaProcessBuilder(). Only createTestJavaProcessBuilder() should use it like jtreg VM options. > > Adding the test thread factory requires the injection of arguments in the middle of the list. I don't think it makes sense to modify arguments in several places so I replaced it with the flag isLimited and moved all modifications in createJavaProcessBuilder(). > > Testing tier1-5. It seems like the code would be cleaner if you moved the threadFactory injection to `createTestJavaProcessBuilder`. ------------- Changes requested by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16442#pullrequestreview-1707671922 From dnsimon at openjdk.org Wed Nov 1 08:03:06 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 1 Nov 2023 08:03:06 GMT Subject: RFR: 8318694: [JVMCI] disable can_call_java in most contexts for libjvmci compiler threads [v6] In-Reply-To: References: Message-ID: On Sun, 29 Oct 2023 20:39:47 GMT, Doug Simon wrote: >> This PR reduces the context in which a libjvmci CompilerThread can make a Java call. By default, a CompileThread for a JVMCI compiler can make Java calls (as jarjvmci only works that way). When libjvmci calls into the VM via a CompilerToVM native method, it enters a scope where Java calls are disabled until either the call returns or a nested scope is entered that re-enables Java calls. The latter is required for the Truffle use case described in [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) as well as for a few other VM entry points where libgraal currently still requires the ability to make Java calls (e.g. to load the Java classes used for boxing primitives). We may be able to eventually eliminate all need for libgraal to make Java calls but this PR is a first step in the right direction. > > Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > allow JavaCalls in HotSpotConstantPool.callSystemExit With GraalVM, we're doing a lot more testing with product builds than fastdebug builds as the majority of checks are done at the Java level and fastdebug just slows everything down. In that testing context, guarantees are much more useful. Given the importance of the "can_call_java" invariant, would you agree that converting these 3 specific assertions to guarantees is justified? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16383#issuecomment-1788559705 From dnsimon at openjdk.org Wed Nov 1 08:52:01 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 1 Nov 2023 08:52:01 GMT Subject: RFR: 8318982: Improve Exceptions::special_exception [v2] In-Reply-To: <6MabOb3z0aR0F7q6rF1Nmb-HV-QMHwjCVgokWhCaGrA=.4d57463f-5cd8-4a19-9251-403654e25bf6@github.com> References: <8fYpTiffRL76fPFWSk34cd2n15q7083XyGMiFonn_Bc=.879fc7d1-b2de-4c1c-8400-3cc91140a44b@github.com> <6NyLOEHFudi76SkNCdDCdtvLPltKcgBNRSHsntmdeIs=.185f7635-6f65-467e-8425-211a01439342@github.com> <_I0sKx5qzcPboC1oHue2T-pv7pN4eka7PX2-Tw0ahg4=.0b27bd88-70af-4e58-99fa-d587676e2152@github.com> <6MabOb3z0aR0F7q6rF1Nmb-HV-QMHwjCVgokWhCaGrA=.4d57463f-5cd8-4a19-9251-403654e25bf6@github.com> Message-ID: On Wed, 1 Nov 2023 02:20:04 GMT, David Holmes wrote: >> Good point: https://github.com/openjdk/jdk/pull/16401/commits/f74fa5ee688558db5917dd2951ced3786410b7fe > > Duplicate messages are just a bug waiting to be filed IMO. They are not as bad as no message at all which is what motivates this change in the first place. I can change the special exception log message so it stands out as not just a duplicate. For example: [0.185s][info][exceptions] Thread cannot call Java so instead of throwing exception (0x0000000000000000) at [src/hotspot/share/interpreter/linkResolver.cpp, line 735] for thread 0x000000013e025400, throwing pre-allocated exception: a 'java/lang/VirtualMachineError'{0x0000000772e06f00} Also note that the duplication only happens on the path where an Exception object exists. Based on inspection and experience, this is rarely (never?) taken as the thread is normally in a state where it cannot actually instantiate an exception (e.g., `can_call_java` is false). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16401#discussion_r1378533291 From dnsimon at openjdk.org Wed Nov 1 09:17:30 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 1 Nov 2023 09:17:30 GMT Subject: RFR: 8318982: Improve Exceptions::special_exception [v3] In-Reply-To: References: Message-ID: > This PR consolidates the 2 almost identical versions of `Exceptions::special_exception` into a single method. > If a special exception is thrown and `-Xlog:exceptions` is enabled, a log message is emitted and it indicates the special handling. > > Here's an example in the output from running `compiler/linkage/LinkageErrors.java` with `-Xlog:exceptions -Xcomp`: > > [0.194s][info][exceptions] Exception (java.util.Set, java.lang.String, java.util.Set, boolean)' (java.lang.module.ModuleDescriptor$1 and java.lang.module.ModuleDescriptor$Exports are in module java.base of loader 'bootstrap')> (0x0000000000000000) > thrown [src/hotspot/share/interpreter/linkResolver.cpp, line 591] > for thread 0x000000011e18c600 > thread cannot call Java, throwing pre-allocated exception: a 'java/lang/VirtualMachineError'{0x0000000772e06f00} > > > The motivation for this change was work on [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) where it's useful to know when exceptions are thrown on a CompilerThread. Doug Simon has updated the pull request incrementally with two additional commits since the last revision: - differentiate special exception log message from normal log message - check special_exception arguments for nullness pre-condition ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16401/files - new: https://git.openjdk.org/jdk/pull/16401/files/f74fa5ee..73be44cc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16401&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16401&range=01-02 Stats: 7 lines in 2 files changed: 4 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16401.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16401/head:pull/16401 PR: https://git.openjdk.org/jdk/pull/16401 From dnsimon at openjdk.org Wed Nov 1 09:17:31 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 1 Nov 2023 09:17:31 GMT Subject: RFR: 8318982: Improve Exceptions::special_exception [v2] In-Reply-To: References: <2RV7IhWydVPMPR9hoZO3TedGNLbrIBS3Zx_dx5OEzig=.825296ee-44eb-4e3f-a2a3-7c9da42773d4@github.com> Message-ID: On Wed, 1 Nov 2023 02:18:38 GMT, David Holmes wrote: >> Doug Simon has updated the pull request incrementally with one additional commit since the last revision: >> >> add missing ResourceMark > > src/hotspot/share/utilities/exceptions.cpp line 85: > >> 83: // Implementation of Exceptions >> 84: >> 85: bool Exceptions::special_exception(JavaThread* thread, const char* file, int line, Handle h_exception, Symbol* h_name, const char* message) { > > So IIUC now we either have a non-null exception and a null symbol and msg; or vice-versa. Can we assert that so someone doesn't mistakenly pass in non-null values for all three. Sure: https://github.com/openjdk/jdk/pull/16401/commits/7e6c1a63f4c5104cb82b4bad38c9b8910b156127 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16401#discussion_r1378554373 From adinn at openjdk.org Wed Nov 1 09:21:17 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 1 Nov 2023 09:21:17 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v6] In-Reply-To: <_HeaLuC2VvNySQYp4nbSkXjHurHTeJ3MdgeuvbuGRT0=.1f44d2d3-fec2-4816-9a2f-716d94c8baaf@github.com> References: <_HeaLuC2VvNySQYp4nbSkXjHurHTeJ3MdgeuvbuGRT0=.1f44d2d3-fec2-4816-9a2f-716d94c8baaf@github.com> Message-ID: On Tue, 31 Oct 2023 22:48:27 GMT, Matias Saavedra Silva wrote: >> src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 2374: >> >>> 2372: >>> 2373: // setup registers >>> 2374: const Register index = r4; >> >> Hardcoding is not very nice. Maybe reuse one of the other registers? > > This exists in x86 as well in each of the `load_resolved_method_entry_...()` methods. Some of these only have three arguments which cannot be reused so there is the option to include `index` as an argument, but this introduces an inconsistency among these similar methods. > > Should all of these methods take `index` which can be a reused register? If you add an extra argument then please name it `temp` or something equally clear as to the fact that this is for storing a local scratch value rather than an input/output. n.b. this is the main reason for resisting the urge to move register declarations+initializations up the call chain. It makes it less obvious that a register is only being used local to a callee. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1378560173 From stuefe at openjdk.org Wed Nov 1 09:24:03 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 1 Nov 2023 09:24:03 GMT Subject: RFR: 8306561: Possible out of bounds access in print_pointer_information [v5] In-Reply-To: References: Message-ID: On Mon, 30 Oct 2023 11:20:48 GMT, Thomas Obermeier wrote: > tests passed in dbg build; opt build still faces an SIGILL error in GTestWrapper when executing AsyncLogTest, which to my understanding is unrelated; therefore, I created created https://bugs.openjdk.org/browse/JDK-8319104 Yes, that looks unrelated. @navyxliu could you please take a look at JDK-8319104 ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16381#issuecomment-1788649426 From sjohanss at openjdk.org Wed Nov 1 09:37:08 2023 From: sjohanss at openjdk.org (Stefan Johansson) Date: Wed, 1 Nov 2023 09:37:08 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v35] In-Reply-To: References: Message-ID: <6tngC-Jwyx8e25LGT8dAwKbaPb9qb_w5ONctnFieH3o=.61b013b2-beb9-4053-8c06-86a700208d77@github.com> On Tue, 31 Oct 2023 04:23:13 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Replace NULL with nullptr Sorry for being a bit late to this PR. I think the addition of CPU time tracking is good, but I wonder if we could do it in a way that is a bit more general. A more general way of tracking CPU time for a set of threads and we could then have different consumers of this data. In addition to hsperf counters I think having logging and JFR events for this could be interesting as well. Have you had any thought along those lines, any obvious problems with such approach? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15082#issuecomment-1788665727 From iwalulya at openjdk.org Wed Nov 1 09:42:02 2023 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 1 Nov 2023 09:42:02 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v5] In-Reply-To: References: Message-ID: On Tue, 31 Oct 2023 19:14:13 GMT, Thomas Schatzl wrote: >> The JEP covers the idea very well, so I'm only covering some implementation details here: >> >> * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. >> >> * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: >> >> * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. >> >> * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). >> >> * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. >> >> * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) >> >> The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. >> >> I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. >> >> * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in a... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > Fix compilation src/hotspot/share/gc/g1/g1CollectedHeap.inline.hpp line 266: > 264: > 265: inline void G1CollectedHeap::pin_object(JavaThread* thread, oop obj) { > 266: assert(obj != NULL, "obj must not be null"); please update the `NULL`s to `nullptr` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1378580763 From jsjolen at openjdk.org Wed Nov 1 10:37:21 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 1 Nov 2023 10:37:21 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v8] In-Reply-To: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: > Hi, > > When using at_put and at_put_grow you can provide a value which will be supplied to the constructor of each element. In other words, you can intialize each element through a copy constructor. > > I suggest that we also provide a function equivalent where the function is provided a pointer to the memory to be initialized. This can be used for `NONCOPYABLE` classes, for example. > > This is implemented using a SFINAE pattern because `nullptr` introduces ambiguity if you use static overload. > > Currently running tier1-tier4. Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: - Call dtr and global placement new - Do not add this ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16409/files - new: https://git.openjdk.org/jdk/pull/16409/files/fa50a221..f6e910a4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=06-07 Stats: 12 lines in 2 files changed: 4 ins; 3 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16409.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16409/head:pull/16409 PR: https://git.openjdk.org/jdk/pull/16409 From kbarrett at openjdk.org Wed Nov 1 10:40:04 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 1 Nov 2023 10:40:04 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v7] In-Reply-To: References: Message-ID: On Wed, 1 Nov 2023 05:32:26 GMT, Julian Waters wrote: >> On Windows we have a non-trivial function (exit_process_or_thread) that provides the implementation of various functions like os::die, os::abort, &etc. Those os functions are marked as noreturn, so this implementation helper should also be noreturn. >> >> The current change does several things around exit_process_or_thread, which is moved out of the os::win32 class since it does not need access to anything in that class other than the Ept enum, which is also moved to os_windows.cpp as well. The signature is changed to void, since exit_process_or_thread's current return value is simply the exit code parameter that it is passed, and to qualify as noreturn it should not return its current value of int. The only usage of this return value is in thread_native_entry, and this usage can easily be replaced by returning the res local that exit_process_or_thread is passed. For this os::infinite_sleep takes the place of the return statement in exit_process_or_thread, to make sure it qualifies as noreturn. >> >> Although not mentioned in the title, raise_fail_fast has also been fixed. RaiseFailFastException is not itself noreturn, so os::infinite_sleep has been added there as well, to make raise_fail_fast qualify as a noreturn method >> >> All the changes mentioned above fixes any Undefined Behaviour on Windows with regards to noreturn, by making all code marked noreturn (os::die() os::exit() etc) never return. This is not a problem on other platforms since their implementations of os::abort or os::die etc already call noreturn methods. >> >> This fix also fixes compilation on gcc windows, which warns about noreturn methods returning prior to this changeset > > Julian Waters has updated the pull request incrementally with two additional commits since the last revision: > > - Remove thread_native_entry declaration os_windows.cpp > - Formatting in vmError_windows.cpp Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16303#pullrequestreview-1707951732 From jwaters at openjdk.org Wed Nov 1 10:45:11 2023 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 1 Nov 2023 10:45:11 GMT Subject: RFR: 8304939: os::win32::exit_process_or_thread should be marked noreturn [v7] In-Reply-To: References: Message-ID: On Wed, 1 Nov 2023 05:32:26 GMT, Julian Waters wrote: >> On Windows we have a non-trivial function (exit_process_or_thread) that provides the implementation of various functions like os::die, os::abort, &etc. Those os functions are marked as noreturn, so this implementation helper should also be noreturn. >> >> The current change does several things around exit_process_or_thread, which is moved out of the os::win32 class since it does not need access to anything in that class other than the Ept enum, which is also moved to os_windows.cpp as well. The signature is changed to void, since exit_process_or_thread's current return value is simply the exit code parameter that it is passed, and to qualify as noreturn it should not return its current value of int. The only usage of this return value is in thread_native_entry, and this usage can easily be replaced by returning the res local that exit_process_or_thread is passed. For this os::infinite_sleep takes the place of the return statement in exit_process_or_thread, to make sure it qualifies as noreturn. >> >> Although not mentioned in the title, raise_fail_fast has also been fixed. RaiseFailFastException is not itself noreturn, so os::infinite_sleep has been added there as well, to make raise_fail_fast qualify as a noreturn method >> >> All the changes mentioned above fixes any Undefined Behaviour on Windows with regards to noreturn, by making all code marked noreturn (os::die() os::exit() etc) never return. This is not a problem on other platforms since their implementations of os::abort or os::die etc already call noreturn methods. >> >> This fix also fixes compilation on gcc windows, which warns about noreturn methods returning prior to this changeset > > Julian Waters has updated the pull request incrementally with two additional commits since the last revision: > > - Remove thread_native_entry declaration os_windows.cpp > - Formatting in vmError_windows.cpp Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16303#issuecomment-1788744228 From jwaters at openjdk.org Wed Nov 1 10:45:12 2023 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 1 Nov 2023 10:45:12 GMT Subject: Integrated: 8304939: os::win32::exit_process_or_thread should be marked noreturn In-Reply-To: References: Message-ID: On Mon, 23 Oct 2023 06:55:02 GMT, Julian Waters wrote: > On Windows we have a non-trivial function (exit_process_or_thread) that provides the implementation of various functions like os::die, os::abort, &etc. Those os functions are marked as noreturn, so this implementation helper should also be noreturn. > > The current change does several things around exit_process_or_thread, which is moved out of the os::win32 class since it does not need access to anything in that class other than the Ept enum, which is also moved to os_windows.cpp as well. The signature is changed to void, since exit_process_or_thread's current return value is simply the exit code parameter that it is passed, and to qualify as noreturn it should not return its current value of int. The only usage of this return value is in thread_native_entry, and this usage can easily be replaced by returning the res local that exit_process_or_thread is passed. For this os::infinite_sleep takes the place of the return statement in exit_process_or_thread, to make sure it qualifies as noreturn. > > Although not mentioned in the title, raise_fail_fast has also been fixed. RaiseFailFastException is not itself noreturn, so os::infinite_sleep has been added there as well, to make raise_fail_fast qualify as a noreturn method > > All the changes mentioned above fixes any Undefined Behaviour on Windows with regards to noreturn, by making all code marked noreturn (os::die() os::exit() etc) never return. This is not a problem on other platforms since their implementations of os::abort or os::die etc already call noreturn methods. > > This fix also fixes compilation on gcc windows, which warns about noreturn methods returning prior to this changeset This pull request has now been integrated. Changeset: b4f5379d Author: Julian Waters URL: https://git.openjdk.org/jdk/commit/b4f5379d50db9412208552fd69bc316e7730aedd Stats: 25 lines in 4 files changed: 7 ins; 7 del; 11 mod 8304939: os::win32::exit_process_or_thread should be marked noreturn Reviewed-by: dholmes, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/16303 From ogillespie at openjdk.org Wed Nov 1 11:19:08 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Wed, 1 Nov 2023 11:19:08 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v4] In-Reply-To: References: Message-ID: On Tue, 31 Oct 2023 18:01:03 GMT, Kim Barrett wrote: >> Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: >> >> Adress comments >> >> Fix indentation >> Improve tests >> Improve comment >> Remove redundant null check >> Improve naming >> Pop when >, not >= max len > > src/hotspot/share/oops/symbolHandle.hpp line 125: > >> 123: if (result != nullptr) { >> 124: delete result; >> 125: Atomic::dec(&_cleanup_delay_len); > > Because of a limitation on NonblockingQueue (from the class description: "A > queue may appear empty even though elements have been added and not > removed."), it is theoretically possible for the max-entries value to be > exceeded. (List is empty, thread1 starts a push but is paused, other threads > push lots of entries.) But that will eventually be cleaned up by completion > of the initial push and then later draining the list. So I don't think this > is a problem in practice, but wanted to note that I'd looked at the question. Thanks! Yes, I agree that it's not perfectly bounded but that this usage works out okay. > src/hotspot/share/utilities/nonblockingQueue.inline.hpp line 51: > >> 49: assert(_tail == nullptr, "precondition"); >> 50: } >> 51: #endif > > Why is this being removed? Without some good explanation, I'm disapproving this change. These assertions require the queue to be empty when the destructor is called. My static queue is destroyed at shutdown, and it may not be empty. I see no explicit problem with a queue not being empty when destroyed, I can only assume it was added to serve as an extra reminder for the specific use-case of g1dirtycardsetqueue which is the only existing user of NonblockingQueue. In my opinion it's better for individual queue owners to decide if they care about this. Alternatives: I can modify NonblockingQueue to accept an argument (e.g. check_empty_at_shutdown) which is true for g1 card set and false for my usage. Or I can avoid NonblockingQueue entirely as there are a few other review concerns arising from its limitations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1378668188 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1378666988 From mgronlun at openjdk.org Wed Nov 1 11:19:34 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 1 Nov 2023 11:19:34 GMT Subject: RFR: 8319206: [REDO] Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native Message-ID: Greetings, The original problem was introduced with: [JDK-8313251](https://bugs.openjdk.org/browse/JDK-8313251) - Add NativeLibraryLoad event The first attempt at resolution: [JDK-8315220](https://bugs.openjdk.org/browse/JDK-8315220) - Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native was reverted: [JDK-8315930](https://bugs.openjdk.org/browse/JDK-8315930) - Revert "8315220: Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native" because it ran into: [JDK-8315892](https://bugs.openjdk.org/browse/JDK-8315892) - NativeLibraryLoadEvent dtr fails with "assert(false) failed: Possible safepoint reached by thread that does not allow it" The reason for [JDK-8315892](https://bugs.openjdk.org/browse/JDK-8315892) was that the thread loading the zip library, in thread state native, was the owner of the Zip_lock mutex. This prevented it from transitioning to thread_in_vm. The Zip_lock mutex was removed as part of: [JDK-8317951](https://bugs.openjdk.org/browse/JDK-8317951) - Refactor loading of zip library to help resolve [JDK-8315220](https://bugs.openjdk.org/browse/JDK-8315220) Therefore, it is time to redo the original [JDK-8315220](https://bugs.openjdk.org/browse/JDK-8315220), now under this issue, [JDK-8319206](https://bugs.openjdk.org/browse/JDK-8319206). Testing: jdk_jfr, tiers1-6, stress testing Thanks Markus ------------- Commit messages: - 8319206 Changes: https://git.openjdk.org/jdk/pull/16447/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16447&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319206 Stats: 395 lines in 10 files changed: 243 ins; 114 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/16447.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16447/head:pull/16447 PR: https://git.openjdk.org/jdk/pull/16447 From dholmes at openjdk.org Wed Nov 1 12:05:02 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 1 Nov 2023 12:05:02 GMT Subject: RFR: 8318694: [JVMCI] disable can_call_java in most contexts for libjvmci compiler threads [v6] In-Reply-To: References: Message-ID: On Sun, 29 Oct 2023 20:39:47 GMT, Doug Simon wrote: >> This PR reduces the context in which a libjvmci CompilerThread can make a Java call. By default, a CompileThread for a JVMCI compiler can make Java calls (as jarjvmci only works that way). When libjvmci calls into the VM via a CompilerToVM native method, it enters a scope where Java calls are disabled until either the call returns or a nested scope is entered that re-enables Java calls. The latter is required for the Truffle use case described in [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) as well as for a few other VM entry points where libgraal currently still requires the ability to make Java calls (e.g. to load the Java classes used for boxing primitives). We may be able to eventually eliminate all need for libgraal to make Java calls but this PR is a first step in the right direction. > > Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > allow JavaCalls in HotSpotConstantPool.callSystemExit The majority of hotspot testing is done on fastdebug and performance is not considered an issue there. But I don't have a hard objection to these three asserts becoming guarantees. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16383#issuecomment-1788837986 From ogillespie at openjdk.org Wed Nov 1 12:12:04 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Wed, 1 Nov 2023 12:12:04 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v4] In-Reply-To: References: Message-ID: On Tue, 31 Oct 2023 18:53:18 GMT, Coleen Phillimore wrote: >> src/hotspot/share/oops/symbolHandle.hpp line 122: >> >>> 120: // If the queue is now full, implement a one-in, one-out policy. >>> 121: if (Atomic::add(&_cleanup_delay_len, 1, memory_order_relaxed) > _cleanup_delay_max_entries) { >>> 122: TempSymbolDelayQueueNode* result = _cleanup_delay.pop(); >> >> NonblockingQueue's push and pop operations are subject to ABA problems, and >> require the client to address that in some fashion. There's nothing here to do >> that. I think one possibility would be to wrap the push/pop calls in a >> GlobalCounter::CriticalSection and do a GlobalCounter::write_synchronize >> before deleting a node. > > If you have to add more code to wrap NonblockingQueue, please implement it in a .cpp file. I thought NBQ was sufficient for this. Maybe we want some other data structure for this. I have attempted to avoid locks if at all possible. Re ABA problems, my reading of the push/pop concern is that the problem occurs if a caller pops a node and then re-uses that same node later by re-pushing it. Something like this: Queue contains Node1(val=a, next=null). 1. Thread 1 (start push) pushes Node2(val=b, next=null) 2. Thread 2 pops Node1(val=a, next=null) 3. Thread 2 sees that Node1.next is null and reuses the container, making Node1(val=c, next=null) 4. Thread 2 (start push) starts to push Node1(val=c, next=null) 5. Thread 1 (finish push) sets Node1.next=Node2 6. Thread 2 (finish push) sets Node2.next=Node1 <-- circular list Since my change doesn't reuse nodes, I believe it's safe. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1378716964 From ogillespie at openjdk.org Wed Nov 1 12:21:06 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Wed, 1 Nov 2023 12:21:06 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v4] In-Reply-To: References: Message-ID: On Tue, 31 Oct 2023 18:12:27 GMT, Kim Barrett wrote: >> Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: >> >> Adress comments >> >> Fix indentation >> Improve tests >> Improve comment >> Remove redundant null check >> Improve naming >> Pop when >, not >= max len > > src/hotspot/share/oops/symbolHandle.hpp line 121: > >> 119: >> 120: // If the queue is now full, implement a one-in, one-out policy. >> 121: if (Atomic::add(&_cleanup_delay_len, 1, memory_order_relaxed) > _cleanup_delay_max_entries) { > > Why is incrementing relaxed? Now I have to think hard about whether there > might be any ordering problems resulting from that. A colleague suggested that was the appropriate constraint ("there is no transitive memory effects that ride on the counter"), but I don't understand it enough to defend it myself. Atomic::inc uses memory_order_conservative. Happy to go with whatever is preferred. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1378722803 From ogillespie at openjdk.org Wed Nov 1 12:21:09 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Wed, 1 Nov 2023 12:21:09 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v4] In-Reply-To: References: Message-ID: On Tue, 31 Oct 2023 12:27:53 GMT, Oli Gillespie wrote: >> Attempt to fix regressions in class-loading performance caused by fixing a symbol table leak in [JDK-8313678](https://bugs.openjdk.org/browse/JDK-8313678). >> >> See lengthy discussion in https://bugs.openjdk.org/browse/JDK-8315559 for more background. In short, the leak was providing an accidental cache for temporary symbols, allowing reuse. >> >> This change keeps new temporary symbols alive in a queue for a short time, allowing them to be re-used by subsequent operations. For example, when attempting to load a class we may call JVM_FindLoadedClass for multiple classloaders back-to-back, and all of them will create a TempNewSymbol for the same string. At present, each call will leave a dead symbol in the table and create a new one. Dead symbols add cleanup and lookup overhead, and insertion is also costly. With this change, the symbol from the first invocation will live for some time after it is used, and subsequent callers can find the symbol alive in the table - avoiding the extra work. >> >> The queue is bounded, and when full new entries displace the oldest entry. This means symbols are held for the time it takes for 100 new temp symbols to be created. 100 is chosen arbitrarily - the tradeoff is memory usage versus 'cache' hit rate. >> >> When concurrent symbol table cleanup runs, it also drains the queue. >> >> In my testing, this brings Dacapo pmd performance back to where it was before the leak was fixed. >> >> Thanks @shipilev , @coleenp and @MDBijman for helping with this fix. > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Adress comments > > Fix indentation > Improve tests > Improve comment > Remove redundant null check > Improve naming > Pop when >, not >= max len test/hotspot/gtest/classfile/test_placeholders.cpp line 46: > 44: Symbol* super = SymbolTable::new_symbol("super2_8_2023_supername"); > 45: Symbol* interf = SymbolTable::new_symbol("interface2_8_2023_supername"); > 46: I swapped these from TempNewSymbol to Symbol before I added the set_cleanup_delay_max_entries functionality to avoid interfering with refcounts in tests. Is the swap safe? Or should I use set_cleanup_delay_max_entries(0) and switch back? I can't tell why some of these were made temp and some not. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1378725277 From coleenp at openjdk.org Wed Nov 1 13:32:09 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 1 Nov 2023 13:32:09 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v4] In-Reply-To: References: Message-ID: On Wed, 1 Nov 2023 12:17:55 GMT, Oli Gillespie wrote: >> Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: >> >> Adress comments >> >> Fix indentation >> Improve tests >> Improve comment >> Remove redundant null check >> Improve naming >> Pop when >, not >= max len > > test/hotspot/gtest/classfile/test_placeholders.cpp line 46: > >> 44: Symbol* super = SymbolTable::new_symbol("super2_8_2023_supername"); >> 45: Symbol* interf = SymbolTable::new_symbol("interface2_8_2023_supername"); >> 46: > > I swapped these from TempNewSymbol to Symbol before I added the set_cleanup_delay_max_entries functionality to avoid interfering with refcounts in tests. > Is the swap safe? Or should I use set_cleanup_delay_max_entries(0) and switch back? I can't tell why some of these were made temp and some not. I don't remember why they were TempNewSymbol either but I don't think it matters for this test, they can be Symbol*. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1378804633 From dnsimon at openjdk.org Wed Nov 1 13:33:32 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 1 Nov 2023 13:33:32 GMT Subject: RFR: 8318694: [JVMCI] disable can_call_java in most contexts for libjvmci compiler threads [v7] In-Reply-To: References: Message-ID: > This PR reduces the context in which a libjvmci CompilerThread can make a Java call. By default, a CompileThread for a JVMCI compiler can make Java calls (as jarjvmci only works that way). When libjvmci calls into the VM via a CompilerToVM native method, it enters a scope where Java calls are disabled until either the call returns or a nested scope is entered that re-enables Java calls. The latter is required for the Truffle use case described in [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) as well as for a few other VM entry points where libgraal currently still requires the ability to make Java calls (e.g. to load the Java classes used for boxing primitives). We may be able to eventually eliminate all need for libgraal to make Java calls but this PR is a first step in the right direction. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: convert assertions about can_call_java to guarantees ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16383/files - new: https://git.openjdk.org/jdk/pull/16383/files/b7181d70..832a5912 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16383&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16383&range=05-06 Stats: 4 lines in 3 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16383.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16383/head:pull/16383 PR: https://git.openjdk.org/jdk/pull/16383 From jvernee at openjdk.org Wed Nov 1 13:48:27 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 1 Nov 2023 13:48:27 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected Message-ID: The issue is essentially that for the Java try-with-resource construct, javac generates multiple calls to `close(`) the resource. One of those calls is inside the hidden exception handler of the try block. The issue for us is that typically the exception handler is never entered (since no exception is thrown), however we don't profile exception handlers at the moment, so the block is not pruned. C2 doesn't inline the `close()` call in the handler due to low call site frequency. As a result, the receiver of that call escapes and can not be scalar replaced, which then leads to a loss in performance. There has been some discussion on the JBS issue that this could be fixed by profiling catch blocks. And another suggestion that partial escape analysis could help here to prevent the object from escaping. But, I think there are other benefits to being able to prune dead catch blocks, such as general reduction in code size, and other optimizations being possible by dead code being eliminated. So, I've implemented catch block profiling + pruning in this patch. The implementation is essentially very straightforward: we allocate an extra bit of profiling data for each exception handler of a method in the `MethodData` for that method (which holds all the profiling data). Then when looking up the exception handler after an exception is thrown, we mark the exception handler as entered. When C2 parses the exception handler block, and it sees that it has never been entered, we emit an uncommon trap instead. I've also cleaned up the handling of profiling data sections a bit. After adding the extra section of data to MethodData, I was seeing several crashes when ciMethodData was used. The underlying issue seemed to be that the offset of the parameter data was computed based on the total data size - parameter data size (which doesn't work if we add an additional section for exception handler data). I've re-written the code around this a bit to try and prevent issues in the future. Both MethodData and ciMethodData now track offsets of parameter data and exception handler data, and the size of the each data section is derived from the offsets. Finally, there was an assert firing in `freeze_internal` in `continuationFreezeThaw.cpp`: assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int64_t)current->held_monitor_count(), (int64_t)current->jni_monitor_count()); This assert relies on `has_monitors` being set for a method, which in itself relies on `monitorenter` and `monitorexit` being parsed. However, if we prune untaken exception handlers, we might not see any `monitorexit`, which is a problem for OSR compilations since then we might also not see any `monitorenter` in that case. After some investigation, it turns out that `ciMethod` already tracks whether monitor bytecodes are being used, so we can just piggyback on that instead of relying on `monitorenter` or `monitorexit` being parsed. We can follow the existing pattern for how `has_reserved_stack_access` is being tracked (which I've done). See https://github.com/openjdk/jdk/pull/16416/commits/a48420681549ac9343f625e1a3a26a737fc4921e https://github.com/openjdk/jdk/pull/16416/commits/a33a905689a056ac053ac34df64541b652747076 and https://github.com/openjdk/jdk/pull/16416/commits/d727df704ea092eedb20517bdd696d82d75b00b2 Benchmark with `-XX:-PruneDeadExceptionHandlers`: Benchmark Mode Cnt Score Error Units ResourceScopeCloseMin.confined_close avgt 30 10.458 ? 0.070 ns/op ResourceScopeCloseMin.confined_close:gc.alloc.rate avgt 30 9480.988 ? 63.335 MB/sec ResourceScopeCloseMin.confined_close:gc.alloc.rate.norm avgt 30 104.000 ? 0.001 B/op ResourceScopeCloseMin.confined_close:gc.count avgt 30 119.000 counts ResourceScopeCloseMin.confined_close:gc.time avgt 30 94.000 ms ResourceScopeCloseMin.confined_close_notry avgt 30 4.691 ? 0.063 ns/op ResourceScopeCloseMin.confined_close_notry:gc.alloc.rate avgt 30 11383.693 ? 151.145 MB/sec ResourceScopeCloseMin.confined_close_notry:gc.alloc.rate.norm avgt 30 56.000 ? 0.001 B/op ResourceScopeCloseMin.confined_close_notry:gc.count avgt 30 120.000 counts ResourceScopeCloseMin.confined_close_notry:gc.time avgt 30 104.000 ms with `-XX:+PruneDeadExceptionHandlers`: Benchmark Mode Cnt Score Error Units ResourceScopeCloseMin.confined_close avgt 30 4.563 ? 0.043 ns/op ResourceScopeCloseMin.confined_close:gc.alloc.rate avgt 30 11702.868 ? 108.816 MB/sec ResourceScopeCloseMin.confined_close:gc.alloc.rate.norm avgt 30 56.000 ? 0.001 B/op ResourceScopeCloseMin.confined_close:gc.count avgt 30 121.000 counts ResourceScopeCloseMin.confined_close:gc.time avgt 30 93.000 ms ResourceScopeCloseMin.confined_close_notry avgt 30 4.601 ? 0.054 ns/op ResourceScopeCloseMin.confined_close_notry:gc.alloc.rate avgt 30 11605.391 ? 134.000 MB/sec ResourceScopeCloseMin.confined_close_notry:gc.alloc.rate.norm avgt 30 56.000 ? 0.001 B/op ResourceScopeCloseMin.confined_close_notry:gc.count avgt 30 121.000 counts ResourceScopeCloseMin.confined_close_notry:gc.time avgt 30 101.000 ms Note that with the optimization turned on, timing and `gc.alloc.rate.norm` is ~equal. I also noticed through other experiments that C2's ability to inline improves, due to `inline_instructions_size` being reduced for methods with untaken exception handlers, which might bring the size under `InlineSmallCode`, and allow the method to be inlined again. Testing : tier 1-6 (ongoing). Local run of `hotspot_compiler` suite with `-XX:+DeoptimizeALot` and with `-XX:+StressPrunedExceptionHandlers`. ------------- Commit messages: - use TWR in foreign benchmarks - reduce duplication - beef up test a bit - improve StressPrunedExceptionHandlers comment - Revert "Revert "add exception handler pruning stress flags"" - also set has_monitors when callee is synchronized - Revert "add exception handler pruning stress flags" - use has_monitor_bytecodes() in c1 as well - use ciMethod::has_monitor_bytecodes - add exception handler pruning stress flags - ... and 18 more: https://git.openjdk.org/jdk/compare/6f352740...059e306c Changes: https://git.openjdk.org/jdk/pull/16416/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16416&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8267532 Stats: 598 lines in 23 files changed: 498 ins; 34 del; 66 mod Patch: https://git.openjdk.org/jdk/pull/16416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16416/head:pull/16416 PR: https://git.openjdk.org/jdk/pull/16416 From aph at openjdk.org Wed Nov 1 14:01:19 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 1 Nov 2023 14:01:19 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v18] In-Reply-To: <6BxIuSlNcRQh3Lb2XhvRh0UejTh0AHa8wRLGNi7nQWI=.2ce874d6-475a-4044-9b1c-e9806484b760@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <6BxIuSlNcRQh3Lb2XhvRh0UejTh0AHa8wRLGNi7nQWI=.2ce874d6-475a-4044-9b1c-e9806484b760@github.com> Message-ID: On Fri, 27 Oct 2023 11:59:59 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Fix header Anyone? I think this one is cooked now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10661#issuecomment-1789004878 From jvernee at openjdk.org Wed Nov 1 14:02:09 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 1 Nov 2023 14:02:09 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected In-Reply-To: References: Message-ID: On Mon, 30 Oct 2023 14:10:33 GMT, Jorn Vernee wrote: > The issue is essentially that for the Java try-with-resource construct, javac generates multiple calls to `close(`) the resource. One of those calls is inside the hidden exception handler of the try block. The issue for us is that typically the exception handler is never entered (since no exception is thrown), however we don't profile exception handlers at the moment, so the block is not pruned. C2 doesn't inline the `close()` call in the handler due to low call site frequency. As a result, the receiver of that call escapes and can not be scalar replaced, which then leads to a loss in performance. > > There has been some discussion on the JBS issue that this could be fixed by profiling catch blocks. And another suggestion that partial escape analysis could help here to prevent the object from escaping. But, I think there are other benefits to being able to prune dead catch blocks, such as general reduction in code size, and other optimizations being possible by dead code being eliminated. So, I've implemented catch block profiling + pruning in this patch. > > The implementation is essentially very straightforward: we allocate an extra bit of profiling data for each > exception handler of a method in the `MethodData` for that method (which holds all the profiling > data). Then when looking up the exception handler after an exception is thrown, we mark the > exception handler as entered. When C2 parses the exception handler block, and it sees that it has > never been entered, we emit an uncommon trap instead. > > I've also cleaned up the handling of profiling data sections a bit. After adding the extra section of data to MethodData, I was seeing several crashes when ciMethodData was used. The underlying issue seemed to be that the offset of the parameter data was computed based on the total data size - parameter data size (which doesn't work if we add an additional section for exception handler data). I've re-written the code around this a bit to try and prevent issues in the future. Both MethodData and ciMethodData now track offsets of parameter data and exception handler data, and the size of the each data section is derived from the offsets. > > Finally, there was an assert firing in `freeze_internal` in `continuationFreezeThaw.cpp`: > > assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), > "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int64_t)current->held_monitor_count... src/hotspot/share/runtime/sharedRuntime.cpp line 681: > 679: // for given exception > 680: // Note that the implementation of this method assumes it's only called when an exception has actually occured > 681: address SharedRuntime::compute_compiled_exc_handler(CompiledMethod* cm, address ret_pc, Handle& exception, One thing of note for this function: We don't look up the exception handler bci for JVMCI compiled methods, so I've not added any profiling in the case of JVMCI (see the `#if INCLUDE_JVMCI` block at the start of the function). This means that when using a JVMCI compiler, exception handlers might appear as untaken, when they are actually taken. This should be fine since the profiling information is currently only used by C2. But, if a JVMCI compiler wants to start using the profiling information e.g. to prune dead exception handlers as well, then profiling needs to be implement first. Please let me know if this is okay. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16416#discussion_r1378839166 From jsjolen at openjdk.org Wed Nov 1 14:05:38 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 1 Nov 2023 14:05:38 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v9] In-Reply-To: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: > Hi, > > When using at_put and at_put_grow you can provide a value which will be supplied to the constructor of each element. In other words, you can intialize each element through a copy constructor. > > I suggest that we also provide a function equivalent where the function is provided a pointer to the memory to be initialized. This can be used for `NONCOPYABLE` classes, for example. > > This is implemented using a SFINAE pattern because `nullptr` introduces ambiguity if you use static overload. > > Currently running tier1-tier4. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Revert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16409/files - new: https://git.openjdk.org/jdk/pull/16409/files/f6e910a4..535c5c9d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=07-08 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16409.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16409/head:pull/16409 PR: https://git.openjdk.org/jdk/pull/16409 From kbarrett at openjdk.org Wed Nov 1 14:09:10 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 1 Nov 2023 14:09:10 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v8] In-Reply-To: References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: On Wed, 1 Nov 2023 10:37:21 GMT, Johan Sj?len wrote: >> Hi, >> >> When using at_put and at_put_grow you can provide a value which will be supplied to the constructor of each element. In other words, you can intialize each element through a copy constructor. >> >> I suggest that we also provide a function equivalent where the function is provided a pointer to the memory to be initialized. This can be used for `NONCOPYABLE` classes, for example. >> >> This is implemented using a SFINAE pattern because `nullptr` introduces ambiguity if you use static overload. >> >> Currently running tier1-tier4. > > Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: > > - Call dtr and global placement new > - Do not add this src/hotspot/share/c1/c1_GraphBuilder.cpp line 181: > 179: block->init_stores_to_locals(method()->max_locals()); > 180: _bci2block->at_put(cur_bci, block); > 181: _bci2block_successors.at_put_grow(cur_bci, BlockList(), BlockList()); I don't understand the purpose of this change. BlockList is copyable and assignable, so the default filler should be adequate. What am I missing? src/hotspot/share/utilities/growableArray.hpp line 409: > 407: assert(0 <= i, "negative index %d", i); > 408: if (i >= this->_len) { > 409: if (i >= this->_capacity) grow(i); Unrelated to this change, but I think there is some inconsistency around the use of `grow()`. Some callers pass the minimum needed index (like here), some pass the minimum needed length (so 1 + min needed index). src/hotspot/share/utilities/growableArray.hpp line 414: > 412: ::new (&this->_data[j]) E(args...); > 413: } > 414: this->_len = i + 1; `grow()` will fill the range `[0,_len)` with copies of the current array contents, and `[_len,_capacity)` with default constructed values. Then we come here and replace the contents from `[_len,_capacity)`. That seems wasteful. Maybe the args need to be passed through to `grow` and so to `expand_to`? src/hotspot/share/utilities/growableArray.hpp line 420: > 418: > 419: template > 420: void at_put_grow(int i, const E& elem, const Args&... args) { I don't understand the point of this change, as the old code defaulted fill object seems like it should be fine for at_put_grow, which requires the element type to be copyable/assignable anyway to store the new element value. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1378829720 PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1378840762 PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1378845780 PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1378831528 From jvernee at openjdk.org Wed Nov 1 14:50:05 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 1 Nov 2023 14:50:05 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v2] In-Reply-To: References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> <2K2wO2SGK7G-lR5Ml1pZyX9eQLDve5pJcjkuqbSQPZc=.d0c7fb43-d39e-4200-b281-362788777741@github.com> <-kCpEXE_nYfhhWc0DmVw816euNdRO7yHHektWGIlYyM=.13b04735-a6ec-4681-8682-f432f21cc9f5@github.com> Message-ID: <7f-mjPj9y8rP2Iuz2uIyeqw0RjP4CTXiImc9UzQZk9A=.e7897a3b-954c-4600-a2a4-1dfdd5fbf098@github.com> On Tue, 31 Oct 2023 12:56:33 GMT, Johan Sj?len wrote: >> I see, in that case it must be `const Args&... args`. https://godbolt.org/z/aov8v38s9 > > Fixed, thanks for the help with this. For perfect forwarding, (non-const) `Args&&...` should be used together with `std::forward` when doing the downstream call to the constructor. That will make it so rvalue arguments are forwarded as rvalues as well. The current code turns all args into lvalues. (see the example code here e.g. https://en.cppreference.com/w/cpp/utility/forward) Do we need to support NONCOPYABLE argument types as well? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1378898208 From jsjolen at openjdk.org Wed Nov 1 15:26:04 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 1 Nov 2023 15:26:04 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v2] In-Reply-To: <7f-mjPj9y8rP2Iuz2uIyeqw0RjP4CTXiImc9UzQZk9A=.e7897a3b-954c-4600-a2a4-1dfdd5fbf098@github.com> References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> <2K2wO2SGK7G-lR5Ml1pZyX9eQLDve5pJcjkuqbSQPZc=.d0c7fb43-d39e-4200-b281-362788777741@github.com> <-kCpEXE_nYfhhWc0DmVw816euNdRO7yHHektWGIlYyM=.13b04735-a6ec-4681-8682-f432f21cc9f5@github.com> <7f-mjPj9y8rP2Iuz2uIyeqw0RjP4CTXiImc9UzQZk9A=.e7897a3b-954c-4600-a2a4-1dfdd5fbf098@github.com> Message-ID: On Wed, 1 Nov 2023 14:45:34 GMT, Jorn Vernee wrote: >> Fixed, thanks for the help with this. > > For perfect forwarding, (non-const) `Args&&...` should be used together with `std::forward` when doing the downstream call to the constructor. That will make it so rvalue arguments are forwarded as rvalues as well. The current code turns all args into lvalues. (see the example code here e.g. https://en.cppreference.com/w/cpp/utility/forward) > > Do we need to support NONCOPYABLE argument types as well? Hi Jorn, the style guide currently doesn't allow for using perfect forwarding or move semantics in the code, so we cannot do this at the moment. There's a PR for that: https://github.com/openjdk/jdk/pull/15386 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1378941108 From jsjolen at openjdk.org Wed Nov 1 15:26:08 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 1 Nov 2023 15:26:08 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v8] In-Reply-To: References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: On Wed, 1 Nov 2023 14:04:23 GMT, Kim Barrett wrote: >> Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: >> >> - Call dtr and global placement new >> - Do not add this > > src/hotspot/share/utilities/growableArray.hpp line 414: > >> 412: ::new (&this->_data[j]) E(args...); >> 413: } >> 414: this->_len = i + 1; > > `grow()` will fill the range `[0,_len)` with copies of the current array contents, and `[_len,_capacity)` with > default constructed values. Then we come here and replace the contents from `[_len,_capacity)`. That > seems wasteful. Maybe the args need to be passed through to `grow` and so to `expand_to`? Hi, I have this PR which is meant to simply do away with `[len, cap)` being initialized: https://github.com/openjdk/jdk/pull/16418 > src/hotspot/share/utilities/growableArray.hpp line 420: > >> 418: >> 419: template >> 420: void at_put_grow(int i, const E& elem, const Args&... args) { > > I don't understand the point of this change, as the old code defaulted fill object seems like it should be fine > for at_put_grow, which requires the element type to be copyable/assignable anyway to store the new element > value. Sorry, I'm not sure what you're missing here. This change enables in-place construction of the objects and the `Args...` doesn't have to be of type `E`. Is the issue with the default being removed? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1378943694 PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1378946744 From kbarrett at openjdk.org Wed Nov 1 15:27:06 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 1 Nov 2023 15:27:06 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v4] In-Reply-To: References: Message-ID: On Wed, 1 Nov 2023 12:15:35 GMT, Oli Gillespie wrote: >> src/hotspot/share/oops/symbolHandle.hpp line 121: >> >>> 119: >>> 120: // If the queue is now full, implement a one-in, one-out policy. >>> 121: if (Atomic::add(&_cleanup_delay_len, 1, memory_order_relaxed) > _cleanup_delay_max_entries) { >> >> Why is incrementing relaxed? Now I have to think hard about whether there >> might be any ordering problems resulting from that. > > A colleague suggested that was the appropriate constraint ("there is no transitive memory effects that ride on the counter"), but I don't understand it enough to defend it myself. Atomic::inc uses memory_order_conservative. Happy to go with whatever is preferred. One of the things I'm worried about is that I think the decrement operation might be able to become temporarily negative. I'm not sure whether that's a problem. Maybe not. I was a little surprised that the queue length stuff is signed rather than unsigned, as I expected the latter. The idiom I've usually used for such things is increment-before-push and decrement-after-pop. That ensures the counter never goes negative or underflows. Also usually use unsigned types for "sizes". >> src/hotspot/share/utilities/nonblockingQueue.inline.hpp line 51: >> >>> 49: assert(_tail == nullptr, "precondition"); >>> 50: } >>> 51: #endif >> >> Why is this being removed? Without some good explanation, I'm disapproving this change. > > These assertions require the queue to be empty when the destructor is called. My static queue is destroyed at shutdown, and it may not be empty. I see no explicit problem with a queue not being empty when destroyed, I can only assume it was added to serve as an extra reminder for the specific use-case of g1dirtycardsetqueue which is the only existing user of NonblockingQueue. In my opinion it's better for individual queue owners to decide if they care about this. > > Alternatives: I can modify NonblockingQueue to accept an argument (e.g. check_empty_at_shutdown) which is true for g1 card set and false for my usage. Or I can avoid NonblockingQueue entirely as there are a few other review concerns arising from its limitations. The asserts are there to catch memory leaks. It is the responsibility of the owner of the NBQ to ensure it is empty before deleting it. (Probably that should have been mentioned in a comment in the header.) This is a generic requirement, and is not specific to G1DCQS (the sole previous client). HotSpot code avoids variables with static storage duration for types with non-trivial construction or destruction. (I thought this was discussed in the style guide, but apparently not. It's been discussed multiple times in various reviews.) That's what's happening here - there's no owning object for the NBQ, and there's no explicit setup/teardown of it during VM initialization and destruction. That's the reason for hitting the asserts. So no, don't add a configuration argument for NBQ. I wouldn't approve that either. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1378946983 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1378948340 From kbarrett at openjdk.org Wed Nov 1 15:27:07 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 1 Nov 2023 15:27:07 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v4] In-Reply-To: References: Message-ID: On Wed, 1 Nov 2023 12:09:05 GMT, Oli Gillespie wrote: >> If you have to add more code to wrap NonblockingQueue, please implement it in a .cpp file. I thought NBQ was sufficient for this. Maybe we want some other data structure for this. > > I have attempted to avoid locks if at all possible. Re ABA problems, my reading of the push/pop concern is that the problem occurs if a caller pops a node and then re-uses that same node later by re-pushing it. Something like this: > > Queue contains Node1(val=a, next=null). > 1. Thread 1 (start push) pushes Node2(val=b, next=null) > 2. Thread 2 pops Node1(val=a, next=null) > 3. Thread 2 sees that Node1.next is null and reuses the container, making Node1(val=c, next=null) > 4. Thread 2 (start push) starts to push Node1(val=c, next=null) > 5. Thread 1 (finish push) sets Node1.next=Node2 > 6. Thread 2 (finish push) sets Node2.next=Node1 <-- circular list > > Since my change doesn't reuse nodes, I believe it's safe. Coleen - I think NBQ is a reasonable choice for use here. But it's not a complete solution on its own. It imposes documented requirements on clients. I don't think we have a different data structure for this purpose (thread-safe FIFO without locks), so any alternative would need to be invented, and would be solving the same problem as NBQ and the surrounding client-provided code. Oliver - The current usage is not safe. The reuse can occur through the allocator. For example, one thread starts a pop. Another thread steals that pop, then deletes the object. Later, an allocation gets a new node at the same address as the deleted node. That newly allocated node makes its way through the queue to eventually become visible to that first thread's still in-progress pop. (So this is an SMR bug. You generally can't delete an object while some other thread might be looking at it.) GlobalCounter is not a locking mechanism. It is an RCU-style synchronization mechanism, so related to but different from RWLocks. In particular, readers (threads in a critical section) never block due to this mechanism - only write_synchronize blocks. A problem with using GlobalCounter in that simplistic way is that once the queue is "full", the one-in-one-out policy is going to have every allocation hit GlobalCounter::write_synchronize (a potentially somewhat expensive operation, since it needs to iterate over all threads), at least until the queue is bulk drained. Switching over to a one-in-N-out policy could ameliate that by batching the synchronizes over several nodes, and also remove the need for complete bulk draining. Have min/max queue size and switching between insert-only and one-in-N-out policies depending on the current size seems like a possible solution. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1378947580 From kbarrett at openjdk.org Wed Nov 1 15:27:05 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 1 Nov 2023 15:27:05 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v4] In-Reply-To: References: Message-ID: On Tue, 31 Oct 2023 12:27:53 GMT, Oli Gillespie wrote: >> Attempt to fix regressions in class-loading performance caused by fixing a symbol table leak in [JDK-8313678](https://bugs.openjdk.org/browse/JDK-8313678). >> >> See lengthy discussion in https://bugs.openjdk.org/browse/JDK-8315559 for more background. In short, the leak was providing an accidental cache for temporary symbols, allowing reuse. >> >> This change keeps new temporary symbols alive in a queue for a short time, allowing them to be re-used by subsequent operations. For example, when attempting to load a class we may call JVM_FindLoadedClass for multiple classloaders back-to-back, and all of them will create a TempNewSymbol for the same string. At present, each call will leave a dead symbol in the table and create a new one. Dead symbols add cleanup and lookup overhead, and insertion is also costly. With this change, the symbol from the first invocation will live for some time after it is used, and subsequent callers can find the symbol alive in the table - avoiding the extra work. >> >> The queue is bounded, and when full new entries displace the oldest entry. This means symbols are held for the time it takes for 100 new temp symbols to be created. 100 is chosen arbitrarily - the tradeoff is memory usage versus 'cache' hit rate. >> >> When concurrent symbol table cleanup runs, it also drains the queue. >> >> In my testing, this brings Dacapo pmd performance back to where it was before the leak was fixed. >> >> Thanks @shipilev , @coleenp and @MDBijman for helping with this fix. > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Adress comments > > Fix indentation > Improve tests > Improve comment > Remove redundant null check > Improve naming > Pop when >, not >= max len Changes requested by kbarrett (Reviewer). src/hotspot/share/oops/symbolHandle.hpp line 70: > 68: static TempSymbolDelayQueue _cleanup_delay; > 69: static volatile int32_t _cleanup_delay_len; > 70: static volatile int32_t _cleanup_delay_max_entries; We don't usually use sized types unless the size actually matters. I don't think it does here. ------------- PR Review: https://git.openjdk.org/jdk/pull/16398#pullrequestreview-1708454218 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1378945647 From qamai at openjdk.org Wed Nov 1 15:36:06 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 1 Nov 2023 15:36:06 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v8] In-Reply-To: References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: On Wed, 1 Nov 2023 13:50:59 GMT, Kim Barrett wrote: >> Johan Sj?len has updated the pull request incrementally with two additional commits since the last revision: >> >> - Call dtr and global placement new >> - Do not add this > > src/hotspot/share/c1/c1_GraphBuilder.cpp line 181: > >> 179: block->init_stores_to_locals(method()->max_locals()); >> 180: _bci2block->at_put(cur_bci, block); >> 181: _bci2block_successors.at_put_grow(cur_bci, BlockList(), BlockList()); > > I don't understand the purpose of this change. BlockList is copyable and assignable, so the default filler > should be adequate. What am I missing? You can remove this change, the objects will be default constructed in-place in absence of any addition arguments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1378958381 From jvernee at openjdk.org Wed Nov 1 15:36:08 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 1 Nov 2023 15:36:08 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v2] In-Reply-To: References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> <2K2wO2SGK7G-lR5Ml1pZyX9eQLDve5pJcjkuqbSQPZc=.d0c7fb43-d39e-4200-b281-362788777741@github.com> <-kCpEXE_nYfhhWc0DmVw816euNdRO7yHHektWGIlYyM=.13b04735-a6ec-4681-8682-f432f21cc9f5@github.com> <7f-mjPj9y8rP2Iuz2uIyeqw0RjP4CTXiImc9UzQZk9A=.e7897a3b-954c-4600-a2a4-1dfdd5fbf098@github.com> Message-ID: On Wed, 1 Nov 2023 15:18:54 GMT, Johan Sj?len wrote: >> For perfect forwarding, (non-const) `Args&&...` should be used together with `std::forward` when doing the downstream call to the constructor. That will make it so rvalue arguments are forwarded as rvalues as well. The current code turns all args into lvalues. (see the example code here e.g. https://en.cppreference.com/w/cpp/utility/forward) >> >> Do we need to support NONCOPYABLE argument types as well? > > Hi Jorn, the style guide currently doesn't allow for using perfect forwarding or move semantics in the code, so we cannot do this at the moment. > > There's a PR for that: https://github.com/openjdk/jdk/pull/15386 Ah, ok I see. Thanks for the link. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1378959637 From dnsimon at openjdk.org Wed Nov 1 16:27:03 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 1 Nov 2023 16:27:03 GMT Subject: RFR: 8318694: [JVMCI] disable can_call_java in most contexts for libjvmci compiler threads [v7] In-Reply-To: References: Message-ID: On Wed, 1 Nov 2023 13:33:32 GMT, Doug Simon wrote: >> This PR reduces the context in which a libjvmci CompilerThread can make a Java call. By default, a CompileThread for a JVMCI compiler can make Java calls (as jarjvmci only works that way). When libjvmci calls into the VM via a CompilerToVM native method, it enters a scope where Java calls are disabled until either the call returns or a nested scope is entered that re-enables Java calls. The latter is required for the Truffle use case described in [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) as well as for a few other VM entry points where libgraal currently still requires the ability to make Java calls (e.g. to load the Java classes used for boxing primitives). We may be able to eventually eliminate all need for libgraal to make Java calls but this PR is a first step in the right direction. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > convert assertions about can_call_java to guarantees Thanks for all the reviews and useful input. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16383#issuecomment-1789260596 From dnsimon at openjdk.org Wed Nov 1 16:30:15 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 1 Nov 2023 16:30:15 GMT Subject: Integrated: 8318694: [JVMCI] disable can_call_java in most contexts for libjvmci compiler threads In-Reply-To: References: Message-ID: <9s6umJ-aLcpc2JLD0PQV_sUs3Wgk7YPGV2lyPo-tjbE=.82a70281-39e2-4ac5-a753-3c0846a38849@github.com> On Thu, 26 Oct 2023 17:39:46 GMT, Doug Simon wrote: > This PR reduces the context in which a libjvmci CompilerThread can make a Java call. By default, a CompileThread for a JVMCI compiler can make Java calls (as jarjvmci only works that way). When libjvmci calls into the VM via a CompilerToVM native method, it enters a scope where Java calls are disabled until either the call returns or a nested scope is entered that re-enables Java calls. The latter is required for the Truffle use case described in [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) as well as for a few other VM entry points where libgraal currently still requires the ability to make Java calls (e.g. to load the Java classes used for boxing primitives). We may be able to eventually eliminate all need for libgraal to make Java calls but this PR is a first step in the right direction. This pull request has now been integrated. Changeset: d354141a Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/d354141aa191c80b473dfeee27b51f1562ffeafd Stats: 197 lines in 13 files changed: 109 ins; 65 del; 23 mod 8318694: [JVMCI] disable can_call_java in most contexts for libjvmci compiler threads Reviewed-by: dholmes, never ------------- PR: https://git.openjdk.org/jdk/pull/16383 From ogillespie at openjdk.org Wed Nov 1 16:37:08 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Wed, 1 Nov 2023 16:37:08 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v4] In-Reply-To: References: Message-ID: <6lajvS2wTUMLb-JbqH-30AQB509F6jRG0FuMmrGY3gs=.9b83d6eb-babe-42ca-b628-aaea06323f4d@github.com> On Wed, 1 Nov 2023 15:24:01 GMT, Kim Barrett wrote: >> I have attempted to avoid locks if at all possible. Re ABA problems, my reading of the push/pop concern is that the problem occurs if a caller pops a node and then re-uses that same node later by re-pushing it. Something like this: >> >> Queue contains Node1(val=a, next=null). >> 1. Thread 1 (start push) pushes Node2(val=b, next=null) >> 2. Thread 2 pops Node1(val=a, next=null) >> 3. Thread 2 sees that Node1.next is null and reuses the container, making Node1(val=c, next=null) >> 4. Thread 2 (start push) starts to push Node1(val=c, next=null) >> 5. Thread 1 (finish push) sets Node1.next=Node2 >> 6. Thread 2 (finish push) sets Node2.next=Node1 <-- circular list >> >> Since my change doesn't reuse nodes, I believe it's safe. > > Coleen - > > I think NBQ is a reasonable choice for use here. But it's not a complete > solution on its own. It imposes documented requirements on clients. I don't > think we have a different data structure for this purpose (thread-safe FIFO > without locks), so any alternative would need to be invented, and would be > solving the same problem as NBQ and the surrounding client-provided code. > > Oliver - > > The current usage is not safe. The reuse can occur through the allocator. For > example, one thread starts a pop. Another thread steals that pop, then deletes > the object. Later, an allocation gets a new node at the same address as the > deleted node. That newly allocated node makes its way through the queue to > eventually become visible to that first thread's still in-progress pop. (So > this is an SMR bug. You generally can't delete an object while some other > thread might be looking at it.) > > GlobalCounter is not a locking mechanism. It is an RCU-style synchronization > mechanism, so related to but different from RWLocks. In particular, readers > (threads in a critical section) never block due to this mechanism - only > write_synchronize blocks. > > A problem with using GlobalCounter in that simplistic way is that once the > queue is "full", the one-in-one-out policy is going to have every allocation > hit GlobalCounter::write_synchronize (a potentially somewhat expensive > operation, since it needs to iterate over all threads), at least until the > queue is bulk drained. Switching over to a one-in-N-out policy could ameliate > that by batching the synchronizes over several nodes, and also remove the need > for complete bulk draining. Have min/max queue size and switching between > insert-only and one-in-N-out policies depending on the current size seems like > a possible solution. Thanks for all the details, I hadn't considered the SMR angle. I'll think about alternatives. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1379035963 From ogillespie at openjdk.org Wed Nov 1 16:37:11 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Wed, 1 Nov 2023 16:37:11 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v4] In-Reply-To: References: Message-ID: On Wed, 1 Nov 2023 15:24:28 GMT, Kim Barrett wrote: >> These assertions require the queue to be empty when the destructor is called. My static queue is destroyed at shutdown, and it may not be empty. I see no explicit problem with a queue not being empty when destroyed, I can only assume it was added to serve as an extra reminder for the specific use-case of g1dirtycardsetqueue which is the only existing user of NonblockingQueue. In my opinion it's better for individual queue owners to decide if they care about this. >> >> Alternatives: I can modify NonblockingQueue to accept an argument (e.g. check_empty_at_shutdown) which is true for g1 card set and false for my usage. Or I can avoid NonblockingQueue entirely as there are a few other review concerns arising from its limitations. > > The asserts are there to catch memory leaks. It is the responsibility of the > owner of the NBQ to ensure it is empty before deleting it. (Probably that > should have been mentioned in a comment in the header.) This is a generic > requirement, and is not specific to G1DCQS (the sole previous client). > > HotSpot code avoids variables with static storage duration for types with > non-trivial construction or destruction. (I thought this was discussed in the > style guide, but apparently not. It's been discussed multiple times in > various reviews.) That's what's happening here - there's no owning object for > the NBQ, and there's no explicit setup/teardown of it during VM initialization > and destruction. That's the reason for hitting the asserts. > > So no, don't add a configuration argument for NBQ. I wouldn't approve that > either. Thanks. Is the 'leak' is relevant in this case since we're shutting down anyway? I tried draining the queue in before_exit but that doesn't seem to run with ctrl-c/SIGINT shutdowns. > HotSpot code avoids variables with static storage duration for types with non-trivial construction or destruction Do you have a suggested alternative? Everything in symbol table seems to be static. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1379036037 From macarte at openjdk.org Wed Nov 1 16:40:28 2023 From: macarte at openjdk.org (Mat Carter) Date: Wed, 1 Nov 2023 16:40:28 GMT Subject: RFR: 8317562: [JFR] Compilation queue statistics [v4] In-Reply-To: References: Message-ID: > Adding a new periodic jfr event to monitor and output statistics for the compiler queues. You will see one event per compiler queue (c1 and c2) > > Passes tier1 on linux (x86) and mac (aarch64) Mat Carter has updated the pull request incrementally with one additional commit since the last revision: Naming changes based on review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16211/files - new: https://git.openjdk.org/jdk/pull/16211/files/310ce342..5474ff61 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16211&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16211&range=02-03 Stats: 19 lines in 2 files changed: 0 ins; 0 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/16211.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16211/head:pull/16211 PR: https://git.openjdk.org/jdk/pull/16211 From macarte at openjdk.org Wed Nov 1 16:40:33 2023 From: macarte at openjdk.org (Mat Carter) Date: Wed, 1 Nov 2023 16:40:33 GMT Subject: RFR: 8317562: [JFR] Compilation queue statistics [v3] In-Reply-To: References: Message-ID: On Sun, 29 Oct 2023 20:19:21 GMT, Erik Gahlin wrote: >> Mat Carter has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comments > > src/hotspot/share/jfr/metadata/metadata.xml line 853: > >> 851: >> 852: >> 853: > > Could we use other terms then ingress and egress? Something that is more in general use, Thought about enqueue/dequeue, but went with added/removed so its consistent with the other fields > src/hotspot/share/jfr/metadata/metadata.xml line 861: > >> 859: >> 860: >> 861: > > The label should be short and use headline-style capitalization, how about "Compiler Thread Count"? > https://docs.oracle.com/en/java/javase/21/docs/api/jdk.jfr/jdk/jfr/Label.html Fixed, thank you for the reference > src/hotspot/share/jfr/periodic/jfrCompilerQueueUtilization.cpp line 53: > >> 51: return 0; >> 52: } >> 53: return ((current - old) * NANOSECS_PER_SEC) / interval.nanoseconds(); > > Shouldn't it be ticks per second here? This resolves to ticks per second; could replace with (current - old) / interval.seconds() but this introduces floats. This was taken from NetworkUtilization, I assume they went this way to keep the math in integers ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1379036697 PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1379037454 PR Review Comment: https://git.openjdk.org/jdk/pull/16211#discussion_r1379039332 From kbarrett at openjdk.org Wed Nov 1 16:41:05 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 1 Nov 2023 16:41:05 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v2] In-Reply-To: References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> <2K2wO2SGK7G-lR5Ml1pZyX9eQLDve5pJcjkuqbSQPZc=.d0c7fb43-d39e-4200-b281-362788777741@github.com> <-kCpEXE_nYfhhWc0DmVw816euNdRO7yHHektWGIlYyM=.13b04735-a6ec-4681-8682-f432f21cc9f5@github.com> <7f-mjPj9y8rP2Iuz2uIyeqw0RjP4CTXiImc9UzQZk9A=.e7897a3b-954c-4600-a2a4-1dfdd5fbf098@github.com> Message-ID: On Wed, 1 Nov 2023 15:33:20 GMT, Jorn Vernee wrote: >> Hi Jorn, the style guide currently doesn't allow for using perfect forwarding or move semantics in the code, so we cannot do this at the moment. >> >> There's a PR for that: https://github.com/openjdk/jdk/pull/15386 > > Ah, ok I see. Thanks for the link. This is a case where using rvalues and perfect forwarding is incorrect. The arguments are potentially used in multiple E constructor calls. Using the same rvalue in multiple places is very bad. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1379039865 From jvernee at openjdk.org Wed Nov 1 16:46:03 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 1 Nov 2023 16:46:03 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v2] In-Reply-To: References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> <2K2wO2SGK7G-lR5Ml1pZyX9eQLDve5pJcjkuqbSQPZc=.d0c7fb43-d39e-4200-b281-362788777741@github.com> <-kCpEXE_nYfhhWc0DmVw816euNdRO7yHHektWGIlYyM=.13b04735-a6ec-4681-8682-f432f21cc9f5@github.com> <7f-mjPj9y8rP2Iuz2uIyeqw0RjP4CTXiImc9UzQZk9A=.e7897a3b-954c-4600-a2a4-1dfdd5fbf098@github.com> Message-ID: On Wed, 1 Nov 2023 16:37:59 GMT, Kim Barrett wrote: >> Ah, ok I see. Thanks for the link. > > This is a case where using rvalues and perfect forwarding is incorrect. The > arguments are potentially used in multiple E constructor calls. Using the same > rvalue in multiple places is very bad. Ah, that's right. We call it in a loop, so `args...` might be used multiple times. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1379045975 From kbarrett at openjdk.org Wed Nov 1 17:25:05 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 1 Nov 2023 17:25:05 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v8] In-Reply-To: References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: On Wed, 1 Nov 2023 15:23:28 GMT, Johan Sj?len wrote: >> src/hotspot/share/utilities/growableArray.hpp line 420: >> >>> 418: >>> 419: template >>> 420: void at_put_grow(int i, const E& elem, const Args&... args) { >> >> I don't understand the point of this change, as the old code defaulted fill object seems like it should be fine >> for at_put_grow, which requires the element type to be copyable/assignable anyway to store the new element >> value. > > Sorry, I'm not sure what you're missing here. This change enables in-place construction of the objects and the `Args...` doesn't have to be of type `E`. Is the issue with the default being removed? My point is that the old code was fine. Copy-assigning the fill argument (either defaulted or passed in) had to work (E can't be non-copyable), because at_put_grow requires E to be copy-assignable in order to install the new value at index i. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1379094175 From kbarrett at openjdk.org Wed Nov 1 18:15:10 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 1 Nov 2023 18:15:10 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v4] In-Reply-To: References: Message-ID: On Wed, 1 Nov 2023 16:34:27 GMT, Oli Gillespie wrote: >> The asserts are there to catch memory leaks. It is the responsibility of the >> owner of the NBQ to ensure it is empty before deleting it. (Probably that >> should have been mentioned in a comment in the header.) This is a generic >> requirement, and is not specific to G1DCQS (the sole previous client). >> >> HotSpot code avoids variables with static storage duration for types with >> non-trivial construction or destruction. (I thought this was discussed in the >> style guide, but apparently not. It's been discussed multiple times in >> various reviews.) That's what's happening here - there's no owning object for >> the NBQ, and there's no explicit setup/teardown of it during VM initialization >> and destruction. That's the reason for hitting the asserts. >> >> So no, don't add a configuration argument for NBQ. I wouldn't approve that >> either. > > Thanks. Is the 'leak' is relevant in this case since we're shutting down anyway? I tried draining the queue in before_exit but that doesn't seem to run with ctrl-c/SIGINT shutdowns. > >> HotSpot code avoids variables with static storage duration for types with > non-trivial construction or destruction > > Do you have a suggested alternative? Everything in symbol table seems to be static. I've filed a bug report against the style guide to provide guidance in this area. https://bugs.openjdk.org/browse/JDK-8319242 We typically define a variable with a pointer type, and initialize it somewhere in the VM startup process. So in this case, something like // declared in class static TempSymbolDelayQueue* _cleanup_delay; // in the .cpp TempSymbolDelayQueue* SymbolHandleBase::_cleanup_delay = nullptr; // and update uses accordingly. Also declare and define an initialization function to initialize that variable, and call it somewhere in init.cpp, probably in init_globals(). Use the style that exists there. I expect if you don't put it early enough that you'll reliably get an obvious crash from attempting to access a nullptr. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1379145767 From kbarrett at openjdk.org Wed Nov 1 18:41:06 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 1 Nov 2023 18:41:06 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v4] In-Reply-To: References: Message-ID: On Wed, 1 Nov 2023 18:12:40 GMT, Kim Barrett wrote: >> Thanks. Is the 'leak' is relevant in this case since we're shutting down anyway? I tried draining the queue in before_exit but that doesn't seem to run with ctrl-c/SIGINT shutdowns. >> >>> HotSpot code avoids variables with static storage duration for types with >> non-trivial construction or destruction >> >> Do you have a suggested alternative? Everything in symbol table seems to be static. > > I've filed a bug report against the style guide to provide guidance in this area. > https://bugs.openjdk.org/browse/JDK-8319242 > > We typically define a variable with a pointer type, and initialize it somewhere in the VM startup process. So in > this case, something like > > // declared in class > static TempSymbolDelayQueue* _cleanup_delay; > > // in the .cpp > TempSymbolDelayQueue* SymbolHandleBase::_cleanup_delay = nullptr; > // and update uses accordingly. > > Also declare and define an initialization function to initialize that variable, and call it somewhere in init.cpp, > probably in init_globals(). Use the style that exists there. I expect if you don't put it early enough that you'll > reliably get an obvious crash from attempting to access a nullptr. As you suggest, in this case the "leak" doesn't matter because we're on the way to shutdown. One might think the asserts could be conditionalized on that, but so far as I know there isn't a reliable "on the way to shutdown" detector. The idiom described above is what's done throughout HotSpot for initialization (to ensure we have control of initialization order, among other things), and we often don't worry about the teardown side of things. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1379171154 From sspitsyn at openjdk.org Wed Nov 1 18:49:11 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 1 Nov 2023 18:49:11 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads Message-ID: The handshakes support for virtual threads is needed to simplify the JVMTI implementation for virtual threads. There is a significant duplication in the JVMTI code to differentiate code intended to support platform, virtual threads or both. The handshakes are unified, so it is enough to define just one handshake for both platform and virtual thread. At the low level, the JVMTI code supporting platform and virtual threads still can be different. This implementation is based on the `JvmtiVTMSTransitionDisabler` class. The interface includes, at least, two new classes: - `JvmtiHandshake` and `JvmtiUnifiedHandshakeClosure` The `JvmtiUnifiedHandshakeClosure` defines two different callback functions: `do_thread()` and `do_vthread()`. The first JVMTI functions are picked first to be converted to use the `JvmtiHandshake`: - `GetStackTrace`, `GetFrameCount`, `GetFrameLocation`, `NotifyFramePop` Testing: - the mach5 tiers 1-6 are all passed ------------- Commit messages: - 8319244: implement JVMTI handshakes support for virtual threads Changes: https://git.openjdk.org/jdk/pull/16460/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16460&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319244 Stats: 532 lines in 5 files changed: 176 ins; 329 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/16460.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16460/head:pull/16460 PR: https://git.openjdk.org/jdk/pull/16460 From mdoerr at openjdk.org Wed Nov 1 18:56:13 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 1 Nov 2023 18:56:13 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v6] In-Reply-To: References: <_HeaLuC2VvNySQYp4nbSkXjHurHTeJ3MdgeuvbuGRT0=.1f44d2d3-fec2-4816-9a2f-716d94c8baaf@github.com> Message-ID: <8N2qJM9OdG_3884fYYuXX0BetaDNqc-bt_eR8-jGFH4=.00aaefb5-804e-4091-af33-80f8590c0f6f@github.com> On Wed, 1 Nov 2023 09:18:18 GMT, Andrew Dinn wrote: >> This exists in x86 as well in each of the `load_resolved_method_entry_...()` methods. Some of these only have three arguments which cannot be reused so there is the option to include `index` as an argument, but this introduces an inconsistency among these similar methods. >> >> Should all of these methods take `index` which can be a reused register? > > If you add an extra argument then please name it `temp` or something equally clear as to the fact that this is for storing a local scratch value rather than an input/output. n.b. this is the main reason for resisting the urge to move register declarations+initializations up the call chain. It makes it less obvious that a register is only being used local to a callee. I think `index` is only used before `flags` and `method` get assigned. Can't we use one of them? Otherwise, passing `temp` would be fine, too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1379186125 From mdoerr at openjdk.org Wed Nov 1 19:08:16 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 1 Nov 2023 19:08:16 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v6] In-Reply-To: References: Message-ID: On Tue, 31 Oct 2023 23:01:44 GMT, Matias Saavedra Silva wrote: >> src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 3293: >> >>> 3291: void TemplateTable::prepare_invoke(Register recv) { >>> 3292: >>> 3293: const Register cache = r2; >> >> Passing `cache` and `flags` is better. See x86 version. > > As @offamitkumar pointed out, `flags` is unused in aarch64. Ok, then only `cache`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1379197098 From jjoo at openjdk.org Wed Nov 1 19:50:45 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Wed, 1 Nov 2023 19:50:45 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v36] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 36 additional commits since the last revision: - Merge branch 'openjdk:master' into master - Replace NULL with nullptr - Implement hsperf counter for G1ServiceThread - Remove StringDedup from GC thread list - Use 64-bit atomic add for incrementing counters - Merge branch 'openjdk:master' into master - Add call to publish in parallel gc and update counter names - Add Copyright header to test and formatting changes - Fix test - add comment and change if defined to ifdef - ... and 26 more: https://git.openjdk.org/jdk/compare/d748dccb...2446149c ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/be104e16..2446149c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=35 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=34-35 Stats: 27809 lines in 1182 files changed: 15264 ins; 4162 del; 8383 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From shade at openjdk.org Wed Nov 1 23:11:08 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 1 Nov 2023 23:11:08 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance Message-ID: See the symptoms, reproducer and analysis in the bug. Current code waits on `disarm()`, which effectively stalls leaving the safepoint if some threads lag behind. Having more runnable threads than CPUs nearly guarantees that we would wait for quite some time. Just waiting at `arm()` is insufficient, but we can have several `Semaphores` to do what we want. This PR implements a more efficient `GenericWaitBarrier` to recover the performance. Most of the implementation discussion is in the code comments. The key observation that drives this work is that we want to reuse `Semaphore` and related counters without being stuck waiting for threads to leave. (AFAICS, futex-based `LinuxWaitBarrier` does roughly the same, but handles this reuse on futex side, by assigning the "address" per futex.) This issue affects everything except Linux. I initially found this on my M1 Mac, but pretty sure it is easy to reproduce on Windows as well. The safepoints from the reproducer in the bug improved dramatically on a Mac. Note not only the two orders of magnitude better safepoint times, but also the >2x more GC safepoints in the time-bound allocation test, which means the attainable GC throughput is at least 2x more, since we don't waste time at this wait barrier. ![plot-generic-wait-barrier-macos](https://github.com/openjdk/jdk/assets/1858943/28cf22d3-b5ca-44fb-bde7-47189d14b47b) Additional testing: - [x] MacOS AArch64 server fastdebug, `tier1` - [x] Linux x86_64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) - [x] MacOS AArch64 server fastdebug, `tier2 tier3` - [ ] Linux x86_64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) - [ ] Linux AArch64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/16404/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16404&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318986 Stats: 186 lines in 3 files changed: 126 ins; 17 del; 43 mod Patch: https://git.openjdk.org/jdk/pull/16404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16404/head:pull/16404 PR: https://git.openjdk.org/jdk/pull/16404 From jjoo at openjdk.org Wed Nov 1 23:56:25 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Wed, 1 Nov 2023 23:56:25 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v37] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with two additional commits since the last revision: - revert gitignore change - Attempt to fix broken test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/2446149c..9fb36a9e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=36 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=35-36 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From jjoo at openjdk.org Thu Nov 2 01:24:08 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Thu, 2 Nov 2023 01:24:08 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v35] In-Reply-To: <6tngC-Jwyx8e25LGT8dAwKbaPb9qb_w5ONctnFieH3o=.61b013b2-beb9-4053-8c06-86a700208d77@github.com> References: <6tngC-Jwyx8e25LGT8dAwKbaPb9qb_w5ONctnFieH3o=.61b013b2-beb9-4053-8c06-86a700208d77@github.com> Message-ID: On Wed, 1 Nov 2023 09:34:01 GMT, Stefan Johansson wrote: >> Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: >> >> Replace NULL with nullptr > > Sorry for being a bit late to this PR. I think the addition of CPU time tracking is good, but I wonder if we could do it in a way that is a bit more general. A more general way of tracking CPU time for a set of threads and we could then have different consumers of this data. In addition to hsperf counters I think having logging and JFR events for this could be interesting as well. > > Have you had any thought along those lines, any obvious problems with such approach? @kstefanj thank you for taking a look! caoman@ and I discussed your comment and our thoughts are below: > A more general way of tracking CPU time for a set of threads and we could then have different consumers of this data. Could you elaborate a bit on what you were thinking of here? If we are assuming something like a thread that updates all other threads, I think this implementation could get a bit complicated. There are two main issues that we can see with a generic thread approach: 1. We would have to figure out how often to pull metrics from the various gc threads from the central thread, and possibly determine this frequency separately for every thread. Instead with our current implementation, we can manually trigger publishes based on when the GC thread is done doing work. 2. We would still need to tag each thread we want to track somewhere, and keep track of a mapping from thread to its counter name, etc. which doesn't seem to simplify things too much. (I imagine we will still need to touch numerous files to "tag" each thread with whether we want to track it or not?) The existing `sun.management:type=HotspotThreading` MBean approach (discussed [here]( https://mail.openjdk.org/pipermail/core-libs-dev/2023-September/111397.html)) could be another general way to track CPU. However, the discussion concludes that it is an internal API, and discourages users from using it. > In addition to hsperf counters I think having logging and JFR events for this could be interesting as well. There's actually already an existing implementation that covers process CPU from GC pauses, but it doesn't handle concurrent work: https://bugs.openjdk.org/browse/JDK-8291753. However, for our implementation of AHS, we decided against a JFR-based approach, as it has a bit more overhead and the hsperf-based implementation seemed simpler. So while a JFR-based approach to track these counters might be feasible, we believe it would be considerably more work, and could be done in a separate PR later if there is sufficient user interest. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15082#issuecomment-1789914128 From macarte at openjdk.org Thu Nov 2 02:04:28 2023 From: macarte at openjdk.org (Mat Carter) Date: Thu, 2 Nov 2023 02:04:28 GMT Subject: RFR: 8317562: [JFR] Compilation queue statistics [v5] In-Reply-To: References: Message-ID: > Adding a new periodic jfr event to monitor and output statistics for the compiler queues. You will see one event per compiler queue (c1 and c2) > > Passes tier1 on linux (x86) and mac (aarch64) Mat Carter has updated the pull request incrementally with two additional commits since the last revision: - Fixed copyright notice - Fixed copyright notice ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16211/files - new: https://git.openjdk.org/jdk/pull/16211/files/5474ff61..08b2dbc2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16211&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16211&range=03-04 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16211.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16211/head:pull/16211 PR: https://git.openjdk.org/jdk/pull/16211 From dholmes at openjdk.org Thu Nov 2 05:05:01 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 2 Nov 2023 05:05:01 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected In-Reply-To: References: Message-ID: On Mon, 30 Oct 2023 14:10:33 GMT, Jorn Vernee wrote: > The issue is essentially that for the Java try-with-resource construct, javac generates multiple calls to `close(`) the resource. One of those calls is inside the hidden exception handler of the try block. The issue for us is that typically the exception handler is never entered (since no exception is thrown), however we don't profile exception handlers at the moment, so the block is not pruned. C2 doesn't inline the `close()` call in the handler due to low call site frequency. As a result, the receiver of that call escapes and can not be scalar replaced, which then leads to a loss in performance. > > There has been some discussion on the JBS issue that this could be fixed by profiling catch blocks. And another suggestion that partial escape analysis could help here to prevent the object from escaping. But, I think there are other benefits to being able to prune dead catch blocks, such as general reduction in code size, and other optimizations being possible by dead code being eliminated. So, I've implemented catch block profiling + pruning in this patch. > > The implementation is essentially very straightforward: we allocate an extra bit of profiling data for each > exception handler of a method in the `MethodData` for that method (which holds all the profiling > data). Then when looking up the exception handler after an exception is thrown, we mark the > exception handler as entered. When C2 parses the exception handler block, and it sees that it has > never been entered, we emit an uncommon trap instead. > > I've also cleaned up the handling of profiling data sections a bit. After adding the extra section of data to MethodData, I was seeing several crashes when ciMethodData was used. The underlying issue seemed to be that the offset of the parameter data was computed based on the total data size - parameter data size (which doesn't work if we add an additional section for exception handler data). I've re-written the code around this a bit to try and prevent issues in the future. Both MethodData and ciMethodData now track offsets of parameter data and exception handler data, and the size of the each data section is derived from the offsets. > > Finally, there was an assert firing in `freeze_internal` in `continuationFreezeThaw.cpp`: > > assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), > "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int64_t)current->held_monitor_count... @JornVernee could you add tiers 7 and 8 to your testing please. I would be worried in case there is any subtle interaction with Xcomp mode, or with ZGC. Given the possibility of asynchronous exceptions (including logically asynchronous exceptions like OutOfMemoryError and StackOverflowError) what happens when a "dead" catch block turns out to not be dead afterall? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1790072194 From dholmes at openjdk.org Thu Nov 2 05:08:15 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 2 Nov 2023 05:08:15 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v18] In-Reply-To: <6BxIuSlNcRQh3Lb2XhvRh0UejTh0AHa8wRLGNi7nQWI=.2ce874d6-475a-4044-9b1c-e9806484b760@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <6BxIuSlNcRQh3Lb2XhvRh0UejTh0AHa8wRLGNi7nQWI=.2ce874d6-475a-4044-9b1c-e9806484b760@github.com> Message-ID: On Fri, 27 Oct 2023 11:59:59 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Fix header Nothing further from me. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/10661#pullrequestreview-1709461453 From dholmes at openjdk.org Thu Nov 2 05:33:07 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 2 Nov 2023 05:33:07 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v4] In-Reply-To: References: Message-ID: On Tue, 31 Oct 2023 12:27:53 GMT, Oli Gillespie wrote: >> Attempt to fix regressions in class-loading performance caused by fixing a symbol table leak in [JDK-8313678](https://bugs.openjdk.org/browse/JDK-8313678). >> >> See lengthy discussion in https://bugs.openjdk.org/browse/JDK-8315559 for more background. In short, the leak was providing an accidental cache for temporary symbols, allowing reuse. >> >> This change keeps new temporary symbols alive in a queue for a short time, allowing them to be re-used by subsequent operations. For example, when attempting to load a class we may call JVM_FindLoadedClass for multiple classloaders back-to-back, and all of them will create a TempNewSymbol for the same string. At present, each call will leave a dead symbol in the table and create a new one. Dead symbols add cleanup and lookup overhead, and insertion is also costly. With this change, the symbol from the first invocation will live for some time after it is used, and subsequent callers can find the symbol alive in the table - avoiding the extra work. >> >> The queue is bounded, and when full new entries displace the oldest entry. This means symbols are held for the time it takes for 100 new temp symbols to be created. 100 is chosen arbitrarily - the tradeoff is memory usage versus 'cache' hit rate. >> >> When concurrent symbol table cleanup runs, it also drains the queue. >> >> In my testing, this brings Dacapo pmd performance back to where it was before the leak was fixed. >> >> Thanks @shipilev , @coleenp and @MDBijman for helping with this fix. > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Adress comments > > Fix indentation > Improve tests > Improve comment > Remove redundant null check > Improve naming > Pop when >, not >= max len > I tried draining the queue in before_exit but that doesn't seem to run with ctrl-c/SIGINT shutdowns. `before_exit` is executed on all non-VM-abort VM shutdown paths. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16398#issuecomment-1790096249 From dholmes at openjdk.org Thu Nov 2 05:53:00 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 2 Nov 2023 05:53:00 GMT Subject: RFR: 8318982: Improve Exceptions::special_exception [v3] In-Reply-To: References: Message-ID: On Wed, 1 Nov 2023 09:17:30 GMT, Doug Simon wrote: >> This PR consolidates the 2 almost identical versions of `Exceptions::special_exception` into a single method. >> If a special exception is thrown and `-Xlog:exceptions` is enabled, a log message is emitted and it indicates the special handling. >> >> Here's an example in the output from running `compiler/linkage/LinkageErrors.java` with `-Xlog:exceptions -Xcomp`: >> >> [0.194s][info][exceptions] Exception (java.util.Set, java.lang.String, java.util.Set, boolean)' (java.lang.module.ModuleDescriptor$1 and java.lang.module.ModuleDescriptor$Exports are in module java.base of loader 'bootstrap')> (0x0000000000000000) >> thrown [src/hotspot/share/interpreter/linkResolver.cpp, line 591] >> for thread 0x000000011e18c600 >> thread cannot call Java, throwing pre-allocated exception: a 'java/lang/VirtualMachineError'{0x0000000772e06f00} >> >> >> The motivation for this change was work on [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) where it's useful to know when exceptions are thrown on a CompilerThread. > > Doug Simon has updated the pull request incrementally with two additional commits since the last revision: > > - differentiate special exception log message from normal log message > - check special_exception arguments for nullness pre-condition Okay. I still find this merging of questionable benefit, but approved. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16401#pullrequestreview-1709497727 From dholmes at openjdk.org Thu Nov 2 06:34:01 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 2 Nov 2023 06:34:01 GMT Subject: RFR: 8319206: [REDO] Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native In-Reply-To: References: Message-ID: On Wed, 1 Nov 2023 11:12:41 GMT, Markus Gr?nlund wrote: > Greetings, > > The original problem was introduced with: > > [JDK-8313251](https://bugs.openjdk.org/browse/JDK-8313251) - Add NativeLibraryLoad event > > The first attempt at resolution: > > [JDK-8315220](https://bugs.openjdk.org/browse/JDK-8315220) - Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native > > was reverted: > > [JDK-8315930](https://bugs.openjdk.org/browse/JDK-8315930) - Revert "8315220: Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native" > > because it ran into: > > [JDK-8315892](https://bugs.openjdk.org/browse/JDK-8315892) - NativeLibraryLoadEvent dtr fails with "assert(false) failed: Possible safepoint reached by thread that does not allow it" > > The reason for [JDK-8315892](https://bugs.openjdk.org/browse/JDK-8315892) was that the thread loading the zip library, in thread state native, was the owner of the Zip_lock mutex. This prevented it from transitioning to thread_in_vm. > > The Zip_lock mutex was removed as part of: > > [JDK-8317951](https://bugs.openjdk.org/browse/JDK-8317951) - Refactor loading of zip library to help resolve [JDK-8315220](https://bugs.openjdk.org/browse/JDK-8315220) > > Therefore, it is time to redo the original [JDK-8315220](https://bugs.openjdk.org/browse/JDK-8315220), now under this issue, [JDK-8319206](https://bugs.openjdk.org/browse/JDK-8319206). > > Testing: jdk_jfr, tiers1-6, stress testing > > Thanks > Markus General runtime changes seem fine. I can't comment on all the JFR-specific aspects but in general that appears okay too. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16447#pullrequestreview-1709536385 From jpai at openjdk.org Thu Nov 2 06:51:00 2023 From: jpai at openjdk.org (Jaikiran Pai) Date: Thu, 2 Nov 2023 06:51:00 GMT Subject: RFR: 8315680: java/lang/ref/ReachabilityFenceTest.java should run with -Xbatch In-Reply-To: References: Message-ID: On Tue, 3 Oct 2023 07:47:30 GMT, Gerg? Barany wrote: > This test requires certain methods to be compiled, but without `-Xbatch` the compiler races against the test code, which can lead to intermittent failures. Adding hotspot label because from the looks of this change, it appears to require someone from that area to review this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16023#issuecomment-1790163370 From stuefe at openjdk.org Thu Nov 2 07:48:15 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 2 Nov 2023 07:48:15 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v18] In-Reply-To: <6BxIuSlNcRQh3Lb2XhvRh0UejTh0AHa8wRLGNi7nQWI=.2ce874d6-475a-4044-9b1c-e9806484b760@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> <6BxIuSlNcRQh3Lb2XhvRh0UejTh0AHa8wRLGNi7nQWI=.2ce874d6-475a-4044-9b1c-e9806484b760@github.com> Message-ID: On Fri, 27 Oct 2023 11:59:59 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Fix header Looks still good to me. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/10661#pullrequestreview-1709621199 From shade at openjdk.org Thu Nov 2 08:22:01 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 2 Nov 2023 08:22:01 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 15:40:11 GMT, Aleksey Shipilev wrote: > See the symptoms, reproducer and analysis in the bug. > > Current code waits on `disarm()`, which effectively stalls leaving the safepoint if some threads lag behind. Having more runnable threads than CPUs nearly guarantees that we would wait for quite some time. Just waiting at `arm()` is insufficient, but we can have several `Semaphores` to do what we want. > > This PR implements a more efficient `GenericWaitBarrier` to recover the performance. Most of the implementation discussion is in the code comments. The key observation that drives this work is that we want to reuse `Semaphore` and related counters without being stuck waiting for threads to leave. > > (AFAICS, futex-based `LinuxWaitBarrier` does roughly the same, but handles this reuse on futex side, by assigning the "address" per futex.) > > This issue affects everything except Linux. I initially found this on my M1 Mac, but pretty sure it is easy to reproduce on Windows as well. The safepoints from the reproducer in the bug improved dramatically on a Mac. Note not only the two orders of magnitude better safepoint times, but also the >2x more GC safepoints in the time-bound allocation test, which means the attainable GC throughput is at least 2x more, since we don't waste time at this wait barrier. > > ![plot-generic-wait-barrier-macos](https://github.com/openjdk/jdk/assets/1858943/28cf22d3-b5ca-44fb-bde7-47189d14b47b) > > Additional testing: > - [x] MacOS AArch64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) > - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) > - [x] MacOS AArch64 server fastdebug, `tier2 tier3` > - [ ] Linux x86_64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) > - [ ] Linux AArch64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) @robehn, you might be interested in this :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1790265118 From jkern at openjdk.org Thu Nov 2 08:26:02 2023 From: jkern at openjdk.org (Joachim Kern) Date: Thu, 2 Nov 2023 08:26:02 GMT Subject: RFR: 8306561: Possible out of bounds access in print_pointer_information [v6] In-Reply-To: References: Message-ID: On Tue, 31 Oct 2023 17:27:57 GMT, Thomas Obermeier wrote: >> MallocTracker::print_pointer_information in src/hotspot/share/services/mallocTracker.cpp is called to check the highest pointer address of the reserved region. To do so it aligns the test pointer down to the next 8 Byte boundary and casts this address to class MallocHeader in order to use this classes eye-catcher member _canary for validation. Method looks_valid() dereferences _canary's content. _canary has an offset of 14 bytes relative to the class. Therefore it resides outside the reserved region for the highest pointer address, which causes a segmentation violation. >> >> We would expect the same error also for other platforms than AIX as memory is read, which is not allocated. Interestingly, Linux seems to allow this access for 5 times 4K above the reserved region. >> >> As a solution, looks_valid() should check _canary's address as being invalid, and return false immediately. > > Thomas Obermeier has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'JDK-8306561' of https://github.com/TOatGithub/jdk into JDK-8306561 > - 8306561: test range instead of endpoints before casting @navyxliu : I have already taken over JDK-8319104, started analyzation, found the reason for the SIGILL and are working now on a fix for the root cause. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16381#issuecomment-1790269310 From tschatzl at openjdk.org Thu Nov 2 08:33:46 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 2 Nov 2023 08:33:46 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v6] In-Reply-To: References: Message-ID: > The JEP covers the idea very well, so I'm only covering some implementation details here: > > * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. > > * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: > > * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. > > * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). > > * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. > > * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) > > The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. > > I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. > > * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in addition to something like: > > `GC(6) P... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: NULL -> nullptr ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16342/files - new: https://git.openjdk.org/jdk/pull/16342/files/e5dfbb73..fb1deac4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=04-05 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16342.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16342/head:pull/16342 PR: https://git.openjdk.org/jdk/pull/16342 From duke at openjdk.org Thu Nov 2 09:04:05 2023 From: duke at openjdk.org (Thomas Obermeier) Date: Thu, 2 Nov 2023 09:04:05 GMT Subject: RFR: 8306561: Possible out of bounds access in print_pointer_information [v6] In-Reply-To: References: Message-ID: On Tue, 31 Oct 2023 17:27:57 GMT, Thomas Obermeier wrote: >> MallocTracker::print_pointer_information in src/hotspot/share/services/mallocTracker.cpp is called to check the highest pointer address of the reserved region. To do so it aligns the test pointer down to the next 8 Byte boundary and casts this address to class MallocHeader in order to use this classes eye-catcher member _canary for validation. Method looks_valid() dereferences _canary's content. _canary has an offset of 14 bytes relative to the class. Therefore it resides outside the reserved region for the highest pointer address, which causes a segmentation violation. >> >> We would expect the same error also for other platforms than AIX as memory is read, which is not allocated. Interestingly, Linux seems to allow this access for 5 times 4K above the reserved region. >> >> As a solution, looks_valid() should check _canary's address as being invalid, and return false immediately. > > Thomas Obermeier has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'JDK-8306561' of https://github.com/TOatGithub/jdk into JDK-8306561 > - 8306561: test range instead of endpoints before casting Checks still run fine with the latest switch to os::is_readable_range. Therefore: ------------- PR Comment: https://git.openjdk.org/jdk/pull/16381#issuecomment-1790322365 From jsjolen at openjdk.org Thu Nov 2 09:52:04 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 2 Nov 2023 09:52:04 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v8] In-Reply-To: References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: On Wed, 1 Nov 2023 17:22:00 GMT, Kim Barrett wrote: >at_put_grow requires E to be copy-assignable in order to install the new value at index i. With the new code it doesn't seem to me that `at_put_grow` requires this, correct? I'd like to get rid of this requirement. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1379841019 From duke at openjdk.org Thu Nov 2 09:54:14 2023 From: duke at openjdk.org (Thomas Obermeier) Date: Thu, 2 Nov 2023 09:54:14 GMT Subject: Integrated: 8306561: Possible out of bounds access in print_pointer_information In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 16:11:00 GMT, Thomas Obermeier wrote: > MallocTracker::print_pointer_information in src/hotspot/share/services/mallocTracker.cpp is called to check the highest pointer address of the reserved region. To do so it aligns the test pointer down to the next 8 Byte boundary and casts this address to class MallocHeader in order to use this classes eye-catcher member _canary for validation. Method looks_valid() dereferences _canary's content. _canary has an offset of 14 bytes relative to the class. Therefore it resides outside the reserved region for the highest pointer address, which causes a segmentation violation. > > We would expect the same error also for other platforms than AIX as memory is read, which is not allocated. Interestingly, Linux seems to allow this access for 5 times 4K above the reserved region. > > As a solution, looks_valid() should check _canary's address as being invalid, and return false immediately. This pull request has now been integrated. Changeset: d6ce62eb Author: Thomas Obermeier Committer: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/d6ce62ebc01eb483b486af886d9b79f60ff87de1 Stats: 5 lines in 2 files changed: 1 ins; 3 del; 1 mod 8306561: Possible out of bounds access in print_pointer_information Reviewed-by: stuefe, clanger ------------- PR: https://git.openjdk.org/jdk/pull/16381 From mgronlun at openjdk.org Thu Nov 2 10:24:00 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 2 Nov 2023 10:24:00 GMT Subject: RFR: 8319206: [REDO] Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native In-Reply-To: References: Message-ID: On Thu, 2 Nov 2023 06:31:46 GMT, David Holmes wrote: > General runtime changes seem fine. I can't comment on all the JFR-specific aspects but in general that appears okay too. > > Thanks Thanks David. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16447#issuecomment-1790453535 From simonis at openjdk.org Thu Nov 2 10:46:11 2023 From: simonis at openjdk.org (Volker Simonis) Date: Thu, 2 Nov 2023 10:46:11 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v37] In-Reply-To: References: Message-ID: On Wed, 1 Nov 2023 23:56:25 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with two additional commits since the last revision: > > - revert gitignore change > - Attempt to fix broken test @kstefanj , it is a pity that the `sun.management:type=HotspotThreading` MBean is not exported any more. If we move that or a similar functionality to a new MBean under `com.sun.management` (as proposed in the [cited discussion](https://mail.openjdk.org/pipermail/core-libs-dev/2023-September/111397.html)) then we might reuse these new hsperf counters in the same way this is already done by some other MBeans which already use hsperf counters as their information source. I think logging or JFR functionality could also easily be implemented on top of the new hsperf counters. As for @albertnetymk's proposal to simple use the `/proc` file system to retrieve thread CPU time information, that is not only inconvenient for the reasons detailed by @caoman. It will also not work for compiler threads (which I think should have similar counters in the future) because compiler threads can be created and and destroyed dynamically (due to [`-XX:+UseDynamicNumberOfCompilerThreads`](https://bugs.openjdk.org/browse/JDK-8198756)). The current PR still looks good to me. ------------- Marked as reviewed by simonis (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15082#pullrequestreview-1709943605 From egahlin at openjdk.org Thu Nov 2 10:55:02 2023 From: egahlin at openjdk.org (Erik Gahlin) Date: Thu, 2 Nov 2023 10:55:02 GMT Subject: RFR: 8319206: [REDO] Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native In-Reply-To: References: Message-ID: <_4eHbPXMOAmMi04Eycywjv5VZPjg8Eauor8U4bXrru4=.bfd2d276-26ba-4892-a18a-3ea855ed7ed2@github.com> On Wed, 1 Nov 2023 11:12:41 GMT, Markus Gr?nlund wrote: > Greetings, > > The original problem was introduced with: > > [JDK-8313251](https://bugs.openjdk.org/browse/JDK-8313251) - Add NativeLibraryLoad event > > The first attempt at resolution: > > [JDK-8315220](https://bugs.openjdk.org/browse/JDK-8315220) - Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native > > was reverted: > > [JDK-8315930](https://bugs.openjdk.org/browse/JDK-8315930) - Revert "8315220: Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native" > > because it ran into: > > [JDK-8315892](https://bugs.openjdk.org/browse/JDK-8315892) - NativeLibraryLoadEvent dtr fails with "assert(false) failed: Possible safepoint reached by thread that does not allow it" > > The reason for [JDK-8315892](https://bugs.openjdk.org/browse/JDK-8315892) was that the thread loading the zip library, in thread state native, was the owner of the Zip_lock mutex. This prevented it from transitioning to thread_in_vm. > > The Zip_lock mutex was removed as part of: > > [JDK-8317951](https://bugs.openjdk.org/browse/JDK-8317951) - Refactor loading of zip library to help resolve [JDK-8315220](https://bugs.openjdk.org/browse/JDK-8315220) > > Therefore, it is time to redo the original [JDK-8315220](https://bugs.openjdk.org/browse/JDK-8315220), now under this issue, [JDK-8319206](https://bugs.openjdk.org/browse/JDK-8319206). > > Testing: jdk_jfr, tiers1-6, stress testing > > Thanks > Markus Marked as reviewed by egahlin (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16447#pullrequestreview-1709958748 From shade at openjdk.org Thu Nov 2 11:03:16 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 2 Nov 2023 11:03:16 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v2] In-Reply-To: References: Message-ID: > See the symptoms, reproducer and analysis in the bug. > > Current code waits on `disarm()`, which effectively stalls leaving the safepoint if some threads lag behind. Having more runnable threads than CPUs nearly guarantees that we would wait for quite some time. Just waiting at `arm()` is insufficient, but we can have several `Semaphores` to do what we want. > > This PR implements a more efficient `GenericWaitBarrier` to recover the performance. Most of the implementation discussion is in the code comments. The key observation that drives this work is that we want to reuse `Semaphore` and related counters without being stuck waiting for threads to leave. > > (AFAICS, futex-based `LinuxWaitBarrier` does roughly the same, but handles this reuse on futex side, by assigning the "address" per futex.) > > This issue affects everything except Linux. I initially found this on my M1 Mac, but pretty sure it is easy to reproduce on Windows as well. The safepoints from the reproducer in the bug improved dramatically on a Mac. Note not only the two orders of magnitude better safepoint times, but also the >2x more GC safepoints in the time-bound allocation test, which means the attainable GC throughput is at least 2x more, since we don't waste time at this wait barrier. > > ![plot-generic-wait-barrier-macos](https://github.com/openjdk/jdk/assets/1858943/28cf22d3-b5ca-44fb-bde7-47189d14b47b) > > Additional testing: > - [x] MacOS AArch64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) > - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) > - [x] MacOS AArch64 server fastdebug, `tier2 tier3` > - [x] Linux x86_64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) > - [x] Linux AArch64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Tigthen up memory ordering even more conservatively ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16404/files - new: https://git.openjdk.org/jdk/pull/16404/files/a3906108..ca88eb74 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16404&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16404&range=00-01 Stats: 17 lines in 1 file changed: 10 ins; 3 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16404/head:pull/16404 PR: https://git.openjdk.org/jdk/pull/16404 From tschatzl at openjdk.org Thu Nov 2 11:42:54 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 2 Nov 2023 11:42:54 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v7] In-Reply-To: References: Message-ID: > The JEP covers the idea very well, so I'm only covering some implementation details here: > > * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. > > * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: > > * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. > > * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). > > * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. > > * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) > > The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. > > I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. > > * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in addition to something like: > > `GC(6) P... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: Renamings to (almost) consistently use the following nomenclature for evacuation failure and types of it: * evacuation failure is the general concept. It includes * pinned regions * allocation failure One region can both be pinned and experience an allocation failure. G1 GC messages use tags "(Pinned)" and "(Allocation Failure)" now instead of "(Evacuation Failure)" Did not rename the G1EvacFailureInjector since this adds a lot of noise. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16342/files - new: https://git.openjdk.org/jdk/pull/16342/files/fb1deac4..73f61da9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=05-06 Stats: 180 lines in 18 files changed: 54 ins; 28 del; 98 mod Patch: https://git.openjdk.org/jdk/pull/16342.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16342/head:pull/16342 PR: https://git.openjdk.org/jdk/pull/16342 From tschatzl at openjdk.org Thu Nov 2 11:46:02 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 2 Nov 2023 11:46:02 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v4] In-Reply-To: References: <8BH2UtnHn-DYz3c80Su4v9BF_v0w-N4fHkASCXP_E2c=.70c7ff8f-32e2-4970-87e3-fe22f7b08e6b@github.com> Message-ID: On Tue, 31 Oct 2023 18:54:26 GMT, Thomas Schatzl wrote: > Had a discussion with @albertnetymk and we came to the following agreement about naming: > >"allocation failure" - allocation failed in the to-space due to memory exhaustion >"pinned" - the region/object has been pinned >"evacuation failure" - either pinned or allocation failure > >I will apply this new naming asap. Done. I left out the `G1EvacFailureInjector` (it only injects allocation failures, not evacuation failures) related renamings as this adds lots of noise (including the debug options). I'll file a follow-up and assign it to me. Tier1 seems to pass, will redo upper tiers again. The only noteworthy externally visible change is that the `(Evacuation Failure)` tag in log messages is now `(Allocation Failure)`. I did not want combinations of `(Evacuation Failure)` and additionally `(Pinned) (Allocation Failure)`, but maybe it is fine, or just fine to keep only `(Evacuation Failure)` as before and assume that users enable higher level logging to find out details. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16342#issuecomment-1790570048 From mgronlun at openjdk.org Thu Nov 2 12:21:14 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 2 Nov 2023 12:21:14 GMT Subject: RFR: 8319206: [REDO] Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native In-Reply-To: References: Message-ID: <3NUFSEjLJFERqbU9waXTkd5y4LSdlh3h_1rgq7qNAFU=.dc4f6663-06c1-4864-bb18-9dda94ee3397@github.com> On Thu, 2 Nov 2023 06:31:46 GMT, David Holmes wrote: >> Greetings, >> >> The original problem was introduced with: >> >> [JDK-8313251](https://bugs.openjdk.org/browse/JDK-8313251) - Add NativeLibraryLoad event >> >> The first attempt at resolution: >> >> [JDK-8315220](https://bugs.openjdk.org/browse/JDK-8315220) - Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native >> >> was reverted: >> >> [JDK-8315930](https://bugs.openjdk.org/browse/JDK-8315930) - Revert "8315220: Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native" >> >> because it ran into: >> >> [JDK-8315892](https://bugs.openjdk.org/browse/JDK-8315892) - NativeLibraryLoadEvent dtr fails with "assert(false) failed: Possible safepoint reached by thread that does not allow it" >> >> The reason for [JDK-8315892](https://bugs.openjdk.org/browse/JDK-8315892) was that the thread loading the zip library, in thread state native, was the owner of the Zip_lock mutex. This prevented it from transitioning to thread_in_vm. >> >> The Zip_lock mutex was removed as part of: >> >> [JDK-8317951](https://bugs.openjdk.org/browse/JDK-8317951) - Refactor loading of zip library to help resolve [JDK-8315220](https://bugs.openjdk.org/browse/JDK-8315220) >> >> Therefore, it is time to redo the original [JDK-8315220](https://bugs.openjdk.org/browse/JDK-8315220), now under this issue, [JDK-8319206](https://bugs.openjdk.org/browse/JDK-8319206). >> >> Testing: jdk_jfr, tiers1-6, stress testing >> >> Thanks >> Markus > > General runtime changes seem fine. I can't comment on all the JFR-specific aspects but in general that appears okay too. > > Thanks Thanks, @dholmes-ora and @egahlin, for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16447#issuecomment-1790619632 From mgronlun at openjdk.org Thu Nov 2 12:21:17 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Thu, 2 Nov 2023 12:21:17 GMT Subject: Integrated: 8319206: [REDO] Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native In-Reply-To: References: Message-ID: <_370DknHJ3dr8nUYO0JZneHL506Q1MbI8upeIzBXA_4=.5eefa8bf-0cc0-4144-aaea-7660e41916ac@github.com> On Wed, 1 Nov 2023 11:12:41 GMT, Markus Gr?nlund wrote: > Greetings, > > The original problem was introduced with: > > [JDK-8313251](https://bugs.openjdk.org/browse/JDK-8313251) - Add NativeLibraryLoad event > > The first attempt at resolution: > > [JDK-8315220](https://bugs.openjdk.org/browse/JDK-8315220) - Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native > > was reverted: > > [JDK-8315930](https://bugs.openjdk.org/browse/JDK-8315930) - Revert "8315220: Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native" > > because it ran into: > > [JDK-8315892](https://bugs.openjdk.org/browse/JDK-8315892) - NativeLibraryLoadEvent dtr fails with "assert(false) failed: Possible safepoint reached by thread that does not allow it" > > The reason for [JDK-8315892](https://bugs.openjdk.org/browse/JDK-8315892) was that the thread loading the zip library, in thread state native, was the owner of the Zip_lock mutex. This prevented it from transitioning to thread_in_vm. > > The Zip_lock mutex was removed as part of: > > [JDK-8317951](https://bugs.openjdk.org/browse/JDK-8317951) - Refactor loading of zip library to help resolve [JDK-8315220](https://bugs.openjdk.org/browse/JDK-8315220) > > Therefore, it is time to redo the original [JDK-8315220](https://bugs.openjdk.org/browse/JDK-8315220), now under this issue, [JDK-8319206](https://bugs.openjdk.org/browse/JDK-8319206). > > Testing: jdk_jfr, tiers1-6, stress testing > > Thanks > Markus This pull request has now been integrated. Changeset: faa8bde2 Author: Markus Gr?nlund URL: https://git.openjdk.org/jdk/commit/faa8bde27569b4db3a1a9dd62adee0b10e81a718 Stats: 395 lines in 10 files changed: 243 ins; 114 del; 38 mod 8319206: [REDO] Event NativeLibraryLoad breaks invariant by taking a stacktrace when thread is in state _thread_in_native Reviewed-by: dholmes, egahlin ------------- PR: https://git.openjdk.org/jdk/pull/16447 From jvernee at openjdk.org Thu Nov 2 12:29:02 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 2 Nov 2023 12:29:02 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected In-Reply-To: References: Message-ID: On Thu, 2 Nov 2023 05:02:21 GMT, David Holmes wrote: > @JornVernee could you add tiers 7 and 8 to your testing please. I would be worried in case there is any subtle interaction with Xcomp mode, or with ZGC. Ok > what happens when a "dead" catch block turns out to not be dead afterall? We insert an uncommon trap in place of the exception handler. So, if the handler is needed after all, we deoptimize and go back to the interpreter. If the method is recompiled later, we see that the exception handler has been entered, and don't emit an uncommon trap that time. Note also that C2 already emits uncommon traps in place of exception handlers if it sees an unloaded exception type as well: https://github.com/openjdk/jdk/blob/faa8bde27569b4db3a1a9dd62adee0b10e81a718/src/hotspot/share/opto/doCall.cpp#L886-L888 ------------- PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1790633998 From shade at openjdk.org Thu Nov 2 12:36:03 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 2 Nov 2023 12:36:03 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v2] In-Reply-To: References: Message-ID: <7KnMuDN5d8h7L0Me5x9OOXciMW9Y3vJLsaWdB0HI9LU=.b1543be4-789a-400a-9941-cac18f640afa@github.com> On Thu, 2 Nov 2023 11:03:16 GMT, Aleksey Shipilev wrote: >> See the symptoms, reproducer and analysis in the bug. >> >> Current code waits on `disarm()`, which effectively stalls leaving the safepoint if some threads lag behind. Having more runnable threads than CPUs nearly guarantees that we would wait for quite some time, but it also reproduces well if you have enough threads near the CPU count. Just waiting at `arm()` is insufficient, but we can have several `Semaphores` to do what we want. >> >> This PR implements a more efficient `GenericWaitBarrier` to recover the performance. Most of the implementation discussion is in the code comments. The key observation that drives this work is that we want to reuse `Semaphore` and related counters without being stuck waiting for threads to leave. >> >> (AFAICS, futex-based `LinuxWaitBarrier` does roughly the same, but handles this reuse on futex side, by assigning the "address" per futex.) >> >> This issue affects everything except Linux. I initially found this on my M1 Mac, but pretty sure it is easy to reproduce on Windows as well. The safepoints from the reproducer in the bug improved dramatically on a Mac. Note not only the orders of magnitude better safepoint times, but also the several times more GC safepoints in the time-bound allocation test, which means the attainable GC throughput is similarly better, since we don't waste time at this wait barrier. >> >> ![plot-generic-wait-barrier-macos](https://github.com/openjdk/jdk/assets/1858943/3440f200-2360-482a-b8b2-71385cca35b7) >> >> Additional testing: >> - [x] MacOS AArch64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] MacOS AArch64 server fastdebug, `tier2 tier3` >> - [x] Linux x86_64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Tigthen up memory ordering even more conservatively Stress tests show there is still a race condition, which can be triggered by putting a small sleep on this path: diff --git a/src/hotspot/share/utilities/waitBarrier_generic.cpp b/src/hotspot/share/utilities/waitBarrier_generic.cpp index f886e7b4cf5..13ba12c39a6 100644 --- a/src/hotspot/share/utilities/waitBarrier_generic.cpp +++ b/src/hotspot/share/utilities/waitBarrier_generic.cpp @@ -174,6 +174,7 @@ void GenericWaitBarrier::wait(int barrier_tag) { OrderAccess::fence(); if (Atomic::load_acquire(&_barrier_tag) == barrier_tag) { + if (UseNewCode) os::naked_short_nanosleep(100); Atomic::add(&cell._unsignaled_waits, 1); // Wait for notification. It would then fail/hang this `tier4` test: $ CONF=linux-x86_64-server-fastdebug make images test TEST=vmTestbase/nsk/monitoring/ThreadInfo/isSuspended/issuspended002.java TEST_VM_OPTS="-XX:+UnlockDiagnosticVMOptions -XX:+UseNewCode" Putting this issue to draft until I figure out the better way to do this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1790644526 From mbaesken at openjdk.org Thu Nov 2 12:55:05 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 2 Nov 2023 12:55:05 GMT Subject: RFR: 8317562: [JFR] Compilation queue statistics [v5] In-Reply-To: References: Message-ID: On Thu, 2 Nov 2023 02:04:28 GMT, Mat Carter wrote: >> Adding a new periodic jfr event to monitor and output statistics for the compiler queues. You will see one event per compiler queue (c1 and c2) >> >> Passes tier1 on linux (x86) and mac (aarch64) > > Mat Carter has updated the pull request incrementally with two additional commits since the last revision: > > - Fixed copyright notice > - Fixed copyright notice Hi , we are seeing now the following error in our tests (with this PR added) when running jdk/jfr/event/compiler/TestCompilerQueueUtilization.java java.lang.RuntimeException: Field ingress not in struct at jdk.test.lib.Asserts.fail(Asserts.java:634) at jdk.test.lib.jfr.Events.getValueDescriptor(Events.java:154) at jdk.test.lib.jfr.Events.assertField(Events.java:69) at jdk.jfr.event.compiler.TestCompilerQueueUtilization.main(TestCompilerQueueUtilization.java:55) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138) at java.base/java.lang.Thread.run(Thread.java:1570) java.lang.RuntimeException: Field ingress not in event at jdk.test.lib.Asserts.fail(Asserts.java:634) at jdk.test.lib.jfr.Events.assertField(Events.java:81) at jdk.jfr.event.compiler.TestCompilerQueueUtilization.main(TestCompilerQueueUtilization.java:55) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138) at java.base/java.lang.Thread.run(Thread.java:1570) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16211#issuecomment-1790667977 From matsaave at openjdk.org Thu Nov 2 14:33:14 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 2 Nov 2023 14:33:14 GMT Subject: RFR: 8315890: Attempts to load from nullptr in instanceKlass.cpp and unsafe.cpp [v2] In-Reply-To: <-1Lov-AqYoY3LCyoctox4fRMBQ_2BFoKyHb898eRaRE=.e5acf632-079f-4e90-877c-dfc36c2e0549@github.com> References: <-1Lov-AqYoY3LCyoctox4fRMBQ_2BFoKyHb898eRaRE=.e5acf632-079f-4e90-877c-dfc36c2e0549@github.com> Message-ID: <2Bt9Opvhnr2W0uV8dVBoe0DqJpE58cHrQXzV8Eto4bA=.a2adb0dc-c465-4792-b48a-6c154b08f3d7@github.com> On Mon, 30 Oct 2023 20:26:12 GMT, Coleen Phillimore wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Moved assert higher in call stack > > Yes, this looks better. This was the source of the nullptr, except in these two cases, the pointer is never null. Thanks for the reviews @coleenp , @calvinccheung , and @dholmes-ora! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16405#issuecomment-1790843625 From matsaave at openjdk.org Thu Nov 2 14:33:16 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 2 Nov 2023 14:33:16 GMT Subject: Integrated: 8315890: Attempts to load from nullptr in instanceKlass.cpp and unsafe.cpp In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 15:40:58 GMT, Matias Saavedra Silva wrote: > Calls in instanceKlass.cpp and unsafe.cpp try to call an atomic load on method calls that could return nullptr. This patch ensures that nullptr is not passed into the load. > > In `print_as_native_pointer` in archiveBuilder, `source_obj_to_requested_obj` should not be able to return `nullptr` as the result is immediately cast to an oop which cascades down to the failure reported in `get_volatile()` in `unsafe.cpp`. Placing an assert close to the top of this call stack should prevent this from happening and will better indicate the source of an unexpected `nullptr` should it occur. > > Verified with tier1-5 tests. This pull request has now been integrated. Changeset: 7a7b1e5a Author: Matias Saavedra Silva URL: https://git.openjdk.org/jdk/commit/7a7b1e5a920d71ab717d8993c9258a01f1074a48 Stats: 5 lines in 2 files changed: 3 ins; 1 del; 1 mod 8315890: Attempts to load from nullptr in instanceKlass.cpp and unsafe.cpp Reviewed-by: coleenp, ccheung, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/16405 From matsaave at openjdk.org Thu Nov 2 15:03:06 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 2 Nov 2023 15:03:06 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v8] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. > > This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. > > Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. > > To streamline the review, please consider these major areas that have been changed: > 1. ResolvedMethodEntry class > 2. Rewriter for initialization of the structure > 3. cpCache for resolution > 4. InterpreterRuntime, linkResolver, and templateTable > 5. JVMCI > 6. SA > > Verified with tier 1-9 tests. > > This change supports the following platforms: x86, aarch64 Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Prepare_invoke args and hard coded registers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15455/files - new: https://git.openjdk.org/jdk/pull/15455/files/5660950d..7addccd6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15455&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15455&range=06-07 Stats: 20 lines in 2 files changed: 0 ins; 3 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/15455.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15455/head:pull/15455 PR: https://git.openjdk.org/jdk/pull/15455 From mdoerr at openjdk.org Thu Nov 2 15:13:17 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 2 Nov 2023 15:13:17 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v8] In-Reply-To: References: Message-ID: On Thu, 2 Nov 2023 15:03:06 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64 > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Prepare_invoke args and hard coded registers Thanks! That looks better to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15455#issuecomment-1790923864 From jvernee at openjdk.org Thu Nov 2 15:15:31 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 2 Nov 2023 15:15:31 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v2] In-Reply-To: References: Message-ID: > The issue is essentially that for the Java try-with-resource construct, javac generates multiple calls to `close(`) the resource. One of those calls is inside the hidden exception handler of the try block. The issue for us is that typically the exception handler is never entered (since no exception is thrown), however we don't profile exception handlers at the moment, so the block is not pruned. C2 doesn't inline the `close()` call in the handler due to low call site frequency. As a result, the receiver of that call escapes and can not be scalar replaced, which then leads to a loss in performance. > > There has been some discussion on the JBS issue that this could be fixed by profiling catch blocks. And another suggestion that partial escape analysis could help here to prevent the object from escaping. But, I think there are other benefits to being able to prune dead catch blocks, such as general reduction in code size, and other optimizations being possible by dead code being eliminated. So, I've implemented catch block profiling + pruning in this patch. > > The implementation is essentially very straightforward: we allocate an extra bit of profiling data for each > exception handler of a method in the `MethodData` for that method (which holds all the profiling > data). Then when looking up the exception handler after an exception is thrown, we mark the > exception handler as entered. When C2 parses the exception handler block, and it sees that it has > never been entered, we emit an uncommon trap instead. > > I've also cleaned up the handling of profiling data sections a bit. After adding the extra section of data to MethodData, I was seeing several crashes when ciMethodData was used. The underlying issue seemed to be that the offset of the parameter data was computed based on the total data size - parameter data size (which doesn't work if we add an additional section for exception handler data). I've re-written the code around this a bit to try and prevent issues in the future. Both MethodData and ciMethodData now track offsets of parameter data and exception handler data, and the size of the each data section is derived from the offsets. > > Finally, there was an assert firing in `freeze_internal` in `continuationFreezeThaw.cpp`: > > assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), > "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int64_t)current->held_monitor_count... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: add one more test case to see that we have no trap after we hit the uncommon trap and reconpile to tier 4 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16416/files - new: https://git.openjdk.org/jdk/pull/16416/files/059e306c..764bf3bd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16416&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16416&range=00-01 Stats: 25 lines in 1 file changed: 25 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16416/head:pull/16416 PR: https://git.openjdk.org/jdk/pull/16416 From tschatzl at openjdk.org Thu Nov 2 15:51:35 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 2 Nov 2023 15:51:35 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v8] In-Reply-To: References: Message-ID: > The JEP covers the idea very well, so I'm only covering some implementation details here: > > * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. > > * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: > > * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. > > * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). > > * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. > > * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) > > The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. > > I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. > > * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in addition to something like: > > `GC(6) P... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: Add documentation about why and how we handle pinned regions in the young/old generation. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16342/files - new: https://git.openjdk.org/jdk/pull/16342/files/73f61da9..5ae05e4c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=06-07 Stats: 18 lines in 2 files changed: 16 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16342.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16342/head:pull/16342 PR: https://git.openjdk.org/jdk/pull/16342 From tschatzl at openjdk.org Thu Nov 2 15:55:05 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 2 Nov 2023 15:55:05 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v8] In-Reply-To: References: Message-ID: On Mon, 30 Oct 2023 17:12:19 GMT, Thomas Schatzl wrote: >> I do not think so. I will do some more testing about this. > > I (still) do not think it is possible after some more re-testing. There are the following situations I can think of: > > * string deduplication is a need-to-be-supported case where only the C code may have a reference to a pinned object: thread A critical sections a string, gets the char array address, locking the region containing the char array. Then string dedup goes ahead and replaces the original char array with something else. Now the C code has the only reference to that char array. > There is no API to convert a raw array pointer back to a Java object so destroying the header is fine; unpinning does not need the header. > > * there is some other case I can think of that could be problematic, but is actually a spec violation: the array is critical-locked by thread A, then shared with other C code (not critical-unlocked), resumes with Java code that forgets that reference. At some point other C code accesses that locked memory and (hopefully) critically-unlocks it. > Again, there is no API to convert a raw array pointer back to a Java object so destroying the header is fine. > > In all other cases I can think of there is always a reference to the encapsulating java object either from the stack frame (when passing in the object into the JNI function they are part of the oop maps) or if you create a new array object (via `NewArray` and lock it, the VM will add a handle to it. > > There is also no API to inspect the array header using the raw pointer (e.g. passing the raw pointer to `GetArrayLength` - doesn't compile as it expects a `jarray`, and in debug VMs there is actually a check that the passed argument is something that resembles a handle), so modifications are already invalid, and the change is fine imo. > > hth, > Thomas Here is some example (pseudo-) code for the first case mentioned above that should be valid JNI code: Java code: String x = ...; native_f1(x); [ some java code, x.chars gets deduplicated, its char array pointing to somewhere else now. Now native code is the only one having a reference to the old char array ] native_f2(); ----------- sample native code: void native_f1(jobject jstring) { global_string = NewGlobalRef(jstring); global_raw_chars = GetStringChars(global_string); } void native_f2() { ReleaseStringChars(global_string, global_raw_chars); DeleteGlobalRef(global_string); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1380370821 From jvernee at openjdk.org Thu Nov 2 16:27:35 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 2 Nov 2023 16:27:35 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v3] In-Reply-To: References: Message-ID: > The issue is essentially that for the Java try-with-resource construct, javac generates multiple calls to `close(`) the resource. One of those calls is inside the hidden exception handler of the try block. The issue for us is that typically the exception handler is never entered (since no exception is thrown), however we don't profile exception handlers at the moment, so the block is not pruned. C2 doesn't inline the `close()` call in the handler due to low call site frequency. As a result, the receiver of that call escapes and can not be scalar replaced, which then leads to a loss in performance. > > There has been some discussion on the JBS issue that this could be fixed by profiling catch blocks. And another suggestion that partial escape analysis could help here to prevent the object from escaping. But, I think there are other benefits to being able to prune dead catch blocks, such as general reduction in code size, and other optimizations being possible by dead code being eliminated. So, I've implemented catch block profiling + pruning in this patch. > > The implementation is essentially very straightforward: we allocate an extra bit of profiling data for each > exception handler of a method in the `MethodData` for that method (which holds all the profiling > data). Then when looking up the exception handler after an exception is thrown, we mark the > exception handler as entered. When C2 parses the exception handler block, and it sees that it has > never been entered, we emit an uncommon trap instead. > > I've also cleaned up the handling of profiling data sections a bit. After adding the extra section of data to MethodData, I was seeing several crashes when ciMethodData was used. The underlying issue seemed to be that the offset of the parameter data was computed based on the total data size - parameter data size (which doesn't work if we add an additional section for exception handler data). I've re-written the code around this a bit to try and prevent issues in the future. Both MethodData and ciMethodData now track offsets of parameter data and exception handler data, and the size of the each data section is derived from the offsets. > > Finally, there was an assert firing in `freeze_internal` in `continuationFreezeThaw.cpp`: > > assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), > "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int64_t)current->held_monitor_count... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: add some assertion to IR test that check for compilation and deoptimization ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16416/files - new: https://git.openjdk.org/jdk/pull/16416/files/764bf3bd..3ad93fdd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16416&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16416&range=01-02 Stats: 7 lines in 1 file changed: 4 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16416/head:pull/16416 PR: https://git.openjdk.org/jdk/pull/16416 From macarte at openjdk.org Thu Nov 2 16:30:32 2023 From: macarte at openjdk.org (Mat Carter) Date: Thu, 2 Nov 2023 16:30:32 GMT Subject: RFR: 8317562: [JFR] Compilation queue statistics [v6] In-Reply-To: References: Message-ID: > Adding a new periodic jfr event to monitor and output statistics for the compiler queues. You will see one event per compiler queue (c1 and c2) > > Passes tier1 on linux (x86) and mac (aarch64) Mat Carter has updated the pull request incrementally with one additional commit since the last revision: Updated test to reflect field name changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16211/files - new: https://git.openjdk.org/jdk/pull/16211/files/08b2dbc2..4a1dfbf7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16211&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16211&range=04-05 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/16211.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16211/head:pull/16211 PR: https://git.openjdk.org/jdk/pull/16211 From macarte at openjdk.org Thu Nov 2 16:30:33 2023 From: macarte at openjdk.org (Mat Carter) Date: Thu, 2 Nov 2023 16:30:33 GMT Subject: RFR: 8317562: [JFR] Compilation queue statistics [v5] In-Reply-To: References: Message-ID: On Thu, 2 Nov 2023 12:50:33 GMT, Matthias Baesken wrote: > Hi , we are seeing now the following error in our tests (with this PR added) when running jdk/jfr/event/compiler/TestCompilerQueueUtilization.java > > java.lang.RuntimeException: Field ingress not in struct at jdk.test.lib.Asserts.fail(Asserts.java:634) at jdk.test.lib.jfr.Events.getValueDescriptor(Events.java:154) at jdk.test.lib.jfr.Events.assertField(Events.java:69) at jdk.jfr.event.compiler.TestCompilerQueueUtilization.main(TestCompilerQueueUtilization.java:55) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138) at java.base/java.lang.Thread.run(Thread.java:1570) java.lang.RuntimeException: Field ingress not in event at jdk.test.lib.Asserts.fail(Asserts.java:634) at jdk.test.lib.jfr.Events.assertField(Events.java:81) at jdk.jfr.event.compiler.TestCompilerQueueUtilization.main(TestCompilerQueueUtilization.java:55) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java .lang.reflect.Method.invoke(Method.java:580) at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138) at java.base/java.lang.Thread.run(Thread.java:1570) Thank you for bringing this to my attention, I had failed to commit the updated test (to reflect the field name changes), it's not part of tier1 tests and so the github checks passed I have now committed the updated test file ------------- PR Comment: https://git.openjdk.org/jdk/pull/16211#issuecomment-1791065806 From mli at openjdk.org Thu Nov 2 16:57:25 2023 From: mli at openjdk.org (Hamlin Li) Date: Thu, 2 Nov 2023 16:57:25 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits Message-ID: Hi, Can you review the change to add intrinsic for CompressBits for Long & Integer? Thanks! ##?Test pass jtreg test: test/jdk/java/lang/CompressExpand*.java ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/16481/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16481&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318218 Stats: 102 lines in 6 files changed: 100 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16481.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16481/head:pull/16481 PR: https://git.openjdk.org/jdk/pull/16481 From mli at openjdk.org Thu Nov 2 17:34:03 2023 From: mli at openjdk.org (Hamlin Li) Date: Thu, 2 Nov 2023 17:34:03 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits In-Reply-To: References: Message-ID: On Thu, 2 Nov 2023 16:50:50 GMT, Hamlin Li wrote: > Hi, > Can you review the change to add intrinsic for CompressBits for Long & Integer? > Thanks! > > ##?Test > pass jtreg test: > test/jdk/java/lang/CompressExpand*.java I made a mistake, UseRVVForCompressBitsIntrinsics is only defined in riscv global.hpp. I think I can resolve the issue by defining it in global global.hpp, but seems it's not a good idea either. Any suggestions? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16481#issuecomment-1791218549 From dnsimon at openjdk.org Thu Nov 2 17:46:16 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 2 Nov 2023 17:46:16 GMT Subject: RFR: 8318982: Improve Exceptions::special_exception [v3] In-Reply-To: References: Message-ID: On Wed, 1 Nov 2023 09:17:30 GMT, Doug Simon wrote: >> This PR consolidates the 2 almost identical versions of `Exceptions::special_exception` into a single method. >> If a special exception is thrown and `-Xlog:exceptions` is enabled, a log message is emitted and it indicates the special handling. >> >> Here's an example in the output from running `compiler/linkage/LinkageErrors.java` with `-Xlog:exceptions -Xcomp`: >> >> [0.194s][info][exceptions] Exception (java.util.Set, java.lang.String, java.util.Set, boolean)' (java.lang.module.ModuleDescriptor$1 and java.lang.module.ModuleDescriptor$Exports are in module java.base of loader 'bootstrap')> (0x0000000000000000) >> thrown [src/hotspot/share/interpreter/linkResolver.cpp, line 591] >> for thread 0x000000011e18c600 >> thread cannot call Java, throwing pre-allocated exception: a 'java/lang/VirtualMachineError'{0x0000000772e06f00} >> >> >> The motivation for this change was work on [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) where it's useful to know when exceptions are thrown on a CompilerThread. > > Doug Simon has updated the pull request incrementally with two additional commits since the last revision: > > - differentiate special exception log message from normal log message > - check special_exception arguments for nullness pre-condition Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16401#issuecomment-1791245292 From dnsimon at openjdk.org Thu Nov 2 17:46:17 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 2 Nov 2023 17:46:17 GMT Subject: Integrated: 8318982: Improve Exceptions::special_exception In-Reply-To: References: Message-ID: On Fri, 27 Oct 2023 14:03:45 GMT, Doug Simon wrote: > This PR consolidates the 2 almost identical versions of `Exceptions::special_exception` into a single method. > If a special exception is thrown and `-Xlog:exceptions` is enabled, a log message is emitted and it indicates the special handling. > > Here's an example in the output from running `compiler/linkage/LinkageErrors.java` with `-Xlog:exceptions -Xcomp`: > > [0.194s][info][exceptions] Exception (java.util.Set, java.lang.String, java.util.Set, boolean)' (java.lang.module.ModuleDescriptor$1 and java.lang.module.ModuleDescriptor$Exports are in module java.base of loader 'bootstrap')> (0x0000000000000000) > thrown [src/hotspot/share/interpreter/linkResolver.cpp, line 591] > for thread 0x000000011e18c600 > thread cannot call Java, throwing pre-allocated exception: a 'java/lang/VirtualMachineError'{0x0000000772e06f00} > > > The motivation for this change was work on [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) where it's useful to know when exceptions are thrown on a CompilerThread. This pull request has now been integrated. Changeset: f875163c Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/f875163c5d1961dd306033d866c95fe91728ba37 Stats: 49 lines in 2 files changed: 20 ins; 21 del; 8 mod 8318982: Improve Exceptions::special_exception Reviewed-by: coleenp, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/16401 From shade at openjdk.org Thu Nov 2 18:13:38 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 2 Nov 2023 18:13:38 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v13] In-Reply-To: References: Message-ID: > See more details in the bug and related issues. > > This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases. > > This implements mitigation on most current architectures: > - ? x86_64: implemented > - ? x86_32: considered, abandoned; cannot be easily done without blowing up code size > - ? AArch64: implemented > - ? ARM32: considered, abandoned; needs cleanups and testing; see [JDK-8318414](https://bugs.openjdk.org/browse/JDK-8318414) > - ? PPC64: implemented, thanks @TheRealMDoerr > - ? S390: implemented, thanks @offamitkumar > - ? RISC-V: implemented, thanks @RealFYang > - ? Zero: does not need implementation > > Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity. > > Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong. > > I believe we can go in with `1000` as the default, given the experimental results mentioned in this PR. > > Additional testing: > - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` > - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 29 additional commits since the last revision: - Merge branch 'master' into JDK-8316180-backoff-secondary-super - Improve benchmarks - Merge branch 'master' into JDK-8316180-backoff-secondary-super - Editorial cleanups - RISC-V implementation - Mention ARM32 bug - Make sure benchmark runs with C1 - Merge branch 'master' into JDK-8316180-backoff-secondary-super - Touchup benchmark metadata - S390 implementation - ... and 19 more: https://git.openjdk.org/jdk/compare/0d7c783b...74921ea9 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15718/files - new: https://git.openjdk.org/jdk/pull/15718/files/0e1fccd2..74921ea9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15718&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15718&range=11-12 Stats: 43531 lines in 1679 files changed: 25397 ins; 7425 del; 10709 mod Patch: https://git.openjdk.org/jdk/pull/15718.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15718/head:pull/15718 PR: https://git.openjdk.org/jdk/pull/15718 From dlong at openjdk.org Thu Nov 2 19:47:15 2023 From: dlong at openjdk.org (Dean Long) Date: Thu, 2 Nov 2023 19:47:15 GMT Subject: RFR: 8306561: Possible out of bounds access in print_pointer_information [v6] In-Reply-To: References: Message-ID: On Tue, 31 Oct 2023 17:27:57 GMT, Thomas Obermeier wrote: >> MallocTracker::print_pointer_information in src/hotspot/share/services/mallocTracker.cpp is called to check the highest pointer address of the reserved region. To do so it aligns the test pointer down to the next 8 Byte boundary and casts this address to class MallocHeader in order to use this classes eye-catcher member _canary for validation. Method looks_valid() dereferences _canary's content. _canary has an offset of 14 bytes relative to the class. Therefore it resides outside the reserved region for the highest pointer address, which causes a segmentation violation. >> >> We would expect the same error also for other platforms than AIX as memory is read, which is not allocated. Interestingly, Linux seems to allow this access for 5 times 4K above the reserved region. >> >> As a solution, looks_valid() should check _canary's address as being invalid, and return false immediately. > > Thomas Obermeier has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'JDK-8306561' of https://github.com/TOatGithub/jdk into JDK-8306561 > - 8306561: test range instead of endpoints before casting src/hotspot/share/nmt/mallocTracker.cpp line 215: > 213: for (; here >= end; here -= smallest_possible_alignment) { > 214: // JDK-8306561: cast to a MallocHeader needs to guarantee it can reside in readable memory > 215: if (!os::is_readable_range(here, here + sizeof(MallocHeader) - 1)) { Sorry I noticed this late, but the " - 1" looks wrong here, because is_readable_range() checks for < `to`, not <= `to`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16381#discussion_r1380694414 From kbarrett at openjdk.org Thu Nov 2 19:48:04 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 2 Nov 2023 19:48:04 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v8] In-Reply-To: References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: On Thu, 2 Nov 2023 09:49:28 GMT, Johan Sj?len wrote: >> My point is that the old code was fine. Copy-assigning the fill argument >> (either defaulted or passed in) had to work (E can't be non-copyable), because >> at_put_grow requires E to be copy-assignable in order to install the new value >> at index i. > >>at_put_grow requires E to be copy-assignable in order to install the new value > at index i. > > With the new code it doesn't seem to me that `at_put_grow` requires this, correct? I'd like to get rid of this requirement. See line#430: `this->_data[i] = elem;`. That requires E to be copy-assignable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1380695612 From shade at openjdk.org Thu Nov 2 21:00:16 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 2 Nov 2023 21:00:16 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v3] In-Reply-To: References: Message-ID: > See the symptoms, reproducer and analysis in the bug. > > Current code waits on `disarm()`, which effectively stalls leaving the safepoint if some threads lag behind. Having more runnable threads than CPUs nearly guarantees that we would wait for quite some time, but it also reproduces well if you have enough threads near the CPU count. Just waiting at `arm()` is insufficient, but we can have several `Semaphores` to do what we want. > > This PR implements a more efficient `GenericWaitBarrier` to recover the performance. Most of the implementation discussion is in the code comments. The key observation that drives this work is that we want to reuse `Semaphore` and related counters without being stuck waiting for threads to leave. > > (AFAICS, futex-based `LinuxWaitBarrier` does roughly the same, but handles this reuse on futex side, by assigning the "address" per futex.) > > This issue affects everything except Linux. I initially found this on my M1 Mac, but pretty sure it is easy to reproduce on Windows as well. The safepoints from the reproducer in the bug improved dramatically on a Mac. Note not only the orders of magnitude better safepoint times, but also the several times more GC safepoints in the time-bound allocation test, which means the attainable GC throughput is similarly better, since we don't waste time at this wait barrier. > > ![plot-generic-wait-barrier-macos](https://github.com/openjdk/jdk/assets/1858943/3440f200-2360-482a-b8b2-71385cca35b7) > > Additional testing: > - [x] MacOS AArch64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) > - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) > - [x] MacOS AArch64 server fastdebug, `tier2 tier3` > - [x] Linux x86_64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) > - [x] Linux AArch64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) Aleksey Shipilev has updated the pull request incrementally with four additional commits since the last revision: - Touchups - More comments work - Tight up the comments - Rework to a single atomic counter per cell ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16404/files - new: https://git.openjdk.org/jdk/pull/16404/files/ca88eb74..dfafbf3a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16404&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16404&range=01-02 Stats: 180 lines in 2 files changed: 67 ins; 40 del; 73 mod Patch: https://git.openjdk.org/jdk/pull/16404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16404/head:pull/16404 PR: https://git.openjdk.org/jdk/pull/16404 From rehn at openjdk.org Thu Nov 2 21:00:18 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 2 Nov 2023 21:00:18 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance In-Reply-To: References: Message-ID: On Thu, 2 Nov 2023 08:19:35 GMT, Aleksey Shipilev wrote: > @robehn, you might be interested in this :) Yepp, thanks for pinging me in! I'll have a look as when you are ready! Regarding the bug, accounting for context switches in every 'sub-state' usually gets you at least one time. That is what the: Atomic::add(&_barrier_threads, 1);, did, "no marine left behind" :) And while on this topic: Note that there is an optimization that can be done on Linux also. Waking all threads via futex the VM thread gets high runtime, so two safepoint very close to each other is slow. Meaning if there are a new safepoint op depending, VM thread often will context switched out just after waking all threads. Secondly the futex wake, involving visiting all run queues, can be parallelized similar to this by just waking a few thread and let them wake the rest. Intel actually did a draft of that just before meltdown/spectre, so it got lost. I think that draft did wake like 6 or 8 at the time, one large system ~128 cores you could get the time to full utilization down by almost 50% (but you lose some latency for the JavaThreads doing the second round of wakening). I mention this since you have setup measurements and graphs, so maybe you like to continue on this code :) (no jira issue for this) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1790668558 From shade at openjdk.org Thu Nov 2 21:00:21 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 2 Nov 2023 21:00:21 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v2] In-Reply-To: References: Message-ID: <2eXNHrpyHgdJQSGKW0fMQMCi0cVzd6hzaOTo5lLmFpg=.ee20bd05-3b39-4719-9d7e-4f7a54c78e81@github.com> On Thu, 2 Nov 2023 11:03:16 GMT, Aleksey Shipilev wrote: >> See the symptoms, reproducer and analysis in the bug. >> >> Current code waits on `disarm()`, which effectively stalls leaving the safepoint if some threads lag behind. Having more runnable threads than CPUs nearly guarantees that we would wait for quite some time, but it also reproduces well if you have enough threads near the CPU count. Just waiting at `arm()` is insufficient, but we can have several `Semaphores` to do what we want. >> >> This PR implements a more efficient `GenericWaitBarrier` to recover the performance. Most of the implementation discussion is in the code comments. The key observation that drives this work is that we want to reuse `Semaphore` and related counters without being stuck waiting for threads to leave. >> >> (AFAICS, futex-based `LinuxWaitBarrier` does roughly the same, but handles this reuse on futex side, by assigning the "address" per futex.) >> >> This issue affects everything except Linux. I initially found this on my M1 Mac, but pretty sure it is easy to reproduce on Windows as well. The safepoints from the reproducer in the bug improved dramatically on a Mac. Note not only the orders of magnitude better safepoint times, but also the several times more GC safepoints in the time-bound allocation test, which means the attainable GC throughput is similarly better, since we don't waste time at this wait barrier. >> >> ![plot-generic-wait-barrier-macos](https://github.com/openjdk/jdk/assets/1858943/3440f200-2360-482a-b8b2-71385cca35b7) >> >> Additional testing: >> - [x] MacOS AArch64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] MacOS AArch64 server fastdebug, `tier2 tier3` >> - [x] Linux x86_64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Tigthen up memory ordering even more conservatively I think this is good for review. The reproducer that used to hang/fail on assert is now passing. `tier1 tier2 tier3` are all passing. I am running more tests overnight. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1791527483 From shade at openjdk.org Thu Nov 2 21:00:19 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 2 Nov 2023 21:00:19 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance In-Reply-To: References: Message-ID: On Thu, 2 Nov 2023 12:50:56 GMT, Robbin Ehn wrote: > Regarding the bug, accounting for context switches in every 'sub-state' usually gets you at least one time. That is what the: Atomic::add(&_barrier_threads, 1);, did, "no marine left behind" :) Yes. I think most of the race condition mess comes from juggling several counters at once, *plus* depending on `barrier_tag`. The new version fuses the important counters together and melds in the arm/disarm status into the counter. Which allows to manage things more easily but CAS-ing one counter only. I think that resolves race conditions I saw in previous implementation. It yields same performance. Now running it through functional testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1791021354 From cslucas at openjdk.org Thu Nov 2 22:36:20 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 2 Nov 2023 22:36:20 GMT Subject: RFR: JDK-8241503: C2: Share MacroAssembler between mach nodes during code emission Message-ID: # Description Please review this PR with a patch to re-use the same C2_MacroAssembler object to emit all instructions in the same compilation unit. Overall, the change is pretty simple. However, due to the renaming of the variable to access C2_MacroAssembler, from `_masm.` to `masm->`, and also some method prototype changes, the patch became quite large. # Help Needed for Testing I don't have access to all platforms necessary to test this. I hope some other folks can help with testing on `S390`, `RISC-V` and `PPC`. ------------- Commit messages: - Reuse same C2_MacroAssembler object to emit instructions. Changes: https://git.openjdk.org/jdk/pull/16484/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16484&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8241503 Stats: 2665 lines in 60 files changed: 112 ins; 429 del; 2124 mod Patch: https://git.openjdk.org/jdk/pull/16484.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16484/head:pull/16484 PR: https://git.openjdk.org/jdk/pull/16484 From mdoerr at openjdk.org Thu Nov 2 23:24:05 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 2 Nov 2023 23:24:05 GMT Subject: RFR: JDK-8241503: C2: Share MacroAssembler between mach nodes during code emission In-Reply-To: References: Message-ID: On Thu, 2 Nov 2023 22:17:43 GMT, Cesar Soares Lucas wrote: > # Description > > Please review this PR with a patch to re-use the same C2_MacroAssembler object to emit all instructions in the same compilation unit. > > Overall, the change is pretty simple. However, due to the renaming of the variable to access C2_MacroAssembler, from `_masm.` to `masm->`, and also some method prototype changes, the patch became quite large. > > # Help Needed for Testing > > I don't have access to all platforms necessary to test this. I hope some other folks can help with testing on `S390`, `RISC-V` and `PPC`. PPC64 runs into assert(masm->inst_mark() == nullptr) failed: should be. V [libjvm.so+0x1648528] PhaseOutput::fill_buffer(C2_MacroAssembler*, unsigned int*)+0x10c8 (output.cpp:1812) V [libjvm.so+0x164b35c] PhaseOutput::Output()+0xd5c (output.cpp:362) V [libjvm.so+0x958f9c] Compile::Code_Gen()+0x4ec (compile.cpp:2989) V [libjvm.so+0x95e484] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1a84 (compile.cpp:887) V [libjvm.so+0x718f58] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x198 (c2compiler.cpp:119) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16484#issuecomment-1791694904 From cslucas at openjdk.org Thu Nov 2 23:39:02 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 2 Nov 2023 23:39:02 GMT Subject: RFR: JDK-8241503: C2: Share MacroAssembler between mach nodes during code emission In-Reply-To: References: Message-ID: <4LqTlmsaVA4gGTi5WQNjhn9KCN9SotYddp9SpdIpv-g=.294ccf81-e719-4621-8384-10242d2d7e95@github.com> On Thu, 2 Nov 2023 23:21:23 GMT, Martin Doerr wrote: >> # Description >> >> Please review this PR with a patch to re-use the same C2_MacroAssembler object to emit all instructions in the same compilation unit. >> >> Overall, the change is pretty simple. However, due to the renaming of the variable to access C2_MacroAssembler, from `_masm.` to `masm->`, and also some method prototype changes, the patch became quite large. >> >> # Help Needed for Testing >> >> I don't have access to all platforms necessary to test this. I hope some other folks can help with testing on `S390`, `RISC-V` and `PPC`. > > PPC64 runs into assert(masm->inst_mark() == nullptr) failed: should be. > V [libjvm.so+0x1648528] PhaseOutput::fill_buffer(C2_MacroAssembler*, unsigned int*)+0x10c8 (output.cpp:1812) > V [libjvm.so+0x164b35c] PhaseOutput::Output()+0xd5c (output.cpp:362) > V [libjvm.so+0x958f9c] Compile::Code_Gen()+0x4ec (compile.cpp:2989) > V [libjvm.so+0x95e484] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1a84 (compile.cpp:887) > V [libjvm.so+0x718f58] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x198 (c2compiler.cpp:119) @TheRealMDoerr - this is likely because of some missing `clear_inst_mark` call on my part in the PPC ad, I'll take a look into it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16484#issuecomment-1791705658 From dlong at openjdk.org Fri Nov 3 00:36:08 2023 From: dlong at openjdk.org (Dean Long) Date: Fri, 3 Nov 2023 00:36:08 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v3] In-Reply-To: References: Message-ID: On Thu, 2 Nov 2023 16:27:35 GMT, Jorn Vernee wrote: >> The issue is essentially that for the Java try-with-resource construct, javac generates multiple calls to `close(`) the resource. One of those calls is inside the hidden exception handler of the try block. The issue for us is that typically the exception handler is never entered (since no exception is thrown), however we don't profile exception handlers at the moment, so the block is not pruned. C2 doesn't inline the `close()` call in the handler due to low call site frequency. As a result, the receiver of that call escapes and can not be scalar replaced, which then leads to a loss in performance. >> >> There has been some discussion on the JBS issue that this could be fixed by profiling catch blocks. And another suggestion that partial escape analysis could help here to prevent the object from escaping. But, I think there are other benefits to being able to prune dead catch blocks, such as general reduction in code size, and other optimizations being possible by dead code being eliminated. So, I've implemented catch block profiling + pruning in this patch. >> >> The implementation is essentially very straightforward: we allocate an extra bit of profiling data for each >> exception handler of a method in the `MethodData` for that method (which holds all the profiling >> data). Then when looking up the exception handler after an exception is thrown, we mark the >> exception handler as entered. When C2 parses the exception handler block, and it sees that it has >> never been entered, we emit an uncommon trap instead. >> >> I've also cleaned up the handling of profiling data sections a bit. After adding the extra section of data to MethodData, I was seeing several crashes when ciMethodData was used. The underlying issue seemed to be that the offset of the parameter data was computed based on the total data size - parameter data size (which doesn't work if we add an additional section for exception handler data). I've re-written the code around this a bit to try and prevent issues in the future. Both MethodData and ciMethodData now track offsets of parameter data and exception handler data, and the size of the each data section is derived from the offsets. >> >> Finally, there was an assert firing in `freeze_internal` in `continuationFreezeThaw.cpp`: >> >> assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), >> "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > add some assertion to IR test that check for compilation and deoptimization We should be able to do better on the `has_monitors` issue. Loom uses this flag to short-circuit extra work when there are no ScopeDesc objects that contain monitors, so why not give Loom exactly what it wants and track this in `DebugInformationRecorder`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1791743413 From dlong at openjdk.org Fri Nov 3 00:39:05 2023 From: dlong at openjdk.org (Dean Long) Date: Fri, 3 Nov 2023 00:39:05 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v3] In-Reply-To: References: Message-ID: On Thu, 2 Nov 2023 16:27:35 GMT, Jorn Vernee wrote: >> The issue is essentially that for the Java try-with-resource construct, javac generates multiple calls to `close(`) the resource. One of those calls is inside the hidden exception handler of the try block. The issue for us is that typically the exception handler is never entered (since no exception is thrown), however we don't profile exception handlers at the moment, so the block is not pruned. C2 doesn't inline the `close()` call in the handler due to low call site frequency. As a result, the receiver of that call escapes and can not be scalar replaced, which then leads to a loss in performance. >> >> There has been some discussion on the JBS issue that this could be fixed by profiling catch blocks. And another suggestion that partial escape analysis could help here to prevent the object from escaping. But, I think there are other benefits to being able to prune dead catch blocks, such as general reduction in code size, and other optimizations being possible by dead code being eliminated. So, I've implemented catch block profiling + pruning in this patch. >> >> The implementation is essentially very straightforward: we allocate an extra bit of profiling data for each >> exception handler of a method in the `MethodData` for that method (which holds all the profiling >> data). Then when looking up the exception handler after an exception is thrown, we mark the >> exception handler as entered. When C2 parses the exception handler block, and it sees that it has >> never been entered, we emit an uncommon trap instead. >> >> I've also cleaned up the handling of profiling data sections a bit. After adding the extra section of data to MethodData, I was seeing several crashes when ciMethodData was used. The underlying issue seemed to be that the offset of the parameter data was computed based on the total data size - parameter data size (which doesn't work if we add an additional section for exception handler data). I've re-written the code around this a bit to try and prevent issues in the future. Both MethodData and ciMethodData now track offsets of parameter data and exception handler data, and the size of the each data section is derived from the offsets. >> >> Finally, there was an assert firing in `freeze_internal` in `continuationFreezeThaw.cpp`: >> >> assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), >> "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > add some assertion to IR test that check for compilation and deoptimization src/hotspot/share/opto/parse1.cpp line 427: > 425: > 426: if (parse_method->is_synchronized() || parse_method->has_monitor_bytecodes()) { > 427: C->set_has_monitors(true); Rather than being pessimistic, I suggest tracking this in DebugInformationRecorder. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16416#discussion_r1380924437 From jvernee at openjdk.org Fri Nov 3 01:08:11 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 3 Nov 2023 01:08:11 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v3] In-Reply-To: References: Message-ID: <3mqMsA2K5hzEQOCVOn1M4biTv0sg0mzFpLxHTvdPX9Y=.e4f9b540-1a48-4a80-8394-c4055678e261@github.com> On Fri, 3 Nov 2023 00:33:10 GMT, Dean Long wrote: > We should be able to do better on the `has_monitors` issue. Loom uses this flag to short-circuit extra work when there are no ScopeDesc objects that contain monitors, so why not give Loom exactly what it wants and track this in `DebugInformationRecorder`? I discussed the issue I ran into with Ron as well. The `has_monitors` flag is _only_ used by that one assertion (see `ContinuationHelper::CompiledFrame::is_owning_locks`). I'm not sure there is too much value in speeding up an assertion? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1791763564 From fjiang at openjdk.org Fri Nov 3 01:21:03 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 3 Nov 2023 01:21:03 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits In-Reply-To: References: Message-ID: On Thu, 2 Nov 2023 16:50:50 GMT, Hamlin Li wrote: > Hi, > Can you review the change to add intrinsic for CompressBits for Long & Integer? > Thanks! > > ##?Test > pass jtreg test: > test/jdk/java/lang/CompressExpand*.java src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4425: > 4423: } > 4424: > 4425: void MacroAssembler::compress_bits(Register dst, Register src, Register mask, Register tmp, bool is_long) { Since `compress_bits` uses RVV instruction, I think we should add `_v` suffix. Suggestion: void MacroAssembler::compress_bits_v(Register dst, Register src, Register mask, Register tmp, bool is_long) { src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 1403: > 1401: > 1402: public: > 1403: // compress bits, i.e. j.l.Long::compress. All `compress_bits` methods are only used for C2, maybe we could move them into `C2_MacroAssembler`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1380960438 PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1380960814 From fjiang at openjdk.org Fri Nov 3 01:36:00 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 3 Nov 2023 01:36:00 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits In-Reply-To: References: Message-ID: On Thu, 2 Nov 2023 17:31:22 GMT, Hamlin Li wrote: > I made a mistake, UseRVVForCompressBitsIntrinsics is only defined in riscv global.hpp. I think I can resolve the issue by defining it in global global.hpp, but seems it's not a good idea either. Any suggestions? Maybe `bool Matcher::match_rule_supported(int opcode) {` in `riscv.ad` is a good place: https://github.com/openjdk/jdk/blob/c788160f8acea7b58b54ad857b601bb7ffb53f8e/src/hotspot/cpu/riscv/riscv.ad#L1896-L1897 ------------- PR Comment: https://git.openjdk.org/jdk/pull/16481#issuecomment-1791779175 From dlong at openjdk.org Fri Nov 3 01:47:02 2023 From: dlong at openjdk.org (Dean Long) Date: Fri, 3 Nov 2023 01:47:02 GMT Subject: RFR: JDK-8241503: C2: Share MacroAssembler between mach nodes during code emission In-Reply-To: References: Message-ID: On Thu, 2 Nov 2023 22:17:43 GMT, Cesar Soares Lucas wrote: > # Description > > Please review this PR with a patch to re-use the same C2_MacroAssembler object to emit all instructions in the same compilation unit. > > Overall, the change is pretty simple. However, due to the renaming of the variable to access C2_MacroAssembler, from `_masm.` to `masm->`, and also some method prototype changes, the patch became quite large. > > # Help Needed for Testing > > I don't have access to all platforms necessary to test this. I hope some other folks can help with testing on `S390`, `RISC-V` and `PPC`. src/hotspot/cpu/x86/gc/z/z_x86_64.ad line 37: > 35: #include "gc/z/zBarrierSetAssembler.hpp" > 36: > 37: static void z_color(MacroAssembler* masm, const MachNode* node, Register ref) { For files already using MacroAssembler& _masm, the only change needed is this at the top: undef __ #define __ _masm. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16484#discussion_r1381015121 From dlong at openjdk.org Fri Nov 3 02:07:10 2023 From: dlong at openjdk.org (Dean Long) Date: Fri, 3 Nov 2023 02:07:10 GMT Subject: RFR: JDK-8241503: C2: Share MacroAssembler between mach nodes during code emission In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 01:44:42 GMT, Dean Long wrote: >> # Description >> >> Please review this PR with a patch to re-use the same C2_MacroAssembler object to emit all instructions in the same compilation unit. >> >> Overall, the change is pretty simple. However, due to the renaming of the variable to access C2_MacroAssembler, from `_masm.` to `masm->`, and also some method prototype changes, the patch became quite large. >> >> # Help Needed for Testing >> >> I don't have access to all platforms necessary to test this. I hope some other folks can help with testing on `S390`, `RISC-V` and `PPC`. > > src/hotspot/cpu/x86/gc/z/z_x86_64.ad line 37: > >> 35: #include "gc/z/zBarrierSetAssembler.hpp" >> 36: >> 37: static void z_color(MacroAssembler* masm, const MachNode* node, Register ref) { > > For files already using MacroAssembler& _masm, the only change needed is this at the top: > > undef __ > #define __ _masm. I guess that doesn't work because different files are concatenated together, causing a conflict if some files expect MacroAssembler *masm. To reduce the number of changes, couldn't we use MacroAssembler& _masm everywhere? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16484#discussion_r1381098489 From dlong at openjdk.org Fri Nov 3 02:26:08 2023 From: dlong at openjdk.org (Dean Long) Date: Fri, 3 Nov 2023 02:26:08 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v3] In-Reply-To: References: Message-ID: On Thu, 2 Nov 2023 16:27:35 GMT, Jorn Vernee wrote: >> The issue is essentially that for the Java try-with-resource construct, javac generates multiple calls to `close(`) the resource. One of those calls is inside the hidden exception handler of the try block. The issue for us is that typically the exception handler is never entered (since no exception is thrown), however we don't profile exception handlers at the moment, so the block is not pruned. C2 doesn't inline the `close()` call in the handler due to low call site frequency. As a result, the receiver of that call escapes and can not be scalar replaced, which then leads to a loss in performance. >> >> There has been some discussion on the JBS issue that this could be fixed by profiling catch blocks. And another suggestion that partial escape analysis could help here to prevent the object from escaping. But, I think there are other benefits to being able to prune dead catch blocks, such as general reduction in code size, and other optimizations being possible by dead code being eliminated. So, I've implemented catch block profiling + pruning in this patch. >> >> The implementation is essentially very straightforward: we allocate an extra bit of profiling data for each >> exception handler of a method in the `MethodData` for that method (which holds all the profiling >> data). Then when looking up the exception handler after an exception is thrown, we mark the >> exception handler as entered. When C2 parses the exception handler block, and it sees that it has >> never been entered, we emit an uncommon trap instead. >> >> I've also cleaned up the handling of profiling data sections a bit. After adding the extra section of data to MethodData, I was seeing several crashes when ciMethodData was used. The underlying issue seemed to be that the offset of the parameter data was computed based on the total data size - parameter data size (which doesn't work if we add an additional section for exception handler data). I've re-written the code around this a bit to try and prevent issues in the future. Both MethodData and ciMethodData now track offsets of parameter data and exception handler data, and the size of the each data section is derived from the offsets. >> >> Finally, there was an assert firing in `freeze_internal` in `continuationFreezeThaw.cpp`: >> >> assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), >> "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > add some assertion to IR test that check for compilation and deoptimization You're right, it's probably not worth the trouble. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1791820669 From haosun at openjdk.org Fri Nov 3 02:57:28 2023 From: haosun at openjdk.org (Hao Sun) Date: Fri, 3 Nov 2023 02:57:28 GMT Subject: RFR: 8319233: AArch64: Build failure with clang due to -Wformat-nonliteral warning Message-ID: The root cause is that an incorrect variant of function VMError::report_and_die() is used. We should introduce another variadic function, just as macos_aarch64 did before. GCC toolchain: >From [1][2], GCC differs from Clang in flag -Wformat-nonliteral slightly, i.e. GCC may **not** raise a warning if "the format function takes its fromat arguments as a va_list". That's why this issue is not exposed by GCC toolchain before. Besides, I suppose platforms ppc and risc-v may have the same issue. [1] https://gcc.gnu.org/onlinedocs/gcc-11.4.0/gcc/Warning-Options.html [2] https://releases.llvm.org/14.0.0/tools/clang/docs/DiagnosticsReference.html ------------- Commit messages: - 8319233: AArch64: Build failure with clang due to -Wformat-nonliteral warning Changes: https://git.openjdk.org/jdk/pull/16486/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16486&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319233 Stats: 35 lines in 7 files changed: 13 ins; 17 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16486.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16486/head:pull/16486 PR: https://git.openjdk.org/jdk/pull/16486 From haosun at openjdk.org Fri Nov 3 03:01:03 2023 From: haosun at openjdk.org (Hao Sun) Date: Fri, 3 Nov 2023 03:01:03 GMT Subject: RFR: 8319233: AArch64: Build failure with clang due to -Wformat-nonliteral warning In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 02:50:22 GMT, Hao Sun wrote: > The root cause is that an incorrect variant of function VMError::report_and_die() is used. We should introduce another variadic function, just as macos_aarch64 did before. > > GCC toolchain: > From [1][2], GCC differs from Clang in flag -Wformat-nonliteral slightly, i.e. GCC may **not** raise a warning if "the format function takes its fromat arguments as a va_list". That's why this issue is not exposed by GCC toolchain before. > > Besides, I suppose platforms ppc and risc-v may have the same issue. > > [1] https://gcc.gnu.org/onlinedocs/gcc-11.4.0/gcc/Warning-Options.html > [2] https://releases.llvm.org/14.0.0/tools/clang/docs/DiagnosticsReference.html I think ppc and risc-v have the same issue. however, I don't have the corresponding hardware. I wonder if @TheRealMDoerr and @RealFYang could help verify: 1. whether ppc and risc-v have the build failure with clang toolchain (if clang is supported to build ppc/riscv) 2. if so, whether this patch could fix the build failure. Thanks in advance. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16486#issuecomment-1791836740 From qamai at openjdk.org Fri Nov 3 03:28:01 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 3 Nov 2023 03:28:01 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v3] In-Reply-To: References: Message-ID: On Thu, 2 Nov 2023 16:27:35 GMT, Jorn Vernee wrote: >> The issue is essentially that for the Java try-with-resource construct, javac generates multiple calls to `close(`) the resource. One of those calls is inside the hidden exception handler of the try block. The issue for us is that typically the exception handler is never entered (since no exception is thrown), however we don't profile exception handlers at the moment, so the block is not pruned. C2 doesn't inline the `close()` call in the handler due to low call site frequency. As a result, the receiver of that call escapes and can not be scalar replaced, which then leads to a loss in performance. >> >> There has been some discussion on the JBS issue that this could be fixed by profiling catch blocks. And another suggestion that partial escape analysis could help here to prevent the object from escaping. But, I think there are other benefits to being able to prune dead catch blocks, such as general reduction in code size, and other optimizations being possible by dead code being eliminated. So, I've implemented catch block profiling + pruning in this patch. >> >> The implementation is essentially very straightforward: we allocate an extra bit of profiling data for each >> exception handler of a method in the `MethodData` for that method (which holds all the profiling >> data). Then when looking up the exception handler after an exception is thrown, we mark the >> exception handler as entered. When C2 parses the exception handler block, and it sees that it has >> never been entered, we emit an uncommon trap instead. >> >> I've also cleaned up the handling of profiling data sections a bit. After adding the extra section of data to MethodData, I was seeing several crashes when ciMethodData was used. The underlying issue seemed to be that the offset of the parameter data was computed based on the total data size - parameter data size (which doesn't work if we add an additional section for exception handler data). I've re-written the code around this a bit to try and prevent issues in the future. Both MethodData and ciMethodData now track offsets of parameter data and exception handler data, and the size of the each data section is derived from the offsets. >> >> Finally, there was an assert firing in `freeze_internal` in `continuationFreezeThaw.cpp`: >> >> assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), >> "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > add some assertion to IR test that check for compilation and deoptimization May I ask if it is possible/preferable to prune code not only when the handler is not taken, but also when it is rarely taken. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1791850213 From lmesnik at openjdk.org Fri Nov 3 03:44:31 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 3 Nov 2023 03:44:31 GMT Subject: RFR: 8318839: Update test thread factory to catch all exceptions [v2] In-Reply-To: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> References: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> Message-ID: <9S4kuhxUayZ6r1SjV2F7fbJzclmYevQ-WysyHOcnq_4=.44149a06-c65b-42b5-9b2d-ad9ee430308a@github.com> > The jtreg starts the main thread in a separate ThreadGroup and checks unhandled exceptions for this group. However, it doesn't catch all unhandled exceptions. There is a jtreg issue for this https://bugs.openjdk.org/browse/CODETOOLS-7903526. > Catching such issues for virtual threads is important because they are not included in any groups. So this fix implements the handler for the test thread factory. > > A few tests start failing. > > The test > serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorVMEventsTest.java > has testcases for platform and virtual threads. So, there is there's no need to run it with the thread factory. > > The test > java/lang/Thread/virtual/ThreadAPI.java > tests UncaughtExceptionHandler and virtual threads. No need to run it with a thread factory. > > Test > test/jdk/java/util/concurrent/tck/ThreadTest.java is updated to not check the default empty handler. > > Probably, we need some common approach about dealing with the UncaughtExceptionHandler in jtreg. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: Replaced System.exit() with exception. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16369/files - new: https://git.openjdk.org/jdk/pull/16369/files/8fbb2798..b0878f35 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16369&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16369&range=00-01 Stats: 48 lines in 1 file changed: 37 ins; 10 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16369.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16369/head:pull/16369 PR: https://git.openjdk.org/jdk/pull/16369 From lmesnik at openjdk.org Fri Nov 3 03:49:04 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 3 Nov 2023 03:49:04 GMT Subject: RFR: 8318839: Update test thread factory to catch all exceptions [v2] In-Reply-To: <9S4kuhxUayZ6r1SjV2F7fbJzclmYevQ-WysyHOcnq_4=.44149a06-c65b-42b5-9b2d-ad9ee430308a@github.com> References: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> <9S4kuhxUayZ6r1SjV2F7fbJzclmYevQ-WysyHOcnq_4=.44149a06-c65b-42b5-9b2d-ad9ee430308a@github.com> Message-ID: On Fri, 3 Nov 2023 03:44:31 GMT, Leonid Mesnik wrote: >> The jtreg starts the main thread in a separate ThreadGroup and checks unhandled exceptions for this group. However, it doesn't catch all unhandled exceptions. There is a jtreg issue for this https://bugs.openjdk.org/browse/CODETOOLS-7903526. >> Catching such issues for virtual threads is important because they are not included in any groups. So this fix implements the handler for the test thread factory. >> >> A few tests start failing. >> >> The test >> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorVMEventsTest.java >> has testcases for platform and virtual threads. So, there is there's no need to run it with the thread factory. >> >> The test >> java/lang/Thread/virtual/ThreadAPI.java >> tests UncaughtExceptionHandler and virtual threads. No need to run it with a thread factory. >> >> Test >> test/jdk/java/util/concurrent/tck/ThreadTest.java is updated to not check the default empty handler. >> >> Probably, we need some common approach about dealing with the UncaughtExceptionHandler in jtreg. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > Replaced System.exit() with exception. I updated the test thread factory to check if there are unhandled exceptions after the test is completed. It now works similarly to what I plan to fix in jtreg. So the purpose of this fix is to catch all exceptions with factory enabled and also to "pre-test" jtreg fix in some cases. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16369#issuecomment-1791859353 From sspitsyn at openjdk.org Fri Nov 3 04:18:26 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 3 Nov 2023 04:18:26 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v2] In-Reply-To: References: Message-ID: > The handshakes support for virtual threads is needed to simplify the JVMTI implementation for virtual threads. There is a significant duplication in the JVMTI code to differentiate code intended to support platform, virtual threads or both. The handshakes are unified, so it is enough to define just one handshake for both platform and virtual thread. > At the low level, the JVMTI code supporting platform and virtual threads still can be different. > This implementation is based on the `JvmtiVTMSTransitionDisabler` class. > > The internal API includes three new classes: > - `JvmtiHandshake`, `JvmtiUnifiedHandshakeClosure`, VM_HandshakeUnmountedVirtualThread > > The `JvmtiUnifiedHandshakeClosure` defines two different callback functions: `do_thread()` and `do_vthread()`. > > The first JVMTI functions are picked first to be converted to use the `JvmtiHandshake`: > - `GetStackTrace`, `GetFrameCount`, `GetFrameLocation`, `NotifyFramePop` > > To get the test results clean, the update also fixes the test issue: > [8318631](https://bugs.openjdk.org/browse/JDK-8318631): GetStackTraceSuspendedStressTest.java failed with "check_jvmti_status: JVMTI function returned error: JVMTI_ERROR_THREAD_NOT_ALIVE (15)" > > Testing: > - the mach5 tiers 1-6 are all passed Serguei Spitsyn has updated the pull request incrementally with two additional commits since the last revision: - address review: remove fix in libGetStackTraceSuspendedStress.cpp - addressed initial minor review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16460/files - new: https://git.openjdk.org/jdk/pull/16460/files/218d439f..7ef7dbbc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16460&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16460&range=00-01 Stats: 64 lines in 4 files changed: 33 ins; 29 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16460.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16460/head:pull/16460 PR: https://git.openjdk.org/jdk/pull/16460 From sspitsyn at openjdk.org Fri Nov 3 04:29:14 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 3 Nov 2023 04:29:14 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v3] In-Reply-To: References: Message-ID: > The handshakes support for virtual threads is needed to simplify the JVMTI implementation for virtual threads. There is a significant duplication in the JVMTI code to differentiate code intended to support platform, virtual threads or both. The handshakes are unified, so it is enough to define just one handshake for both platform and virtual thread. > At the low level, the JVMTI code supporting platform and virtual threads still can be different. > This implementation is based on the `JvmtiVTMSTransitionDisabler` class. > > The internal API includes three new classes: > - `JvmtiHandshake`, `JvmtiUnifiedHandshakeClosure`, VM_HandshakeUnmountedVirtualThread > > The `JvmtiUnifiedHandshakeClosure` defines two different callback functions: `do_thread()` and `do_vthread()`. > > The first JVMTI functions are picked first to be converted to use the `JvmtiHandshake`: > - `GetStackTrace`, `GetFrameCount`, `GetFrameLocation`, `NotifyFramePop` > > To get the test results clean, the update also fixes the test issue: > [8318631](https://bugs.openjdk.org/browse/JDK-8318631): GetStackTraceSuspendedStressTest.java failed with "check_jvmti_status: JVMTI function returned error: JVMTI_ERROR_THREAD_NOT_ALIVE (15)" > > Testing: > - the mach5 tiers 1-6 are all passed Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: remove unneeded ResourceMark from JVMTI GetStackTrace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16460/files - new: https://git.openjdk.org/jdk/pull/16460/files/7ef7dbbc..720c9c7e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16460&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16460&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16460.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16460/head:pull/16460 PR: https://git.openjdk.org/jdk/pull/16460 From fyang at openjdk.org Fri Nov 3 04:49:01 2023 From: fyang at openjdk.org (Fei Yang) Date: Fri, 3 Nov 2023 04:49:01 GMT Subject: RFR: JDK-8241503: C2: Share MacroAssembler between mach nodes during code emission In-Reply-To: References: Message-ID: On Thu, 2 Nov 2023 22:17:43 GMT, Cesar Soares Lucas wrote: > # Description > > Please review this PR with a patch to re-use the same C2_MacroAssembler object to emit all instructions in the same compilation unit. > > Overall, the change is pretty simple. However, due to the renaming of the variable to access C2_MacroAssembler, from `_masm.` to `masm->`, and also some method prototype changes, the patch became quite large. > > # Help Needed for Testing > > I don't have access to all platforms necessary to test this. I hope some other folks can help with testing on `S390`, `RISC-V` and `PPC`. Hello, I guess you might want to merge latest jdk master and add more changes. I witnessed some build errors when building the latest jdk master with this patch on linux-riscv64: /home/fyang/jdk/src/hotspot/cpu/riscv/riscv.ad: In member function 'virtual void UdivINode::emit(C2_MacroAssembler*, PhaseRegAlloc*) const': /home/fyang/jdk/src/hotspot/cpu/riscv/riscv.ad:2412:30: error: 'cbuf' was not declared in this scope 2412 | C2_MacroAssembler _masm(&cbuf); | ^~~~ /home/fyang/jdk/src/hotspot/cpu/riscv/riscv.ad: In member function 'virtual void UdivLNode::emit(C2_MacroAssembler*, PhaseRegAlloc*) const': /home/fyang/jdk/src/hotspot/cpu/riscv/riscv.ad:2427:30: error: 'cbuf' was not declared in this scope 2427 | C2_MacroAssembler _masm(&cbuf); | ^~~~ /home/fyang/jdk/src/hotspot/cpu/riscv/riscv.ad: In member function 'virtual void UmodINode::emit(C2_MacroAssembler*, PhaseRegAlloc*) const': /home/fyang/jdk/src/hotspot/cpu/riscv/riscv.ad:2442:30: error: 'cbuf' was not declared in this scope 2442 | C2_MacroAssembler _masm(&cbuf); | ^~~~ /home/fyang/jdk/src/hotspot/cpu/riscv/riscv.ad: In member function 'virtual void UmodLNode::emit(C2_MacroAssembler*, PhaseRegAlloc*) const': /home/fyang/jdk/src/hotspot/cpu/riscv/riscv.ad:2457:30: error: 'cbuf' was not declared in this scope 2457 | C2_MacroAssembler _masm(&cbuf); ------------- PR Comment: https://git.openjdk.org/jdk/pull/16484#issuecomment-1791888196 From dholmes at openjdk.org Fri Nov 3 05:47:04 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 3 Nov 2023 05:47:04 GMT Subject: RFR: 8318839: Update test thread factory to catch all exceptions [v2] In-Reply-To: <9S4kuhxUayZ6r1SjV2F7fbJzclmYevQ-WysyHOcnq_4=.44149a06-c65b-42b5-9b2d-ad9ee430308a@github.com> References: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> <9S4kuhxUayZ6r1SjV2F7fbJzclmYevQ-WysyHOcnq_4=.44149a06-c65b-42b5-9b2d-ad9ee430308a@github.com> Message-ID: On Fri, 3 Nov 2023 03:44:31 GMT, Leonid Mesnik wrote: >> The jtreg starts the main thread in a separate ThreadGroup and checks unhandled exceptions for this group. However, it doesn't catch all unhandled exceptions. There is a jtreg issue for this https://bugs.openjdk.org/browse/CODETOOLS-7903526. >> Catching such issues for virtual threads is important because they are not included in any groups. So this fix implements the handler for the test thread factory. >> >> A few tests start failing. >> >> The test >> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorVMEventsTest.java >> has testcases for platform and virtual threads. So, there is there's no need to run it with the thread factory. >> >> The test >> java/lang/Thread/virtual/ThreadAPI.java >> tests UncaughtExceptionHandler and virtual threads. No need to run it with a thread factory. >> >> Test >> test/jdk/java/util/concurrent/tck/ThreadTest.java is updated to not check the default empty handler. >> >> Probably, we need some common approach about dealing with the UncaughtExceptionHandler in jtreg. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > Replaced System.exit() with exception. Changes requested by dholmes (Reviewer). test/jtreg_test_thread_factory/src/share/classes/Virtual.java line 76: > 74: uncaughtException = null; > 75: Thread.setDefaultUncaughtExceptionHandler(null); > 76: } I don't understand what this is trying to do. If any virtual thread has an uncaught exception then other virtual threads will throw it wrapped in a RuntimeException. But the first virtual thread that completes (normally or by throwing) will clear the default UEH that you set, so uncaught exceptions from other virtual threads won't do anything. AFAICS setting the UEH for a virtual thread works fine, so why are you not setting a per-thread handler? ------------- PR Review: https://git.openjdk.org/jdk/pull/16369#pullrequestreview-1711859772 PR Review Comment: https://git.openjdk.org/jdk/pull/16369#discussion_r1381179648 From alanb at openjdk.org Fri Nov 3 07:14:05 2023 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 3 Nov 2023 07:14:05 GMT Subject: RFR: 8318839: Update test thread factory to catch all exceptions [v2] In-Reply-To: References: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> <9S4kuhxUayZ6r1SjV2F7fbJzclmYevQ-WysyHOcnq_4=.44149a06-c65b-42b5-9b2d-ad9ee430308a@github.com> Message-ID: On Fri, 3 Nov 2023 05:43:38 GMT, David Holmes wrote: > I don't understand what this is trying to do. If any virtual thread has an uncaught exception then other virtual threads will throw it wrapped in a RuntimeException. But the first virtual thread that completes (normally or by throwing) will clear the default UEH that you set, so uncaught exceptions from other virtual threads won't do anything. > > AFAICS setting the UEH for a virtual thread works fine, so why are you not setting a per-thread handler? My reading is that it sets default UHE so it means that any thread (not just virtual threads) can potentially execute it. There will be tests that set a per thread UHE that doesn't delegate. There will also be platform threads in a jtreg ThreadGroup (a UHE) so the exceptions will be handled there (AFAIK, the jtreg TG/UGE does not delegate to the default UHE). I think the patch is confusing because uncaughtException may be set several times, last one wins. If virtual "main" completes without an exception then it looks at uncaughtException to see if an exception is recorded by another thread. It does wrap/propagate it as a runtime exception and I think the bit we aren't seeing is that this is handled by the real main on a platform thread in the jtreg TG. If the virtual "main" completes with an exception (meaning task throws), then it resets the default UHE and exits with an uncaught handle. As with the runtime exception case, I think we aren't seeing this being handled by real main. So I think it is working but confusing to read. I wonder if it might be better to just put the effort into helping CODETOOLS-7903526 instead of a workaround. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16369#discussion_r1381220891 From gcao at openjdk.org Fri Nov 3 07:38:17 2023 From: gcao at openjdk.org (Gui Cao) Date: Fri, 3 Nov 2023 07:38:17 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v6] In-Reply-To: References: Message-ID: On Tue, 31 Oct 2023 15:14:35 GMT, Matias Saavedra Silva wrote: >> Hi, >> @RealFYang and I have finished the RISC-V part, tier1-3 and hotspot:tier4 tested on hifive unmatched board. >> Please help us to add the RISC-V part, thanks a lot! >> [15455-riscv-port.diff.txt](https://github.com/openjdk/jdk/files/13214653/15455-riscv-port.diff.txt) > >> Hi, @RealFYang and I have finished the RISC-V part, tier1-3 and hotspot:tier4 tested on hifive unmatched board. Please help us to add the RISC-V part, thanks a lot! [15455-riscv-port.diff.txt](https://github.com/openjdk/jdk/files/13214653/15455-riscv-port.diff.txt) > > Thank you for the help! @matias9927 Some extra change to keep the riscv part up to date, please help us to add the RISC-V part. [15455-riscv-port-v2.diff.txt](https://github.com/openjdk/jdk/files/13247611/15455-riscv-port-v2.diff.txt) ------------- PR Comment: https://git.openjdk.org/jdk/pull/15455#issuecomment-1791989954 From rehn at openjdk.org Fri Nov 3 08:22:02 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 3 Nov 2023 08:22:02 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits In-Reply-To: References: Message-ID: On Thu, 2 Nov 2023 16:50:50 GMT, Hamlin Li wrote: > Hi, > Can you review the change to add intrinsic for CompressBits for Long & Integer? > Thanks! > > ##?Test > pass jtreg test: > test/jdk/java/lang/CompressExpand*.java Personally I'm against these 'micro' options .e.g. UseRVVForXX. Even if it was possible to test out all combinations of options, no end-user will actually do it. I'm perfectly fine with just removing those and just have RVV in this cases. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16481#issuecomment-1792030041 From aph at openjdk.org Fri Nov 3 08:45:01 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 3 Nov 2023 08:45:01 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 08:19:28 GMT, Robbin Ehn wrote: > Personally I'm against these 'micro' options .e.g. UseRVVForXX. Even if it was possible to test out all combinations of options, no end-user will actually do it. I'm perfectly fine with just removing those and just have RVV in this cases. Indeed. We already can turn on and off individual intrinsics. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16481#issuecomment-1792056182 From aph at openjdk.org Fri Nov 3 10:08:45 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 3 Nov 2023 10:08:45 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v19] In-Reply-To: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: > A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. > > The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 > > One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. > > However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 42 commits: - Merge - Fix header - Move IEE subnormal check to globalDefinitions - Remove accidental include - Duh - Remove dead code - Merge from head - s/Denormal/Subnormal/g - Review feedback - Merge branch 'JDK-8295159' of https://github.com/theRealAph/jdk into JDK-8295159 - ... and 32 more: https://git.openjdk.org/jdk/compare/f875163c...80ce877b ------------- Changes: https://git.openjdk.org/jdk/pull/10661/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=18 Stats: 278 lines in 11 files changed: 276 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10661.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10661/head:pull/10661 PR: https://git.openjdk.org/jdk/pull/10661 From ayang at openjdk.org Fri Nov 3 10:19:10 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 3 Nov 2023 10:19:10 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v8] In-Reply-To: References: Message-ID: On Thu, 2 Nov 2023 15:51:35 GMT, Thomas Schatzl wrote: >> The JEP covers the idea very well, so I'm only covering some implementation details here: >> >> * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. >> >> * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: >> >> * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. >> >> * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). >> >> * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. >> >> * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) >> >> The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. >> >> I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. >> >> * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in a... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > Add documentation about why and how we handle pinned regions in the young/old generation. src/hotspot/share/gc/g1/g1FullGCPrepareTask.inline.hpp line 76: > 74: } > 75: > 76: inline bool G1DetermineCompactionQueueClosure::has_pinned_objects(HeapRegion* hr) const { Could this be a static-local function so that it doesn't appear in the header file? (Its name is the same as the public API in heap-region.) src/hotspot/share/gc/g1/g1GCPhaseTimes.cpp line 482: > 480: } > 481: > 482: double G1GCPhaseTimes::print_post_evacuate_collection_set(bool evacuation_retained) const { Why the renaming here? src/hotspot/share/gc/g1/g1GCPhaseTimes.hpp line 150: > 148: > 149: enum RestoreRetainedRegionsWorkItems { > 150: RestoreRetainedRegionsEvacFailedNum, // How many regions experienced an evacuation failure (pinned or allocation failure) Kind of a preexisting issue. "retained" here seems to mean evac-fail, not "kept in retain-list". src/hotspot/share/gc/g1/g1HeapRegionAttr.hpp line 43: > 41: remset_is_tracked_t _remset_is_tracked; > 42: region_type_t _type; > 43: bool _is_pinned; Maybe `uint8_t` as documented above? src/hotspot/share/gc/g1/g1Policy.cpp line 547: > 545: } > 546: > 547: log_trace(gc, ergo, heap)("Selected %u of %u retained candidates (unreclaimable %u) taking %1.3fms additional time", I actually think calling it "pinned", instead of "unreclaimable", is more informative (to users/dev). (And other places when it is shown in logs.) src/hotspot/share/gc/g1/g1YoungCollector.cpp line 1102: > 1100: jtm.report_pause_type(collector_state()->young_gc_pause_type(_concurrent_operation_is_full_mark)); > 1101: > 1102: policy()->record_young_collection_end(_concurrent_operation_is_full_mark, evacuation_alloc_failed()); The arg name (where this method is defined) should be updated to sth like `evac_alloc_failed` from `evacuation_failure`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1381415671 PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1381418678 PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1381440184 PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1381419525 PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1381423626 PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1381433739 From ayang at openjdk.org Fri Nov 3 10:19:12 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 3 Nov 2023 10:19:12 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v8] In-Reply-To: References: Message-ID: <-W9sGrhwC6dMCJZrvR8cYU_YzEoE23CMbsU9WGXSocs=.4771371a-5cf9-4694-b32e-f173a14dcb7c@github.com> On Thu, 2 Nov 2023 15:52:09 GMT, Thomas Schatzl wrote: >> I (still) do not think it is possible after some more re-testing. There are the following situations I can think of: >> >> * string deduplication is a need-to-be-supported case where only the C code may have a reference to a pinned object: thread A critical sections a string, gets the char array address, locking the region containing the char array. Then string dedup goes ahead and replaces the original char array with something else. Now the C code has the only reference to that char array. >> There is no API to convert a raw array pointer back to a Java object so destroying the header is fine; unpinning does not need the header. >> >> * there is some other case I can think of that could be problematic, but is actually a spec violation: the array is critical-locked by thread A, then shared with other C code (not critical-unlocked), resumes with Java code that forgets that reference. At some point other C code accesses that locked memory and (hopefully) critically-unlocks it. >> Again, there is no API to convert a raw array pointer back to a Java object so destroying the header is fine. >> >> In all other cases I can think of there is always a reference to the encapsulating java object either from the stack frame (when passing in the object into the JNI function they are part of the oop maps) or if you create a new array object (via `NewArray` and lock it, the VM will add a handle to it. >> >> There is also no API to inspect the array header using the raw pointer (e.g. passing the raw pointer to `GetArrayLength` - doesn't compile as it expects a `jarray`, and in debug VMs there is actually a check that the passed argument is something that resembles a handle), so modifications are already invalid, and the change is fine imo. >> >> hth, >> Thomas > > Here is some example (pseudo-) code for the first case mentioned above that should be valid JNI code: > > > Java code: > > String x = ...; > native_f1(x); > [ some java code, x.chars gets deduplicated, its char array pointing to somewhere else now. Now native code is the only one having a reference to the old char array ] > native_f2(); > > ----------- sample native code: > > void native_f1(jobject jstring) { > global_string = NewGlobalRef(jstring); > global_raw_chars = GetStringChars(global_string); > } > > void native_f2() { > ReleaseStringChars(global_string, global_raw_chars); > DeleteGlobalRef(global_string); > } > string deduplication is a need-to-be-supported case... OK, so this is the only valid scenario where a type-array should be kept live even though it's not reachable from GC's perspective. Could you describe it in the comment? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1381436094 From tschatzl at openjdk.org Fri Nov 3 10:59:06 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 3 Nov 2023 10:59:06 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v8] In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 09:56:43 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> Add documentation about why and how we handle pinned regions in the young/old generation. > > src/hotspot/share/gc/g1/g1GCPhaseTimes.cpp line 482: > >> 480: } >> 481: >> 482: double G1GCPhaseTimes::print_post_evacuate_collection_set(bool evacuation_retained) const { > > Why the renaming here? Probably forgot to undo the rename. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1381501087 From mbaesken at openjdk.org Fri Nov 3 11:28:04 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 3 Nov 2023 11:28:04 GMT Subject: RFR: 8317562: [JFR] Compilation queue statistics [v6] In-Reply-To: References: Message-ID: <2jS2KogdxxjTHne4-zljO2yczXsAnHnvVSwPM-qhN0s=.72d79b55-d9d0-4d42-9606-4a961f4366e7@github.com> On Thu, 2 Nov 2023 16:30:32 GMT, Mat Carter wrote: >> Adding a new periodic jfr event to monitor and output statistics for the compiler queues. You will see one event per compiler queue (c1 and c2) >> >> Passes tier1 on linux (x86) and mac (aarch64) > > Mat Carter has updated the pull request incrementally with one additional commit since the last revision: > > Updated test to reflect field name changes With the updated test file, the jtreg test error is gone. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16211#issuecomment-1792272259 From mdoerr at openjdk.org Fri Nov 3 11:45:02 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 3 Nov 2023 11:45:02 GMT Subject: RFR: 8319233: AArch64: Build failure with clang due to -Wformat-nonliteral warning In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 02:50:22 GMT, Hao Sun wrote: > The root cause is that an incorrect variant of function VMError::report_and_die() is used. We should introduce another variadic function, just as macos_aarch64 did before. > > GCC toolchain: > From [1][2], GCC differs from Clang in flag -Wformat-nonliteral slightly, i.e. GCC may **not** raise a warning if "the format function takes its fromat arguments as a va_list". That's why this issue is not exposed by GCC toolchain before. > > Besides, I suppose platforms ppc and risc-v may have the same issue. > > [1] https://gcc.gnu.org/onlinedocs/gcc-11.4.0/gcc/Warning-Options.html > [2] https://releases.llvm.org/14.0.0/tools/clang/docs/DiagnosticsReference.html PPC64 only supports clang on AIX. @jkern: Can you answer the 2 questions? Sorry, that was the wrong jkern. @JoKern65: Can you answer the 2 questions? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16486#issuecomment-1792291250 PR Comment: https://git.openjdk.org/jdk/pull/16486#issuecomment-1792292738 From mli at openjdk.org Fri Nov 3 12:17:39 2023 From: mli at openjdk.org (Hamlin Li) Date: Fri, 3 Nov 2023 12:17:39 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v2] In-Reply-To: References: Message-ID: > Hi, > Can you review the change to add intrinsic for CompressBits for Long & Integer? > Thanks! > > ##?Test > pass jtreg test: > test/jdk/java/lang/CompressExpand*.java Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: remove the new vm option, using Matcher::match_rule_supported instead; move code to riscv_v.ad and C2_MacroAssembler ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16481/files - new: https://git.openjdk.org/jdk/pull/16481/files/4cf39ada..2380d6ec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16481&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16481&range=00-01 Stats: 200 lines in 9 files changed: 99 ins; 99 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16481.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16481/head:pull/16481 PR: https://git.openjdk.org/jdk/pull/16481 From mli at openjdk.org Fri Nov 3 12:20:04 2023 From: mli at openjdk.org (Hamlin Li) Date: Fri, 3 Nov 2023 12:20:04 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 01:33:43 GMT, Feilong Jiang wrote: >> I made a mistake, UseRVVForCompressBitsIntrinsics is only defined in riscv global.hpp. >> I think I can resolve the issue by defining it in global global.hpp, but seems it's not a good idea either. >> Any suggestions? > >> I made a mistake, UseRVVForCompressBitsIntrinsics is only defined in riscv global.hpp. I think I can resolve the issue by defining it in global global.hpp, but seems it's not a good idea either. Any suggestions? > > Maybe `bool Matcher::match_rule_supported(int opcode) {` in `riscv.ad` is a good place, and just returning `UseRVV` would be enough for `Op_CompressBits`?: > https://github.com/openjdk/jdk/blob/c788160f8acea7b58b54ad857b601bb7ffb53f8e/src/hotspot/cpu/riscv/riscv.ad#L1896-L1897 Thanks @feilongjiang for pointing at the postion. @robehn @theRealAph I agree, thanks for discussion ------------- PR Comment: https://git.openjdk.org/jdk/pull/16481#issuecomment-1792336266 From mli at openjdk.org Fri Nov 3 12:20:08 2023 From: mli at openjdk.org (Hamlin Li) Date: Fri, 3 Nov 2023 12:20:08 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v2] In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 01:18:00 GMT, Feilong Jiang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> remove the new vm option, using Matcher::match_rule_supported instead; move code to riscv_v.ad and C2_MacroAssembler > > src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 1403: > >> 1401: >> 1402: public: >> 1403: // compress bits, i.e. j.l.Long::compress. > > All `compress_bits` methods are only used for C2, maybe we could move them into `C2_MacroAssembler`. Yes, it makes sense. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1381589405 From stuefe at openjdk.org Fri Nov 3 12:23:31 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 3 Nov 2023 12:23:31 GMT Subject: RFR: JDK-8319307: DCmds should not assert on truncation and should report truncation Message-ID: `bufferedStream` was originally intended to provide intermediate output buffering; the buffer was supposed to drain upon flushing. However, in many cases, `flush()` is a no-op, so the buffer never gets drained. To prevent infinitely raising buffer sizes for non-flushing `bufferedStream`, [JDK-8220394](https://bugs.openjdk.org/browse/JDK-8220394) introduced the notion of a "maximum reasonable cap". Upon reaching this threshold, we assert in debug VMs since we assumed this to be a condition worth analyzing. In release VMs, we silently truncate. But DCmds - one of the primary users of `bufferedStream` - can reach the maximum cap under normal conditions; one example would be printing the list of dynamic libraries on Linux (just prints the process memory map) - this can get very large. Similarly, NMT detail reports and VM.info output can get just as large. Therefore, neither asserting nor silent truncation is optimal. Instead, we should truncate the output, print a visible truncation marker, and - if possible - interrupt the printing. --- The patch is minimally invasive to simplify review. Like most stream classes, `bufferedStream` would benefit from an overhaul, but I'd like to leave that to a future RFE. Testing: Tested manually with a number of commands with artificially increased output size. GHAs (Windows test errors unrelated). ------------- Commit messages: - fix mac builds - JDK-8319307-DCmds-should-not-assert-on-truncation-and-should-report-truncation Changes: https://git.openjdk.org/jdk/pull/16474/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16474&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319307 Stats: 36 lines in 9 files changed: 20 ins; 6 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/16474.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16474/head:pull/16474 PR: https://git.openjdk.org/jdk/pull/16474 From stuefe at openjdk.org Fri Nov 3 12:24:13 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 3 Nov 2023 12:24:13 GMT Subject: RFR: JDK-8319318: bufferedStream fixed case can be removed Message-ID: Trivial change to remove an unused constructor / use case for bufferedStream Tests: GHAs (Windows error unrelated) ------------- Commit messages: - JDK-8319318-bufferedStream-fixed-case-can-be-removed Changes: https://git.openjdk.org/jdk/pull/16475/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16475&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319318 Stats: 68 lines in 3 files changed: 4 ins; 42 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/16475.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16475/head:pull/16475 PR: https://git.openjdk.org/jdk/pull/16475 From kbarrett at openjdk.org Fri Nov 3 12:25:09 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 3 Nov 2023 12:25:09 GMT Subject: RFR: 8319233: AArch64: Build failure with clang due to -Wformat-nonliteral warning In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 02:50:22 GMT, Hao Sun wrote: > The root cause is that an incorrect variant of function VMError::report_and_die() is used. We should introduce another variadic function, just as macos_aarch64 did before. > > GCC toolchain: > From [1][2], GCC differs from Clang in flag -Wformat-nonliteral slightly, i.e. GCC may **not** raise a warning if "the format function takes its fromat arguments as a va_list". That's why this issue is not exposed by GCC toolchain before. > > Besides, I suppose platforms ppc and risc-v may have the same issue. > > [1] https://gcc.gnu.org/onlinedocs/gcc-11.4.0/gcc/Warning-Options.html > [2] https://releases.llvm.org/14.0.0/tools/clang/docs/DiagnosticsReference.html Shared changes look goo. Platform-specific changes also look good to me, subject to verification by the respective platform maintainers. Happy to see the elimination of the `va_list dummy;` stuff. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16486#pullrequestreview-1712525071 From tschatzl at openjdk.org Fri Nov 3 12:32:02 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 3 Nov 2023 12:32:02 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v9] In-Reply-To: References: Message-ID: > The JEP covers the idea very well, so I'm only covering some implementation details here: > > * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. > > * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: > > * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. > > * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). > > * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. > > * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) > > The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. > > I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. > > * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in addition to something like: > > `GC(6) P... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: ayang review - renamings + documentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16342/files - new: https://git.openjdk.org/jdk/pull/16342/files/5ae05e4c..8342b80b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=07-08 Stats: 79 lines in 17 files changed: 8 ins; 3 del; 68 mod Patch: https://git.openjdk.org/jdk/pull/16342.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16342/head:pull/16342 PR: https://git.openjdk.org/jdk/pull/16342 From tschatzl at openjdk.org Fri Nov 3 12:32:06 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 3 Nov 2023 12:32:06 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v8] In-Reply-To: References: Message-ID: On Thu, 2 Nov 2023 15:51:35 GMT, Thomas Schatzl wrote: >> The JEP covers the idea very well, so I'm only covering some implementation details here: >> >> * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. >> >> * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: >> >> * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. >> >> * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). >> >> * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. >> >> * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) >> >> The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. >> >> I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. >> >> * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in a... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > Add documentation about why and how we handle pinned regions in the young/old generation. Fwiw, recent changes (without the most recent renamings) passed tier1-5 ------------- PR Comment: https://git.openjdk.org/jdk/pull/16342#issuecomment-1792352540 From ayang at openjdk.org Fri Nov 3 12:53:08 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 3 Nov 2023 12:53:08 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v9] In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 12:32:02 GMT, Thomas Schatzl wrote: >> The JEP covers the idea very well, so I'm only covering some implementation details here: >> >> * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. >> >> * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: >> >> * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. >> >> * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). >> >> * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. >> >> * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) >> >> The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. >> >> I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. >> >> * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in a... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > ayang review - renamings + documentation src/hotspot/share/gc/g1/g1Policy.hpp line 275: > 273: double start, > 274: double end, > 275: bool alloocation_failure = false); Typo. src/hotspot/share/gc/g1/g1Policy.hpp line 314: > 312: // Record the start and end of the actual collection part of the evacuation pause. > 313: void record_young_collection_start(); > 314: void record_young_collection_end(bool concurrent_operation_is_full_mark, bool alllocation_failure); Typo. src/hotspot/share/gc/g1/g1YoungCollector.cpp line 87: > 85: GCCause::to_string(_pause_cause), > 86: _collector->evacuation_pinned() ? " (Pinned)" : "", > 87: _collector->evacuation_alloc_failed() ? " (Allocation Failure)" : ""); > GC(6) Pause Young (Normal) (Pinned) (Evacuation Failure) I wonder if the last two can be merged into one `()`, sth like `(Pinned / ...)`, because they are on the same abstraction level. src/hotspot/share/gc/g1/g1_globals.hpp line 327: > 325: range(1, 256) \ > 326: \ > 327: product(uint, G1NumCollectionsKeepPinned, 8, DIAGNOSTIC, \ Any particular reason this is not `EXPERIMENTAL`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1381627320 PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1381627700 PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1381628625 PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1381624492 From tschatzl at openjdk.org Fri Nov 3 13:46:51 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 3 Nov 2023 13:46:51 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v10] In-Reply-To: References: Message-ID: > The JEP covers the idea very well, so I'm only covering some implementation details here: > > * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. > > * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: > > * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. > > * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). > > * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. > > * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) > > The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. > > I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. > > * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in addition to something like: > > `GC(6) P... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: typos ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16342/files - new: https://git.openjdk.org/jdk/pull/16342/files/8342b80b..f9735539 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=08-09 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16342.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16342/head:pull/16342 PR: https://git.openjdk.org/jdk/pull/16342 From tschatzl at openjdk.org Fri Nov 3 13:50:09 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 3 Nov 2023 13:50:09 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v9] In-Reply-To: References: Message-ID: <4uKST6lml9Okm18TVjp2hgBUQHBPH0FP_Uv13Pr7CLE=.dab77bbb-eec6-41d7-820d-3ed89779feb0@github.com> On Fri, 3 Nov 2023 12:41:05 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> ayang review - renamings + documentation > > src/hotspot/share/gc/g1/g1_globals.hpp line 327: > >> 325: range(1, 256) \ >> 326: \ >> 327: product(uint, G1NumCollectionsKeepPinned, 8, DIAGNOSTIC, \ > > Any particular reason this is not `EXPERIMENTAL`? Changing this does not in any way enable risky/experimental code not fit for production. This knob is for helping diagnose performance issues. G1 does have its fair share of experimental options, but all/most of these were from the initial import where G1 as a whole had been experimental (unstable) for some time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1381706769 From jvernee at openjdk.org Fri Nov 3 14:13:06 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 3 Nov 2023 14:13:06 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v3] In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 03:25:18 GMT, Quan Anh Mai wrote: > May I ask if it is possible/preferable to prune code not only when the handler is not taken, but also when it is rarely taken. Thanks. I spent some time thinking about this during the design process as well. First off, we need to define what 'rarely taken' means. For exception handlers, I think that means they are rarely entered when compared to the `try` block they cover. So, we could theoretically have 2 counters for each exception handler: one for the `try` block, and one for the exception handler itself, then profile both and figure out the ratio of each block being taken, to which we could apply a heuristic to define 'rarely taken' However, profiling `try` blocks is tricky. 'Regular' profiling happens based on a particular bytecode. e.g. when we interpret a `goto` we profile using JumpData. However, the entry of a `try` block could be any bytecode. So, we'd either have to do a dynamic lookup in some table when interpreting any bytecode to see if it's the start of a try block, and then do the profiling. Or, we'd not be able to do the profiling in the interpreter, and only in C1 compiled code when we see that an instruction lies at the start of a `try` block. Of course we'd have to adjust the profiling of catch blocks as well, since the 2 counters are used in tandem, and be careful that they don't go out of sync, since the profiling happens in separate locations. In other words: implementing this seems not trivial. The benefit is also not clear to me: In which cases do we expect an exception to be thrown only sometimes? Is it better to deoptimize in that case, or should we just always generate the exception handler? On the other hand, we already have a real-world use case on our hands (FFM API) which we can test against, that is addressed sufficiently by the simpler approach. Given the complexity & uncertainty around the benefits of using a frequency-based heuristic, I backed off on that idea, and went with the simpler approach implemented by this patch. I feel like this is a good sweet spot to be in, in terms of implementation complexity & benefit tradeoff. We can always expand the profiling later when we see real world use cases that would benefit from that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1792503426 From ayang at openjdk.org Fri Nov 3 14:17:07 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 3 Nov 2023 14:17:07 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v10] In-Reply-To: References: Message-ID: <1-i3-5OmZbuCNUlpfv31Kr3eiBXEd4Si8F5gsbPHuBQ=.1d97dcac-4662-4482-842c-ce86315ba61a@github.com> On Fri, 3 Nov 2023 13:46:51 GMT, Thomas Schatzl wrote: >> The JEP covers the idea very well, so I'm only covering some implementation details here: >> >> * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. >> >> * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: >> >> * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. >> >> * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). >> >> * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. >> >> * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) >> >> The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. >> >> I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. >> >> * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in a... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > typos Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16342#pullrequestreview-1712794411 From tschatzl at openjdk.org Fri Nov 3 14:17:11 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 3 Nov 2023 14:17:11 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v9] In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 12:44:10 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> ayang review - renamings + documentation > > src/hotspot/share/gc/g1/g1YoungCollector.cpp line 87: > >> 85: GCCause::to_string(_pause_cause), >> 86: _collector->evacuation_pinned() ? " (Pinned)" : "", >> 87: _collector->evacuation_alloc_failed() ? " (Allocation Failure)" : ""); > >> GC(6) Pause Young (Normal) (Pinned) (Evacuation Failure) > > I wonder if the last two can be merged into one `()`, sth like `(Pinned / ...)`, because they are on the same abstraction level. Parsing the separate components is easier :) Not sure if these tags in any way ever indicated some level of abstraction. I do not have a strong opinion here. The combinations (Pinned) (Allocation Failure) (Pinned + Allocation Failure) // or the other way around, or some other symbol for "+" or no symbol at all? are fine with me (and I thought about doing something more elaborate here), but my concern has been that any complicated string makes it less unique (e.g. `(Allocation Failure)` vs. "Allocation Failure") and adds code both to implement and parse the result. Much more disrupting is likely that there is no "Evacuation Failure" string any more. But log messages are not part of the external interface, and we should not want to change them just because. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1381716129 From ayang at openjdk.org Fri Nov 3 14:17:13 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 3 Nov 2023 14:17:13 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v9] In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 13:53:35 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/g1/g1YoungCollector.cpp line 87: >> >>> 85: GCCause::to_string(_pause_cause), >>> 86: _collector->evacuation_pinned() ? " (Pinned)" : "", >>> 87: _collector->evacuation_alloc_failed() ? " (Allocation Failure)" : ""); >> >>> GC(6) Pause Young (Normal) (Pinned) (Evacuation Failure) >> >> I wonder if the last two can be merged into one `()`, sth like `(Pinned / ...)`, because they are on the same abstraction level. > > Parsing the separate components is easier :) Not sure if these tags in any way ever indicated some level of abstraction. > > I do not have a strong opinion here. The combinations > > (Pinned) > (Allocation Failure) > (Pinned + Allocation Failure) // or the other way around, or some other symbol for "+" or no symbol at all? > > are fine with me (and I thought about doing something more elaborate here), but my concern has been that any complicated string makes it less unique (e.g. `(Allocation Failure)` vs. "Allocation Failure") and adds code both to implement and parse the result. > > Much more disrupting is likely that there is no "Evacuation Failure" string any more. But log messages are not part of the external interface, and we should not want to change them just because. The example looks good to me. >> src/hotspot/share/gc/g1/g1_globals.hpp line 327: >> >>> 325: range(1, 256) \ >>> 326: \ >>> 327: product(uint, G1NumCollectionsKeepPinned, 8, DIAGNOSTIC, \ >> >> Any particular reason this is not `EXPERIMENTAL`? > > Changing this does not in any way enable risky/experimental code not fit for production. This knob is for helping diagnose performance issues. > > G1 does have its fair share of experimental options, but all/most of these were from the initial import where G1 as a whole had been experimental (unstable) for some time. This flag conceptually related (or similar) to `G1RetainRegionLiveThresholdPercent`, which is an exp, so I thought they should be the same category. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1381748512 PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1381747902 From aph at openjdk.org Fri Nov 3 14:46:39 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 3 Nov 2023 14:46:39 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v20] In-Reply-To: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: > A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. > > The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 > > One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. > > However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Delete src/hotspot/os/linux/.#os_linux.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10661/files - new: https://git.openjdk.org/jdk/pull/10661/files/80ce877b..b1412626 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10661&range=18-19 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10661.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10661/head:pull/10661 PR: https://git.openjdk.org/jdk/pull/10661 From jkern at openjdk.org Fri Nov 3 15:16:05 2023 From: jkern at openjdk.org (Joachim Kern) Date: Fri, 3 Nov 2023 15:16:05 GMT Subject: RFR: 8319233: AArch64: Build failure with clang due to -Wformat-nonliteral warning In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 02:50:22 GMT, Hao Sun wrote: > The root cause is that an incorrect variant of function VMError::report_and_die() is used. We should introduce another variadic function, just as macos_aarch64 did before. > > GCC toolchain: > From [1][2], GCC differs from Clang in flag -Wformat-nonliteral slightly, i.e. GCC may **not** raise a warning if "the format function takes its fromat arguments as a va_list". That's why this issue is not exposed by GCC toolchain before. > > Besides, I suppose platforms ppc and risc-v may have the same issue. > > [1] https://gcc.gnu.org/onlinedocs/gcc-11.4.0/gcc/Warning-Options.html > [2] https://releases.llvm.org/14.0.0/tools/clang/docs/DiagnosticsReference.html During my switch to xlc17 I came across "error: format string is not a string literal" in several places. To fix this, I inserted lines like "DISABLED_WARNINGS_clang_aix_os_posix.cpp := format-nonliteral," in various places in the gmk files. So in principle we have the problem too, but the compiler didn't show up an error in os_aix_ppc.cpp or vmError.cpp. We will test your PR overnight in our test suite to check for regressions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16486#issuecomment-1792622883 From eastigeevich at openjdk.org Fri Nov 3 15:23:04 2023 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Fri, 3 Nov 2023 15:23:04 GMT Subject: RFR: 8319233: AArch64: Build failure with clang due to -Wformat-nonliteral warning In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 02:50:22 GMT, Hao Sun wrote: > The root cause is that an incorrect variant of function VMError::report_and_die() is used. We should introduce another variadic function, just as macos_aarch64 did before. > > GCC toolchain: > From [1][2], GCC differs from Clang in flag -Wformat-nonliteral slightly, i.e. GCC may **not** raise a warning if "the format function takes its fromat arguments as a va_list". That's why this issue is not exposed by GCC toolchain before. > > Besides, I suppose platforms ppc and risc-v may have the same issue. > > [1] https://gcc.gnu.org/onlinedocs/gcc-11.4.0/gcc/Warning-Options.html > [2] https://releases.llvm.org/14.0.0/tools/clang/docs/DiagnosticsReference.html lgtm ------------- Marked as reviewed by eastigeevich (Committer). PR Review: https://git.openjdk.org/jdk/pull/16486#pullrequestreview-1712952964 From mli at openjdk.org Fri Nov 3 16:18:13 2023 From: mli at openjdk.org (Hamlin Li) Date: Fri, 3 Nov 2023 16:18:13 GMT Subject: RFR: 8319408: RISC-V: MaxVectorSize is not consistently checked in several situations Message-ID: Hi, Can you review the change to add intrinsic for CmpU3 and CmpUL3? Thanks! Current logic will not check whether (MaxVectorSize < 16), after the assignment `MaxVectorSize = _initial_vector_length;`, in following situation. a) if FLAG_IS_DEFAULT(MaxVectorSize) == true b) if FLAG_IS_DEFAULT(MaxVectorSize) == false and (MaxVectorSize >= 16) and is_power_of_2(MaxVectorSize) and (MaxVectorSize > _initial_vector_length) And in original code, the logic is not consistent for the situations between MaxVectorSize < 16 and MaxVectorSize >= 16, when is_power_of_2(MaxVectorSize) == false; for the former (<16) it's to disable RVV, for the latter (>=16) it's vm_exit. ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/16498/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16498&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319408 Stats: 16 lines in 1 file changed: 7 ins; 7 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16498.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16498/head:pull/16498 PR: https://git.openjdk.org/jdk/pull/16498 From cslucas at openjdk.org Fri Nov 3 16:27:06 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 3 Nov 2023 16:27:06 GMT Subject: RFR: JDK-8241503: C2: Share MacroAssembler between mach nodes during code emission In-Reply-To: References: Message-ID: <_Qvj6Msy_P6hIWUc4IwGPLKXLK7391GMCEI4XlNYQrY=.1e03e561-071f-48ae-b322-d944c7b1ff36@github.com> On Fri, 3 Nov 2023 04:44:40 GMT, Fei Yang wrote: >> # Description >> >> Please review this PR with a patch to re-use the same C2_MacroAssembler object to emit all instructions in the same compilation unit. >> >> Overall, the change is pretty simple. However, due to the renaming of the variable to access C2_MacroAssembler, from `_masm.` to `masm->`, and also some method prototype changes, the patch became quite large. >> >> # Help Needed for Testing >> >> I don't have access to all platforms necessary to test this. I hope some other folks can help with testing on `S390`, `RISC-V` and `PPC`. > > Hello, I guess you might want to merge latest jdk master and add more changes. > I witnessed some build errors when building the latest jdk master with this patch on linux-riscv64: > > /home/fyang/jdk/src/hotspot/cpu/riscv/riscv.ad: In member function 'virtual void UdivINode::emit(C2_MacroAssembler*, PhaseRegAlloc*) const': > /home/fyang/jdk/src/hotspot/cpu/riscv/riscv.ad:2412:30: error: 'cbuf' was not declared in this scope > 2412 | C2_MacroAssembler _masm(&cbuf); > | ^~~~ > /home/fyang/jdk/src/hotspot/cpu/riscv/riscv.ad: In member function 'virtual void UdivLNode::emit(C2_MacroAssembler*, PhaseRegAlloc*) const': > /home/fyang/jdk/src/hotspot/cpu/riscv/riscv.ad:2427:30: error: 'cbuf' was not declared in this scope > 2427 | C2_MacroAssembler _masm(&cbuf); > | ^~~~ > /home/fyang/jdk/src/hotspot/cpu/riscv/riscv.ad: In member function 'virtual void UmodINode::emit(C2_MacroAssembler*, PhaseRegAlloc*) const': > /home/fyang/jdk/src/hotspot/cpu/riscv/riscv.ad:2442:30: error: 'cbuf' was not declared in this scope > 2442 | C2_MacroAssembler _masm(&cbuf); > | ^~~~ > /home/fyang/jdk/src/hotspot/cpu/riscv/riscv.ad: In member function 'virtual void UmodLNode::emit(C2_MacroAssembler*, PhaseRegAlloc*) const': > /home/fyang/jdk/src/hotspot/cpu/riscv/riscv.ad:2457:30: error: 'cbuf' was not declared in this scope > 2457 | C2_MacroAssembler _masm(&cbuf); @RealFYang - Thanks for the note. I'll do that and update the PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16484#issuecomment-1792746837 From lmesnik at openjdk.org Fri Nov 3 16:39:18 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 3 Nov 2023 16:39:18 GMT Subject: RFR: 8318839: Update test thread factory to catch all exceptions [v2] In-Reply-To: <9S4kuhxUayZ6r1SjV2F7fbJzclmYevQ-WysyHOcnq_4=.44149a06-c65b-42b5-9b2d-ad9ee430308a@github.com> References: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> <9S4kuhxUayZ6r1SjV2F7fbJzclmYevQ-WysyHOcnq_4=.44149a06-c65b-42b5-9b2d-ad9ee430308a@github.com> Message-ID: <_kF5hp2T1M6e_pCi6Meiz5hcqbJw4wg_i0Rjeb8rX0c=.096f2213-4e6c-4a03-b051-433b8449b6f7@github.com> On Fri, 3 Nov 2023 03:44:31 GMT, Leonid Mesnik wrote: >> The jtreg starts the main thread in a separate ThreadGroup and checks unhandled exceptions for this group. However, it doesn't catch all unhandled exceptions. There is a jtreg issue for this https://bugs.openjdk.org/browse/CODETOOLS-7903526. >> Catching such issues for virtual threads is important because they are not included in any groups. So this fix implements the handler for the test thread factory. >> >> A few tests start failing. >> >> The test >> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorVMEventsTest.java >> has testcases for platform and virtual threads. So, there is there's no need to run it with the thread factory. >> >> The test >> java/lang/Thread/virtual/ThreadAPI.java >> tests UncaughtExceptionHandler and virtual threads. No need to run it with a thread factory. >> >> Test >> test/jdk/java/util/concurrent/tck/ThreadTest.java is updated to not check the default empty handler. >> >> Probably, we need some common approach about dealing with the UncaughtExceptionHandler in jtreg. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > Replaced System.exit() with exception. ok, seems there is no good way to process exceptions here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16369#issuecomment-1792774893 From lmesnik at openjdk.org Fri Nov 3 16:39:20 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 3 Nov 2023 16:39:20 GMT Subject: Withdrawn: 8318839: Update test thread factory to catch all exceptions In-Reply-To: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> References: <19jJ-G8gCQEMp6p4bp_FQX82RQKBNbvCDSJbFt6EPkM=.a16102c1-afc7-46ec-8d0c-46db236c6510@github.com> Message-ID: <7N3MQZ3asiMcuw2eEp9AX9nIcRh_en_R9paR_BJOqIk=.f27ae87d-326c-4fb7-86cf-0befd196e9c9@github.com> On Wed, 25 Oct 2023 21:08:01 GMT, Leonid Mesnik wrote: > The jtreg starts the main thread in a separate ThreadGroup and checks unhandled exceptions for this group. However, it doesn't catch all unhandled exceptions. There is a jtreg issue for this https://bugs.openjdk.org/browse/CODETOOLS-7903526. > Catching such issues for virtual threads is important because they are not included in any groups. So this fix implements the handler for the test thread factory. > > A few tests start failing. > > The test > serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorVMEventsTest.java > has testcases for platform and virtual threads. So, there is there's no need to run it with the thread factory. > > The test > java/lang/Thread/virtual/ThreadAPI.java > tests UncaughtExceptionHandler and virtual threads. No need to run it with a thread factory. > > Test > test/jdk/java/util/concurrent/tck/ThreadTest.java is updated to not check the default empty handler. > > Probably, we need some common approach about dealing with the UncaughtExceptionHandler in jtreg. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/16369 From shade at openjdk.org Fri Nov 3 18:35:23 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 3 Nov 2023 18:35:23 GMT Subject: RFR: 8319406: x86: Shorter movptr(reg, imm) for 32-bit immediates Message-ID: Noticed this while doing C1 work, but the issue is more generic. If you look into x86 code, then sometimes you'll notice `movabs` with small immediates on x86. That `movabs` actually carries the full-blown 64-bit immediate. Similar to [JDK-8255838](https://bugs.openjdk.org/browse/JDK-8255838), it would be useful to shorten movptr(reg, imm) when immediate fits in 32 bits. This would compact some code, notably the code in C1 profiling ([JDK-8315843](https://bugs.openjdk.org/browse/JDK-8315843)), but also other code, generically. For example, sample branch profiling hunk from C1 tier3 on x86_64: Before: 0x00007f269065ed02: test %edx,%edx 0x00007f269065ed04: movabs $0x7f260a4ddd68,%rax ; {metadata(method data for {method} ? 0x00007f269065ed0e: movabs $0x138,%rsi ? 0x00007f269065ed18: je 0x00007f269065ed24 ? 0x00007f269065ed1a: movabs $0x148,%rsi ? 0x00007f269065ed24: mov (%rax,%rsi,1),%rdi 0x00007f269065ed28: lea 0x1(%rdi),%rdi 0x00007f269065ed2c: mov %rdi,(%rax,%rsi,1) 0x00007f269065ed30: je 0x00007f269065ed4e After: 0x00007f1370dcd782: test %edx,%edx 0x00007f1370dcd784: movabs $0x7f12f64ddd68,%rax ; {metadata(method data for {method} ? 0x00007f1370dcd78e: mov $0x138,%esi ? 0x00007f1370dcd793: je 0x00007f1370dcd79a ? 0x00007f1370dcd795: mov $0x148,%esi ? 0x00007f1370dcd79a: mov (%rax,%rsi,1),%rdi 0x00007f1370dcd79e: lea 0x1(%rdi),%rdi 0x00007f1370dcd7a2: mov %rdi,(%rax,%rsi,1) 0x00007f1370dcd7a6: je 0x00007f1370dcd7c4 We can use a shorter 32-bit immediate moves. In the hunk above, this saves about 8 bytes. This is not limited to the profiling code. There is observable code space savings on larger tests in C2, e.g. on `-Xcomp -XX:TieredStopAtLevel=... HelloWorld`. # Before nmethod code size : 430328 bytes nmethod code size : 467032 bytes nmethod code size : 908936 bytes nmethod code size : 1267816 bytes # After nmethod code size : 429616 bytes (-0.1%) nmethod code size : 466344 bytes (-0.1%) nmethod code size : 897144 bytes (-1.3%) nmethod code size : 1256216 bytes (-0.9%) There are two wrinkles: 1. Current `movslq(Register, int32_t)` is broken and protected by `ShouldNotReachHere()`. I fixed it to make this patch work. Note that x86_64 does not actually define `movslq reg64, imm32`, this is a regular `mov reg64, imm32`. It matches our current `movq(Register, int32_t)`. 2. There is at least one place in Hotspot -- IC calls -- that expects the synthetic `movptr` to always have the same length, because it would be used as IC slot. I had to introduce a special method in `MacroAssembler` to handle it. I looked through other uses of `movptr(Register, intptr_t)`, and no other are suspicious. (I don't quite like the name "mov_ptrslot" all that much, suggestions welcome.) Additional testing: - [ ] Linux x86_64 server fastdebug, `tier1 tier2 tier3 tier4` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/16497/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16497&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319406 Stats: 26 lines in 3 files changed: 19 ins; 3 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16497.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16497/head:pull/16497 PR: https://git.openjdk.org/jdk/pull/16497 From duke at openjdk.org Fri Nov 3 19:21:18 2023 From: duke at openjdk.org (duke) Date: Fri, 3 Nov 2023 19:21:18 GMT Subject: Withdrawn: 8311500: StackWalker.getCallerClass() throws UOE if invoked reflectively In-Reply-To: References: Message-ID: On Wed, 5 Jul 2023 11:45:59 GMT, Volker Simonis wrote: > As the included jtreg test demonstrates, `StackWalker.getCallerClass()` can throw an `UnsupportedOperationException` if called reflectively. Currently this only happens if we invoke `StackWalker.getCallerClass()` recursively reflectively, but this issue will become more prominent once we fix [JDK-8285447](https://bugs.openjdk.org/browse/JDK-8285447). The gory details follow below: > > The protocol between the Java API and the JVM for `StackWalker.getCallerClass()/walk()` is as follows: > - On the Java side, `StackWalker` calls into `StackStreamFactory` for the real work. > - For `StackWalker.getCallerClass()` `StackStreamFactory` basically creates a `Class[]` which will be passed down and filled in the JVM. For `StackWalker.walk()` it will normally be a `StackFrameInfo[]` (or a `LiveStackFrameInfo[]` if the internal `ExtendedOption.LOCALS_AND_OPERANDS` option was used). > - The default size of this arrays is currently `StackStreamFactory.SMALL_BATCH` which is 8 (but see [JDK-8285447](https://bugs.openjdk.org/browse/JDK-8285447)). > - `StackStreamFactory` than calls `AbstractStackWalker.callStackWalk()` which is a natively implemented in the VM by `JVM_CallStackWalk()`. > - `JVM_CallStackWalk()` calls `StackWalk::walk()` which calls `StackWalk::fetchFirstBatch()` which calls `StackWalk::fill_in_frames()` which walks the stack and fills in the available class/stackframe slots in the passed in array until the array is full or there are no more stack frames, > - Once `StackWalk::fill_in_frames()` returns, `StackWalk::fetchFirstBatch()` calls back to Java by invoking `AbstractStackWalker::doStackWalk()` to consume the result. > - `AbstractStackWalker::doStackWalk()` calls `consumeFrames()` (which is overridden depending on whether we initially called `getCallerClass()` or `walk()`) which consumes the frames until it either finishes (e.g. finds the caller class) or until there are no more frames. > - In the latter case `consumeFrames()` will call into the the VM again by calling `AbstractStackWalker.fetchStackFrames()` to fetch additional frames from the stack. > - `AbstractStackWalker.fetchStackFrames()` is implemented by `JVM_MoreStackWalk()` which calls `StackWalk::fetchNextBatch()` which calls `StackWalk::fill_in_frames()` (the same method that already fetched the initial batch of frames). > > Following is a stacktrace of what I've explained so far: > > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x143a96a] StackWalk::fill_in_frames... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/14773 From matsaave at openjdk.org Fri Nov 3 19:52:41 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 3 Nov 2023 19:52:41 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v9] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. > > This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. > > Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. > > To streamline the review, please consider these major areas that have been changed: > 1. ResolvedMethodEntry class > 2. Rewriter for initialization of the structure > 3. cpCache for resolution > 4. InterpreterRuntime, linkResolver, and templateTable > 5. JVMCI > 6. SA > > Verified with tier 1-9 tests. > > This change supports the following platforms: x86, aarch64 Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: RISCV port update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15455/files - new: https://git.openjdk.org/jdk/pull/15455/files/7addccd6..6950709c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15455&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15455&range=07-08 Stats: 20 lines in 2 files changed: 0 ins; 3 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/15455.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15455/head:pull/15455 PR: https://git.openjdk.org/jdk/pull/15455 From qamai at openjdk.org Fri Nov 3 19:53:06 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 3 Nov 2023 19:53:06 GMT Subject: RFR: 8319406: x86: Shorter movptr(reg, imm) for 32-bit immediates In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 16:00:08 GMT, Aleksey Shipilev wrote: > Noticed this while doing C1 work, but the issue is more generic. If you look into x86 code, then sometimes you'll notice `movabs` with small immediates on x86. That `movabs` actually carries the full-blown 64-bit immediate. > > Similar to [JDK-8255838](https://bugs.openjdk.org/browse/JDK-8255838), it would be useful to shorten movptr(reg, imm) when immediate fits in 32 bits. This would compact some code, notably the code in C1 profiling ([JDK-8315843](https://bugs.openjdk.org/browse/JDK-8315843)), but also other code, generically. > > For example, sample branch profiling hunk from C1 tier3 on x86_64: > > > Before: > 0x00007f269065ed02: test %edx,%edx > 0x00007f269065ed04: movabs $0x7f260a4ddd68,%rax ; {metadata(method data for {method} ? > 0x00007f269065ed0e: movabs $0x138,%rsi > ? 0x00007f269065ed18: je 0x00007f269065ed24 > ? 0x00007f269065ed1a: movabs $0x148,%rsi > ? 0x00007f269065ed24: mov (%rax,%rsi,1),%rdi > 0x00007f269065ed28: lea 0x1(%rdi),%rdi > 0x00007f269065ed2c: mov %rdi,(%rax,%rsi,1) > 0x00007f269065ed30: je 0x00007f269065ed4e > > After: > 0x00007f1370dcd782: test %edx,%edx > 0x00007f1370dcd784: movabs $0x7f12f64ddd68,%rax ; {metadata(method data for {method} ? > 0x00007f1370dcd78e: mov $0x138,%esi > ? 0x00007f1370dcd793: je 0x00007f1370dcd79a > ? 0x00007f1370dcd795: mov $0x148,%esi > ? 0x00007f1370dcd79a: mov (%rax,%rsi,1),%rdi > 0x00007f1370dcd79e: lea 0x1(%rdi),%rdi > 0x00007f1370dcd7a2: mov %rdi,(%rax,%rsi,1) > 0x00007f1370dcd7a6: je 0x00007f1370dcd7c4 > > > We can use a shorter 32-bit immediate moves. In the hunk above, this saves about 8 bytes. > > This is not limited to the profiling code. There is observable code space savings on larger tests in C2, e.g. on `-Xcomp -XX:TieredStopAtLevel=... HelloWorld`. > > > # Before > nmethod code size : 430328 bytes > nmethod code size : 467032 bytes > nmethod code size : 908936 bytes > nmethod code size : 1267816 bytes > > # After > nmethod code size : 429616 bytes (-0.1%) > nmethod code size : 466344 bytes (-0.1%) > nmethod code size : 897144 bytes (-1.3%) > nmethod code size : 1256216 bytes (-0.9%) > > > There are two wrinkles: > 1. Current `movslq(Register, int32_t)` is broken and protected by `ShouldNotReachHere()`. I fixed it to make this patch work. Note that x86_64 does not actually define `movslq reg64, imm32`, this is a regular `mov... Can we create `MacroAssembler::mov64` that does the branching instead, I think it is more natural there. And things that need 8-byte immediates will call into `Assembler::mov64`. Thanks. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 2576: > 2574: #ifdef _LP64 > 2575: if (is_simm32(src)) { > 2576: movslq(dst, checked_cast(src)); Why not just `movq`? there is no `movslq r, i` so this is kind of confusing. src/hotspot/cpu/x86/macroAssembler_x86.hpp line 1818: > 1816: void mov_metadata(Address dst, Metadata* obj, Register rscratch); > 1817: > 1818: void mov_ptrslot(Register dst, intptr_t val); I believe the convention here would be `movptr_imm64` ------------- PR Review: https://git.openjdk.org/jdk/pull/16497#pullrequestreview-1713429561 PR Review Comment: https://git.openjdk.org/jdk/pull/16497#discussion_r1382136858 PR Review Comment: https://git.openjdk.org/jdk/pull/16497#discussion_r1382138403 From iwalulya at openjdk.org Fri Nov 3 21:25:13 2023 From: iwalulya at openjdk.org (Ivan Walulya) Date: Fri, 3 Nov 2023 21:25:13 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v10] In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 13:46:51 GMT, Thomas Schatzl wrote: >> The JEP covers the idea very well, so I'm only covering some implementation details here: >> >> * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. >> >> * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: >> >> * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. >> >> * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). >> >> * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. >> >> * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) >> >> The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. >> >> I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. >> >> * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in a... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > typos LGTM! Nits: src/hotspot/share/gc/g1/g1FullCollector.cpp line 465: > 463: continue; > 464: } else if (is_compaction_target(region_index)) { > 465: assert(!hr->has_pinned_objects(), "pinned objects should not be compaction targets"); Suggestion: assert(!hr->has_pinned_objects(), "pinned regions should not be compaction targets"); src/hotspot/share/gc/g1/g1YoungCollector.cpp line 430: > 428: _claimer(_g1h->workers()->active_workers()), > 429: _humongous_total(0), > 430: _humongous_candidates(0) { } Suggestion: _humongous_candidates(0) { } ------------- Marked as reviewed by iwalulya (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16342#pullrequestreview-1707910606 PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1382164253 PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1382135577 From iwalulya at openjdk.org Fri Nov 3 21:25:16 2023 From: iwalulya at openjdk.org (Ivan Walulya) Date: Fri, 3 Nov 2023 21:25:16 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v5] In-Reply-To: References: Message-ID: On Tue, 31 Oct 2023 19:14:13 GMT, Thomas Schatzl wrote: >> The JEP covers the idea very well, so I'm only covering some implementation details here: >> >> * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. >> >> * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: >> >> * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. >> >> * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). >> >> * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. >> >> * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) >> >> The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. >> >> I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. >> >> * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in a... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > Fix compilation src/hotspot/share/gc/g1/g1CollectionSet.cpp line 355: > 353: move_pinned_marking_to_retained(&pinned_marking_regions); > 354: // Drop pinned retained regions to make progress with retained regions. Regions > 355: // in that list have must have been pinned for at least Suggestion: // in that list must have been pinned for at least ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1378608614 From msheppar at openjdk.org Fri Nov 3 22:01:11 2023 From: msheppar at openjdk.org (Mark Sheppard) Date: Fri, 3 Nov 2023 22:01:11 GMT Subject: RFR: 8319200: Don't use test thread factory in ProcessTools.createLimitedTestJavaProcessBuilder() In-Reply-To: <3damdMQpRBrkUN2S32tBD0Tmrl2tmSqA31NniV8FzHU=.d3a36aa7-d5f4-4a63-b2ff-8b9b616a9637@github.com> References: <3damdMQpRBrkUN2S32tBD0Tmrl2tmSqA31NniV8FzHU=.d3a36aa7-d5f4-4a63-b2ff-8b9b616a9637@github.com> Message-ID: On Wed, 1 Nov 2023 00:06:35 GMT, Leonid Mesnik wrote: > Test thread factory is a mode similar to VM flags and should not be used in ProcessTools.createLimitedTestJavaProcessBuilder(). Only createTestJavaProcessBuilder() should use it like jtreg VM options. > > Adding the test thread factory requires the injection of arguments in the middle of the list. I don't think it makes sense to modify arguments in several places so I replaced it with the flag isLimited and moved all modifications in createJavaProcessBuilder(). > > Testing tier1-5. test/lib/jdk/test/lib/process/ProcessTools.java line 451: > 449: * @return The ProcessBuilder instance representing the java command. > 450: */ > 451: private static ProcessBuilder createJavaProcessBuilder(boolean isLimited, String... command) { addThreadFactory might be a more appropriate name However, Stefan has a more SOLID suggestion ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16442#discussion_r1382236083 From duke at openjdk.org Fri Nov 3 22:44:17 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 3 Nov 2023 22:44:17 GMT Subject: RFR: 8319429: Don't zero out mxcsr flag bits on ECore Message-ID: Improves vector rounding on ECore about 10x (BEFORE) FpRoundingBenchmark.test_round_float 2048 thrpt 3 40.912 ? 0.044 ops/ms (AFTER ) FpRoundingBenchmark.test_round_float 2048 thrpt 3 431.682 ? 0.727 ops/ms ------------- Commit messages: - Don't zero out mxcsr flag bits on ECore Changes: https://git.openjdk.org/jdk/pull/16504/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16504&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319429 Stats: 18 lines in 5 files changed: 9 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/16504.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16504/head:pull/16504 PR: https://git.openjdk.org/jdk/pull/16504 From jkern at openjdk.org Sat Nov 4 07:20:06 2023 From: jkern at openjdk.org (Joachim Kern) Date: Sat, 4 Nov 2023 07:20:06 GMT Subject: RFR: 8319233: AArch64: Build failure with clang due to -Wformat-nonliteral warning In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 02:50:22 GMT, Hao Sun wrote: > The root cause is that an incorrect variant of function VMError::report_and_die() is used. We should introduce another variadic function, just as macos_aarch64 did before. > > GCC toolchain: > From [1][2], GCC differs from Clang in flag -Wformat-nonliteral slightly, i.e. GCC may **not** raise a warning if "the format function takes its fromat arguments as a va_list". That's why this issue is not exposed by GCC toolchain before. > > Besides, I suppose platforms ppc and risc-v may have the same issue. > > [1] https://gcc.gnu.org/onlinedocs/gcc-11.4.0/gcc/Warning-Options.html > [2] https://releases.llvm.org/14.0.0/tools/clang/docs/DiagnosticsReference.html We consumed your PR overnight. The AIX ppc64 based on clang was fine and without regressions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16486#issuecomment-1793368307 From fyang at openjdk.org Sat Nov 4 07:28:09 2023 From: fyang at openjdk.org (Fei Yang) Date: Sat, 4 Nov 2023 07:28:09 GMT Subject: RFR: 8319233: AArch64: Build failure with clang due to -Wformat-nonliteral warning In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 15:13:45 GMT, Joachim Kern wrote: > We will test your PR overnight in our test suite to check for regressions. @shqking : Sorry, I don't have a clang for linux-riscv at hand. I guess @VladimirKempik might want to give it a try :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16486#issuecomment-1793369662 From vkempik at openjdk.org Sat Nov 4 07:46:07 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Sat, 4 Nov 2023 07:46:07 GMT Subject: RFR: 8319233: AArch64: Build failure with clang due to -Wformat-nonliteral warning In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 02:50:22 GMT, Hao Sun wrote: > The root cause is that an incorrect variant of function VMError::report_and_die() is used. We should introduce another variadic function, just as macos_aarch64 did before. > > GCC toolchain: > From [1][2], GCC differs from Clang in flag -Wformat-nonliteral slightly, i.e. GCC may **not** raise a warning if "the format function takes its fromat arguments as a va_list". That's why this issue is not exposed by GCC toolchain before. > > Besides, I suppose platforms ppc and risc-v may have the same issue. > > [1] https://gcc.gnu.org/onlinedocs/gcc-11.4.0/gcc/Warning-Options.html > [2] https://releases.llvm.org/14.0.0/tools/clang/docs/DiagnosticsReference.html I'll try after the weekend ------------- PR Comment: https://git.openjdk.org/jdk/pull/16486#issuecomment-1793373071 From aph at openjdk.org Sat Nov 4 09:20:38 2023 From: aph at openjdk.org (Andrew Haley) Date: Sat, 4 Nov 2023 09:20:38 GMT Subject: Integrated: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic In-Reply-To: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Tue, 11 Oct 2022 16:02:41 GMT, Andrew Haley wrote: > A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. > > The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 > > One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. > > However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. This pull request has now been integrated. Changeset: df599dbb Author: Andrew Haley URL: https://git.openjdk.org/jdk/commit/df599dbb9b0f0ee96d1ec767ac8821f164ab075d Stats: 277 lines in 10 files changed: 275 ins; 0 del; 2 mod 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic Reviewed-by: ihse, dholmes, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/10661 From vkempik at openjdk.org Sat Nov 4 15:50:07 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Sat, 4 Nov 2023 15:50:07 GMT Subject: RFR: 8319233: AArch64: Build failure with clang due to -Wformat-nonliteral warning In-Reply-To: References: Message-ID: <1bQk4WJCdM6vnLF10HuvAl-u2hoEpLPE8BFaweXlqLw=.3270b714-dad3-4dd6-8442-d0de314d2a6c@github.com> On Fri, 3 Nov 2023 02:50:22 GMT, Hao Sun wrote: > The root cause is that an incorrect variant of function VMError::report_and_die() is used. We should introduce another variadic function, just as macos_aarch64 did before. > > GCC toolchain: > From [1][2], GCC differs from Clang in flag -Wformat-nonliteral slightly, i.e. GCC may **not** raise a warning if "the format function takes its fromat arguments as a va_list". That's why this issue is not exposed by GCC toolchain before. > > Besides, I suppose platforms ppc and risc-v may have the same issue. > > [1] https://gcc.gnu.org/onlinedocs/gcc-11.4.0/gcc/Warning-Options.html > [2] https://releases.llvm.org/14.0.0/tools/clang/docs/DiagnosticsReference.html looks better on risc-v now, we still have issues with register storage specifier is deprecated and incompatible with c++17 for vm_version_linux_riscv.cpp line 77. but it's unrelated to this PR. This issue have gone: os_linux_riscv.cpp:255:62: error: format string is not a string literal [-Werror,-Wformat-nonliteral] 255 | VMError::report_and_die(thread, uc, nullptr, 0, msg, detail_msg, va_dummy); | ^~~~~~~~~~ os_linux_riscv.cpp:255:74: error: variable 'va_dummy' is uninitialized when used here [-Werror,-Wuninitialized] 255 | VMError::report_and_die(thread, uc, nullptr, 0, msg, detail_msg, va_dummy); ------------- PR Comment: https://git.openjdk.org/jdk/pull/16486#issuecomment-1793480523 From haosun at openjdk.org Mon Nov 6 01:08:07 2023 From: haosun at openjdk.org (Hao Sun) Date: Mon, 6 Nov 2023 01:08:07 GMT Subject: RFR: 8319233: AArch64: Build failure with clang due to -Wformat-nonliteral warning In-Reply-To: References: Message-ID: On Sat, 4 Nov 2023 07:17:25 GMT, Joachim Kern wrote: >> The root cause is that an incorrect variant of function VMError::report_and_die() is used. We should introduce another variadic function, just as macos_aarch64 did before. >> >> GCC toolchain: >> From [1][2], GCC differs from Clang in flag -Wformat-nonliteral slightly, i.e. GCC may **not** raise a warning if "the format function takes its fromat arguments as a va_list". That's why this issue is not exposed by GCC toolchain before. >> >> Besides, I suppose platforms ppc and risc-v may have the same issue. >> >> [1] https://gcc.gnu.org/onlinedocs/gcc-11.4.0/gcc/Warning-Options.html >> [2] https://releases.llvm.org/14.0.0/tools/clang/docs/DiagnosticsReference.html > > We consumed your PR overnight. The AIX ppc64 based on clang was fine and without regressions. Thanks a lot for your verification on ppc and risc-v. @JoKern65 and @VladimirKempik. I also tested JDK build with gcc and clang on linux/aarch64, linux/x86 and macos/aarch64. Hence, I think this PR is ready to go. I will integrate it tomorrow if there are no more comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16486#issuecomment-1793922862 From luhenry at openjdk.org Mon Nov 6 01:27:10 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 6 Nov 2023 01:27:10 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v2] In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 12:17:39 GMT, Hamlin Li wrote: >> Hi, >> Can you review the change to add intrinsic for CompressBits for Long & Integer? >> Thanks! >> >> ##?Test >> pass jtreg test: >> test/jdk/java/lang/CompressExpand*.java > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > remove the new vm option, using Matcher::match_rule_supported instead; move code to riscv_v.ad and C2_MacroAssembler src/hotspot/cpu/riscv/riscv.ad line 1897: > 1895: > 1896: case Op_CompressBits: > 1897: return UseRVV && (MaxVectorSize >= 16); Isn't it guaranteed that `MaxVectorSize >= 16` if `UseRVV` is true? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1382702413 From dholmes at openjdk.org Mon Nov 6 05:04:13 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 6 Nov 2023 05:04:13 GMT Subject: RFR: 8319429: Don't zero out mxcsr flag bits on ECore In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 22:32:44 GMT, Volodymyr Paprotski wrote: > Improves vector rounding on ECore about 10x > > (BEFORE) FpRoundingBenchmark.test_round_float 2048 thrpt 3 40.912 ? 0.044 ops/ms > (AFTER ) FpRoundingBenchmark.test_round_float 2048 thrpt 3 431.682 ? 0.727 ops/ms Changes requested by dholmes (Reviewer). src/hotspot/cpu/x86/vm_version_x86.cpp line 864: > 862: (_model == 0x97 || _model == 0xAC || _model == 0xAF)) { > 863: FLAG_SET_DEFAULT(DoEcoreOpt, true); > 864: } And what should happen if the flag is set true by the user and there is no Ecore? What affect will that have? Should it be allowed? src/hotspot/share/runtime/globals.hpp line 574: > 572: product(bool, DoEcoreOpt, false, DIAGNOSTIC, \ > 573: "Perform Ecore Optimization") \ > 574: \ I think this should be a CPU specific flag in ./cpu/x86/globals_x86.hpp (similar to how we have linux specific flags in ./os/linux/globals_linux.hpp). Also the description should clarify that the default is actually true for Ecore systems, and false elsewhere. ------------- PR Review: https://git.openjdk.org/jdk/pull/16504#pullrequestreview-1714266950 PR Review Comment: https://git.openjdk.org/jdk/pull/16504#discussion_r1382784036 PR Review Comment: https://git.openjdk.org/jdk/pull/16504#discussion_r1382783715 From fyang at openjdk.org Mon Nov 6 07:08:09 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 6 Nov 2023 07:08:09 GMT Subject: RFR: 8319408: RISC-V: MaxVectorSize is not consistently checked in several situations In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 16:11:15 GMT, Hamlin Li wrote: > Hi, > Can you review the change to fix the MaxVectorSize checking in vm_version_riscv.cpp? > Thanks! > > Current logic will not check whether (MaxVectorSize < 16), after the assignment `MaxVectorSize = _initial_vector_length;`, in following situation. > a) if FLAG_IS_DEFAULT(MaxVectorSize) == true > b) if FLAG_IS_DEFAULT(MaxVectorSize) == false and (MaxVectorSize >= 16) and is_power_of_2(MaxVectorSize) and (MaxVectorSize > _initial_vector_length) > > And in original code, the logic is not consistent for the situations between MaxVectorSize < 16 and MaxVectorSize >= 16, when is_power_of_2(MaxVectorSize) == false; for the former (<16) it's to disable RVV, for the latter (>=16) it's vm_exit. src/hotspot/cpu/riscv/vm_version_riscv.cpp line 315: > 313: warning("RVV does not support vector length less than 16 bytes. Disabling RVV."); > 314: UseRVV = false; > 315: FLAG_SET_DEFAULT(MaxVectorSize, 0); I think setting `UseRVV` to false would be enough in this case. Why bother resetting MaxVectorSize to 0? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16498#discussion_r1382854272 From thartmann at openjdk.org Mon Nov 6 08:14:15 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 6 Nov 2023 08:14:15 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v3] In-Reply-To: References: Message-ID: On Wed, 1 Nov 2023 13:58:49 GMT, Jorn Vernee wrote: >> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: >> >> add some assertion to IR test that check for compilation and deoptimization > > src/hotspot/share/runtime/sharedRuntime.cpp line 681: > >> 679: // for given exception >> 680: // Note that the implementation of this method assumes it's only called when an exception has actually occured >> 681: address SharedRuntime::compute_compiled_exc_handler(CompiledMethod* cm, address ret_pc, Handle& exception, > > One thing of note for this function: We don't look up the exception handler bci for JVMCI compiled methods, so I've not added any profiling in the case of JVMCI (see the `#if INCLUDE_JVMCI` block at the start of the function). > > This means that when using a JVMCI compiler, exception handlers might appear as untaken, when they are actually taken. This should be fine since the profiling information is currently only used by C2. But, if a JVMCI compiler wants to start using the profiling information e.g. to prune dead exception handlers as well, then profiling needs to be implement first. > > Please let me know if this is okay. I think that's fine (@dougxc, fyi). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16416#discussion_r1382907246 From mli at openjdk.org Mon Nov 6 08:50:11 2023 From: mli at openjdk.org (Hamlin Li) Date: Mon, 6 Nov 2023 08:50:11 GMT Subject: RFR: 8319408: RISC-V: MaxVectorSize is not consistently checked in several situations In-Reply-To: References: Message-ID: On Mon, 6 Nov 2023 07:05:14 GMT, Fei Yang wrote: >> Hi, >> Can you review the change to fix the MaxVectorSize checking in vm_version_riscv.cpp? >> Thanks! >> >> Current logic will not check whether (MaxVectorSize < 16), after the assignment `MaxVectorSize = _initial_vector_length;`, in following situation. >> a) if FLAG_IS_DEFAULT(MaxVectorSize) == true >> b) if FLAG_IS_DEFAULT(MaxVectorSize) == false and (MaxVectorSize >= 16) and is_power_of_2(MaxVectorSize) and (MaxVectorSize > _initial_vector_length) >> >> And in original code, the logic is not consistent for the situations between MaxVectorSize < 16 and MaxVectorSize >= 16, when is_power_of_2(MaxVectorSize) == false; for the former (<16) it's to disable RVV, for the latter (>=16) it's vm_exit. > > src/hotspot/cpu/riscv/vm_version_riscv.cpp line 315: > >> 313: warning("RVV does not support vector length less than 16 bytes. Disabling RVV."); >> 314: UseRVV = false; >> 315: FLAG_SET_DEFAULT(MaxVectorSize, 0); > > I think setting `UseRVV` to false would be enough in this case. Why bother resetting MaxVectorSize to 0? This is trying to be consistent with the code at LINE 295 in this file. Or should I also remove L294-296? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16498#discussion_r1382946319 From mli at openjdk.org Mon Nov 6 08:52:10 2023 From: mli at openjdk.org (Hamlin Li) Date: Mon, 6 Nov 2023 08:52:10 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v2] In-Reply-To: References: Message-ID: On Mon, 6 Nov 2023 01:24:37 GMT, Ludovic Henry wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> remove the new vm option, using Matcher::match_rule_supported instead; move code to riscv_v.ad and C2_MacroAssembler > > src/hotspot/cpu/riscv/riscv.ad line 1897: > >> 1895: >> 1896: case Op_CompressBits: >> 1897: return UseRVV && (MaxVectorSize >= 16); > > Isn't it guaranteed that `MaxVectorSize >= 16` if `UseRVV` is true? After https://github.com/openjdk/jdk/pull/16498, it should be guaranteed `MaxVectorSize >= 16`. Let me remove this condition after pr #16498 is pushed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1382949011 From thartmann at openjdk.org Mon Nov 6 08:55:18 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 6 Nov 2023 08:55:18 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v3] In-Reply-To: References: Message-ID: On Thu, 2 Nov 2023 16:27:35 GMT, Jorn Vernee wrote: >> The issue is essentially that for the Java try-with-resource construct, javac generates multiple calls to `close(`) the resource. One of those calls is inside the hidden exception handler of the try block. The issue for us is that typically the exception handler is never entered (since no exception is thrown), however we don't profile exception handlers at the moment, so the block is not pruned. C2 doesn't inline the `close()` call in the handler due to low call site frequency. As a result, the receiver of that call escapes and can not be scalar replaced, which then leads to a loss in performance. >> >> There has been some discussion on the JBS issue that this could be fixed by profiling catch blocks. And another suggestion that partial escape analysis could help here to prevent the object from escaping. But, I think there are other benefits to being able to prune dead catch blocks, such as general reduction in code size, and other optimizations being possible by dead code being eliminated. So, I've implemented catch block profiling + pruning in this patch. >> >> The implementation is essentially very straightforward: we allocate an extra bit of profiling data for each >> exception handler of a method in the `MethodData` for that method (which holds all the profiling >> data). Then when looking up the exception handler after an exception is thrown, we mark the >> exception handler as entered. When C2 parses the exception handler block, and it sees that it has >> never been entered, we emit an uncommon trap instead. >> >> I've also cleaned up the handling of profiling data sections a bit. After adding the extra section of data to MethodData, I was seeing several crashes when ciMethodData was used. The underlying issue seemed to be that the offset of the parameter data was computed based on the total data size - parameter data size (which doesn't work if we add an additional section for exception handler data). I've re-written the code around this a bit to try and prevent issues in the future. Both MethodData and ciMethodData now track offsets of parameter data and exception handler data, and the size of the each data section is derived from the offsets. >> >> Finally, there was an assert firing in `freeze_internal` in `continuationFreezeThaw.cpp`: >> >> assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), >> "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > add some assertion to IR test that check for compilation and deoptimization Great work, Jorn! The changes look good to me. `-XX:+StressPrunedExceptionHandlers` and probably also `-XX:-ProfileExceptionHandlers` should be added to our testing, at least to [JDK-8295559](https://bugs.openjdk.org/browse/JDK-8295559) (internal). src/hotspot/share/ci/ciMethodData.hpp line 525: > 523: > 524: // pointers to sections in _data > 525: // NOTE: these may be called before ciMethodData::load_data (is that a bug?). I don't think we should add a "is that a bug?" comment here but either investigate right away or file a follow-up RFE to keep track of that. I don't think it's a bug though. test/hotspot/jtreg/compiler/c2/TestExHandlerTrap.java line 37: > 35: * -Xbatch > 36: * -Xlog:deoptimization=trace > 37: * -XX:CompileCommand=PrintCompilation,compiler.c2.TestExHandlerTrap::payload Should the logging/printing be removed? test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 1536: > 1534: public static final String UNREACHED_TRAP = PREFIX + "UNREACHED_TRAP" + POSTFIX; > 1535: static { > 1536: trapNodes(UNREACHED_TRAP,"unreached"); Suggestion: trapNodes(UNREACHED_TRAP, "unreached"); ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16416#pullrequestreview-1714480033 PR Review Comment: https://git.openjdk.org/jdk/pull/16416#discussion_r1382910906 PR Review Comment: https://git.openjdk.org/jdk/pull/16416#discussion_r1382943362 PR Review Comment: https://git.openjdk.org/jdk/pull/16416#discussion_r1382945696 From thartmann at openjdk.org Mon Nov 6 08:55:19 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 6 Nov 2023 08:55:19 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v3] In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 14:08:13 GMT, Jorn Vernee wrote: > Given the complexity & uncertainty around the benefits of using a frequency-based heuristic, I backed off on that idea, and went with the simpler approach implemented by this patch. I agree. If we find a real world use case for the rarely taken case, we can still re-evaluate and potentially add the additional complexity. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1794339136 From thartmann at openjdk.org Mon Nov 6 08:55:21 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 6 Nov 2023 08:55:21 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v3] In-Reply-To: References: Message-ID: On Mon, 6 Nov 2023 08:46:32 GMT, Tobias Hartmann wrote: >> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: >> >> add some assertion to IR test that check for compilation and deoptimization > > test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 1536: > >> 1534: public static final String UNREACHED_TRAP = PREFIX + "UNREACHED_TRAP" + POSTFIX; >> 1535: static { >> 1536: trapNodes(UNREACHED_TRAP,"unreached"); > > Suggestion: > > trapNodes(UNREACHED_TRAP, "unreached"); Same in above code (could be fixed as well). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16416#discussion_r1382946354 From dnsimon at openjdk.org Mon Nov 6 09:07:17 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 6 Nov 2023 09:07:17 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v3] In-Reply-To: References: Message-ID: On Mon, 6 Nov 2023 08:11:01 GMT, Tobias Hartmann wrote: >> src/hotspot/share/runtime/sharedRuntime.cpp line 681: >> >>> 679: // for given exception >>> 680: // Note that the implementation of this method assumes it's only called when an exception has actually occured >>> 681: address SharedRuntime::compute_compiled_exc_handler(CompiledMethod* cm, address ret_pc, Handle& exception, >> >> One thing of note for this function: We don't look up the exception handler bci for JVMCI compiled methods, so I've not added any profiling in the case of JVMCI (see the `#if INCLUDE_JVMCI` block at the start of the function). >> >> This means that when using a JVMCI compiler, exception handlers might appear as untaken, when they are actually taken. This should be fine since the profiling information is currently only used by C2. But, if a JVMCI compiler wants to start using the profiling information e.g. to prune dead exception handlers as well, then profiling needs to be implement first. >> >> Please let me know if this is okay. > > I think that's fine (@dougxc, fyi). I also think it's fine but would like @tkrodriguez to confirm as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16416#discussion_r1382969752 From fyang at openjdk.org Mon Nov 6 09:25:11 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 6 Nov 2023 09:25:11 GMT Subject: RFR: 8319408: RISC-V: MaxVectorSize is not consistently checked in several situations In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 16:11:15 GMT, Hamlin Li wrote: > Hi, > Can you review the change to fix the MaxVectorSize checking in vm_version_riscv.cpp? > Thanks! > > Current logic will not check whether (MaxVectorSize < 16), after the assignment `MaxVectorSize = _initial_vector_length;`, in following situation. > a) if FLAG_IS_DEFAULT(MaxVectorSize) == true > b) if FLAG_IS_DEFAULT(MaxVectorSize) == false and (MaxVectorSize >= 16) and is_power_of_2(MaxVectorSize) and (MaxVectorSize > _initial_vector_length) > > And in original code, the logic is not consistent for the situations between MaxVectorSize < 16 and MaxVectorSize >= 16, when is_power_of_2(MaxVectorSize) == false; for the former (<16) it's to disable RVV, for the latter (>=16) it's vm_exit. Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16498#pullrequestreview-1714609633 From fyang at openjdk.org Mon Nov 6 09:25:14 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 6 Nov 2023 09:25:14 GMT Subject: RFR: 8319408: RISC-V: MaxVectorSize is not consistently checked in several situations In-Reply-To: References: Message-ID: On Mon, 6 Nov 2023 08:47:01 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/vm_version_riscv.cpp line 315: >> >>> 313: warning("RVV does not support vector length less than 16 bytes. Disabling RVV."); >>> 314: UseRVV = false; >>> 315: FLAG_SET_DEFAULT(MaxVectorSize, 0); >> >> I think setting `UseRVV` to false would be enough in this case. Why bother resetting MaxVectorSize to 0? > > This is trying to be consistent with the code at LINE 295 in this file. > Or should I also remove L294-296? I see. Let's keep it for consistency. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16498#discussion_r1382990653 From yzheng at openjdk.org Mon Nov 6 09:31:14 2023 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 6 Nov 2023 09:31:14 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v3] In-Reply-To: References: Message-ID: On Mon, 6 Nov 2023 09:04:37 GMT, Doug Simon wrote: >> I think that's fine (@dougxc, fyi). > > I also think it's fine but would like @tkrodriguez to confirm as well. I think it should be straightforward to add this profile for JVMCI compiler at line 693 with if (t->bci() != -1) { // did we find a handler in this method? sd->method()->set_ex_handler_entered(t->bci()); // profile } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16416#discussion_r1382998917 From stefank at openjdk.org Mon Nov 6 09:53:35 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 6 Nov 2023 09:53:35 GMT Subject: RFR: 8318757 ObjectMonitor::deflate_monitor fails "assert(prev == old_value) failed: unexpected prev owner=0x0000000000000002, expected=0x0000000000000000" Message-ID: A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... ------------- Commit messages: - Add patricio's test - Rework ObjectMonitorsHashtable for thread dumping Changes: https://git.openjdk.org/jdk/pull/16519/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16519&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318757 Stats: 429 lines in 9 files changed: 216 ins; 147 del; 66 mod Patch: https://git.openjdk.org/jdk/pull/16519.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16519/head:pull/16519 PR: https://git.openjdk.org/jdk/pull/16519 From stefank at openjdk.org Mon Nov 6 09:53:36 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 6 Nov 2023 09:53:36 GMT Subject: RFR: 8318757 ObjectMonitor::deflate_monitor fails "assert(prev == old_value) failed: unexpected prev owner=0x0000000000000002, expected=0x0000000000000000" In-Reply-To: References: Message-ID: On Mon, 6 Nov 2023 09:46:11 GMT, Stefan Karlsson wrote: > A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... There are only two paths to run a safepointed monitor deflation pass: 1) VM_ThreadDump when it collects monitors and looks for deadlocks 2) In the "final audit" when we print information about interesting monitors in the system This PR deals with (1), which was the cause for the asserts in the bug report. The following describes what is happening and how we hit the assert: The stack trace of the asserting safepointed monitor deflation code: ObjectMonitor::deflate_monitor ObjectSynchronizer::deflate_monitor_list ObjectSynchronizer::deflate_idle_monitors VM_ThreadDump::doit VM_Operation::evaluate VMThread::evaluate_operation VMThread::inner_execute And the stack trace of the MonitorDeflationThread performing the paused async monitor deflation: SafepointSynchronize::block SafepointMechanism::process SafepointMechanism::process_if_requested ThreadBlockInVMPreprocess::~ThreadBlockInVMPreprocess ThreadBlockInVM::~ThreadBlockInVM ObjectSynchronizer::chk_for_block_req ObjectSynchronizer::deflate_monitor_list ObjectSynchronizer::deflate_idle_monitors MonitorDeflationThread::monitor_deflation_thread_entry The MonitorDeflationThread is inside the loop that is deflating object monitors on the `_in_use_list` list, but that it hasn't reached the point where it tries to unlink the deflated monitors. In the deflation loop we have checks to see if it is time safepoint (`chk_for_block_req`). At some point it *is* time to block (the stack trace above), and the VMThread starts to run the VM_ThreadDump safepoint operation, which also tries to deflate monitors on the `_in_use_list`. This is what then causes the assert to fail. The MonitorDeflationThread sets the `_owner` to `DEFLATER_MARKER` (value 0x2) and release the `_object` OopHandle. Then when the VMThread finds this ObjectMonitor it sees that `_object` is "cleared" and expects _owner to be nullptr and not 0x2. The assert comes from this path: if (obj == nullptr) { // If the object died, we can recycle the monitor without racing with // Java threads. The GC already broke the association with the object. set_owner_from(nullptr, DEFLATER_MARKER); My proposed fix for this is to stop deflating monitors from the safepoting VM_ThreadDump operation. This has the following benefits: 1) It solves this bug! 2) It removes work from a pause 3) It decouples the thread-dumping code from the Synchronizer code 4) It renames ObjectMonitorHashtable FWIW, we are currently experimenting with a patch to put ObjectMonitors in a hashtable in preparation for Lilliput. So, I had already written most of the patch to support (2, 3, 4). The patch removes the call to monitor_deflation from the VM_ThreadDump thread and instead it walks the in-use list of ObjectMonitors and only collects the much more limited set of ObjectMonitors that have an owner. Some arguments have been made that by deflating the monitors we can make the walk over the ObjectMonitors faster and that it helps other operations that have to visit them. In the included test (provided by Patricio) I can see that visiting the monitor lists takes a couple of tens of milliseconds when the list start to grow into the millions. To not regress this situation I added a call to poke the MonitorDeflationThread if the thread-dumping code sees that we have more than 100000 ObjectMonitors. This number is arbitrarily chosen, but it's around the point where walking the list started to take more than a couple of milliseconds on my machine. Does this make sense? Do you want a named constant for this somewhere? One thing to note from this patch is how the thread-dumping code is put inside the VM_ThreadDump operation, which is placed inside the vmOperations.hpp/cpp files and this code grows when I moved some of the existing code. An alternative to this could be to move VM_ThreadDump to threadServices.hpp/cpp. If we want to make that change, I would prefer to do it as a patch that goes in either before or after this fix. I've tested this with the provided jtreg test and with the test that originally reproduced the assert. I'm going to run this through our CI pipeline, but I'd like to get this PR out before that has been completed. I've been measuring the performance of this fix with this little patch to extract the times of the thread-dumping code: https://github.com/stefank/jdk/commit/e9086ac856602dc0957b8157ee568d0891e3402b ------------- PR Comment: https://git.openjdk.org/jdk/pull/16519#issuecomment-1794438582 From shade at openjdk.org Mon Nov 6 10:27:11 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 6 Nov 2023 10:27:11 GMT Subject: RFR: 8318757 ObjectMonitor::deflate_monitor fails "assert(prev == old_value) failed: unexpected prev owner=0x0000000000000002, expected=0x0000000000000000" In-Reply-To: References: Message-ID: On Mon, 6 Nov 2023 09:46:11 GMT, Stefan Karlsson wrote: > A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... Could you please rename the bug and PR to something that captures the essence of the bug, rather than the assert message? ------------- Changes requested by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16519#pullrequestreview-1714746354 From stuefe at openjdk.org Mon Nov 6 10:50:20 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 6 Nov 2023 10:50:20 GMT Subject: RFR: JDK-8319437: NMT should show library names in call stacks Message-ID: With this tiny enhancement, NMT shows library names in callstacks. ------------- Commit messages: - start Changes: https://git.openjdk.org/jdk/pull/16508/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16508&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319437 Stats: 21 lines in 1 file changed: 15 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/16508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16508/head:pull/16508 PR: https://git.openjdk.org/jdk/pull/16508 From shade at openjdk.org Mon Nov 6 10:54:06 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 6 Nov 2023 10:54:06 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v2] In-Reply-To: <2eXNHrpyHgdJQSGKW0fMQMCi0cVzd6hzaOTo5lLmFpg=.ee20bd05-3b39-4719-9d7e-4f7a54c78e81@github.com> References: <2eXNHrpyHgdJQSGKW0fMQMCi0cVzd6hzaOTo5lLmFpg=.ee20bd05-3b39-4719-9d7e-4f7a54c78e81@github.com> Message-ID: On Thu, 2 Nov 2023 20:57:31 GMT, Aleksey Shipilev wrote: > I think this is good for review. The reproducer that used to hang/fail on assert is now passing. `tier1 tier2 tier3` are all passing. I am running more tests overnight. Testing seems all good. I'll leave the `Linux` -> `Generic` switch in this PR, until the very last moment before integration to keep testing more easily. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1794549935 From sjohanss at openjdk.org Mon Nov 6 12:14:21 2023 From: sjohanss at openjdk.org (Stefan Johansson) Date: Mon, 6 Nov 2023 12:14:21 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v35] In-Reply-To: References: <6tngC-Jwyx8e25LGT8dAwKbaPb9qb_w5ONctnFieH3o=.61b013b2-beb9-4053-8c06-86a700208d77@github.com> Message-ID: <3iJqYOiXeO6bwtUmjzlO3tyFR9Uc28YAJ8aQbKbqKJM=.fda95001-86f3-4f09-8a74-e15be1987c4f@github.com> On Thu, 2 Nov 2023 01:20:59 GMT, Jonathan Joo wrote: > Could you elaborate a bit on what you were thinking of here? If we are assuming something like a thread that updates all other threads, I think this implementation could get a bit complicated. > > There are two main issues that we can see with a generic thread approach: > > 1. We would have to figure out how often to pull metrics from the various gc threads from the central thread, and possibly determine this frequency separately for every thread. Instead with our current implementation, we can manually trigger publishes based on when the GC thread is done doing work. > > 2. We would still need to tag each thread we want to track somewhere, and keep track of a mapping from thread to its counter name, etc. which doesn't seem to simplify things too much. (I imagine we will still need to touch numerous files to "tag" each thread with whether we want to track it or not?) I agree, I was not thinking about having a separate thread, more about trying to group this information in a way that it would be easier, for example, to provide periodic JFR events for the CPU times collected. Having something like a `CollectorCPUTimeCounters` (we already have `CollectorCounters`). Such a class could keep the different counters making it easier to get an overview of which CPU time counters are present. But this would also require a mapping between thread and counter (but it might be as simple as having an enum). I played around a bit instead of trying to explain what I mean and this is not very polished, but I was thinking something like this: https://github.com/openjdk/jdk/compare/pr/15082...kstefanj:jdk:pull/15082-idea What do you think? This way we don't add things to `CollectedHeap` as well, which is usually good unless really needed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15082#issuecomment-1794686146 From sjohanss at openjdk.org Mon Nov 6 12:20:21 2023 From: sjohanss at openjdk.org (Stefan Johansson) Date: Mon, 6 Nov 2023 12:20:21 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v37] In-Reply-To: References: Message-ID: On Wed, 1 Nov 2023 23:56:25 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with two additional commits since the last revision: > > - revert gitignore change > - Attempt to fix broken test > The existing sun.management:type=HotspotThreading MBean approach (discussed [here](https://mail.openjdk.org/pipermail/core-libs-dev/2023-September/111397.html)) could be another general way to track CPU. However, the discussion concludes that it is an internal API, and discourages users from using it. > @kstefanj , it is a pity that the `sun.management:type=HotspotThreading` MBean is not exported any more. If we move that or a similar functionality to a new MBean under `com.sun.management` (as proposed in the [cited discussion](https://mail.openjdk.org/pipermail/core-libs-dev/2023-September/111397.html)) then we might reuse these new hsperf counters in the same way this is already done by some other MBeans which already use hsperf counters as their information source. I think logging or JFR functionality could also easily be implemented on top of the new hsperf counters. I haven't looked at the details around this, but extracting some useful information from the internal bean and making it public sound reasonable to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15082#issuecomment-1794693970 From aboldtch at openjdk.org Mon Nov 6 12:53:13 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 6 Nov 2023 12:53:13 GMT Subject: RFR: 8318757: 8318757 VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor In-Reply-To: References: Message-ID: <_riE4qmrYNKLgzlHUBpRUlmrSobLlB1JEBySQX84S9Y=.44881eb0-c090-4548-b439-b60cf0a65de4@github.com> On Mon, 6 Nov 2023 09:46:11 GMT, Stefan Karlsson wrote: > A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... Looks good, just had a few thoughts / comments. src/hotspot/share/runtime/synchronizer.hpp line 135: > 133: > 134: // Iterate owned ObjectMonitors > 135: static void monitors_iterate(MonitorClosure* closure); Should the preconditions described in the implementation be documented in the header as well. Or at least have a note along the lines of `See ObjectSynchronizer::monitors_iterate_filtered for details on the contract for using this.` src/hotspot/share/runtime/vmOperations.cpp line 398: > 396: // to take more then a few milliseconds. > 397: size_t monitors_count = ObjectSynchronizer::in_use_list_count(); > 398: if (monitors_count > 100000) { Maybe use a named constant. test/hotspot/jtreg/runtime/Monitor/ConcurrentDeflation.java line 41: > 39: > 40: public class ConcurrentDeflation { > 41: public static final int TOTAL_RUN_TIME = 10 * 1000; Given that this test always runs for at least 10 seconds, maybe it should be excluded from tier1. See `test/hotspot/jtreg/TEST.groups`. Unsure what the praxis is here. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16519#pullrequestreview-1714840232 PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1383278380 PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1383132222 PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1383270353 From jvernee at openjdk.org Mon Nov 6 12:58:17 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 6 Nov 2023 12:58:17 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v3] In-Reply-To: References: Message-ID: On Mon, 6 Nov 2023 08:14:55 GMT, Tobias Hartmann wrote: >> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: >> >> add some assertion to IR test that check for compilation and deoptimization > > src/hotspot/share/ci/ciMethodData.hpp line 525: > >> 523: >> 524: // pointers to sections in _data >> 525: // NOTE: these may be called before ciMethodData::load_data (is that a bug?). > > I don't think we should add a "is that a bug?" comment here but either investigate right away or file a follow-up RFE to keep track of that. I don't think it's a bug though. Right. This is a leftover personal note that I added during development. Will remove. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16416#discussion_r1383286238 From jvernee at openjdk.org Mon Nov 6 13:15:18 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 6 Nov 2023 13:15:18 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v3] In-Reply-To: References: Message-ID: <4kP-cNv7NVLPCLIvtNtZodu7KqWukhs_tMlG8pQugF0=.6c18c065-da2d-4c87-82e7-0dbc257ac7f1@github.com> On Mon, 6 Nov 2023 08:44:27 GMT, Tobias Hartmann wrote: >> Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: >> >> add some assertion to IR test that check for compilation and deoptimization > > test/hotspot/jtreg/compiler/c2/TestExHandlerTrap.java line 37: > >> 35: * -Xbatch >> 36: * -Xlog:deoptimization=trace >> 37: * -XX:CompileCommand=PrintCompilation,compiler.c2.TestExHandlerTrap::payload > > Should the logging/printing be removed? I've left it since the output is pretty minimal, and it seems like it might be useful in case the test ever fails in CI? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16416#discussion_r1383310959 From stefank at openjdk.org Mon Nov 6 13:14:44 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 6 Nov 2023 13:14:44 GMT Subject: RFR: 8318757: 8318757 VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor [v2] In-Reply-To: <_riE4qmrYNKLgzlHUBpRUlmrSobLlB1JEBySQX84S9Y=.44881eb0-c090-4548-b439-b60cf0a65de4@github.com> References: <_riE4qmrYNKLgzlHUBpRUlmrSobLlB1JEBySQX84S9Y=.44881eb0-c090-4548-b439-b60cf0a65de4@github.com> Message-ID: On Mon, 6 Nov 2023 12:49:06 GMT, Axel Boldt-Christmas wrote: >> Stefan Karlsson has updated the pull request incrementally with three additional commits since the last revision: >> >> - Move ConcurrentDeflation.java out of tier1 >> - Update comments >> - Use a named constant > > src/hotspot/share/runtime/synchronizer.hpp line 135: > >> 133: >> 134: // Iterate owned ObjectMonitors >> 135: static void monitors_iterate(MonitorClosure* closure); > > Should the preconditions described in the implementation be documented in the header as well. Or at least have a note along the lines of `See ObjectSynchronizer::monitors_iterate_filtered for details on the contract for using this.` I've unified and updated the comments both files. > src/hotspot/share/runtime/vmOperations.cpp line 398: > >> 396: // to take more then a few milliseconds. >> 397: size_t monitors_count = ObjectSynchronizer::in_use_list_count(); >> 398: if (monitors_count > 100000) { > > Maybe use a named constant. Done > test/hotspot/jtreg/runtime/Monitor/ConcurrentDeflation.java line 41: > >> 39: >> 40: public class ConcurrentDeflation { >> 41: public static final int TOTAL_RUN_TIME = 10 * 1000; > > Given that this test always runs for at least 10 seconds, maybe it should be excluded from tier1. See `test/hotspot/jtreg/TEST.groups`. > > Unsure what the praxis is here. Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1383308219 PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1383308671 PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1383307219 From jvernee at openjdk.org Mon Nov 6 13:19:12 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 6 Nov 2023 13:19:12 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v3] In-Reply-To: References: Message-ID: <_khhREnL3iU3Z8r98FCLip8QhHmvKM1Ie8f82DAi98k=.d503abc8-087e-4b4c-b930-0966a3be6bd3@github.com> On Mon, 6 Nov 2023 09:28:38 GMT, Yudi Zheng wrote: >> I also think it's fine but would like @tkrodriguez to confirm as well. > > I think it should be straightforward to add this profile for JVMCI compiler at line 693 with > > if (t->bci() != -1) { // did we find a handler in this method? > sd->method()->set_ex_handler_entered(t->bci()); // profile > } Note that we don't have an `sd` at that point in the code. `cm->method()` only points at the top-level method, so we'd have to do the same method/handler bci lookup as we do for C2 in the big `while` loop below the JVMCI code. Given that we don't do that already, I figured JVMCI exception handler routing just worked differently, and there was another point (on the Java side maybe) where the method + bci is looked up. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16416#discussion_r1383314504 From stefank at openjdk.org Mon Nov 6 13:14:42 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 6 Nov 2023 13:14:42 GMT Subject: RFR: 8318757: 8318757 VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor [v2] In-Reply-To: References: Message-ID: > A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... Stefan Karlsson has updated the pull request incrementally with three additional commits since the last revision: - Move ConcurrentDeflation.java out of tier1 - Update comments - Use a named constant ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16519/files - new: https://git.openjdk.org/jdk/pull/16519/files/345e5214..591bd110 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16519&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16519&range=00-01 Stats: 8 lines in 4 files changed: 3 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16519.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16519/head:pull/16519 PR: https://git.openjdk.org/jdk/pull/16519 From jvernee at openjdk.org Mon Nov 6 14:39:54 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 6 Nov 2023 14:39:54 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v3] In-Reply-To: References: Message-ID: On Mon, 6 Nov 2023 08:50:13 GMT, Tobias Hartmann wrote: > `-XX:+StressPrunedExceptionHandlers` and probably also `-XX:-ProfileExceptionHandlers` should be added to our testing, at least to [JDK-8295559](https://bugs.openjdk.org/browse/JDK-8295559) (internal). I've added to more runs of `TestExHandlerTrap` with these flags turned on as a basic smoke test. I agree it would be useful to do broader stress testing as well. Do note that a handful of compiler tests fail with `-XX:+StressPrunedExceptionHandlers` since they test for a very particular sequence of compilation and deoptimization, and the stress option introduces more deoptimizations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1794971361 From jvernee at openjdk.org Mon Nov 6 14:39:54 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 6 Nov 2023 14:39:54 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v4] In-Reply-To: References: Message-ID: > The issue is essentially that for the Java try-with-resource construct, javac generates multiple calls to `close(`) the resource. One of those calls is inside the hidden exception handler of the try block. The issue for us is that typically the exception handler is never entered (since no exception is thrown), however we don't profile exception handlers at the moment, so the block is not pruned. C2 doesn't inline the `close()` call in the handler due to low call site frequency. As a result, the receiver of that call escapes and can not be scalar replaced, which then leads to a loss in performance. > > There has been some discussion on the JBS issue that this could be fixed by profiling catch blocks. And another suggestion that partial escape analysis could help here to prevent the object from escaping. But, I think there are other benefits to being able to prune dead catch blocks, such as general reduction in code size, and other optimizations being possible by dead code being eliminated. So, I've implemented catch block profiling + pruning in this patch. > > The implementation is essentially very straightforward: we allocate an extra bit of profiling data for each > exception handler of a method in the `MethodData` for that method (which holds all the profiling > data). Then when looking up the exception handler after an exception is thrown, we mark the > exception handler as entered. When C2 parses the exception handler block, and it sees that it has > never been entered, we emit an uncommon trap instead. > > I've also cleaned up the handling of profiling data sections a bit. After adding the extra section of data to MethodData, I was seeing several crashes when ciMethodData was used. The underlying issue seemed to be that the offset of the parameter data was computed based on the total data size - parameter data size (which doesn't work if we add an additional section for exception handler data). I've re-written the code around this a bit to try and prevent issues in the future. Both MethodData and ciMethodData now track offsets of parameter data and exception handler data, and the size of the each data section is derived from the offsets. > > Finally, there was an assert firing in `freeze_internal` in `continuationFreezeThaw.cpp`: > > assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), > "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int64_t)current->held_monitor_count... Jorn Vernee has updated the pull request incrementally with three additional commits since the last revision: - remove leftover comment - Add smoke tests for -XX:+StressPrunedExceptionHandlers and -XX:-ProfileExceptionHandlers - Add missing spaces to IRNode ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16416/files - new: https://git.openjdk.org/jdk/pull/16416/files/3ad93fdd..261cdb0e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16416&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16416&range=02-03 Stats: 42 lines in 3 files changed: 30 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/16416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16416/head:pull/16416 PR: https://git.openjdk.org/jdk/pull/16416 From tschatzl at openjdk.org Mon Nov 6 14:51:49 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 6 Nov 2023 14:51:49 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v11] In-Reply-To: References: Message-ID: > The JEP covers the idea very well, so I'm only covering some implementation details here: > > * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. > > * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: > > * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. > > * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). > > * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. > > * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) > > The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. > > I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. > > * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in addition to something like: > > `GC(6) P... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: - Merge tag 'jdk-22+21' into 8318706-implementation-of-region-pinning-in-g1 Added tag jdk-22+21 for changeset d96f38b8 - iwalulya review - typos - ayang review - renamings + documentation - Add documentation about why and how we handle pinned regions in the young/old generation. - Renamings to (almost) consistently use the following nomenclature for evacuation failure and types of it: * evacuation failure is the general concept. It includes * pinned regions * allocation failure One region can both be pinned and experience an allocation failure. G1 GC messages use tags "(Pinned)" and "(Allocation Failure)" now instead of "(Evacuation Failure)" Did not rename the G1EvacFailureInjector since this adds a lot of noise. - NULL -> nullptr - Fix compilation - Improve TestPinnedOldObjectsEvacuation test - Move tests into gc.g1.pinnedobjs package - ... and 5 more: https://git.openjdk.org/jdk/compare/24b37ee3...251f4d38 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16342/files - new: https://git.openjdk.org/jdk/pull/16342/files/f9735539..251f4d38 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=09-10 Stats: 4795 lines in 166 files changed: 2660 ins; 1554 del; 581 mod Patch: https://git.openjdk.org/jdk/pull/16342.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16342/head:pull/16342 PR: https://git.openjdk.org/jdk/pull/16342 From duke at openjdk.org Mon Nov 6 14:53:39 2023 From: duke at openjdk.org (Thomas Obermeier) Date: Mon, 6 Nov 2023 14:53:39 GMT Subject: RFR: 8306561: Possible out of bounds access in print_pointer_information [v6] In-Reply-To: References: Message-ID: On Thu, 2 Nov 2023 19:44:12 GMT, Dean Long wrote: >> Thomas Obermeier has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'JDK-8306561' of https://github.com/TOatGithub/jdk into JDK-8306561 >> - 8306561: test range instead of endpoints before casting > > src/hotspot/share/nmt/mallocTracker.cpp line 215: > >> 213: for (; here >= end; here -= smallest_possible_alignment) { >> 214: // JDK-8306561: cast to a MallocHeader needs to guarantee it can reside in readable memory >> 215: if (!os::is_readable_range(here, here + sizeof(MallocHeader) - 1)) { > > Sorry I noticed this late, but the " - 1" looks wrong here, because is_readable_range() checks for < `to`, not <= `to`. Hi Dean, thanks for finding this. I opened https://bugs.openjdk.org/browse/JDK-8319542 to address this and will fix it in a timely manner. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16381#discussion_r1383459624 From never at openjdk.org Mon Nov 6 15:43:12 2023 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 6 Nov 2023 15:43:12 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v4] In-Reply-To: <_khhREnL3iU3Z8r98FCLip8QhHmvKM1Ie8f82DAi98k=.d503abc8-087e-4b4c-b930-0966a3be6bd3@github.com> References: <_khhREnL3iU3Z8r98FCLip8QhHmvKM1Ie8f82DAi98k=.d503abc8-087e-4b4c-b930-0966a3be6bd3@github.com> Message-ID: <5ENQf2G5VvA_nTy1rWcZPg6xDuqcgBBxf88tFcWOd90=.fea26ba2-c681-4ad3-b7da-2834070bf4a0@github.com> On Mon, 6 Nov 2023 13:15:36 GMT, Jorn Vernee wrote: >> I think it should be straightforward to add this profile for JVMCI compiler at line 693 with >> >> if (t->bci() != -1) { // did we find a handler in this method? >> sd->method()->set_ex_handler_entered(t->bci()); // profile >> } > > Note that we don't have an `sd` at that point in the code. `cm->method()` only points at the top-level method, so we'd have to do the same method/handler bci lookup as we do for C2 in the big `while` loop below the JVMCI code. Given that we don't do that already, I figured JVMCI exception handler routing just worked differently, and there was another point (on the Java side maybe) where the method + bci is looked up. I think this fine for now as well. The code as written just can't work for JVMCI since we don't even enter that part of the logic. We do a straight lookup based on the pc offset and dispatch there directly. The generated code actually does the exception dispatch for the inlined scopes so it's not visible to the runtime which handler actually services the request. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16416#discussion_r1383539526 From jsjolen at openjdk.org Mon Nov 6 15:44:39 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 6 Nov 2023 15:44:39 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v10] In-Reply-To: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: > Hi, > > When using at_put and at_put_grow you can provide a value which will be supplied to the constructor of each element. In other words, you can intialize each element through a copy constructor. > > I suggest that we also provide a function equivalent where the function is provided a pointer to the memory to be initialized. This can be used for `NONCOPYABLE` classes, for example. > > This is implemented using a SFINAE pattern because `nullptr` introduces ambiguity if you use static overload. > > Currently running tier1-tier4. Johan Sj?len has updated the pull request incrementally with four additional commits since the last revision: - Add another test - Reintroduce the functional variant - Remove unnecessary - 1 space ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16409/files - new: https://git.openjdk.org/jdk/pull/16409/files/535c5c9d..7365593d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=08-09 Stats: 115 lines in 4 files changed: 66 ins; 40 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/16409.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16409/head:pull/16409 PR: https://git.openjdk.org/jdk/pull/16409 From duke at openjdk.org Mon Nov 6 15:49:13 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 6 Nov 2023 15:49:13 GMT Subject: RFR: 8319429: Don't zero out mxcsr flag bits on ECore In-Reply-To: References: Message-ID: On Mon, 6 Nov 2023 05:01:33 GMT, David Holmes wrote: >> Improves vector rounding on ECore about 10x >> >> (BEFORE) FpRoundingBenchmark.test_round_float 2048 thrpt 3 40.912 ? 0.044 ops/ms >> (AFTER ) FpRoundingBenchmark.test_round_float 2048 thrpt 3 431.682 ? 0.727 ops/ms > > src/hotspot/cpu/x86/vm_version_x86.cpp line 864: > >> 862: (_model == 0x97 || _model == 0xAC || _model == 0xAF)) { >> 863: FLAG_SET_DEFAULT(DoEcoreOpt, true); >> 864: } > > And what should happen if the flag is set true by the user and there is no Ecore? What affect will that have? Should it be allowed? >From ISA point of view, they are the same, so if user flips flag on purpose, code will still be correct. Its also helpful to test Ecore optimized code on a Pcore (I have some more patches coming in under this option soon) > src/hotspot/share/runtime/globals.hpp line 574: > >> 572: product(bool, DoEcoreOpt, false, DIAGNOSTIC, \ >> 573: "Perform Ecore Optimization") \ >> 574: \ > > I think this should be a CPU specific flag in ./cpu/x86/globals_x86.hpp (similar to how we have linux specific flags in ./os/linux/globals_linux.hpp). Also the description should clarify that the default is actually true for Ecore systems, and false elsewhere. Thanks, didn't know about that file, will try. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16504#discussion_r1383548150 PR Review Comment: https://git.openjdk.org/jdk/pull/16504#discussion_r1383547163 From never at openjdk.org Mon Nov 6 15:50:19 2023 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 6 Nov 2023 15:50:19 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v4] In-Reply-To: References: Message-ID: On Mon, 6 Nov 2023 14:39:54 GMT, Jorn Vernee wrote: >> The issue is essentially that for the Java try-with-resource construct, javac generates multiple calls to `close(`) the resource. One of those calls is inside the hidden exception handler of the try block. The issue for us is that typically the exception handler is never entered (since no exception is thrown), however we don't profile exception handlers at the moment, so the block is not pruned. C2 doesn't inline the `close()` call in the handler due to low call site frequency. As a result, the receiver of that call escapes and can not be scalar replaced, which then leads to a loss in performance. >> >> There has been some discussion on the JBS issue that this could be fixed by profiling catch blocks. And another suggestion that partial escape analysis could help here to prevent the object from escaping. But, I think there are other benefits to being able to prune dead catch blocks, such as general reduction in code size, and other optimizations being possible by dead code being eliminated. So, I've implemented catch block profiling + pruning in this patch. >> >> The implementation is essentially very straightforward: we allocate an extra bit of profiling data for each >> exception handler of a method in the `MethodData` for that method (which holds all the profiling >> data). Then when looking up the exception handler after an exception is thrown, we mark the >> exception handler as entered. When C2 parses the exception handler block, and it sees that it has >> never been entered, we emit an uncommon trap instead. >> >> I've also cleaned up the handling of profiling data sections a bit. After adding the extra section of data to MethodData, I was seeing several crashes when ciMethodData was used. The underlying issue seemed to be that the offset of the parameter data was computed based on the total data size - parameter data size (which doesn't work if we add an additional section for exception handler data). I've re-written the code around this a bit to try and prevent issues in the future. Both MethodData and ciMethodData now track offsets of parameter data and exception handler data, and the size of the each data section is derived from the offsets. >> >> Finally, there was an assert firing in `freeze_internal` in `continuationFreezeThaw.cpp`: >> >> assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), >> "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int... > > Jorn Vernee has updated the pull request incrementally with three additional commits since the last revision: > > - remove leftover comment > - Add smoke tests for -XX:+StressPrunedExceptionHandlers and -XX:-ProfileExceptionHandlers > - Add missing spaces to IRNode src/hotspot/share/runtime/sharedRuntime.cpp line 784: > 782: > 783: if (handler_bci != -1) { // did we find a handler in this method? > 784: sd->method()->set_ex_handler_entered(handler_bci); // profile Are you sure this handles inlined exception handlers properly for c2? My recollection is that C2 generates single level tables and performs the inlined exception dispatch in the generated code. If you notice the flag top_frame_only stops the walk through the callers and OptoRuntime::handle_exception_C always passes true for top_frame_only. So it seems like sd will always be the ScopeDesc of the top frame. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16416#discussion_r1383551195 From jsjolen at openjdk.org Mon Nov 6 15:52:52 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 6 Nov 2023 15:52:52 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v11] In-Reply-To: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: > Hi, > > When using at_put and at_put_grow you can provide a value which will be supplied to the constructor of each element. In other words, you can intialize each element through a copy constructor. > > I suggest that we also provide a function equivalent where the function is provided a pointer to the memory to be initialized. This can be used for `NONCOPYABLE` classes, for example. > > This is implemented using a SFINAE pattern because `nullptr` introduces ambiguity if you use static overload. > > Currently running tier1-tier4. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Add resource marks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16409/files - new: https://git.openjdk.org/jdk/pull/16409/files/7365593d..382773a0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=09-10 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16409.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16409/head:pull/16409 PR: https://git.openjdk.org/jdk/pull/16409 From dnsimon at openjdk.org Mon Nov 6 16:05:12 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 6 Nov 2023 16:05:12 GMT Subject: RFR: 8315680: java/lang/ref/ReachabilityFenceTest.java should run with -Xbatch In-Reply-To: References: Message-ID: On Tue, 3 Oct 2023 07:47:30 GMT, Gerg? Barany wrote: > This test requires certain methods to be compiled, but without `-Xbatch` the compiler races against the test code, which can lead to intermittent failures. Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16023#pullrequestreview-1715574295 From never at openjdk.org Mon Nov 6 16:05:12 2023 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 6 Nov 2023 16:05:12 GMT Subject: RFR: 8315680: java/lang/ref/ReachabilityFenceTest.java should run with -Xbatch In-Reply-To: References: Message-ID: On Tue, 3 Oct 2023 07:47:30 GMT, Gerg? Barany wrote: > This test requires certain methods to be compiled, but without `-Xbatch` the compiler races against the test code, which can lead to intermittent failures. Marked as reviewed by never (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16023#pullrequestreview-1715579342 From jvernee at openjdk.org Mon Nov 6 16:21:15 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 6 Nov 2023 16:21:15 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v4] In-Reply-To: References: Message-ID: On Mon, 6 Nov 2023 15:47:54 GMT, Tom Rodriguez wrote: >> Jorn Vernee has updated the pull request incrementally with three additional commits since the last revision: >> >> - remove leftover comment >> - Add smoke tests for -XX:+StressPrunedExceptionHandlers and -XX:-ProfileExceptionHandlers >> - Add missing spaces to IRNode > > src/hotspot/share/runtime/sharedRuntime.cpp line 784: > >> 782: >> 783: if (handler_bci != -1) { // did we find a handler in this method? >> 784: sd->method()->set_ex_handler_entered(handler_bci); // profile > > Are you sure this handles inlined exception handlers properly for c2? My recollection is that C2 generates single level tables and performs the inlined exception dispatch in the generated code. If you notice the flag top_frame_only stops the walk through the callers and OptoRuntime::handle_exception_C always passes true for top_frame_only. So it seems like sd will always be the ScopeDesc of the top frame. Hmm, you're right. The current code doesn't seem to work for exceptions thrown in an inlinee and caught in a caller. The other way around works. I thought we only saw `-1` when we had to unwind a frame, and then we'd do the exception handler lookup again. But, looking now, there doesn't seem to be a following lookup. I'll dig into this some more. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16416#discussion_r1383596969 From rriggs at openjdk.org Mon Nov 6 16:28:22 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Mon, 6 Nov 2023 16:28:22 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs Message-ID: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. The changes include: - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate if the input array is modified before the constructor returns. The resulting string may contain any combination of characters sampled from the input array. - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. ------------- Commit messages: - Cleanup javadoc, whitespace, and formatting in the JMH benchmark - Update RiscV implementation of intrinsic for java.lang.StringUTF16.compress - Javadoc formatting - 8311906: Improve robustness of String constructors with mutable array arguments Changes: https://git.openjdk.org/jdk/pull/16425/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16425&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8311906 Stats: 1057 lines in 11 files changed: 859 ins; 82 del; 116 mod Patch: https://git.openjdk.org/jdk/pull/16425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16425/head:pull/16425 PR: https://git.openjdk.org/jdk/pull/16425 From jpai at openjdk.org Mon Nov 6 16:28:22 2023 From: jpai at openjdk.org (Jaikiran Pai) Date: Mon, 6 Nov 2023 16:28:22 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs In-Reply-To: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: On Mon, 30 Oct 2023 18:34:44 GMT, Roger Riggs wrote: > Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. > The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. > > The changes include: > > - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate > if the input array is modified before the constructor returns. > The resulting string may contain any combination of characters sampled from the input array. > > - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. > For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. > If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. > If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. > Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. > Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. > > - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. > > - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. Hello Roger, it looks like there are some whitespace related issues in the changes which jcheck has caught https://github.com/openjdk/jdk/pull/16425/checks?check_run_id=18357062638 and thus hasn't created a RFR for this yet. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16425#issuecomment-1793738211 From duke at openjdk.org Mon Nov 6 16:28:25 2023 From: duke at openjdk.org (ExE Boss) Date: Mon, 6 Nov 2023 16:28:25 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs In-Reply-To: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: On Mon, 30 Oct 2023 18:34:44 GMT, Roger Riggs wrote: > Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. > The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. > > The changes include: > > - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate > if the input array is modified before the constructor returns. > The resulting string may contain any combination of characters sampled from the input array. > > - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. > For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. > If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. > If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. > Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. > Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. > > - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. > > - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. src/java.base/share/classes/java/lang/String.java line 566: > 564: } > 565: // Decode with a stable copy, to be the result if the decoded length is the same > 566: byte[] latin1 = Arrays.copyOfRange(bytes, offset, offset + length); This?has to?be?moved before?the?`if (dp == length) { ? }` check, as?that also?does a?copy: // Decode with a stable copy, to be the result if the decoded length is the same byte[] latin1 = Arrays.copyOfRange(bytes, offset, offset + length); int dp = StringCoding.countPositives(latin1, offset, length); if (dp == length) { this.value = latin1; this.coder = LATIN1; return; } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1382576891 From rriggs at openjdk.org Mon Nov 6 16:28:25 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Mon, 6 Nov 2023 16:28:25 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: <8XE-J0reitsGrDsDulsRZfbzrDEKGNFpGKHoC0xdZC4=.ccd97d94-7453-44ae-ba25-c2a5d41affab@github.com> On Sun, 5 Nov 2023 13:32:20 GMT, ExE Boss wrote: >> Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. >> The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. >> >> The changes include: >> >> - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate >> if the input array is modified before the constructor returns. >> The resulting string may contain any combination of characters sampled from the input array. >> >> - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. >> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. >> If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. >> If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. >> Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. >> Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. >> >> - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. >> >> - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. > > src/java.base/share/classes/java/lang/String.java line 566: > >> 564: } >> 565: // Decode with a stable copy, to be the result if the decoded length is the same >> 566: byte[] latin1 = Arrays.copyOfRange(bytes, offset, offset + length); > > This?has to?be?moved before?the?`if (dp == length) { ? }` check, as?that also?does a?copy: > > // Decode with a stable copy, to be the result if the decoded length is the same > byte[] latin1 = Arrays.copyOfRange(bytes, offset, offset + length); > int dp = StringCoding.countPositives(latin1, offset, length); > if (dp == length) { > this.value = latin1; > this.coder = LATIN1; > return; > } That may look like an improvement, to share common code, but it results in a performance hit in the normal case. The best performing case is to copy and return immediately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1383539522 From liach at openjdk.org Mon Nov 6 16:28:27 2023 From: liach at openjdk.org (Chen Liang) Date: Mon, 6 Nov 2023 16:28:27 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs In-Reply-To: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: <6Z8gWGyohCB0dp1gfe7-K-HDFpYHtn7jjwTsNt0XujY=.d756475b-6a44-44fc-854c-a5bd5290eb1c@github.com> On Mon, 30 Oct 2023 18:34:44 GMT, Roger Riggs wrote: > Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. > The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. > > The changes include: > > - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate > if the input array is modified before the constructor returns. > The resulting string may contain any combination of characters sampled from the input array. > > - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. > For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. > If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. > If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. > Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. > Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. > > - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. > > - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. src/java.base/share/classes/java/lang/StringUTF16.java line 202: > 200: @ForceInline > 201: public static byte[] compress(final char[] val, final int off, final int count) { > 202: byte[] latin1 = new byte[count]; Will this redundant array allocation be costly if we are working with mostly-utf16 strings, such as CJK strings with no latin characters? I suggest we can use a heuristic to read the initial char; if it's utf16 then we skip the latin-1 process altogether (and we can assign the utf16 value to the initial index to ensure it's non-latin-1 compressible. src/java.base/share/classes/java/lang/StringUTF16.java line 411: > 409: return 2; > 410: } else > 411: throw new IllegalArgumentException(Integer.toString(codePoint)); `toHexString` might be more informative. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1382296222 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1382304769 From rriggs at openjdk.org Mon Nov 6 16:28:28 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Mon, 6 Nov 2023 16:28:28 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs In-Reply-To: <6Z8gWGyohCB0dp1gfe7-K-HDFpYHtn7jjwTsNt0XujY=.d756475b-6a44-44fc-854c-a5bd5290eb1c@github.com> References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> <6Z8gWGyohCB0dp1gfe7-K-HDFpYHtn7jjwTsNt0XujY=.d756475b-6a44-44fc-854c-a5bd5290eb1c@github.com> Message-ID: On Sat, 4 Nov 2023 00:07:33 GMT, Chen Liang wrote: >> Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. >> The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. >> >> The changes include: >> >> - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate >> if the input array is modified before the constructor returns. >> The resulting string may contain any combination of characters sampled from the input array. >> >> - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. >> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. >> If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. >> If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. >> Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. >> Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. >> >> - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. >> >> - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. > > src/java.base/share/classes/java/lang/StringUTF16.java line 202: > >> 200: @ForceInline >> 201: public static byte[] compress(final char[] val, final int off, final int count) { >> 202: byte[] latin1 = new byte[count]; > > Will this redundant array allocation be costly if we are working with mostly-utf16 strings, such as CJK strings with no latin characters? > > I suggest we can use a heuristic to read the initial char; if it's utf16 then we skip the latin-1 process altogether (and we can assign the utf16 value to the initial index to ensure it's non-latin-1 compressible. We can reconsider this design as a separate PR. Every additional check has a performance impact and in this bug the goal is to avoid any regression. We'll need to gain some insight into the distribution of strings when used in a non-latin1 application. How many of the strings are latin1 vs non-latin1, what is the distribution of string lengths and which APIs are in use in the applications. The implementation is already pretty good about working with strings of different coders but there may be some different choices when converting between char arrays and int arrays and strings. > src/java.base/share/classes/java/lang/StringUTF16.java line 411: > >> 409: return 2; >> 410: } else >> 411: throw new IllegalArgumentException(Integer.toString(codePoint)); > > `toHexString` might be more informative. Perhaps, but changing the exception text is out scope for this PR; it has been decimal since JDK 9 (2015). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1383521445 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1383527621 From jsjolen at openjdk.org Mon Nov 6 16:41:22 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 6 Nov 2023 16:41:22 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v12] In-Reply-To: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: > Hi, > > When using at_put and at_put_grow you can provide a value which will be supplied to the constructor of each element. In other words, you can intialize each element through a copy constructor. > > I suggest that we also provide a function equivalent where the function is provided a pointer to the memory to be initialized. This can be used for `NONCOPYABLE` classes, for example. > > This is implemented using a SFINAE pattern because `nullptr` introduces ambiguity if you use static overload. > > Currently running tier1-tier4. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Another test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16409/files - new: https://git.openjdk.org/jdk/pull/16409/files/382773a0..5ab96639 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=10-11 Stats: 13 lines in 1 file changed: 13 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16409.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16409/head:pull/16409 PR: https://git.openjdk.org/jdk/pull/16409 From never at openjdk.org Mon Nov 6 16:42:14 2023 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 6 Nov 2023 16:42:14 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v4] In-Reply-To: References: Message-ID: On Mon, 6 Nov 2023 16:17:48 GMT, Jorn Vernee wrote: >> src/hotspot/share/runtime/sharedRuntime.cpp line 784: >> >>> 782: >>> 783: if (handler_bci != -1) { // did we find a handler in this method? >>> 784: sd->method()->set_ex_handler_entered(handler_bci); // profile >> >> Are you sure this handles inlined exception handlers properly for c2? My recollection is that C2 generates single level tables and performs the inlined exception dispatch in the generated code. If you notice the flag top_frame_only stops the walk through the callers and OptoRuntime::handle_exception_C always passes true for top_frame_only. So it seems like sd will always be the ScopeDesc of the top frame. > > Hmm, you're right. The current code doesn't seem to work for exceptions thrown in an inlinee and caught in a caller. > > I thought we only saw `-1` when we had to unwind a frame, and then we'd do the exception handler lookup again. But, looking now, there doesn't seem to be a following lookup. > > I'll dig into this some more. If the expectation is that unhandled exception entry points will be deopts, then you may not actually have to do anything since the exception handling would end up going through the interpreter. Or maybe Deoptimization::uncommon_trap_inner could be augmented to update the MDO at the right point? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16416#discussion_r1383631735 From jsjolen at openjdk.org Mon Nov 6 16:48:11 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 6 Nov 2023 16:48:11 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v12] In-Reply-To: References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: On Mon, 6 Nov 2023 16:41:22 GMT, Johan Sj?len wrote: >> Hi, >> >> When using at_put and at_put_grow you can provide a value which will be supplied to the constructor of each element. In other words, you can intialize each element through a copy constructor. >> >> I suggest that we also provide a function equivalent where the function is provided a pointer to the memory to be initialized. This can be used for `NONCOPYABLE` classes, for example. >> >> This is implemented using a SFINAE pattern because `nullptr` introduces ambiguity if you use static overload. >> >> Currently running tier1-tier4. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Another test Hi, I've reworked this PR a bit. I've introduced two new methods with a `_with` suffix (any ideas for better names?). These methods allow for supplying a functor/lambda for constructing the object, similar to what was in my original PR. The original API now uses these two methods. There are also new tests, along with a toy example showing how this new API could be useful. The `at_put_grow_with` lambda takes two arguments, the latter being a boolean flag indicating whether or not this is the last element. It's perhaps not the most elegant, but I don't think having two templates would do us any good here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16409#issuecomment-1795452779 From jvernee at openjdk.org Mon Nov 6 17:02:11 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 6 Nov 2023 17:02:11 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v4] In-Reply-To: References: Message-ID: On Mon, 6 Nov 2023 16:39:08 GMT, Tom Rodriguez wrote: >> Hmm, you're right. The current code doesn't seem to work for exceptions thrown in an inlinee and caught in a caller. >> >> I thought we only saw `-1` when we had to unwind a frame, and then we'd do the exception handler lookup again. But, looking now, there doesn't seem to be a following lookup. >> >> I'll dig into this some more. > > If the expectation is that unhandled exception entry points will be deopts, then you may not actually have to do anything since the exception handling would end up going through the interpreter. Or maybe Deoptimization::uncommon_trap_inner could be augmented to update the MDO at the right point? Thinking more about this: marking the handler as entered is mostly useful for C2 so that when we deoptimize and compile in C2 again, we don't emit an uncommon trap again. However, that would also require not seeing any exception in the interpreter or in tier 1-3. (Maybe not tracking C2 handler enters is actually desirable). I think we can mark the handler as entered in the deoptimization code that handles the uncommon trap that replaces the ex. handler instead, for C2 (and maybe that also works for JVMCI?) P.S. Ah, I didn't see your last comment :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16416#discussion_r1383659987 From never at openjdk.org Mon Nov 6 17:10:13 2023 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 6 Nov 2023 17:10:13 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v4] In-Reply-To: References: Message-ID: <-nU3hhjFJeM_7wf6e7U3ShHHCQbKKHHHwMhR5LENFZs=.f6977636-c0ec-49cf-a949-e93bcc7a5625@github.com> On Mon, 6 Nov 2023 17:03:23 GMT, Jorn Vernee wrote: >> Thinking more about this: marking the handler as entered is mostly useful for C2 so that when we deoptimize and compile in C2 again, we don't emit an uncommon trap again. However, that would also require not seeing any exception in the interpreter or in tier 1-3. (Maybe not tracking C2 handler enters is actually desirable). >> >> I think we can mark the handler as entered in the deoptimization code that handles the uncommon trap that replaces the ex. handler instead, for C2 (and maybe that also works for JVMCI?) >> >> P.S. Ah, I didn't see your last comment :-) > >> If the expectation is that unhandled exception entry points will be deopts, then you may not actually have to do anything since the exception handling would end up going through the interpreter. > > The uncommon trap is located in the exception handler itself, so after the deopt we start interpreting in the exception handler, and don't go through the interpreter's exception dispatch code where we do the profiling (`InterpreterRuntime::exception_handler_for_exception`). > > I think handling this in `Deoptimization::uncommon_trap_inner` can work though. Yes I think a solution like that would work for JVMCI as well. C2 and JVMCI exception dispatch are very similar. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16416#discussion_r1383680688 From jvernee at openjdk.org Mon Nov 6 17:10:13 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 6 Nov 2023 17:10:13 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v4] In-Reply-To: References: Message-ID: On Mon, 6 Nov 2023 16:56:38 GMT, Jorn Vernee wrote: >> If the expectation is that unhandled exception entry points will be deopts, then you may not actually have to do anything since the exception handling would end up going through the interpreter. Or maybe Deoptimization::uncommon_trap_inner could be augmented to update the MDO at the right point? > > Thinking more about this: marking the handler as entered is mostly useful for C2 so that when we deoptimize and compile in C2 again, we don't emit an uncommon trap again. However, that would also require not seeing any exception in the interpreter or in tier 1-3. (Maybe not tracking C2 handler enters is actually desirable). > > I think we can mark the handler as entered in the deoptimization code that handles the uncommon trap that replaces the ex. handler instead, for C2 (and maybe that also works for JVMCI?) > > P.S. Ah, I didn't see your last comment :-) > If the expectation is that unhandled exception entry points will be deopts, then you may not actually have to do anything since the exception handling would end up going through the interpreter. The uncommon trap is located in the exception handler itself, so after the deopt we start interpreting in the exception handler, and don't go through the interpreter's exception dispatch code where we do the profiling (`InterpreterRuntime::exception_handler_for_exception`). I think handling this in `Deoptimization::uncommon_trap_inner` can work though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16416#discussion_r1383671561 From tschatzl at openjdk.org Mon Nov 6 17:20:39 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 6 Nov 2023 17:20:39 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v12] In-Reply-To: References: Message-ID: > The JEP covers the idea very well, so I'm only covering some implementation details here: > > * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. > > * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: > > * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. > > * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). > > * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. > > * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) > > The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. > > I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. > > * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in addition to something like: > > `GC(6) P... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: - Merge tag 'jdk-22+22' into 8318706-implementation-of-region-pinning-in-g1 Added tag jdk-22+22 for changeset d354141a - Merge tag 'jdk-22+21' into 8318706-implementation-of-region-pinning-in-g1 Added tag jdk-22+21 for changeset d96f38b8 - iwalulya review - typos - ayang review - renamings + documentation - Add documentation about why and how we handle pinned regions in the young/old generation. - Renamings to (almost) consistently use the following nomenclature for evacuation failure and types of it: * evacuation failure is the general concept. It includes * pinned regions * allocation failure One region can both be pinned and experience an allocation failure. G1 GC messages use tags "(Pinned)" and "(Allocation Failure)" now instead of "(Evacuation Failure)" Did not rename the G1EvacFailureInjector since this adds a lot of noise. - NULL -> nullptr - Fix compilation - Improve TestPinnedOldObjectsEvacuation test - ... and 6 more: https://git.openjdk.org/jdk/compare/3af918f1...2ad39680 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16342/files - new: https://git.openjdk.org/jdk/pull/16342/files/251f4d38..2ad39680 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=10-11 Stats: 27458 lines in 1170 files changed: 14959 ins; 4156 del; 8343 mod Patch: https://git.openjdk.org/jdk/pull/16342.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16342/head:pull/16342 PR: https://git.openjdk.org/jdk/pull/16342 From pchilanomate at openjdk.org Mon Nov 6 19:40:29 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 6 Nov 2023 19:40:29 GMT Subject: RFR: 8318757: 8318757 VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor [v2] In-Reply-To: References: Message-ID: On Mon, 6 Nov 2023 13:14:42 GMT, Stefan Karlsson wrote: >> A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... > > Stefan Karlsson has updated the pull request incrementally with three additional commits since the last revision: > > - Move ConcurrentDeflation.java out of tier1 > - Update comments > - Use a named constant Looks good to me. I see this monitor traversal in VM_ThreadDump is just to get the monitors locked through JNI. But we now have the _jni_monitor_count counter so all this traversal could be made conditional, which most of the times will mean we won't have to do it. test/hotspot/jtreg/runtime/Monitor/ConcurrentDeflation.java line 35: > 33: * @test > 34: * @bug 8318757 > 35: * @summary Test concurrent monitor deflation by MonitorDeflationThread and VMThread No concurrent deflation anymore in VM_ThreadDump by the VMThread. ------------- Marked as reviewed by pchilanomate (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16519#pullrequestreview-1716091260 PR Comment: https://git.openjdk.org/jdk/pull/16519#issuecomment-1796176604 PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1383871753 From shade at openjdk.org Mon Nov 6 19:56:27 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 6 Nov 2023 19:56:27 GMT Subject: RFR: 8318757: 8318757 VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor [v2] In-Reply-To: References: Message-ID: <-6DDgv7dVmV8eB5_putOLjWXq1PQo7BT37MdqsmIV2k=.4ef8c7e3-c7bc-4664-815f-ca46e50cbe12@github.com> On Mon, 6 Nov 2023 13:14:42 GMT, Stefan Karlsson wrote: >> A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... > > Stefan Karlsson has updated the pull request incrementally with three additional commits since the last revision: > > - Move ConcurrentDeflation.java out of tier1 > - Update comments > - Use a named constant I wanted to have a closer look today, but ran out of time, sorry. But I have one comment already: src/hotspot/share/runtime/vmOperations.cpp line 400: > 398: const int DeflateRequestLimit = 100000; > 399: if (monitors_count > DeflateRequestLimit) { > 400: ObjectSynchronizer::request_deflate_idle_monitors(); Not sure about this. Arguably, the async deflation policy should re-evaluate the conditions for deflation and then decide to act. Otherwise, this effectively backdoors the heuristics, and does so with the hardcoded threshold. On the other hand, the old code effectively did the same with threshold of `0`. So, I would rather keep old behavior and just request deflation without a threshold here. ------------- PR Review: https://git.openjdk.org/jdk/pull/16519#pullrequestreview-1716117901 PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1383886830 From stefank at openjdk.org Mon Nov 6 20:09:35 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 6 Nov 2023 20:09:35 GMT Subject: RFR: 8318757: 8318757 VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor [v2] In-Reply-To: <-6DDgv7dVmV8eB5_putOLjWXq1PQo7BT37MdqsmIV2k=.4ef8c7e3-c7bc-4664-815f-ca46e50cbe12@github.com> References: <-6DDgv7dVmV8eB5_putOLjWXq1PQo7BT37MdqsmIV2k=.4ef8c7e3-c7bc-4664-815f-ca46e50cbe12@github.com> Message-ID: On Mon, 6 Nov 2023 19:47:58 GMT, Aleksey Shipilev wrote: >> Stefan Karlsson has updated the pull request incrementally with three additional commits since the last revision: >> >> - Move ConcurrentDeflation.java out of tier1 >> - Update comments >> - Use a named constant > > src/hotspot/share/runtime/vmOperations.cpp line 400: > >> 398: const int DeflateRequestLimit = 100000; >> 399: if (monitors_count > DeflateRequestLimit) { >> 400: ObjectSynchronizer::request_deflate_idle_monitors(); > > Not sure about this. Arguably, the async deflation policy should re-evaluate the conditions for deflation and then decide to act. Otherwise, this effectively backdoors the heuristics, and does so with the hardcoded threshold. On the other hand, the old code effectively did the same with threshold of `0`. > > So, I would rather keep old behavior and just request deflation without a threshold here. Thanks for the feedback. It is unclear to me if the old behavior of deflating monitors for every single thread dump is beneficial or not, but I also wouldn't mind changing this to use your suggestion if others agree that it is the preferred way forward. I'm going to at least wait for @dcubed-ojdk to get some time to give his input on this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1383914943 From stefank at openjdk.org Mon Nov 6 20:18:07 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 6 Nov 2023 20:18:07 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v3] In-Reply-To: References: Message-ID: > A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Update comment in ConcurrentDeflation.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16519/files - new: https://git.openjdk.org/jdk/pull/16519/files/591bd110..1185e9b6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16519&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16519&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16519.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16519/head:pull/16519 PR: https://git.openjdk.org/jdk/pull/16519 From stefank at openjdk.org Mon Nov 6 20:18:07 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 6 Nov 2023 20:18:07 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v2] In-Reply-To: References: Message-ID: On Mon, 6 Nov 2023 19:35:59 GMT, Patricio Chilano Mateo wrote: >> Stefan Karlsson has updated the pull request incrementally with three additional commits since the last revision: >> >> - Move ConcurrentDeflation.java out of tier1 >> - Update comments >> - Use a named constant > > test/hotspot/jtreg/runtime/Monitor/ConcurrentDeflation.java line 35: > >> 33: * @test >> 34: * @bug 8318757 >> 35: * @summary Test concurrent monitor deflation by MonitorDeflationThread and VMThread > > No concurrent deflation anymore in VM_ThreadDump by the VMThread. Thanks. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1383925570 From stefank at openjdk.org Mon Nov 6 20:21:33 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 6 Nov 2023 20:21:33 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v2] In-Reply-To: References: Message-ID: On Mon, 6 Nov 2023 19:38:06 GMT, Patricio Chilano Mateo wrote: > I see this monitor traversal in VM_ThreadDump is just to get the monitors locked through JNI. But we now have the _jni_monitor_count counter so all this traversal could be made conditional, which most of the times will mean we won't have to do it. Good point. If we need to optimize this in the future, this could be a good thing to implement. Right now I'm not sure it is worth it, given that the iteration is fairly quick, unless there is a brutal amount of monitors, but then it might be better to look into the deflation mechanisms instead. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16519#issuecomment-1796366135 From dcubed at openjdk.org Mon Nov 6 21:22:28 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 6 Nov 2023 21:22:28 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v2] In-Reply-To: References: <2eXNHrpyHgdJQSGKW0fMQMCi0cVzd6hzaOTo5lLmFpg=.ee20bd05-3b39-4719-9d7e-4f7a54c78e81@github.com> Message-ID: On Mon, 6 Nov 2023 10:50:58 GMT, Aleksey Shipilev wrote: >> I think this is good for review. The reproducer that used to hang/fail on assert is now passing. `tier1 tier2 tier3` are all passing. I am running more tests overnight. > >> I think this is good for review. The reproducer that used to hang/fail on assert is now passing. `tier1 tier2 tier3` are all passing. I am running more tests overnight. > > Testing seems all good. I'll leave the `Linux` -> `Generic` switch in this PR, until the very last moment before integration to keep testing more easily. @shipilev - I'm glad that: vmTestbase/nsk/monitoring/ThreadInfo/isSuspended/issuspended002.java has proven to useful. I had been thinking about removing it from my weekly stress kit runs since it has been a long time since I've seen a failure flushed out by that test running the stress config. I think I'll keep it around for longer... ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1796507913 From Matthew.Carter at microsoft.com Mon Nov 6 22:19:26 2023 From: Matthew.Carter at microsoft.com (Mat Carter) Date: Mon, 6 Nov 2023 22:19:26 +0000 Subject: RFR: 8317562: [JFR] Compilation queue statistics [v6] In-Reply-To: <2jS2KogdxxjTHne4-zljO2yczXsAnHnvVSwPM-qhN0s=.72d79b55-d9d0-4d42-9606-4a961f4366e7@github.com> References: <2jS2KogdxxjTHne4-zljO2yczXsAnHnvVSwPM-qhN0s=.72d79b55-d9d0-4d42-9606-4a961f4366e7@github.com> Message-ID: Thank you all for the reviews and testing. Are there any further changes or questions regarding the implementation or intent of the event? Are there any additional steps required before this PR can be sponsored? Should we wait until after the Nov 28 code freeze for the January PSU? Thanks in advance Mat From: hotspot-jfr-dev on behalf of Matthias Baesken Date: Friday, November 3, 2023 at 4:28 AM To: hotspot-dev at openjdk.org , hotspot-jfr-dev at openjdk.org Subject: Re: RFR: 8317562: [JFR] Compilation queue statistics [v6] On Thu, 2 Nov 2023 16:30:32 GMT, Mat Carter wrote: >> Adding a new periodic jfr event to monitor and output statistics for the compiler queues. You will see one event per compiler queue (c1 and c2) >> >> Passes tier1 on linux (x86) and mac (aarch64) > > Mat Carter has updated the pull request incrementally with one additional commit since the last revision: > > Updated test to reflect field name changes With the updated test file, the jtreg test error is gone. ------------- PR Comment: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.openjdk.org%2Fjdk%2Fpull%2F16211%23issuecomment-1792272259&data=05%7C01%7Cmatthew.carter%40microsoft.com%7C56d1d476d6304c572cb708dbdc6005cb%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638346077226632897%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=MaHRypOjqcNhvRbmdyN%2FXXKQtIQzZY4jNJm5TOsj9ZU%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From cslucas at openjdk.org Mon Nov 6 22:22:30 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 6 Nov 2023 22:22:30 GMT Subject: RFR: JDK-8241503: C2: Share MacroAssembler between mach nodes during code emission In-Reply-To: References: Message-ID: <_1LI3cbeE9XR-7E7gD22M7xNueLRhlAtHWopiTccb1Y=.a26dde00-d6b8-4c98-a84d-143d65ef251c@github.com> On Fri, 3 Nov 2023 02:04:13 GMT, Dean Long wrote: >> src/hotspot/cpu/x86/gc/z/z_x86_64.ad line 37: >> >>> 35: #include "gc/z/zBarrierSetAssembler.hpp" >>> 36: >>> 37: static void z_color(MacroAssembler* masm, const MachNode* node, Register ref) { >> >> For files already using MacroAssembler& _masm, the only change needed is this at the top: >> >> undef __ >> #define __ _masm. > > I guess that doesn't work because different files are concatenated together, causing a conflict if some files expect MacroAssembler *masm. To reduce the number of changes, couldn't we use MacroAssembler& _masm everywhere? Because some places were using `if (cbuf)` I ended up opting to make the parameter also a pointer instead of a reference. > For files already using MacroAssembler& _masm, the only change needed is this at the top: My opinion would be to use Reference or Pointer everywhere and not mix the two - to prevent confusion. But if you folks think it's best to go that way, I'm fine with it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16484#discussion_r1384095916 From sspitsyn at openjdk.org Mon Nov 6 23:22:03 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 6 Nov 2023 23:22:03 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v4] In-Reply-To: References: Message-ID: <6vZHqAagtrtDw-c9xNZNpaBCa1HrpYw22uhYtDTHwAw=.3a59dde0-ab33-43b8-a08f-d481e7acffef@github.com> > The handshakes support for virtual threads is needed to simplify the JVMTI implementation for virtual threads. There is a significant duplication in the JVMTI code to differentiate code intended to support platform, virtual threads or both. The handshakes are unified, so it is enough to define just one handshake for both platform and virtual thread. > At the low level, the JVMTI code supporting platform and virtual threads still can be different. > This implementation is based on the `JvmtiVTMSTransitionDisabler` class. > > The internal API includes three new classes: > - `JvmtiHandshake`, `JvmtiUnifiedHandshakeClosure`, VM_HandshakeUnmountedVirtualThread > > The `JvmtiUnifiedHandshakeClosure` defines two different callback functions: `do_thread()` and `do_vthread()`. > > The first JVMTI functions are picked first to be converted to use the `JvmtiHandshake`: > - `GetStackTrace`, `GetFrameCount`, `GetFrameLocation`, `NotifyFramePop` > > To get the test results clean, the update also fixes the test issue: > [8318631](https://bugs.openjdk.org/browse/JDK-8318631): GetStackTraceSuspendedStressTest.java failed with "check_jvmti_status: JVMTI function returned error: JVMTI_ERROR_THREAD_NOT_ALIVE (15)" > > Testing: > - the mach5 tiers 1-6 are all passed Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: get rid of the VM_HandshakeUnmountedVirtualThread ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16460/files - new: https://git.openjdk.org/jdk/pull/16460/files/720c9c7e..ca2fbb98 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16460&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16460&range=02-03 Stats: 38 lines in 4 files changed: 0 ins; 29 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/16460.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16460/head:pull/16460 PR: https://git.openjdk.org/jdk/pull/16460 From sspitsyn at openjdk.org Mon Nov 6 23:31:58 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 6 Nov 2023 23:31:58 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v4] In-Reply-To: <6vZHqAagtrtDw-c9xNZNpaBCa1HrpYw22uhYtDTHwAw=.3a59dde0-ab33-43b8-a08f-d481e7acffef@github.com> References: <6vZHqAagtrtDw-c9xNZNpaBCa1HrpYw22uhYtDTHwAw=.3a59dde0-ab33-43b8-a08f-d481e7acffef@github.com> Message-ID: On Mon, 6 Nov 2023 23:22:03 GMT, Serguei Spitsyn wrote: >> The handshakes support for virtual threads is needed to simplify the JVMTI implementation for virtual threads. There is a significant duplication in the JVMTI code to differentiate code intended to support platform, virtual threads or both. The handshakes are unified, so it is enough to define just one handshake for both platform and virtual thread. >> At the low level, the JVMTI code supporting platform and virtual threads still can be different. >> This implementation is based on the `JvmtiVTMSTransitionDisabler` class. >> >> The internal API includes two new classes: >> - `JvmtiHandshake` and `JvmtiUnifiedHandshakeClosure` >> >> The `JvmtiUnifiedHandshakeClosure` defines two different callback functions: `do_thread()` and `do_vthread()`. >> >> The first JVMTI functions are picked first to be converted to use the `JvmtiHandshake`: >> - `GetStackTrace`, `GetFrameCount`, `GetFrameLocation`, `NotifyFramePop` >> >> To get the test results clean, the update also fixes the test issue: >> [8318631](https://bugs.openjdk.org/browse/JDK-8318631): GetStackTraceSuspendedStressTest.java failed with "check_jvmti_status: JVMTI function returned error: JVMTI_ERROR_THREAD_NOT_ALIVE (15)" >> >> Testing: >> - the mach5 tiers 1-6 are all passed > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: get rid of the VM_HandshakeUnmountedVirtualThread I've pushed an update which removes newly introduced VM_op and its use: `VM_HandshakeUnmountedVirtualThread`. Patricio convinced me that it has to be handshake-safe to execute a `HanshakeClosure` callback on the current (handshake requesting) thread when target thread is an unmounted virtual threads. At an earlier development stage I saw various intermittent crashes and concluded it is not handshake-safe. It is why there was a decision to use the `VM_HandshakeUnmountedVirtualThread`. I do not see these crashes anymore after a full testing cycle. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16460#issuecomment-1797026470 From duke at openjdk.org Mon Nov 6 23:55:59 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 6 Nov 2023 23:55:59 GMT Subject: RFR: 8319429: Resetting MXCSR flags degrades ecore [v2] In-Reply-To: References: Message-ID: > Improves vector rounding on ECore about 10x > > (BEFORE) FpRoundingBenchmark.test_round_float 2048 thrpt 3 40.912 ? 0.044 ops/ms > (AFTER ) FpRoundingBenchmark.test_round_float 2048 thrpt 3 431.682 ? 0.727 ops/ms Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: move option to x86-specific section ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16504/files - new: https://git.openjdk.org/jdk/pull/16504/files/010a993f..f4c8c36e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16504&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16504&range=00-01 Stats: 7 lines in 2 files changed: 4 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16504.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16504/head:pull/16504 PR: https://git.openjdk.org/jdk/pull/16504 From jjoo at openjdk.org Tue Nov 7 00:52:07 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Tue, 7 Nov 2023 00:52:07 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v38] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Attempt to fix duplicate name error in test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/9fb36a9e..ac780c5e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=37 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=36-37 Stats: 8 lines in 1 file changed: 0 ins; 5 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From haosun at openjdk.org Tue Nov 7 01:04:41 2023 From: haosun at openjdk.org (Hao Sun) Date: Tue, 7 Nov 2023 01:04:41 GMT Subject: Integrated: 8319233: AArch64: Build failure with clang due to -Wformat-nonliteral warning In-Reply-To: References: Message-ID: <32FwKUqOQs8a2TozJ9BP6UIIRyg00a-n4ehw28cIQLI=.821ca17f-f717-4fc3-985e-cb31d5ecc65a@github.com> On Fri, 3 Nov 2023 02:50:22 GMT, Hao Sun wrote: > The root cause is that an incorrect variant of function VMError::report_and_die() is used. We should introduce another variadic function, just as macos_aarch64 did before. > > GCC toolchain: > From [1][2], GCC differs from Clang in flag -Wformat-nonliteral slightly, i.e. GCC may **not** raise a warning if "the format function takes its fromat arguments as a va_list". That's why this issue is not exposed by GCC toolchain before. > > Besides, I suppose platforms ppc and risc-v may have the same issue. > > [1] https://gcc.gnu.org/onlinedocs/gcc-11.4.0/gcc/Warning-Options.html > [2] https://releases.llvm.org/14.0.0/tools/clang/docs/DiagnosticsReference.html This pull request has now been integrated. Changeset: 439ed046 Author: Hao Sun URL: https://git.openjdk.org/jdk/commit/439ed046e451fc41a875993819a6d4335a0efad5 Stats: 35 lines in 7 files changed: 13 ins; 17 del; 5 mod 8319233: AArch64: Build failure with clang due to -Wformat-nonliteral warning Reviewed-by: kbarrett, eastigeevich ------------- PR: https://git.openjdk.org/jdk/pull/16486 From manc at openjdk.org Tue Nov 7 01:08:33 2023 From: manc at openjdk.org (Man Cao) Date: Tue, 7 Nov 2023 01:08:33 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v35] In-Reply-To: <3iJqYOiXeO6bwtUmjzlO3tyFR9Uc28YAJ8aQbKbqKJM=.fda95001-86f3-4f09-8a74-e15be1987c4f@github.com> References: <6tngC-Jwyx8e25LGT8dAwKbaPb9qb_w5ONctnFieH3o=.61b013b2-beb9-4053-8c06-86a700208d77@github.com> <3iJqYOiXeO6bwtUmjzlO3tyFR9Uc28YAJ8aQbKbqKJM=.fda95001-86f3-4f09-8a74-e15be1987c4f@github.com> Message-ID: On Mon, 6 Nov 2023 12:11:47 GMT, Stefan Johansson wrote: > I played around a bit instead of trying to explain what I mean and this is not very polished, but I was thinking something like this: https://github.com/openjdk/jdk/compare/pr/15082...kstefanj:jdk:pull/15082-idea > > What do you think? This way we don't add things to CollectedHeap as well, which is usually good unless really needed. I think it looks great. It is mainly refactoring that consolidates the declarations/definitions of the hsperf counters in to a single file. Would it be better to name that class `CPUTimeCounters`, so we could move `sun.threads.cpu_time.vm` and `sun.threads.cpu_time.conc_dedup`, and future JIT thread CPU counters to that class? Then we could also change the constructor of `ThreadTotalCPUTimeClosure` to `ThreadTotalCPUTimeClosure(CPUTimeCounters* counters, CPUTimeGroups::Name name)`, then it could set `_update_gc_counters` based on `name`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15082#issuecomment-1797109487 From luhenry at openjdk.org Tue Nov 7 01:39:28 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 7 Nov 2023 01:39:28 GMT Subject: RFR: 8319408: RISC-V: MaxVectorSize is not consistently checked in several situations In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 16:11:15 GMT, Hamlin Li wrote: > Hi, > Can you review the change to fix the MaxVectorSize checking in vm_version_riscv.cpp? > Thanks! > > Current logic will not check whether (MaxVectorSize < 16), after the assignment `MaxVectorSize = _initial_vector_length;`, in following situation. > a) if FLAG_IS_DEFAULT(MaxVectorSize) == true > b) if FLAG_IS_DEFAULT(MaxVectorSize) == false and (MaxVectorSize >= 16) and is_power_of_2(MaxVectorSize) and (MaxVectorSize > _initial_vector_length) > > And in original code, the logic is not consistent for the situations between MaxVectorSize < 16 and MaxVectorSize >= 16, when is_power_of_2(MaxVectorSize) == false; for the former (<16) it's to disable RVV, for the latter (>=16) it's vm_exit. src/hotspot/cpu/riscv/vm_version_riscv.cpp line 306: > 304: MaxVectorSize = _initial_vector_length; > 305: } else if (!is_power_of_2(MaxVectorSize)) { > 306: vm_exit_during_initialization(err_msg("Unsupported MaxVectorSize: %d", (int)MaxVectorSize)); you can add an explanation, like the following: Suggestion: vm_exit_during_initialization(err_msg("Unsupported MaxVectorSize: %d, must be a power of 2", (int)MaxVectorSize)); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16498#discussion_r1382701444 From jbhateja at openjdk.org Tue Nov 7 02:55:28 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 7 Nov 2023 02:55:28 GMT Subject: RFR: 8319429: Resetting MXCSR flags degrades ecore [v2] In-Reply-To: References: Message-ID: <77IDvc-Y0a2nzd_TXxunntrFiCfLqllESAmPxWATAlM=.479f53ce-e06e-4b50-9e04-20de996ba3eb@github.com> On Mon, 6 Nov 2023 23:55:59 GMT, Volodymyr Paprotski wrote: >> Improves vector rounding on ECore about 10x >> >> (BEFORE) FpRoundingBenchmark.test_round_float 2048 thrpt 3 40.912 ? 0.044 ops/ms >> (AFTER ) FpRoundingBenchmark.test_round_float 2048 thrpt 3 431.682 ? 0.727 ops/ms > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > move option to x86-specific section src/hotspot/cpu/x86/vm_version_x86.cpp line 862: > 860: // Check if processor has Intel Ecore > 861: if (FLAG_IS_DEFAULT(DoEcoreOpt) && is_intel() && cpu_family() == 6 && > 862: (_model == 0x97 || _model == 0xAC || _model == 0xAF)) { Model 0x97 corresponds to ADL Hybrid Core converged ISA target where DoEcodeOpt enabling will depend on the core over which VM initialization thread executes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16504#discussion_r1384290564 From duke at openjdk.org Tue Nov 7 03:00:35 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 7 Nov 2023 03:00:35 GMT Subject: RFR: 8319429: Resetting MXCSR flags degrades ecore [v2] In-Reply-To: <77IDvc-Y0a2nzd_TXxunntrFiCfLqllESAmPxWATAlM=.479f53ce-e06e-4b50-9e04-20de996ba3eb@github.com> References: <77IDvc-Y0a2nzd_TXxunntrFiCfLqllESAmPxWATAlM=.479f53ce-e06e-4b50-9e04-20de996ba3eb@github.com> Message-ID: On Tue, 7 Nov 2023 02:52:46 GMT, Jatin Bhateja wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> move option to x86-specific section > > src/hotspot/cpu/x86/vm_version_x86.cpp line 862: > >> 860: // Check if processor has Intel Ecore >> 861: if (FLAG_IS_DEFAULT(DoEcoreOpt) && is_intel() && cpu_family() == 6 && >> 862: (_model == 0x97 || _model == 0xAC || _model == 0xAF)) { > > Model 0x97 corresponds to ADL Hybrid Core converged ISA target where DoEcodeOpt enabling will depend on the core over which VM initialization thread executes. On hybrid platforms, it will be enabled on both Pcore and Ecore. Performance is essentially unchanged on Pcore ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16504#discussion_r1384293314 From jbhateja at openjdk.org Tue Nov 7 03:04:30 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 7 Nov 2023 03:04:30 GMT Subject: RFR: 8319429: Resetting MXCSR flags degrades ecore [v2] In-Reply-To: References: Message-ID: On Mon, 6 Nov 2023 23:55:59 GMT, Volodymyr Paprotski wrote: >> Improves vector rounding on ECore about 10x >> >> (BEFORE) FpRoundingBenchmark.test_round_float 2048 thrpt 3 40.912 ? 0.044 ops/ms >> (AFTER ) FpRoundingBenchmark.test_round_float 2048 thrpt 3 431.682 ? 0.727 ops/ms > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > move option to x86-specific section src/hotspot/cpu/x86/globals_x86.hpp line 218: > 216: \ > 217: /* Autodetected, see vm_version_x86.cpp */ \ > 218: product(bool, DoEcoreOpt, false, DIAGNOSTIC, \ Change to name to DoEcoreOpt -> DoEcoreOpts or EnableX86ECoreOpts src/hotspot/cpu/x86/x86.ad line 7443: > 7441: ins_encode %{ > 7442: int vlen_enc = vector_length_encoding(this); > 7443: InternalAddress new_mxcsr = $constantaddress((jint)(DoEcoreOpt ? 0x3FBF : 0x3F80)); You can define a preprocessor macro for conditional selection pattern ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16504#discussion_r1384293766 PR Review Comment: https://git.openjdk.org/jdk/pull/16504#discussion_r1384292566 From jbhateja at openjdk.org Tue Nov 7 03:08:29 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 7 Nov 2023 03:08:29 GMT Subject: RFR: 8319429: Resetting MXCSR flags degrades ecore [v2] In-Reply-To: References: <77IDvc-Y0a2nzd_TXxunntrFiCfLqllESAmPxWATAlM=.479f53ce-e06e-4b50-9e04-20de996ba3eb@github.com> Message-ID: On Tue, 7 Nov 2023 02:57:55 GMT, Volodymyr Paprotski wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> move option to x86-specific section > > src/hotspot/cpu/x86/vm_version_x86.cpp line 862: > >> 860: // Check if processor has Intel Ecore >> 861: if (FLAG_IS_DEFAULT(DoEcoreOpt) && is_intel() && cpu_family() == 6 && >> 862: (_model == 0x97 || _model == 0xAC || _model == 0xAF)) { > > On hybrid platforms, it will be enabled on both Pcore and Ecore. Performance is essentially unchanged on Pcore Correct, CPUID is CPU specific not core specific, so settings should apply to both. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16504#discussion_r1384297045 From duke at openjdk.org Tue Nov 7 03:19:32 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 7 Nov 2023 03:19:32 GMT Subject: RFR: 8319429: Resetting MXCSR flags degrades ecore [v2] In-Reply-To: References: Message-ID: On Tue, 7 Nov 2023 02:56:47 GMT, Jatin Bhateja wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> move option to x86-specific section > > src/hotspot/cpu/x86/x86.ad line 7443: > >> 7441: ins_encode %{ >> 7442: int vlen_enc = vector_length_encoding(this); >> 7443: InternalAddress new_mxcsr = $constantaddress((jint)(DoEcoreOpt ? 0x3FBF : 0x3F80)); > > You can define a preprocessor macro for conditional selection pattern Not convinced that it makes it cleaner; while reading, means one extra file lookup to see what the macro does, vs how it immediately is clear now (ctags working half the time..). Macros seem to be paired better here with conditional compilation, don't see them paired with dev options.. I can perhaps be convinced, it _is_ repeated 6 times in this PR. Perhaps `globals_x86.hpp` might be an acceptable place, but doesn't appear to have a precedent there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16504#discussion_r1384323086 From jbhateja at openjdk.org Tue Nov 7 03:40:31 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 7 Nov 2023 03:40:31 GMT Subject: RFR: 8319429: Resetting MXCSR flags degrades ecore [v2] In-Reply-To: References: Message-ID: <9oh5727exwDNjqdB2bUs6dauA0m5myiTBYMPw-_wb2I=.1b8d7500-39b9-4b68-a8af-d8d4fc85a479@github.com> On Mon, 6 Nov 2023 23:55:59 GMT, Volodymyr Paprotski wrote: >> Improves vector rounding on ECore about 10x >> >> (BEFORE) FpRoundingBenchmark.test_round_float 2048 thrpt 3 40.912 ? 0.044 ops/ms >> (AFTER ) FpRoundingBenchmark.test_round_float 2048 thrpt 3 431.682 ? 0.727 ops/ms > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > move option to x86-specific section As per JVM specification section 2.8 "The floating-point instructions of the Java Virtual Machine do not throw exceptions, trap, or otherwise signal the IEEE 754 exceptional conditions of invalid operation, division by zero, overflow, underflow, or inexact.", thus JVM does not check these exceptions. Your patch is always setting lower 6 bits of MXCSR on hybrid CPUs which has both E and P core, do you see any concerns if these bits are default ON for other server targets. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16504#issuecomment-1797673697 From jbhateja at openjdk.org Tue Nov 7 03:40:33 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 7 Nov 2023 03:40:33 GMT Subject: RFR: 8319429: Resetting MXCSR flags degrades ecore [v2] In-Reply-To: References: <77IDvc-Y0a2nzd_TXxunntrFiCfLqllESAmPxWATAlM=.479f53ce-e06e-4b50-9e04-20de996ba3eb@github.com> Message-ID: <1CzbbPYkOhuhSkgcCzePwKzRB0ynNK1kT-dV8pmMG7w=.e3581de3-7d50-4831-beeb-b555efddaf75@github.com> On Tue, 7 Nov 2023 03:05:48 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/vm_version_x86.cpp line 862: >> >>> 860: // Check if processor has Intel Ecore >>> 861: if (FLAG_IS_DEFAULT(DoEcoreOpt) && is_intel() && cpu_family() == 6 && >>> 862: (_model == 0x97 || _model == 0xAC || _model == 0xAF)) { >> >> On hybrid platforms, it will be enabled on both Pcore and Ecore. Performance is essentially unchanged on Pcore > > Correct, CPUID is CPU specific not core specific, so settings should apply to both. As per JVM specification section 2.8 "The floating-point instructions of the Java Virtual Machine do not throw exceptions, trap, or otherwise signal the IEEE 754 exceptional conditions of invalid operation, division by zero, overflow, underflow, or inexact.", thus JVM does not check these exceptions. Your patch is always setting lower 6 bits of MXCSR on hybrid CPUs which has both E and P core, do you see any concerns if these bits are default ON for other server targets. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16504#discussion_r1384333465 From kbarrett at openjdk.org Tue Nov 7 05:17:30 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 7 Nov 2023 05:17:30 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v12] In-Reply-To: References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: On Mon, 6 Nov 2023 16:45:39 GMT, Johan Sj?len wrote: > [...] The `at_put_grow_with` lambda takes two arguments, the latter being a boolean flag indicating whether or not this is the last element. It's perhaps not the most elegant, but I don't think having two templates would do us any good here. I don't see the point of the "last element" flag for the _with function argument. It is used to change what was in the old code a copy-assign into a copy-construct, but so what? A class should be either non-copyable (neither by constructor nor by assignment), or it should be copyable (both by constructor and by assignment). Anything else is weird and probably broken. So avoiding copy-assignment isn't really buying anything except API complexity. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16409#issuecomment-1797837188 From stefank at openjdk.org Tue Nov 7 07:28:10 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 7 Nov 2023 07:28:10 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v4] In-Reply-To: References: Message-ID: <5FaJO-AqgTFEZzoVYeVDtx_zDBVig7G4nWVqdbPmVp4=.3bbab805-5221-4eaa-90d6-1afaca23ed85@github.com> > A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Punctuation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16519/files - new: https://git.openjdk.org/jdk/pull/16519/files/1185e9b6..305b0567 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16519&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16519&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16519.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16519/head:pull/16519 PR: https://git.openjdk.org/jdk/pull/16519 From tschatzl at openjdk.org Tue Nov 7 08:31:54 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 7 Nov 2023 08:31:54 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v13] In-Reply-To: References: Message-ID: > The JEP covers the idea very well, so I'm only covering some implementation details here: > > * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. > > * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: > > * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. > > * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). > > * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. > > * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) > > The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. > > I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. > > * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in addition to something like: > > `GC(6) P... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: Fix tests after merge ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16342/files - new: https://git.openjdk.org/jdk/pull/16342/files/2ad39680..d9ccccff Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=11-12 Stats: 36 lines in 3 files changed: 0 ins; 0 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/16342.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16342/head:pull/16342 PR: https://git.openjdk.org/jdk/pull/16342 From mli at openjdk.org Tue Nov 7 09:03:44 2023 From: mli at openjdk.org (Hamlin Li) Date: Tue, 7 Nov 2023 09:03:44 GMT Subject: RFR: 8319408: RISC-V: MaxVectorSize is not consistently checked in several situations [v2] In-Reply-To: References: Message-ID: <9EUIJPkyMahN9o3nfeahScRhKkkV2ACFCBZPMNNlHoA=.7c98cf5e-6c01-4ce6-9d82-1557803e9283@github.com> > Hi, > Can you review the change to fix the MaxVectorSize checking in vm_version_riscv.cpp? > Thanks! > > Current logic will not check whether (MaxVectorSize < 16), after the assignment `MaxVectorSize = _initial_vector_length;`, in following situation. > a) if FLAG_IS_DEFAULT(MaxVectorSize) == true > b) if FLAG_IS_DEFAULT(MaxVectorSize) == false and (MaxVectorSize >= 16) and is_power_of_2(MaxVectorSize) and (MaxVectorSize > _initial_vector_length) > > And in original code, the logic is not consistent for the situations between MaxVectorSize < 16 and MaxVectorSize >= 16, when is_power_of_2(MaxVectorSize) == false; for the former (<16) it's to disable RVV, for the latter (>=16) it's vm_exit. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: Refine log ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16498/files - new: https://git.openjdk.org/jdk/pull/16498/files/06945065..19dc28fd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16498&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16498&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16498.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16498/head:pull/16498 PR: https://git.openjdk.org/jdk/pull/16498 From mli at openjdk.org Tue Nov 7 09:03:45 2023 From: mli at openjdk.org (Hamlin Li) Date: Tue, 7 Nov 2023 09:03:45 GMT Subject: RFR: 8319408: RISC-V: MaxVectorSize is not consistently checked in several situations [v2] In-Reply-To: References: Message-ID: On Mon, 6 Nov 2023 01:21:48 GMT, Ludovic Henry wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Refine log > > src/hotspot/cpu/riscv/vm_version_riscv.cpp line 306: > >> 304: MaxVectorSize = _initial_vector_length; >> 305: } else if (!is_power_of_2(MaxVectorSize)) { >> 306: vm_exit_during_initialization(err_msg("Unsupported MaxVectorSize: %d", (int)MaxVectorSize)); > > you can add an explanation, like the following: > Suggestion: > > vm_exit_during_initialization(err_msg("Unsupported MaxVectorSize: %d, must be a power of 2", (int)MaxVectorSize)); Thanks, I've added the log message. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16498#discussion_r1384582114 From rehn at openjdk.org Tue Nov 7 09:09:35 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 7 Nov 2023 09:09:35 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v3] In-Reply-To: References: Message-ID: On Thu, 2 Nov 2023 21:00:16 GMT, Aleksey Shipilev wrote: >> See the symptoms, reproducer and analysis in the bug. >> >> Current code waits on `disarm()`, which effectively stalls leaving the safepoint if some threads lag behind. Having more runnable threads than CPUs nearly guarantees that we would wait for quite some time, but it also reproduces well if you have enough threads near the CPU count. Just waiting at `arm()` is insufficient, but we can have several `Semaphores` to do what we want. >> >> This PR implements a more efficient `GenericWaitBarrier` to recover the performance. Most of the implementation discussion is in the code comments. The key observation that drives this work is that we want to reuse `Semaphore` and related counters without being stuck waiting for threads to leave. >> >> (AFAICS, futex-based `LinuxWaitBarrier` does roughly the same, but handles this reuse on futex side, by assigning the "address" per futex.) >> >> This issue affects everything except Linux. I initially found this on my M1 Mac, but pretty sure it is easy to reproduce on Windows as well. The safepoints from the reproducer in the bug improved dramatically on a Mac. Note not only the orders of magnitude better safepoint times, but also the several times more GC safepoints in the time-bound allocation test, which means the attainable GC throughput is similarly better, since we don't waste time at this wait barrier. >> >> ![plot-generic-wait-barrier-macos](https://github.com/openjdk/jdk/assets/1858943/3440f200-2360-482a-b8b2-71385cca35b7) >> >> Additional testing: >> - [x] MacOS AArch64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] MacOS AArch64 server fastdebug, `tier2 tier3` >> - [x] Linux x86_64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) > > Aleksey Shipilev has updated the pull request incrementally with four additional commits since the last revision: > > - Touchups > - More comments work > - Tight up the comments > - Rework to a single atomic counter per cell Thanks for addressing this! I had some comment. src/hotspot/share/utilities/waitBarrier_generic.cpp line 120: > 118: SpinYield sp; > 119: while (Atomic::load_acquire(&_state) < -1) { > 120: sp.wait(); A warning once would be helpful that cells might be too few. src/hotspot/share/utilities/waitBarrier_generic.cpp line 151: > 149: _sem.signal(); > 150: > 151: if (wakeups++ > max) { I would assume max = 2, would call signal() max 2 times ? Here we end when wakeups are larger than max with post inc, so isn't that 4 times? (0->3) src/hotspot/share/utilities/waitBarrier_generic.cpp line 163: > 161: int s = Atomic::load_acquire(&_state); > 162: assert(s > 0, "Mid disarm: Should be armed. State: %d", s); > 163: if (Atomic::cmpxchg(&_state, s, -s) == s) { When we hit this branch we have effectively left the outer loop. I think it will read easier if this scope actually was outside the scope of the outer loop, no? src/hotspot/share/utilities/waitBarrier_generic.cpp line 192: > 190: return; > 191: } > 192: assert(s > 0, "Before wait: Should be armed. State: %d", s); If we are context switched here until this cell is re-used ? Hence we have the wrong barrier tag ? For safepoint this can't happen since the safepoint id for this safepoint safe thread will be wrong. Thus we can't re-use cells until this thread returns and change his safepoint id. But it seems like this is what saves us, so if there was another use-case it could happen, no? ------------- Changes requested by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16404#pullrequestreview-1717132378 PR Review Comment: https://git.openjdk.org/jdk/pull/16404#discussion_r1384571746 PR Review Comment: https://git.openjdk.org/jdk/pull/16404#discussion_r1384577576 PR Review Comment: https://git.openjdk.org/jdk/pull/16404#discussion_r1384580535 PR Review Comment: https://git.openjdk.org/jdk/pull/16404#discussion_r1384588931 From rehn at openjdk.org Tue Nov 7 09:16:32 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 7 Nov 2023 09:16:32 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v3] In-Reply-To: References: Message-ID: On Tue, 7 Nov 2023 09:05:30 GMT, Robbin Ehn wrote: >> Aleksey Shipilev has updated the pull request incrementally with four additional commits since the last revision: >> >> - Touchups >> - More comments work >> - Tight up the comments >> - Rework to a single atomic counter per cell > > src/hotspot/share/utilities/waitBarrier_generic.cpp line 192: > >> 190: return; >> 191: } >> 192: assert(s > 0, "Before wait: Should be armed. State: %d", s); > > If we are context switched here until this cell is re-used ? > Hence we have the wrong barrier tag ? > > For safepoint this can't happen since the safepoint id for this safepoint safe thread will be wrong. > Thus we can't re-use cells until this thread returns and change his safepoint id. > > But it seems like this is what saves us, so if there was another use-case it could happen, no? I thinking adding the barrier tag into the _state(64), so it's a 32-bit tag, and as now sign bit for armed/disarmed plus the 31-bit counter. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16404#discussion_r1384599034 From shade at openjdk.org Tue Nov 7 09:24:31 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 7 Nov 2023 09:24:31 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v3] In-Reply-To: References: Message-ID: On Tue, 7 Nov 2023 08:58:27 GMT, Robbin Ehn wrote: >> Aleksey Shipilev has updated the pull request incrementally with four additional commits since the last revision: >> >> - Touchups >> - More comments work >> - Tight up the comments >> - Rework to a single atomic counter per cell > > src/hotspot/share/utilities/waitBarrier_generic.cpp line 163: > >> 161: int s = Atomic::load_acquire(&_state); >> 162: assert(s > 0, "Mid disarm: Should be armed. State: %d", s); >> 163: if (Atomic::cmpxchg(&_state, s, -s) == s) { > > When we hit this branch we have effectively left the outer loop. > I think it will read easier if this scope actually was outside the scope of the outer loop, no? Oh, you mean break out here, and just do the rest outside the loop. Yes, we can do that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16404#discussion_r1384606323 From rehn at openjdk.org Tue Nov 7 09:24:33 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 7 Nov 2023 09:24:33 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v3] In-Reply-To: References: Message-ID: On Tue, 7 Nov 2023 09:19:37 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/utilities/waitBarrier_generic.cpp line 163: >> >>> 161: int s = Atomic::load_acquire(&_state); >>> 162: assert(s > 0, "Mid disarm: Should be armed. State: %d", s); >>> 163: if (Atomic::cmpxchg(&_state, s, -s) == s) { >> >> When we hit this branch we have effectively left the outer loop. >> I think it will read easier if this scope actually was outside the scope of the outer loop, no? > > Oh, you mean break out here, and just do the rest outside the loop. Yes, we can do that. Yes, I think it will be easier to read. But do as you will, just a suggestion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16404#discussion_r1384608906 From shade at openjdk.org Tue Nov 7 09:24:36 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 7 Nov 2023 09:24:36 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v3] In-Reply-To: References: Message-ID: On Tue, 7 Nov 2023 09:13:52 GMT, Robbin Ehn wrote: >> src/hotspot/share/utilities/waitBarrier_generic.cpp line 192: >> >>> 190: return; >>> 191: } >>> 192: assert(s > 0, "Before wait: Should be armed. State: %d", s); >> >> If we are context switched here until this cell is re-used ? >> Hence we have the wrong barrier tag ? >> >> For safepoint this can't happen since the safepoint id for this safepoint safe thread will be wrong. >> Thus we can't re-use cells until this thread returns and change his safepoint id. >> >> But it seems like this is what saves us, so if there was another use-case it could happen, no? > > I thinking adding the barrier tag into the _state(64), so it's a 32-bit tag, and as now sign bit for armed/disarmed plus the 31-bit counter. Aw. Yes, there is a race condition here. It is plausible that waiter here can be stuck at previous barrier tag. I thought waiting for all threads to leave the cell on `arm()` saves us here, but it does not, because we might not know at `arm()` that we have a waiter waiting to enter. Initially I thought to include `barrier_tag` into `_state`, so we can check for it before entering in this code, but somehow convinced myself it was not required. This counter-example directly shows that we want to encode `barrier_tag` into `_state` as well. Let me do that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16404#discussion_r1384604709 From azafari at openjdk.org Tue Nov 7 09:31:41 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 7 Nov 2023 09:31:41 GMT Subject: Withdrawn: 8198918: jio_snprintf and friends are not checked by -Wformat In-Reply-To: References: Message-ID: On Tue, 26 Sep 2023 11:46:19 GMT, Afshin Zafari wrote: > - The `ATTRIBUTE_PRINTF` usage in cpp files is useless. They are removed. > - There are cases where `jio_xxprintf` functions use `char *` arguments for format string, rather than a literal like `"%s..."`. These cases are _not compiled_ when `ATTRIBUTE_PRINTF` is used for them. So, first I used the attribute and got the corresponding compile errors. Then I fixed the issues and removed the attribute when all issues were fixed. > > ### Test > The changes are tested on all platforms tiers 1-4. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/15918 From shade at openjdk.org Tue Nov 7 09:48:32 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 7 Nov 2023 09:48:32 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v3] In-Reply-To: References: Message-ID: On Tue, 7 Nov 2023 08:56:09 GMT, Robbin Ehn wrote: >> Aleksey Shipilev has updated the pull request incrementally with four additional commits since the last revision: >> >> - Touchups >> - More comments work >> - Tight up the comments >> - Rework to a single atomic counter per cell > > src/hotspot/share/utilities/waitBarrier_generic.cpp line 151: > >> 149: _sem.signal(); >> 150: >> 151: if (wakeups++ > max) { > > I would assume max = 2, would call signal() max 2 times ? > Here we end when wakeups are larger than max with post inc, so isn't that 4 times? (0->3) Right. Should be `++wakeups >= max` to match the limit exactly. Going to fix it in next commits. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16404#discussion_r1384640849 From shade at openjdk.org Tue Nov 7 09:52:32 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 7 Nov 2023 09:52:32 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v3] In-Reply-To: References: Message-ID: On Tue, 7 Nov 2023 08:51:20 GMT, Robbin Ehn wrote: >> Aleksey Shipilev has updated the pull request incrementally with four additional commits since the last revision: >> >> - Touchups >> - More comments work >> - Tight up the comments >> - Rework to a single atomic counter per cell > > src/hotspot/share/utilities/waitBarrier_generic.cpp line 120: > >> 118: SpinYield sp; >> 119: while (Atomic::load_acquire(&_state) < -1) { >> 120: sp.wait(); > > A warning once would be helpful that cells might be too few. Yes, but I am a bit uncomfortable with printing performance warnings from a generic synchronization primitive. At very least because it might intermittently break some tests that check the outputs. So I'd rather avoid adding stuff here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16404#discussion_r1384649730 From mli at openjdk.org Tue Nov 7 09:53:45 2023 From: mli at openjdk.org (Hamlin Li) Date: Tue, 7 Nov 2023 09:53:45 GMT Subject: RFR: 8319408: RISC-V: MaxVectorSize is not consistently checked in several situations [v2] In-Reply-To: <9EUIJPkyMahN9o3nfeahScRhKkkV2ACFCBZPMNNlHoA=.7c98cf5e-6c01-4ce6-9d82-1557803e9283@github.com> References: <9EUIJPkyMahN9o3nfeahScRhKkkV2ACFCBZPMNNlHoA=.7c98cf5e-6c01-4ce6-9d82-1557803e9283@github.com> Message-ID: On Tue, 7 Nov 2023 09:03:44 GMT, Hamlin Li wrote: >> Hi, >> Can you review the change to fix the MaxVectorSize checking in vm_version_riscv.cpp? >> Thanks! >> >> Current logic will not check whether (MaxVectorSize < 16), after the assignment `MaxVectorSize = _initial_vector_length;`, in following situation. >> a) if FLAG_IS_DEFAULT(MaxVectorSize) == true >> b) if FLAG_IS_DEFAULT(MaxVectorSize) == false and (MaxVectorSize >= 16) and is_power_of_2(MaxVectorSize) and (MaxVectorSize > _initial_vector_length) >> >> And in original code, the logic is not consistent for the situations between MaxVectorSize < 16 and MaxVectorSize >= 16, when is_power_of_2(MaxVectorSize) == false; for the former (<16) it's to disable RVV, for the latter (>=16) it's vm_exit. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > Refine log Thanks @luhenry @RealFYang for your reviewing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16498#issuecomment-1798155894 From mli at openjdk.org Tue Nov 7 09:53:47 2023 From: mli at openjdk.org (Hamlin Li) Date: Tue, 7 Nov 2023 09:53:47 GMT Subject: Integrated: 8319408: RISC-V: MaxVectorSize is not consistently checked in several situations In-Reply-To: References: Message-ID: <_B7ZkPuUZmMyyCUzZbstAHiMLNO7G6rjvxPJ0VVYQRQ=.55016b49-64aa-4868-bc8a-dee0264cf8a4@github.com> On Fri, 3 Nov 2023 16:11:15 GMT, Hamlin Li wrote: > Hi, > Can you review the change to fix the MaxVectorSize checking in vm_version_riscv.cpp? > Thanks! > > Current logic will not check whether (MaxVectorSize < 16), after the assignment `MaxVectorSize = _initial_vector_length;`, in following situation. > a) if FLAG_IS_DEFAULT(MaxVectorSize) == true > b) if FLAG_IS_DEFAULT(MaxVectorSize) == false and (MaxVectorSize >= 16) and is_power_of_2(MaxVectorSize) and (MaxVectorSize > _initial_vector_length) > > And in original code, the logic is not consistent for the situations between MaxVectorSize < 16 and MaxVectorSize >= 16, when is_power_of_2(MaxVectorSize) == false; for the former (<16) it's to disable RVV, for the latter (>=16) it's vm_exit. This pull request has now been integrated. Changeset: 1c0e7b71 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/1c0e7b71b86cf735a251d5b6fe25b9c573fbec80 Stats: 16 lines in 1 file changed: 7 ins; 7 del; 2 mod 8319408: RISC-V: MaxVectorSize is not consistently checked in several situations Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/16498 From sjohanss at openjdk.org Tue Nov 7 09:57:40 2023 From: sjohanss at openjdk.org (Stefan Johansson) Date: Tue, 7 Nov 2023 09:57:40 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v35] In-Reply-To: References: <6tngC-Jwyx8e25LGT8dAwKbaPb9qb_w5ONctnFieH3o=.61b013b2-beb9-4053-8c06-86a700208d77@github.com> <3iJqYOiXeO6bwtUmjzlO3tyFR9Uc28YAJ8aQbKbqKJM=.fda95001-86f3-4f09-8a74-e15be1987c4f@github.com> Message-ID: <1CHY1oIjteVS6FU__jUkfO6nTijg9j3tRbv_QJl3QLI=.18e3b2ba-3ab2-4ee8-ab5f-1dd30323f671@github.com> On Tue, 7 Nov 2023 01:06:12 GMT, Man Cao wrote: > I think it looks great. It is mainly refactoring that consolidates the declarations/definitions of the hsperf counters in to a single file. Would it be better to name that class `CPUTimeCounters`, so we could move `sun.threads.cpu_time.vm` and `sun.threads.cpu_time.conc_dedup`, and future JIT thread CPU counters to that class? > Yes, mainly refactoring and I was thinking along the same lines, but since this patch was just for GC and we had `CollectorCounters` already I went with this. I think calling the class `CPUTimeCounters` would be good and place it outside GC makes sense if we plan to include even more CPU time counters. Another name that we could improve is `CPUTimeGroups` and maybe also the enum name `Name`, they are ok, but we might come up with something better. > Then we could also change the constructor of `ThreadTotalCPUTimeClosure` to `ThreadTotalCPUTimeClosure(CPUTimeCounters* counters, CPUTimeGroups::Name name)`, then it could set `_update_gc_counters` based on `name`. I was looking at this too, but had to restructure the code more to avoid circular deps. If we create the general `CPUTImeCounters` we could move this closure to that file and then things would fit better I believe. So I like your proposals. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15082#issuecomment-1798170871 From mli at openjdk.org Tue Nov 7 10:00:45 2023 From: mli at openjdk.org (Hamlin Li) Date: Tue, 7 Nov 2023 10:00:45 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v3] In-Reply-To: References: Message-ID: > Hi, > Can you review the change to add intrinsic for CompressBits for Long & Integer? > Thanks! > > ##?Test > pass jtreg test: > test/jdk/java/lang/CompressExpand*.java Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: Match Op_CompressBits based on UseRVV only ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16481/files - new: https://git.openjdk.org/jdk/pull/16481/files/2380d6ec..b5eed0ec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16481&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16481&range=01-02 Stats: 4 lines in 2 files changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16481.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16481/head:pull/16481 PR: https://git.openjdk.org/jdk/pull/16481 From mli at openjdk.org Tue Nov 7 10:00:46 2023 From: mli at openjdk.org (Hamlin Li) Date: Tue, 7 Nov 2023 10:00:46 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v2] In-Reply-To: References: Message-ID: On Mon, 6 Nov 2023 08:49:33 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/riscv.ad line 1897: >> >>> 1895: >>> 1896: case Op_CompressBits: >>> 1897: return UseRVV && (MaxVectorSize >= 16); >> >> Isn't it guaranteed that `MaxVectorSize >= 16` if `UseRVV` is true? > > After https://github.com/openjdk/jdk/pull/16498, it should be guaranteed `MaxVectorSize >= 16`. > Let me remove this condition after pr #16498 is pushed. I've updated the patch to match Op_CompressBits based on UseRVV only ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1384665252 From rehn at openjdk.org Tue Nov 7 10:39:32 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 7 Nov 2023 10:39:32 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v3] In-Reply-To: References: Message-ID: On Tue, 7 Nov 2023 09:49:36 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/utilities/waitBarrier_generic.cpp line 120: >> >>> 118: SpinYield sp; >>> 119: while (Atomic::load_acquire(&_state) < -1) { >>> 120: sp.wait(); >> >> A warning once would be helpful that cells might be too few. > > Yes, but I am a bit uncomfortable with printing performance warnings from a generic synchronization primitive. At very least because it might intermittently break some tests that check the outputs. So I'd rather avoid adding stuff here. Ok ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16404#discussion_r1384714582 From azafari at openjdk.org Tue Nov 7 11:40:02 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 7 Nov 2023 11:40:02 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v8] In-Reply-To: References: Message-ID: <97IBSrr12htoiw751JlhL4f7jiEZeoYVF9hQjas8vrI=.a7143156-e1d5-4774-ba4b-08e29eb05389@github.com> > The `find` method now is > ```C++ > template > int find(T* token, bool f(T*, E)) const { > ... > > Any other functions which use this are also changed. > Local linux-x64-debug hotspot:tier1 passed. Mach5 tier1 build on linux and Windows passed. Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: function pointer is replaced with template Functor. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15418/files - new: https://git.openjdk.org/jdk/pull/15418/files/6d9288e7..7665b878 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15418&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15418&range=06-07 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/15418.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15418/head:pull/15418 PR: https://git.openjdk.org/jdk/pull/15418 From shade at openjdk.org Tue Nov 7 11:55:53 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 7 Nov 2023 11:55:53 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v4] In-Reply-To: References: Message-ID: > See the symptoms, reproducer and analysis in the bug. > > Current code waits on `disarm()`, which effectively stalls leaving the safepoint if some threads lag behind. Having more runnable threads than CPUs nearly guarantees that we would wait for quite some time, but it also reproduces well if you have enough threads near the CPU count. Just waiting at `arm()` is insufficient, but we can have several `Semaphores` to do what we want. > > This PR implements a more efficient `GenericWaitBarrier` to recover the performance. Most of the implementation discussion is in the code comments. The key observation that drives this work is that we want to reuse `Semaphore` and related counters without being stuck waiting for threads to leave. > > (AFAICS, futex-based `LinuxWaitBarrier` does roughly the same, but handles this reuse on futex side, by assigning the "address" per futex.) > > This issue affects everything except Linux. I initially found this on my M1 Mac, but pretty sure it is easy to reproduce on Windows as well. The safepoints from the reproducer in the bug improved dramatically on a Mac. Note not only the orders of magnitude better safepoint times, but also the several times more GC safepoints in the time-bound allocation test, which means the attainable GC throughput is similarly better, since we don't waste time at this wait barrier. > > ![plot-generic-wait-barrier-macos](https://github.com/openjdk/jdk/assets/1858943/3440f200-2360-482a-b8b2-71385cca35b7) > > Additional testing: > - [x] MacOS AArch64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) > - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) > - [x] MacOS AArch64 server fastdebug, `tier2 tier3` > - [x] Linux x86_64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) > - [x] Linux AArch64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: - Encode barrier tag into state, resolving another race condition - Simple review feedback fixes: tracking wakeup numbers, reflowing some methods - Merge branch 'master' into JDK-8318986-generic-wait-barrier - Touchups - More comments work - Tight up the comments - Rework to a single atomic counter per cell - Tigthen up memory ordering even more conservatively - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16404/files - new: https://git.openjdk.org/jdk/pull/16404/files/dfafbf3a..3cd53c1a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16404&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16404&range=02-03 Stats: 10385 lines in 307 files changed: 4383 ins; 2910 del; 3092 mod Patch: https://git.openjdk.org/jdk/pull/16404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16404/head:pull/16404 PR: https://git.openjdk.org/jdk/pull/16404 From shade at openjdk.org Tue Nov 7 11:55:55 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 7 Nov 2023 11:55:55 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v3] In-Reply-To: References: Message-ID: <4DkPofPnSIUXin6a8S9ZagcYLz-7EbUYA9BhLX_sEvU=.c25a6298-70b6-4b87-b185-0b5e77e11f54@github.com> On Tue, 7 Nov 2023 09:18:21 GMT, Aleksey Shipilev wrote: >> I thinking adding the barrier tag into the _state(64), so it's a 32-bit tag, and as now sign bit for armed/disarmed plus the 31-bit counter. > > Aw. Yes, there is a race condition here. It is plausible that waiter here can be stuck at previous barrier tag. I thought waiting for all threads to leave the cell on `arm()` saves us here, but it does not, because we might not know at `arm()` that we have a waiter waiting to enter. > > Initially I thought to include `barrier_tag` into `_state`, so we can check for it before entering in this code, but somehow convinced myself it was not required. This counter-example directly shows that we want to encode `barrier_tag` into `_state` as well. Let me do that. I pushed the new revision that incorporates review feedback and encodes `barrier_tag` into `_state`. It passes light testing. I am scheduling more thorough testing runs today. Feel free to take a look meanwhile! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16404#discussion_r1384800227 From jsjolen at openjdk.org Tue Nov 7 12:09:35 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 7 Nov 2023 12:09:35 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v12] In-Reply-To: References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: On Tue, 7 Nov 2023 05:14:55 GMT, Kim Barrett wrote: > > [...] The `at_put_grow_with` lambda takes two arguments, the latter being a boolean flag indicating whether or not this is the last element. It's perhaps not the most elegant, but I don't think having two templates would do us any good here. > > I don't see the point of the "last element" flag for the _with function argument. It is used to change what was in the old code a copy-assign into a copy-construct, but so what? A class should be either non-copyable (neither by constructor nor by assignment), or it should be copyable (both by constructor and by assignment). Anything else is weird and probably broken. So avoiding copy-assignment isn't really buying anything except API complexity. Would you prefer that we instantiate the element with global placement new? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16409#issuecomment-1798375378 From shade at openjdk.org Tue Nov 7 12:38:35 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 7 Nov 2023 12:38:35 GMT Subject: RFR: 8319406: x86: Shorter movptr(reg, imm) for 32-bit immediates In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 19:46:26 GMT, Quan Anh Mai wrote: >> Noticed this while doing C1 work, but the issue is more generic. If you look into x86 code, then sometimes you'll notice `movabs` with small immediates on x86. That `movabs` actually carries the full-blown 64-bit immediate. >> >> Similar to [JDK-8255838](https://bugs.openjdk.org/browse/JDK-8255838), it would be useful to shorten movptr(reg, imm) when immediate fits in 32 bits. This would compact some code, notably the code in C1 profiling ([JDK-8315843](https://bugs.openjdk.org/browse/JDK-8315843)), but also other code, generically. >> >> For example, sample branch profiling hunk from C1 tier3 on x86_64: >> >> >> Before: >> 0x00007f269065ed02: test %edx,%edx >> 0x00007f269065ed04: movabs $0x7f260a4ddd68,%rax ; {metadata(method data for {method} ? >> 0x00007f269065ed0e: movabs $0x138,%rsi >> ? 0x00007f269065ed18: je 0x00007f269065ed24 >> ? 0x00007f269065ed1a: movabs $0x148,%rsi >> ? 0x00007f269065ed24: mov (%rax,%rsi,1),%rdi >> 0x00007f269065ed28: lea 0x1(%rdi),%rdi >> 0x00007f269065ed2c: mov %rdi,(%rax,%rsi,1) >> 0x00007f269065ed30: je 0x00007f269065ed4e >> >> After: >> 0x00007f1370dcd782: test %edx,%edx >> 0x00007f1370dcd784: movabs $0x7f12f64ddd68,%rax ; {metadata(method data for {method} ? >> 0x00007f1370dcd78e: mov $0x138,%esi >> ? 0x00007f1370dcd793: je 0x00007f1370dcd79a >> ? 0x00007f1370dcd795: mov $0x148,%esi >> ? 0x00007f1370dcd79a: mov (%rax,%rsi,1),%rdi >> 0x00007f1370dcd79e: lea 0x1(%rdi),%rdi >> 0x00007f1370dcd7a2: mov %rdi,(%rax,%rsi,1) >> 0x00007f1370dcd7a6: je 0x00007f1370dcd7c4 >> >> >> We can use a shorter 32-bit immediate moves. In the hunk above, this saves about 8 bytes. >> >> This is not limited to the profiling code. There is observable code space savings on larger tests in C2, e.g. on `-Xcomp -XX:TieredStopAtLevel=... HelloWorld`. >> >> >> # Before >> nmethod code size : 430328 bytes >> nmethod code size : 467032 bytes >> nmethod code size : 908936 bytes >> nmethod code size : 1267816 bytes >> >> # After >> nmethod code size : 429616 bytes (-0.1%) >> nmethod code size : 466344 bytes (-0.1%) >> nmethod code size : 897144 bytes (-1.3%) >> nmethod code size : 1256216 bytes (-0.9%) >> >> >> There are two wrinkles: >> 1. Current `movslq(Register, int32_t)` is broken and protected by `ShouldNotReachHere()`. I fixed it to make this... > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 2576: > >> 2574: #ifdef _LP64 >> 2575: if (is_simm32(src)) { >> 2576: movslq(dst, checked_cast(src)); > > Why not just `movq`? there is no `movslq r, i` so this is kind of confusing. Right. There is no point in trying to fix `movslq`. Replaced with `movq`. > src/hotspot/cpu/x86/macroAssembler_x86.hpp line 1818: > >> 1816: void mov_metadata(Address dst, Metadata* obj, Register rscratch); >> 1817: >> 1818: void mov_ptrslot(Register dst, intptr_t val); > > I believe the convention here would be `movptr_imm64` Good name, renamed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16497#discussion_r1384844360 PR Review Comment: https://git.openjdk.org/jdk/pull/16497#discussion_r1384845944 From jsjolen at openjdk.org Tue Nov 7 12:40:50 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 7 Nov 2023 12:40:50 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v13] In-Reply-To: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: > Hi, > > When using at_put and at_put_grow you can provide a value which will be supplied to the constructor of each element. In other words, you can intialize each element through a copy constructor. > > I suggest that we also provide a function equivalent where the function is provided a pointer to the memory to be initialized. This can be used for `NONCOPYABLE` classes, for example. > > This is implemented using a SFINAE pattern because `nullptr` introduces ambiguity if you use static overload. > > Currently running tier1-tier4. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Alternative solution ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16409/files - new: https://git.openjdk.org/jdk/pull/16409/files/5ab96639..5786481c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=11-12 Stats: 9 lines in 1 file changed: 0 ins; 4 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16409.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16409/head:pull/16409 PR: https://git.openjdk.org/jdk/pull/16409 From shade at openjdk.org Tue Nov 7 12:47:49 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 7 Nov 2023 12:47:49 GMT Subject: RFR: 8319406: x86: Shorter movptr(reg, imm) for 32-bit immediates [v2] In-Reply-To: References: Message-ID: > Noticed this while doing C1 work, but the issue is more generic. If you look into x86 code, then sometimes you'll notice `movabs` with small immediates on x86. That `movabs` actually carries the full-blown 64-bit immediate. > > Similar to [JDK-8255838](https://bugs.openjdk.org/browse/JDK-8255838), it would be useful to shorten movptr(reg, imm) when immediate fits in 32 bits. This would compact some code, notably the code in C1 profiling ([JDK-8315843](https://bugs.openjdk.org/browse/JDK-8315843)), but also other code, generically. > > For example, sample branch profiling hunk from C1 tier3 on x86_64: > > > Before: > 0x00007f269065ed02: test %edx,%edx > 0x00007f269065ed04: movabs $0x7f260a4ddd68,%rax ; {metadata(method data for {method} ? > 0x00007f269065ed0e: movabs $0x138,%rsi > ? 0x00007f269065ed18: je 0x00007f269065ed24 > ? 0x00007f269065ed1a: movabs $0x148,%rsi > ? 0x00007f269065ed24: mov (%rax,%rsi,1),%rdi > 0x00007f269065ed28: lea 0x1(%rdi),%rdi > 0x00007f269065ed2c: mov %rdi,(%rax,%rsi,1) > 0x00007f269065ed30: je 0x00007f269065ed4e > > After: > 0x00007f1370dcd782: test %edx,%edx > 0x00007f1370dcd784: movabs $0x7f12f64ddd68,%rax ; {metadata(method data for {method} ? > 0x00007f1370dcd78e: mov $0x138,%esi > ? 0x00007f1370dcd793: je 0x00007f1370dcd79a > ? 0x00007f1370dcd795: mov $0x148,%esi > ? 0x00007f1370dcd79a: mov (%rax,%rsi,1),%rdi > 0x00007f1370dcd79e: lea 0x1(%rdi),%rdi > 0x00007f1370dcd7a2: mov %rdi,(%rax,%rsi,1) > 0x00007f1370dcd7a6: je 0x00007f1370dcd7c4 > > > We can use a shorter 32-bit immediate moves. In the hunk above, this saves about 8 bytes. > > This is not limited to the profiling code. There is observable code space savings on larger tests in C2, e.g. on `-Xcomp -XX:TieredStopAtLevel=... HelloWorld`. > > > # Before > nmethod code size : 430328 bytes > nmethod code size : 467032 bytes > nmethod code size : 908936 bytes > nmethod code size : 1267816 bytes > > # After > nmethod code size : 429616 bytes (-0.1%) > nmethod code size : 466344 bytes (-0.1%) > nmethod code size : 897144 bytes (-1.3%) > nmethod code size : 1256216 bytes (-0.9%) > > > There are two wrinkles: > 1. Current `movslq(Register, int32_t)` is broken and protected by `ShouldNotReachHere()`. I fixed it to make this patch work. Note that x86_64 does not actually define `movslq reg64, imm32`, this is a regular `mov... Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Remove new imm64 method completely, inline at use - Easy review feedback - Merge branch 'master' into JDK-8319406-shorter-movptr-32 - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16497/files - new: https://git.openjdk.org/jdk/pull/16497/files/ca37d38d..18112483 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16497&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16497&range=00-01 Stats: 10277 lines in 308 files changed: 4324 ins; 2921 del; 3032 mod Patch: https://git.openjdk.org/jdk/pull/16497.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16497/head:pull/16497 PR: https://git.openjdk.org/jdk/pull/16497 From shade at openjdk.org Tue Nov 7 12:47:51 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 7 Nov 2023 12:47:51 GMT Subject: RFR: 8319406: x86: Shorter movptr(reg, imm) for 32-bit immediates [v2] In-Reply-To: References: Message-ID: <7r1Z7uslcCF3kqVLf9I_kTI4z1htxxPHCfgC-M60e5w=.2b451b32-474f-4932-a36f-183906afb0ca@github.com> On Fri, 3 Nov 2023 19:49:55 GMT, Quan Anh Mai wrote: > Can we create `MacroAssembler::mov64` that does the branching instead, I think it is more natural there. And things that need 8-byte immediates will call into `Assembler::mov64`. Right, that would capture more cases. I am a bit concerned that we would not catch all the places where `mov64` is actually needed to be full-blown immediate. Maybe it would be safer to keep `mov64` intact, and introduce `mov_immediate` that can be arbitrarily shortened? AArch64 does this, for example. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16497#issuecomment-1798430468 From shade at openjdk.org Tue Nov 7 12:53:45 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 7 Nov 2023 12:53:45 GMT Subject: RFR: 8319406: x86: Shorter movptr(reg, imm) for 32-bit immediates [v3] In-Reply-To: References: Message-ID: > Noticed this while doing C1 work, but the issue is more generic. If you look into x86 code, then sometimes you'll notice `movabs` with small immediates on x86. That `movabs` actually carries the full-blown 64-bit immediate. > > Similar to [JDK-8255838](https://bugs.openjdk.org/browse/JDK-8255838), it would be useful to shorten movptr(reg, imm) when immediate fits in 32 bits. This would compact some code, notably the code in C1 profiling ([JDK-8315843](https://bugs.openjdk.org/browse/JDK-8315843)), but also other code, generically. > > For example, sample branch profiling hunk from C1 tier3 on x86_64: > > > Before: > 0x00007f269065ed02: test %edx,%edx > 0x00007f269065ed04: movabs $0x7f260a4ddd68,%rax ; {metadata(method data for {method} ? > 0x00007f269065ed0e: movabs $0x138,%rsi > ? 0x00007f269065ed18: je 0x00007f269065ed24 > ? 0x00007f269065ed1a: movabs $0x148,%rsi > ? 0x00007f269065ed24: mov (%rax,%rsi,1),%rdi > 0x00007f269065ed28: lea 0x1(%rdi),%rdi > 0x00007f269065ed2c: mov %rdi,(%rax,%rsi,1) > 0x00007f269065ed30: je 0x00007f269065ed4e > > After: > 0x00007f1370dcd782: test %edx,%edx > 0x00007f1370dcd784: movabs $0x7f12f64ddd68,%rax ; {metadata(method data for {method} ? > 0x00007f1370dcd78e: mov $0x138,%esi > ? 0x00007f1370dcd793: je 0x00007f1370dcd79a > ? 0x00007f1370dcd795: mov $0x148,%esi > ? 0x00007f1370dcd79a: mov (%rax,%rsi,1),%rdi > 0x00007f1370dcd79e: lea 0x1(%rdi),%rdi > 0x00007f1370dcd7a2: mov %rdi,(%rax,%rsi,1) > 0x00007f1370dcd7a6: je 0x00007f1370dcd7c4 > > > We can use a shorter 32-bit immediate moves. In the hunk above, this saves about 8 bytes. > > This is not limited to the profiling code. There is observable code space savings on larger tests in C2, e.g. on `-Xcomp -XX:TieredStopAtLevel=... HelloWorld`. > > > # Before > nmethod code size : 430328 bytes > nmethod code size : 467032 bytes > nmethod code size : 908936 bytes > nmethod code size : 1267816 bytes > > # After > nmethod code size : 429616 bytes (-0.1%) > nmethod code size : 466344 bytes (-0.1%) > nmethod code size : 897144 bytes (-1.3%) > nmethod code size : 1256216 bytes (-0.9%) > > > There are two wrinkles: > 1. Current `movslq(Register, int32_t)` is broken and protected by `ShouldNotReachHere()`. I fixed it to make this patch work. Note that x86_64 does not actually define `movslq reg64, imm32`, this is a regular `mov... Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Enlighs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16497/files - new: https://git.openjdk.org/jdk/pull/16497/files/18112483..6dcaf425 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16497&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16497&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16497.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16497/head:pull/16497 PR: https://git.openjdk.org/jdk/pull/16497 From dnsimon at openjdk.org Tue Nov 7 13:00:31 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 7 Nov 2023 13:00:31 GMT Subject: RFR: 8315680: java/lang/ref/ReachabilityFenceTest.java should run with -Xbatch In-Reply-To: References: Message-ID: On Tue, 3 Oct 2023 07:47:30 GMT, Gerg? Barany wrote: > This test requires certain methods to be compiled, but without `-Xbatch` the compiler races against the test code, which can lead to intermittent failures. @PaulSandoz do you see any problem with this change? Adding `-Xbatch` does not significantly increase the test run time. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16023#issuecomment-1798453055 From rehn at openjdk.org Tue Nov 7 13:03:34 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 7 Nov 2023 13:03:34 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v4] In-Reply-To: References: Message-ID: On Tue, 7 Nov 2023 11:55:53 GMT, Aleksey Shipilev wrote: >> See the symptoms, reproducer and analysis in the bug. >> >> Current code waits on `disarm()`, which effectively stalls leaving the safepoint if some threads lag behind. Having more runnable threads than CPUs nearly guarantees that we would wait for quite some time, but it also reproduces well if you have enough threads near the CPU count. Just waiting at `arm()` is insufficient, but we can have several `Semaphores` to do what we want. >> >> This PR implements a more efficient `GenericWaitBarrier` to recover the performance. Most of the implementation discussion is in the code comments. The key observation that drives this work is that we want to reuse `Semaphore` and related counters without being stuck waiting for threads to leave. >> >> (AFAICS, futex-based `LinuxWaitBarrier` does roughly the same, but handles this reuse on futex side, by assigning the "address" per futex.) >> >> This issue affects everything except Linux. I initially found this on my M1 Mac, but pretty sure it is easy to reproduce on Windows as well. The safepoints from the reproducer in the bug improved dramatically on a Mac. Note not only the orders of magnitude better safepoint times, but also the several times more GC safepoints in the time-bound allocation test, which means the attainable GC throughput is similarly better, since we don't waste time at this wait barrier. >> >> ![plot-generic-wait-barrier-macos](https://github.com/openjdk/jdk/assets/1858943/3440f200-2360-482a-b8b2-71385cca35b7) >> >> Additional testing: >> - [x] MacOS AArch64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] MacOS AArch64 server fastdebug, `tier2 tier3` >> - [x] Linux x86_64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: > > - Encode barrier tag into state, resolving another race condition > - Simple review feedback fixes: tracking wakeup numbers, reflowing some methods > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Touchups > - More comments work > - Tight up the comments > - Rework to a single atomic counter per cell > - Tigthen up memory ordering even more conservatively > - Fix Thank you! I like the new protocol with tag == 0 as disarmed. Looks good! (a bit more tired than this morning, so I'll have new look tomorrow, just in case a missed something. silence == all good) src/hotspot/share/utilities/waitBarrier_generic.hpp line 38: > 36: private: > 37: DEFINE_PAD_MINUS_SIZE(0, DEFAULT_CACHE_LINE_SIZE, 0); > 38: Just reading the padding, it's unclear why the two pads are where they are. Can you add a comment about why choose these two locations? ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16404#pullrequestreview-1717616929 PR Review Comment: https://git.openjdk.org/jdk/pull/16404#discussion_r1384871385 From jsjolen at openjdk.org Tue Nov 7 13:40:53 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 7 Nov 2023 13:40:53 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v14] In-Reply-To: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: > Hi, > > When using at_put and at_put_grow you can provide a value which will be supplied to the constructor of each element. In other words, you can intialize each element through a copy constructor. > > I suggest that we also provide a function equivalent where the function is provided a pointer to the memory to be initialized. This can be used for `NONCOPYABLE` classes, for example. > > This is implemented using a SFINAE pattern because `nullptr` introduces ambiguity if you use static overload. > > Currently running tier1-tier4. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: More descriptive name and uniform argument order ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16409/files - new: https://git.openjdk.org/jdk/pull/16409/files/5786481c..1e8ce296 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=12-13 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/16409.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16409/head:pull/16409 PR: https://git.openjdk.org/jdk/pull/16409 From tschatzl at openjdk.org Tue Nov 7 14:09:42 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 7 Nov 2023 14:09:42 GMT Subject: RFR: 8319456: jdk/jfr/event/gc/collection/TestGCCauseWith[Serial|Parallel].java : GC cause 'GCLocker Initiated GC' not in the valid causes Message-ID: <3vf4CilrHIDJRN6KbtezqVAz3YBMkyEiy8syWXesfvI=.99460479-ef0e-4d30-b918-2c2542ff2a2b@github.com> Hi all, please review these fixes to the `jdk/jfr/event/gc/collection/TestGCCauseWith[Serial|Parallel].java` tests that fail if the GC cause has been "GCLocker Initiated GC". This is a valid gc cause, just extremely rare (interestingly the corresponding G1 tests added it). Thanks, Thomas ------------- Commit messages: - 8319456 Changes: https://git.openjdk.org/jdk/pull/16542/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16542&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319456 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16542.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16542/head:pull/16542 PR: https://git.openjdk.org/jdk/pull/16542 From tschatzl at openjdk.org Tue Nov 7 14:11:20 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 7 Nov 2023 14:11:20 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v14] In-Reply-To: References: Message-ID: > The JEP covers the idea very well, so I'm only covering some implementation details here: > > * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. > > * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: > > * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. > > * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). > > * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. > > * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) > > The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. > > I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. > > * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in addition to something like: > > `GC(6) P... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: "GCLocker Initiated GC" is not a valid GC cause for G1 any more ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16342/files - new: https://git.openjdk.org/jdk/pull/16342/files/d9ccccff..c272a736 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=12-13 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16342.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16342/head:pull/16342 PR: https://git.openjdk.org/jdk/pull/16342 From lkorinth at openjdk.org Tue Nov 7 14:32:39 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Tue, 7 Nov 2023 14:32:39 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v14] In-Reply-To: References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: On Tue, 7 Nov 2023 13:40:53 GMT, Johan Sj?len wrote: >> Hi, >> >> When using at_put and at_put_grow you can provide a value which will be supplied to the constructor of each element. In other words, you can intialize each element through a copy constructor. >> >> I suggest that we also provide a function equivalent where the function is provided a pointer to the memory to be initialized. This can be used for `NONCOPYABLE` classes, for example. >> >> This is implemented using a SFINAE pattern because `nullptr` introduces ambiguity if you use static overload. >> >> Currently running tier1-tier4. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > More descriptive name and uniform argument order src/hotspot/share/utilities/growableArray.hpp line 437: > 435: this->_len = i+1; > 436: } > 437: ::new (&this->_data[i]) E(elem); I think this needs to be guarded (there might already be an element at `i`) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1385008942 From pchilanomate at openjdk.org Tue Nov 7 14:56:36 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 7 Nov 2023 14:56:36 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v4] In-Reply-To: <6vZHqAagtrtDw-c9xNZNpaBCa1HrpYw22uhYtDTHwAw=.3a59dde0-ab33-43b8-a08f-d481e7acffef@github.com> References: <6vZHqAagtrtDw-c9xNZNpaBCa1HrpYw22uhYtDTHwAw=.3a59dde0-ab33-43b8-a08f-d481e7acffef@github.com> Message-ID: On Mon, 6 Nov 2023 23:22:03 GMT, Serguei Spitsyn wrote: >> The handshakes support for virtual threads is needed to simplify the JVMTI implementation for virtual threads. There is a significant duplication in the JVMTI code to differentiate code intended to support platform, virtual threads or both. The handshakes are unified, so it is enough to define just one handshake for both platform and virtual thread. >> At the low level, the JVMTI code supporting platform and virtual threads still can be different. >> This implementation is based on the `JvmtiVTMSTransitionDisabler` class. >> >> The internal API includes two new classes: >> - `JvmtiHandshake` and `JvmtiUnifiedHandshakeClosure` >> >> The `JvmtiUnifiedHandshakeClosure` defines two different callback functions: `do_thread()` and `do_vthread()`. >> >> The first JVMTI functions are picked first to be converted to use the `JvmtiHandshake`: >> - `GetStackTrace`, `GetFrameCount`, `GetFrameLocation`, `NotifyFramePop` >> >> To get the test results clean, the update also fixes the test issue: >> [8318631](https://bugs.openjdk.org/browse/JDK-8318631): GetStackTraceSuspendedStressTest.java failed with "check_jvmti_status: JVMTI function returned error: JVMTI_ERROR_THREAD_NOT_ALIVE (15)" >> >> Testing: >> - the mach5 tiers 1-6 are all passed > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: get rid of the VM_HandshakeUnmountedVirtualThread Hi Serguei, Looks good to me, nice code consolidation. src/hotspot/share/prims/jvmtiEnvBase.cpp line 1974: > 1972: > 1973: if (java_lang_VirtualThread::is_instance(target_h())) { // virtual thread > 1974: if (!JvmtiEnvBase::is_vthread_alive(target_h())) { There is only one issue I see in how this check is implemented and the removal of the VM_op for unmounted vthreads. The change of state to TERMINATED happens after notifyJvmtiUnmount(), i.e we can see that this vthread is alive here but a check later can return is not. This might hit the assert in JvmtiEnvBase::get_vthread_jvf() (maybe this the issue you saw on your first prototype). We can either change that order at the Java level, or maybe better change this function to read the state and add a case where if the state is RUNNING check whether the continuation is done or not (jdk_internal_vm_Continuation::done(cont)). src/hotspot/share/prims/jvmtiEnvBase.cpp line 1978: > 1976: } > 1977: if (target_jt == nullptr) { // unmounted virtual thread > 1978: hs_cl->do_vthread(target_h); // execute handshake closure callback on current thread directly I think comment should be: s/current thread/unmounted vthread src/hotspot/share/prims/jvmtiEnvBase.cpp line 2416: > 2414: if (!JvmtiEnvBase::is_vthread_alive(_target_h())) { > 2415: return; // JVMTI_ERROR_THREAD_NOT_ALIVE (default) > 2416: } Don't we have this check already in JvmtiHandshake::execute()? Same with the other converted functions. src/hotspot/share/prims/jvmtiEnvBase.hpp line 490: > 488: class JvmtiHandshake : public Handshake { > 489: protected: > 490: static bool is_vthread_handshake_safe(JavaThread* thread, oop vt); Not defined, leftover? ------------- PR Review: https://git.openjdk.org/jdk/pull/16460#pullrequestreview-1717815943 PR Review Comment: https://git.openjdk.org/jdk/pull/16460#discussion_r1385033726 PR Review Comment: https://git.openjdk.org/jdk/pull/16460#discussion_r1384994419 PR Review Comment: https://git.openjdk.org/jdk/pull/16460#discussion_r1384999063 PR Review Comment: https://git.openjdk.org/jdk/pull/16460#discussion_r1385002852 From jsjolen at openjdk.org Tue Nov 7 15:24:49 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 7 Nov 2023 15:24:49 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v15] In-Reply-To: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: > Hi, > > When using at_put and at_put_grow you can provide a value which will be supplied to the constructor of each element. In other words, you can intialize each element through a copy constructor. > > I suggest that we also provide a function equivalent where the function is provided a pointer to the memory to be initialized. This can be used for `NONCOPYABLE` classes, for example. > > This is implemented using a SFINAE pattern because `nullptr` introduces ambiguity if you use static overload. > > Currently running tier1-tier4. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Fix bug and style issues ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16409/files - new: https://git.openjdk.org/jdk/pull/16409/files/1e8ce296..1c39fb41 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16409&range=13-14 Stats: 19 lines in 1 file changed: 11 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/16409.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16409/head:pull/16409 PR: https://git.openjdk.org/jdk/pull/16409 From jsjolen at openjdk.org Tue Nov 7 15:24:52 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 7 Nov 2023 15:24:52 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v14] In-Reply-To: References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: On Tue, 7 Nov 2023 14:29:34 GMT, Leo Korinth wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> More descriptive name and uniform argument order > > src/hotspot/share/utilities/growableArray.hpp line 437: > >> 435: this->_len = i+1; >> 436: } >> 437: ::new (&this->_data[i]) E(elem); > > I think this needs to be guarded (there might already be an element at `i`) Thanks Leo, I fixed the bug and some style issues along with it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1385096353 From duke at openjdk.org Tue Nov 7 15:44:30 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 7 Nov 2023 15:44:30 GMT Subject: RFR: 8319429: Resetting MXCSR flags degrades ecore [v2] In-Reply-To: <9oh5727exwDNjqdB2bUs6dauA0m5myiTBYMPw-_wb2I=.1b8d7500-39b9-4b68-a8af-d8d4fc85a479@github.com> References: <9oh5727exwDNjqdB2bUs6dauA0m5myiTBYMPw-_wb2I=.1b8d7500-39b9-4b68-a8af-d8d4fc85a479@github.com> Message-ID: <-31hCc-QbzceR6z9HM9JQ3VuIC2Cvv8n_v2oG076MME=.9a869fdc-bc73-4e8d-979c-85b223cb613c@github.com> On Tue, 7 Nov 2023 03:37:20 GMT, Jatin Bhateja wrote: > As per JVM specification section 2.8 "The floating-point instructions of the Java Virtual Machine do not throw exceptions, trap, or otherwise signal the IEEE 754 exceptional conditions of invalid operation, division by zero, overflow, underflow, or inexact.", thus JVM does not check these exceptions. > > Your patch is always setting lower 6 bits of MXCSR on hybrid CPUs which has both E and P core, do you see any concerns if these bits are default ON for other server targets with just P-cores. I considered it. There doesn't appear to be any functional correctness issues, since Java does not support the signaling part of BFP IEEE anyway, those flags are essentially noop. I also measured on a some PCore systems, the performance is unaffected. I mostly went with this fix to be conservative, since there probably should be more performance testing otherwise. Might be cleaner to have it set to just one value. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16504#issuecomment-1798936849 From jvernee at openjdk.org Tue Nov 7 15:49:08 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 7 Nov 2023 15:49:08 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v5] In-Reply-To: References: Message-ID: > The issue is essentially that for the Java try-with-resource construct, javac generates multiple calls to `close(`) the resource. One of those calls is inside the hidden exception handler of the try block. The issue for us is that typically the exception handler is never entered (since no exception is thrown), however we don't profile exception handlers at the moment, so the block is not pruned. C2 doesn't inline the `close()` call in the handler due to low call site frequency. As a result, the receiver of that call escapes and can not be scalar replaced, which then leads to a loss in performance. > > There has been some discussion on the JBS issue that this could be fixed by profiling catch blocks. And another suggestion that partial escape analysis could help here to prevent the object from escaping. But, I think there are other benefits to being able to prune dead catch blocks, such as general reduction in code size, and other optimizations being possible by dead code being eliminated. So, I've implemented catch block profiling + pruning in this patch. > > The implementation is essentially very straightforward: we allocate an extra bit of profiling data for each > exception handler of a method in the `MethodData` for that method (which holds all the profiling > data). Then when looking up the exception handler after an exception is thrown, we mark the > exception handler as entered. When C2 parses the exception handler block, and it sees that it has > never been entered, we emit an uncommon trap instead. > > I've also cleaned up the handling of profiling data sections a bit. After adding the extra section of data to MethodData, I was seeing several crashes when ciMethodData was used. The underlying issue seemed to be that the offset of the parameter data was computed based on the total data size - parameter data size (which doesn't work if we add an additional section for exception handler data). I've re-written the code around this a bit to try and prevent issues in the future. Both MethodData and ciMethodData now track offsets of parameter data and exception handler data, and the size of the each data section is derived from the offsets. > > Finally, there was an assert firing in `freeze_internal` in `continuationFreezeThaw.cpp`: > > assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), > "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int64_t)current->held_monitor_count... Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: - track catch block enters in deoptimization code too - Add @requires vm.debug to test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16416/files - new: https://git.openjdk.org/jdk/pull/16416/files/261cdb0e..007664ad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16416&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16416&range=03-04 Stats: 64 lines in 5 files changed: 60 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16416/head:pull/16416 PR: https://git.openjdk.org/jdk/pull/16416 From jvernee at openjdk.org Tue Nov 7 15:49:09 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 7 Nov 2023 15:49:09 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v4] In-Reply-To: References: Message-ID: <1IM9PmKdUQkyiICiDfMtWt7vwoc8tIsqvWye1GlFdjc=.6bd90821-eab4-48c3-8cc5-91688c7ff7f4@github.com> On Mon, 6 Nov 2023 14:39:54 GMT, Jorn Vernee wrote: >> The issue is essentially that for the Java try-with-resource construct, javac generates multiple calls to `close(`) the resource. One of those calls is inside the hidden exception handler of the try block. The issue for us is that typically the exception handler is never entered (since no exception is thrown), however we don't profile exception handlers at the moment, so the block is not pruned. C2 doesn't inline the `close()` call in the handler due to low call site frequency. As a result, the receiver of that call escapes and can not be scalar replaced, which then leads to a loss in performance. >> >> There has been some discussion on the JBS issue that this could be fixed by profiling catch blocks. And another suggestion that partial escape analysis could help here to prevent the object from escaping. But, I think there are other benefits to being able to prune dead catch blocks, such as general reduction in code size, and other optimizations being possible by dead code being eliminated. So, I've implemented catch block profiling + pruning in this patch. >> >> The implementation is essentially very straightforward: we allocate an extra bit of profiling data for each >> exception handler of a method in the `MethodData` for that method (which holds all the profiling >> data). Then when looking up the exception handler after an exception is thrown, we mark the >> exception handler as entered. When C2 parses the exception handler block, and it sees that it has >> never been entered, we emit an uncommon trap instead. >> >> I've also cleaned up the handling of profiling data sections a bit. After adding the extra section of data to MethodData, I was seeing several crashes when ciMethodData was used. The underlying issue seemed to be that the offset of the parameter data was computed based on the total data size - parameter data size (which doesn't work if we add an additional section for exception handler data). I've re-written the code around this a bit to try and prevent issues in the future. Both MethodData and ciMethodData now track offsets of parameter data and exception handler data, and the size of the each data section is derived from the offsets. >> >> Finally, there was an assert firing in `freeze_internal` in `continuationFreezeThaw.cpp`: >> >> assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), >> "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int... > > Jorn Vernee has updated the pull request incrementally with three additional commits since the last revision: > > - remove leftover comment > - Add smoke tests for -XX:+StressPrunedExceptionHandlers and -XX:-ProfileExceptionHandlers > - Add missing spaces to IRNode I've added tracking of exception handler enters in the deoptimization code as well now, and added another test case to validate. Please have another look. Thanks. I'll redo CI testing as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1798951631 From dfenacci at openjdk.org Tue Nov 7 16:42:33 2023 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 7 Nov 2023 16:42:33 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs In-Reply-To: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: On Mon, 30 Oct 2023 18:34:44 GMT, Roger Riggs wrote: > Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. > The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. > > The changes include: > > - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate > if the input array is modified before the constructor returns. > The resulting string may contain any combination of characters sampled from the input array. > > - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. > For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. > If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. > If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. > Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. > Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. > > - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. > > - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. This PR includes changes to the `C2_MacroAssembler::char_array_compress_v` intrinsic for **RISCV** that we couldn't test. @RealFYang could you please test and review it? Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16425#issuecomment-1799144689 From jbhateja at openjdk.org Tue Nov 7 17:47:35 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 7 Nov 2023 17:47:35 GMT Subject: RFR: 8319429: Resetting MXCSR flags degrades ecore [v2] In-Reply-To: References: Message-ID: On Tue, 7 Nov 2023 02:58:50 GMT, Jatin Bhateja wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> move option to x86-specific section > > src/hotspot/cpu/x86/globals_x86.hpp line 218: > >> 216: \ >> 217: /* Autodetected, see vm_version_x86.cpp */ \ >> 218: product(bool, DoEcoreOpt, false, DIAGNOSTIC, \ > > Change to name to DoEcoreOpt -> DoEcoreOpts or EnableX86ECoreOpts Comment addressal is pending. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16504#discussion_r1385312473 From jbhateja at openjdk.org Tue Nov 7 17:47:31 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 7 Nov 2023 17:47:31 GMT Subject: RFR: 8319429: Resetting MXCSR flags degrades ecore [v2] In-Reply-To: <-31hCc-QbzceR6z9HM9JQ3VuIC2Cvv8n_v2oG076MME=.9a869fdc-bc73-4e8d-979c-85b223cb613c@github.com> References: <9oh5727exwDNjqdB2bUs6dauA0m5myiTBYMPw-_wb2I=.1b8d7500-39b9-4b68-a8af-d8d4fc85a479@github.com> <-31hCc-QbzceR6z9HM9JQ3VuIC2Cvv8n_v2oG076MME=.9a869fdc-bc73-4e8d-979c-85b223cb613c@github.com> Message-ID: <4AjEiOabfGigF3gxTYc0Y_KplZT6V6raaCay_mobQ0M=.7ff252f2-6643-46dd-a71f-e0e6a0db8f40@github.com> On Tue, 7 Nov 2023 15:41:40 GMT, Volodymyr Paprotski wrote: > > As per JVM specification section 2.8 "The floating-point instructions of the Java Virtual Machine do not throw exceptions, trap, or otherwise signal the IEEE 754 exceptional conditions of invalid operation, division by zero, overflow, underflow, or inexact.", thus JVM does not check these exceptions. > > Your patch is always setting lower 6 bits of MXCSR on hybrid CPUs which has both E and P core, do you see any concerns if these bits are default ON for other server targets with just P-cores. > > I considered it. There doesn't appear to be any functional correctness issues, since Java does not support the signaling part of BFP IEEE anyway, those flags are essentially noop. I also measured on a some PCore systems, the performance is unaffected. I mostly went with this fix to be conservative, since there probably should be more performance testing otherwise. Might be cleaner to have it set to just one value. Do you have any idea why these settings give a performance bump over E-core, are these suggested settings in x86 manuals? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16504#issuecomment-1799329720 From psandoz at openjdk.org Tue Nov 7 17:48:31 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 7 Nov 2023 17:48:31 GMT Subject: RFR: 8315680: java/lang/ref/ReachabilityFenceTest.java should run with -Xbatch In-Reply-To: References: Message-ID: On Tue, 3 Oct 2023 07:47:30 GMT, Gerg? Barany wrote: > This test requires certain methods to be compiled, but without `-Xbatch` the compiler races against the test code, which can lead to intermittent failures. Marked as reviewed by psandoz (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16023#pullrequestreview-1718335476 From psandoz at openjdk.org Tue Nov 7 17:48:32 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 7 Nov 2023 17:48:32 GMT Subject: RFR: 8315680: java/lang/ref/ReachabilityFenceTest.java should run with -Xbatch In-Reply-To: References: Message-ID: On Tue, 7 Nov 2023 17:45:36 GMT, Paul Sandoz wrote: >> This test requires certain methods to be compiled, but without `-Xbatch` the compiler races against the test code, which can lead to intermittent failures. > > Marked as reviewed by psandoz (Reviewer). > @PaulSandoz do you see any problem with this change? Adding `-Xbatch` does not significantly increase the test run time. Seems ok to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16023#issuecomment-1799331748 From rriggs at openjdk.org Tue Nov 7 17:55:31 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Tue, 7 Nov 2023 17:55:31 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs In-Reply-To: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: <77-VU5TxUIOF1m0CJkL9BOoInaC-y_gFIKSl6bwqmq8=.c3fe0e09-19df-410c-89af-079bc5472ae6@github.com> On Mon, 30 Oct 2023 18:34:44 GMT, Roger Riggs wrote: > Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. > The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. > > The changes include: > > - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate > if the input array is modified before the constructor returns. > The resulting string may contain any combination of characters sampled from the input array. > > - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. > For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. > If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. > If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. > Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. > Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. > > - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. > > - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. Damon Fenacci authored the updates to the hotspot intrinsics. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16425#issuecomment-1799341933 From rriggs at openjdk.org Tue Nov 7 18:10:35 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Tue, 7 Nov 2023 18:10:35 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs In-Reply-To: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: On Mon, 30 Oct 2023 18:34:44 GMT, Roger Riggs wrote: > Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. > The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. > > The changes include: > > - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate > if the input array is modified before the constructor returns. > The resulting string may contain any combination of characters sampled from the input array. > > - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. > For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. > If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. > If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. > Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. > Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. > > - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. > > - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. Remove the override. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16425#issuecomment-1799374280 From tschatzl at openjdk.org Tue Nov 7 18:11:52 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 7 Nov 2023 18:11:52 GMT Subject: RFR: 8318706: Implementation of JDK-8276094: JEP 423: Region Pinning for G1 [v15] In-Reply-To: References: Message-ID: > The JEP covers the idea very well, so I'm only covering some implementation details here: > > * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. > > * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: > > * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. > > * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). > > * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. > > * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) > > The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. > > I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. > > * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in addition to something like: > > `GC(6) P... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: - Merge branch 'master' into 8318706-implementation-of-region-pinning-in-g1 - "GCLocker Initiated GC" is not a valid GC cause for G1 any more - Fix tests after merge - Merge tag 'jdk-22+22' into 8318706-implementation-of-region-pinning-in-g1 Added tag jdk-22+22 for changeset d354141a - Merge tag 'jdk-22+21' into 8318706-implementation-of-region-pinning-in-g1 Added tag jdk-22+21 for changeset d96f38b8 - iwalulya review - typos - ayang review - renamings + documentation - Add documentation about why and how we handle pinned regions in the young/old generation. - Renamings to (almost) consistently use the following nomenclature for evacuation failure and types of it: * evacuation failure is the general concept. It includes * pinned regions * allocation failure One region can both be pinned and experience an allocation failure. G1 GC messages use tags "(Pinned)" and "(Allocation Failure)" now instead of "(Evacuation Failure)" Did not rename the G1EvacFailureInjector since this adds a lot of noise. - ... and 9 more: https://git.openjdk.org/jdk/compare/45e68ae2...83eff9fe ------------- Changes: https://git.openjdk.org/jdk/pull/16342/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=14 Stats: 1820 lines in 59 files changed: 1147 ins; 430 del; 243 mod Patch: https://git.openjdk.org/jdk/pull/16342.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16342/head:pull/16342 PR: https://git.openjdk.org/jdk/pull/16342 From gbarany at openjdk.org Tue Nov 7 19:28:20 2023 From: gbarany at openjdk.org (=?UTF-8?B?R2VyZ8O2?= Barany) Date: Tue, 7 Nov 2023 19:28:20 GMT Subject: Integrated: 8315680: java/lang/ref/ReachabilityFenceTest.java should run with -Xbatch In-Reply-To: References: Message-ID: On Tue, 3 Oct 2023 07:47:30 GMT, Gerg? Barany wrote: > This test requires certain methods to be compiled, but without `-Xbatch` the compiler races against the test code, which can lead to intermittent failures. This pull request has now been integrated. Changeset: a290256b Author: Gerg? Barany Committer: Doug Simon URL: https://git.openjdk.org/jdk/commit/a290256bbf85a52fbeab24dab5fbe195cf58750f Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod 8315680: java/lang/ref/ReachabilityFenceTest.java should run with -Xbatch Reviewed-by: dnsimon, never, psandoz ------------- PR: https://git.openjdk.org/jdk/pull/16023 From duke at openjdk.org Tue Nov 7 19:51:34 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 7 Nov 2023 19:51:34 GMT Subject: RFR: 8319429: Resetting MXCSR flags degrades ecore [v3] In-Reply-To: References: Message-ID: > Improves vector rounding on ECore about 10x > > (BEFORE) FpRoundingBenchmark.test_round_float 2048 thrpt 3 40.912 ? 0.044 ops/ms > (AFTER ) FpRoundingBenchmark.test_round_float 2048 thrpt 3 431.682 ? 0.727 ops/ms Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: change option name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16504/files - new: https://git.openjdk.org/jdk/pull/16504/files/f4c8c36e..fd956635 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16504&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16504&range=01-02 Stats: 9 lines in 5 files changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/16504.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16504/head:pull/16504 PR: https://git.openjdk.org/jdk/pull/16504 From duke at openjdk.org Tue Nov 7 20:02:38 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 7 Nov 2023 20:02:38 GMT Subject: RFR: 8319429: Resetting MXCSR flags degrades ecore [v2] In-Reply-To: References: Message-ID: <9HdBB8ripMhPy5LoM-ANlFDk3WQWdh-qvDUKzdHJXrs=.8fc3ade1-e155-4a4b-a4d5-7db4ca3fe42c@github.com> On Tue, 7 Nov 2023 17:44:41 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/globals_x86.hpp line 218: >> >>> 216: \ >>> 217: /* Autodetected, see vm_version_x86.cpp */ \ >>> 218: product(bool, DoEcoreOpt, false, DIAGNOSTIC, \ >> >> Change to name to DoEcoreOpt -> DoEcoreOpts or EnableX86ECoreOpts > > Comment addressal is pending. Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16504#discussion_r1385484763 From coleenp at openjdk.org Tue Nov 7 20:03:26 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 7 Nov 2023 20:03:26 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v4] In-Reply-To: <6lajvS2wTUMLb-JbqH-30AQB509F6jRG0FuMmrGY3gs=.9b83d6eb-babe-42ca-b628-aaea06323f4d@github.com> References: <6lajvS2wTUMLb-JbqH-30AQB509F6jRG0FuMmrGY3gs=.9b83d6eb-babe-42ca-b628-aaea06323f4d@github.com> Message-ID: On Wed, 1 Nov 2023 16:34:23 GMT, Oli Gillespie wrote: >> Coleen - >> >> I think NBQ is a reasonable choice for use here. But it's not a complete >> solution on its own. It imposes documented requirements on clients. I don't >> think we have a different data structure for this purpose (thread-safe FIFO >> without locks), so any alternative would need to be invented, and would be >> solving the same problem as NBQ and the surrounding client-provided code. >> >> Oliver - >> >> The current usage is not safe. The reuse can occur through the allocator. For >> example, one thread starts a pop. Another thread steals that pop, then deletes >> the object. Later, an allocation gets a new node at the same address as the >> deleted node. That newly allocated node makes its way through the queue to >> eventually become visible to that first thread's still in-progress pop. (So >> this is an SMR bug. You generally can't delete an object while some other >> thread might be looking at it.) >> >> GlobalCounter is not a locking mechanism. It is an RCU-style synchronization >> mechanism, so related to but different from RWLocks. In particular, readers >> (threads in a critical section) never block due to this mechanism - only >> write_synchronize blocks. >> >> A problem with using GlobalCounter in that simplistic way is that once the >> queue is "full", the one-in-one-out policy is going to have every allocation >> hit GlobalCounter::write_synchronize (a potentially somewhat expensive >> operation, since it needs to iterate over all threads), at least until the >> queue is bulk drained. Switching over to a one-in-N-out policy could ameliate >> that by batching the synchronizes over several nodes, and also remove the need >> for complete bulk draining. Have min/max queue size and switching between >> insert-only and one-in-N-out policies depending on the current size seems like >> a possible solution. > > Thanks for all the details, I hadn't considered the SMR angle. I'll think about alternatives. In a brief conversation with Kim, where I bemoaned that there should be a simpler solution, he suggested maybe a fixed ring buffer with a xchg to replace the n'th element, like: const uint N = ...; Symbol* volatile _delay_queue[N]; // initialize to nullptr volatile uint _index = 0; void add(Symbol* s) { ... increment refcount for s uint i = Atomic::add(&_index, 1u) % N; Symbol* old = Atomic::xchg(&_delay_queue[i], s); if (old != nullptr) { ... decrement refcount for old, possibly deleting it } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1385484881 From cslucas at openjdk.org Tue Nov 7 20:07:57 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 7 Nov 2023 20:07:57 GMT Subject: RFR: JDK-8316991: Reduce nullable allocation merges In-Reply-To: References: Message-ID: <_qMAFS4KFzO9Wcta53RhpYNHtG3LgRk0Wux_Tjbu9nM=.e92525b9-2719-4d89-a499-9962315e38bc@github.com> On Mon, 16 Oct 2023 09:32:40 GMT, Tobias Hartmann wrote: >> ### Description >> >> Many, if not most, allocation merges (Phis) are nullable because they join object allocations with "NULL", or objects returned from method calls, etc. Please review this Pull Request that improves Reduce Allocation Merge implementation so that it can reduce at least some of these allocation merges. >> >> Overall, the improvements are related to 1) making rematerialization of merges able to represent "NULL" objects, and 2) being able to reduce merges used by CmpP/N and CastPP. >> >> The approach to reducing CmpP/N and CastPP is pretty similar to that used in the `MemNode::split_through_phi` method: a clone of the node being split is added on each input of the Phi. I make use of `optimize_ptr_compare` and some type information to remove redundant CmpP and CastPP nodes. I added a bunch of ASCII diagrams illustrating what some of the more important methods are doing. >> >> ### Benchmarking >> >> **Note:** In some of these tests no reduction happens. I left them in to validate that no perf. regression happens in that case. >> **Note 2:** Marging of error was negligible. >> >> | Benchmark | No RAM (ms/op) | Yes RAM (ms/op) | >> |--------------------------------------|------------------|-------------------| >> | TestTrapAfterMerge | 19.515 | 13.386 | >> | TestArgEscape | 33.165 | 33.254 | >> | TestCallTwoSide | 70.547 | 69.427 | >> | TestCmpAfterMerge | 16.400 | 2.984 | >> | TestCmpMergeWithNull_Second | 27.204 | 27.293 | >> | TestCmpMergeWithNull | 8.248 | 4.920 | >> | TestCondAfterMergeWithAllocate | 12.890 | 5.252 | >> | TestCondAfterMergeWithNull | 6.265 | 5.078 | >> | TestCondLoadAfterMerge | 12.713 | 5.163 | >> | TestConsecutiveSimpleMerge | 30.863 | 4.068 | >> | TestDoubleIfElseMerge | 16.069 | 2.444 | >> | TestEscapeInCallAfterMerge | 23.111 | 22.924 | >> | TestGlobalEscape | 14.459 | 14.425 | >> | TestIfElseInLoop | 246.061 | 42.786 | >> | TestLoadAfterLoopAlias | 45.808 | 45.812 | >> ... > > I'm still seeing the following failures: > > > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/workspace/open/src/hotspot/share/opto/escape.cpp:1299), pid=1574160, tid=1574500 > # assert(false) failed: SafePointScalarMerge nodes can't be nested. > # > # JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-10-12-0503164.tobias.hartmann.jdk2) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 22-internal-2023-10-12-0503164.tobias.hartmann.jdk2, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) > # Problematic frame: > # V [libjvm.so+0xab151c] ConnectionGraph::verify_ram_nodes(Compile*, Node*)+0x6e8 > > Current CompileTask: > C2:39141 8262 ! 4 akka.actor.ActorCell::invokeAll$1 (577 bytes) > > Stack: [0x0000fffea024c000,0x0000fffea044a000], sp=0x0000fffea0444d50, free space=2019k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0xab151c] ConnectionGraph::verify_ram_nodes(Compile*, Node*)+0x6e8 (escape.cpp:1299) > V [libjvm.so+0x90d1e4] Compile::Optimize()+0x744 (compile.cpp:2336) > V [libjvm.so+0x90f098] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1504 (compile.cpp:854) > V [libjvm.so+0x75b12c] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x10c (c2compiler.cpp:130) > V [libjvm.so+0x91b124] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x8e4 (compileBroker.cpp:2282) > V [libjvm.so+0x91bc3c] CompileBroker::compiler_thread_loop()+0x5bc (compileBroker.cpp:1943) > V [libjvm.so+0xdb4bc0] JavaThread::thread_main_inner()+0xec (javaThread.cpp:720) > V [libjvm.so+0x1600764] Thread::call_run()+0xb0 (thread.cpp:220) > V [libjvm.so+0x1368ff8] thread_native_entry(Thread*)+0x138 (os_linux.cpp:785) > C [libc.so.6+0x82a28] start_thread+0x2d4 > > > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/workspace/open/src/hotspot/share/opto/narrowptrnode.cpp:84), pid=3481386, tid=3481478 > # assert(t != TypeNarrowKlass::NULL_PTR) failed: null klass? > # > # JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-10-12-0503164.tobias.hartmann.jdk2) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 22-internal-2023-10-12-0503164.tobias.hartmann.jdk2, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0x140fcf4] DecodeNKlass... @TobiHartmann - I believe I fixed the issues that you reported. Would you mind submitting the tests again? Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15825#issuecomment-1799818609 From dcubed at openjdk.org Tue Nov 7 20:22:11 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 7 Nov 2023 20:22:11 GMT Subject: RFR: 8318694: [JVMCI] disable can_call_java in most contexts for libjvmci compiler threads [v6] In-Reply-To: References: Message-ID: On Wed, 1 Nov 2023 08:00:46 GMT, Doug Simon wrote: > With GraalVM, we're doing a lot more testing with product builds than fastdebug > builds as the majority of checks are done at the Java level and fastdebug just > slows everything down. In that testing context, guarantees are much more useful. > Given the importance of the "can_call_java" invariant, would you agree that > converting these 3 specific assertions to guarantees is justified? Hmmm... I hope this doesn't mean that the GraalVM project has changed the Tier[1-3] task definitions to focus on 'release' bits testing instead of 'fastdebug' bits testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16383#issuecomment-1799922110 From shade at openjdk.org Tue Nov 7 20:22:13 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 7 Nov 2023 20:22:13 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v5] In-Reply-To: References: Message-ID: > See the symptoms, reproducer and analysis in the bug. > > Current code waits on `disarm()`, which effectively stalls leaving the safepoint if some threads lag behind. Having more runnable threads than CPUs nearly guarantees that we would wait for quite some time, but it also reproduces well if you have enough threads near the CPU count. Just waiting at `arm()` is insufficient, but we can have several `Semaphores` to do what we want. > > This PR implements a more efficient `GenericWaitBarrier` to recover the performance. Most of the implementation discussion is in the code comments. The key observation that drives this work is that we want to reuse `Semaphore` and related counters without being stuck waiting for threads to leave. > > (AFAICS, futex-based `LinuxWaitBarrier` does roughly the same, but handles this reuse on futex side, by assigning the "address" per futex.) > > This issue affects everything except Linux. I initially found this on my M1 Mac, but pretty sure it is easy to reproduce on Windows as well. The safepoints from the reproducer in the bug improved dramatically on a Mac. Note not only the orders of magnitude better safepoint times, but also the several times more GC safepoints in the time-bound allocation test, which means the attainable GC throughput is similarly better, since we don't waste time at this wait barrier. > > ![plot-generic-wait-barrier-macos](https://github.com/openjdk/jdk/assets/1858943/3440f200-2360-482a-b8b2-71385cca35b7) > > Additional testing: > - [x] MacOS AArch64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) > - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) > - [x] MacOS AArch64 server fastdebug, `tier2 tier3` > - [x] Linux x86_64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) > - [x] Linux AArch64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Rework paddings ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16404/files - new: https://git.openjdk.org/jdk/pull/16404/files/3cd53c1a..bca446d9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16404&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16404&range=03-04 Stats: 16 lines in 1 file changed: 13 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16404/head:pull/16404 PR: https://git.openjdk.org/jdk/pull/16404 From shade at openjdk.org Tue Nov 7 20:22:18 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 7 Nov 2023 20:22:18 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v4] In-Reply-To: References: Message-ID: On Tue, 7 Nov 2023 12:57:43 GMT, Robbin Ehn wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: >> >> - Encode barrier tag into state, resolving another race condition >> - Simple review feedback fixes: tracking wakeup numbers, reflowing some methods >> - Merge branch 'master' into JDK-8318986-generic-wait-barrier >> - Touchups >> - More comments work >> - Tight up the comments >> - Rework to a single atomic counter per cell >> - Tigthen up memory ordering even more conservatively >> - Fix > > src/hotspot/share/utilities/waitBarrier_generic.hpp line 38: > >> 36: private: >> 37: DEFINE_PAD_MINUS_SIZE(0, DEFAULT_CACHE_LINE_SIZE, 0); >> 38: > > Just reading the padding, it's unclear why the two pads are where they are. > Can you add a comment about why choose these two locations? Wanted to make sure nothing interferes with cells. But now I realize we actually overpad between the cells (due to both pre-cell and post-cell padding), and underpad for the barrier itself! Fixed in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16404#discussion_r1385515426 From dnsimon at openjdk.org Tue Nov 7 21:16:09 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 7 Nov 2023 21:16:09 GMT Subject: RFR: 8318694: [JVMCI] disable can_call_java in most contexts for libjvmci compiler threads [v6] In-Reply-To: References: Message-ID: On Tue, 7 Nov 2023 20:18:49 GMT, Daniel D. Daugherty wrote: > Hmmm... I hope this doesn't mean that the GraalVM project has changed the > Tier[1-3] task definitions to focus on 'release' bits testing instead of 'fastdebug' > bits testing. No, we did not change this in the tier testing. I was referring to the Graal CI. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16383#issuecomment-1800137205 From sviswanathan at openjdk.org Tue Nov 7 21:33:59 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 7 Nov 2023 21:33:59 GMT Subject: RFR: 8319429: Resetting MXCSR flags degrades ecore [v3] In-Reply-To: References: Message-ID: On Tue, 7 Nov 2023 19:51:34 GMT, Volodymyr Paprotski wrote: >> Improves vector rounding on ECore about 10x >> >> (BEFORE) FpRoundingBenchmark.test_round_float 2048 thrpt 3 40.912 ? 0.044 ops/ms >> (AFTER ) FpRoundingBenchmark.test_round_float 2048 thrpt 3 431.682 ? 0.727 ops/ms > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > change option name The PR looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16504#pullrequestreview-1718815034 From dcubed at openjdk.org Tue Nov 7 21:46:06 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 7 Nov 2023 21:46:06 GMT Subject: RFR: 8318694: [JVMCI] disable can_call_java in most contexts for libjvmci compiler threads [v6] In-Reply-To: References: Message-ID: On Tue, 7 Nov 2023 21:13:34 GMT, Doug Simon wrote: > No, we did not change this in the tier testing. I was referring to the Graal CI. Thanks for the info. Does this mean that the Graal CI is running its own Tier[1-8] definitions and not the same Tier[1-8] as the main JDK repo? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16383#issuecomment-1800221785 From dcubed at openjdk.org Tue Nov 7 21:51:09 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 7 Nov 2023 21:51:09 GMT Subject: RFR: 8318694: [JVMCI] disable can_call_java in most contexts for libjvmci compiler threads [v7] In-Reply-To: References: Message-ID: On Wed, 1 Nov 2023 13:33:32 GMT, Doug Simon wrote: >> This PR reduces the context in which a libjvmci CompilerThread can make a Java call. By default, a CompileThread for a JVMCI compiler can make Java calls (as jarjvmci only works that way). When libjvmci calls into the VM via a CompilerToVM native method, it enters a scope where Java calls are disabled until either the call returns or a nested scope is entered that re-enables Java calls. The latter is required for the Truffle use case described in [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) as well as for a few other VM entry points where libgraal currently still requires the ability to make Java calls (e.g. to load the Java classes used for boxing primitives). We may be able to eventually eliminate all need for libgraal to make Java calls but this PR is a first step in the right direction. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > convert assertions about can_call_java to guarantees I meant does that mean that Graal CI is running its own Graal-Tier[1-N] and it not running the same Tier[1-8] definitions as the main JDK repo at all? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16383#issuecomment-1800232912 From dnsimon at openjdk.org Tue Nov 7 21:54:08 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 7 Nov 2023 21:54:08 GMT Subject: RFR: 8318694: [JVMCI] disable can_call_java in most contexts for libjvmci compiler threads [v7] In-Reply-To: References: Message-ID: On Wed, 1 Nov 2023 13:33:32 GMT, Doug Simon wrote: >> This PR reduces the context in which a libjvmci CompilerThread can make a Java call. By default, a CompileThread for a JVMCI compiler can make Java calls (as jarjvmci only works that way). When libjvmci calls into the VM via a CompilerToVM native method, it enters a scope where Java calls are disabled until either the call returns or a nested scope is entered that re-enables Java calls. The latter is required for the Truffle use case described in [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) as well as for a few other VM entry points where libgraal currently still requires the ability to make Java calls (e.g. to load the Java classes used for boxing primitives). We may be able to eventually eliminate all need for libgraal to make Java calls but this PR is a first step in the right direction. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > convert assertions about can_call_java to guarantees The Graal CI system is completely disjoint from mach5-based JDK testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16383#issuecomment-1800238885 From duke at openjdk.org Tue Nov 7 22:31:58 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 7 Nov 2023 22:31:58 GMT Subject: RFR: 8319429: Resetting MXCSR flags degrades ecore [v3] In-Reply-To: References: Message-ID: <3tZC9BQnoJjYWWT6JeJ326v4-p6FgngwyTSb405dT44=.ee2ea95b-528e-45e1-890f-dfabb50495ad@github.com> On Mon, 6 Nov 2023 05:01:51 GMT, David Holmes wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> change option name > > Changes requested by dholmes (Reviewer). @dholmes-ora did you have any further concerns? Would you mind running any extra tests you have? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16504#issuecomment-1800292635 From dholmes at openjdk.org Wed Nov 8 00:30:03 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 8 Nov 2023 00:30:03 GMT Subject: RFR: 8319429: Resetting MXCSR flags degrades ecore [v3] In-Reply-To: References: Message-ID: On Mon, 6 Nov 2023 05:01:51 GMT, David Holmes wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> change option name > > Changes requested by dholmes (Reviewer). > @dholmes-ora did you have any further concerns? Would you mind running any extra tests you have? I have no further concerns - thanks for moving the flag. Someone from our compiler team should review this and run it through our CI. I don't know if we have any machines that will be affected by this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16504#issuecomment-1800726898 From sviswanathan at openjdk.org Wed Nov 8 00:57:00 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 8 Nov 2023 00:57:00 GMT Subject: RFR: 8319429: Resetting MXCSR flags degrades ecore [v3] In-Reply-To: References: Message-ID: On Tue, 7 Nov 2023 19:51:34 GMT, Volodymyr Paprotski wrote: >> Improves vector rounding on ECore about 10x >> >> (BEFORE) FpRoundingBenchmark.test_round_float 2048 thrpt 3 40.912 ? 0.044 ops/ms >> (AFTER ) FpRoundingBenchmark.test_round_float 2048 thrpt 3 431.682 ? 0.727 ops/ms > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > change option name @TobiHartmann @vnkozlov Could we please get one more review on this small PR? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16504#issuecomment-1800820717 From lmesnik at openjdk.org Wed Nov 8 02:08:26 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 8 Nov 2023 02:08:26 GMT Subject: RFR: 8319200: Don't use test thread factory in ProcessTools.createLimitedTestJavaProcessBuilder() [v2] In-Reply-To: <3damdMQpRBrkUN2S32tBD0Tmrl2tmSqA31NniV8FzHU=.d3a36aa7-d5f4-4a63-b2ff-8b9b616a9637@github.com> References: <3damdMQpRBrkUN2S32tBD0Tmrl2tmSqA31NniV8FzHU=.d3a36aa7-d5f4-4a63-b2ff-8b9b616a9637@github.com> Message-ID: > Test thread factory is a mode similar to VM flags and should not be used in ProcessTools.createLimitedTestJavaProcessBuilder(). Only createTestJavaProcessBuilder() should use it like jtreg VM options. > > Adding the test thread factory requires the injection of arguments in the middle of the list. I don't think it makes sense to modify arguments in several places so I replaced it with the flag isLimited and moved all modifications in createJavaProcessBuilder(). > > Testing tier1-5. Leonid Mesnik has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' of https://github.com/openjdk/jdk into 8319200 - param removed - Moved ttf from limited builder. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16442/files - new: https://git.openjdk.org/jdk/pull/16442/files/0dcf3673..db069ace Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16442&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16442&range=00-01 Stats: 31452 lines in 728 files changed: 16977 ins; 6081 del; 8394 mod Patch: https://git.openjdk.org/jdk/pull/16442.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16442/head:pull/16442 PR: https://git.openjdk.org/jdk/pull/16442 From lmesnik at openjdk.org Wed Nov 8 02:33:29 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 8 Nov 2023 02:33:29 GMT Subject: RFR: 8319200: Don't use test thread factory in ProcessTools.createLimitedTestJavaProcessBuilder() [v3] In-Reply-To: <3damdMQpRBrkUN2S32tBD0Tmrl2tmSqA31NniV8FzHU=.d3a36aa7-d5f4-4a63-b2ff-8b9b616a9637@github.com> References: <3damdMQpRBrkUN2S32tBD0Tmrl2tmSqA31NniV8FzHU=.d3a36aa7-d5f4-4a63-b2ff-8b9b616a9637@github.com> Message-ID: > Test thread factory is a mode similar to VM flags and should not be used in ProcessTools.createLimitedTestJavaProcessBuilder(). Only createTestJavaProcessBuilder() should use it like jtreg VM options. > > Adding the test thread factory requires the injection of arguments in the middle of the list. I don't think it makes sense to modify arguments in several places so I replaced it with the flag isLimited and moved all modifications in createJavaProcessBuilder(). > > Testing tier1-5. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: converted list to array. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16442/files - new: https://git.openjdk.org/jdk/pull/16442/files/db069ace..4ba1e85e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16442&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16442&range=01-02 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16442.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16442/head:pull/16442 PR: https://git.openjdk.org/jdk/pull/16442 From amitkumar at openjdk.org Wed Nov 8 04:36:53 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 8 Nov 2023 04:36:53 GMT Subject: RFR: JDK-8241503: C2: Share MacroAssembler between mach nodes during code emission In-Reply-To: References: Message-ID: On Thu, 2 Nov 2023 22:17:43 GMT, Cesar Soares Lucas wrote: > # Description > > Please review this PR with a patch to re-use the same C2_MacroAssembler object to emit all instructions in the same compilation unit. > > Overall, the change is pretty simple. However, due to the renaming of the variable to access C2_MacroAssembler, from `_masm.` to `masm->`, and also some method prototype changes, the patch became quite large. > > # Help Needed for Testing > > I don't have access to all platforms necessary to test this. I hope some other folks can help with testing on `S390`, `RISC-V` and `PPC`. `s390x` also run into assert failure: `assert(masm->inst_mark() == nullptr) failed: should be.` V [libjvm.so+0xfb0938] PhaseOutput::fill_buffer(C2_MacroAssembler*, unsigned int*)+0x2370 (output.cpp:1812) V [libjvm.so+0xfb21ce] PhaseOutput::Output()+0xcae (output.cpp:362) V [libjvm.so+0x6a90a8] Compile::Code_Gen()+0x460 (compile.cpp:2989) V [libjvm.so+0x6ad848] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1738 (compile.cpp:887) V [libjvm.so+0x4fb932] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x14a (c2compiler.cpp:119) V [libjvm.so+0x6b81a2] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xd9a (compileBroker.cpp:2282) V [libjvm.so+0x6b8eaa] CompileBroker::compiler_thread_loop()+0x5a2 (compileBroker.cpp:1943) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16484#issuecomment-1801070622 From lmesnik at openjdk.org Wed Nov 8 05:35:59 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 8 Nov 2023 05:35:59 GMT Subject: RFR: 8319200: Don't use test thread factory in ProcessTools.createLimitedTestJavaProcessBuilder() [v3] In-Reply-To: References: <3damdMQpRBrkUN2S32tBD0Tmrl2tmSqA31NniV8FzHU=.d3a36aa7-d5f4-4a63-b2ff-8b9b616a9637@github.com> Message-ID: On Wed, 8 Nov 2023 02:33:29 GMT, Leonid Mesnik wrote: >> Test thread factory is a mode similar to VM flags and should not be used in ProcessTools.createLimitedTestJavaProcessBuilder(). Only createTestJavaProcessBuilder() should use it like jtreg VM options. >> >> Adding the test thread factory requires the injection of arguments in the middle of the list. I don't think it makes sense to modify arguments in several places so I replaced it with the flag isLimited and moved all modifications in createJavaProcessBuilder(). >> >> Testing tier1-5. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > converted list to array. Moved the threadFactory injection to createTestJavaProcessBuilder. I think that it is more logical to don't set it for limited version of process builder. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16442#issuecomment-1801121241 From dholmes at openjdk.org Wed Nov 8 06:08:57 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 8 Nov 2023 06:08:57 GMT Subject: RFR: JDK-8319437: NMT should show library names in call stacks In-Reply-To: References: Message-ID: On Sun, 5 Nov 2023 06:28:11 GMT, Thomas Stuefe wrote: > With this tiny enhancement, NMT shows library names in callstacks. Seems reasonable. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16508#pullrequestreview-1719434407 From rehn at openjdk.org Wed Nov 8 07:12:59 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 8 Nov 2023 07:12:59 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v4] In-Reply-To: References: Message-ID: <_lLR74_9OIeDPh9sPa4iuMZo71EEExXRFSo6pP-jT9A=.eaeea4b4-75fb-44cf-ba09-6b976da9cbdc@github.com> On Tue, 7 Nov 2023 20:18:11 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/utilities/waitBarrier_generic.hpp line 38: >> >>> 36: private: >>> 37: DEFINE_PAD_MINUS_SIZE(0, DEFAULT_CACHE_LINE_SIZE, 0); >>> 38: >> >> Just reading the padding, it's unclear why the two pads are where they are. >> Can you add a comment about why choose these two locations? > > Wanted to make sure nothing interferes with cells. But now I realize we actually overpad between the cells (due to both pre-cell and post-cell padding), and underpad for the barrier itself! Fixed in new commit. Ok! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16404#discussion_r1386104404 From fyang at openjdk.org Wed Nov 8 07:37:00 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 8 Nov 2023 07:37:00 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v3] In-Reply-To: References: Message-ID: On Tue, 7 Nov 2023 10:00:45 GMT, Hamlin Li wrote: >> Hi, >> Can you review the change to add intrinsic for CompressBits for Long & Integer? >> Thanks! >> >> ##?Test >> pass jtreg test: >> test/jdk/java/lang/CompressExpand*.java > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > Match Op_CompressBits based on UseRVV only Changes requested by fyang (Reviewer). src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1684: > 1682: } > 1683: > 1684: void C2_MacroAssembler::compress_bits_v(Register dst, Register src, Register mask, Register tmp, bool is_long) { Seems that this is quite similar to the implementation of `CompressM` node in `riscv_v.ad` [1] which I think should be more efficient. The only difference is that we only need to move `src` into a vector register beforehand and change to perform `vcpop_m` under the given `mask`. Please consider. [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/riscv_v.ad#L3506-L3518 src/hotspot/cpu/riscv/riscv.ad line 1901: > 1899: > 1900: case Op_EncodeISOArray: > 1901: return UseRVV && SpecialEncodeISOArray; Seems that we can remove this extra check for `SpecialEncodeISOArray` (and related code at [1]) and group those 5 cases (which return UseRVV) together. [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/vm_version_riscv.cpp#L290-L292 src/hotspot/cpu/riscv/riscv_v.ad line 2884: > 2882: > 2883: instruct compressBitsI(iRegINoSp dst, iRegIorL2I src, iRegIorL2I mask, iRegPNoSp tmp, vRegMask_V0 v0, vReg_V4 v4, vReg_V8 v8) %{ > 2884: predicate(UseRVV && (MaxVectorSize >= 16)); Can we simply remote the `(MaxVectorSize >= 16)` condition as that has already been ensured on JVM startup? src/hotspot/cpu/riscv/riscv_v.ad line 2909: > 2907: > 2908: instruct compressBitsL(iRegLNoSp dst, iRegL src, iRegL mask, iRegPNoSp tmp, vRegMask_V0 v0, vReg_V4 v4, vReg_V8 v8) %{ > 2909: predicate(UseRVV && (MaxVectorSize >= 16)); Same as above. ------------- PR Review: https://git.openjdk.org/jdk/pull/16481#pullrequestreview-1719420758 PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1386125552 PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1386040152 PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1386028318 PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1386028471 From fyang at openjdk.org Wed Nov 8 08:06:58 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 8 Nov 2023 08:06:58 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v3] In-Reply-To: References: Message-ID: On Wed, 8 Nov 2023 07:32:34 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Match Op_CompressBits based on UseRVV only > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1684: > >> 1682: } >> 1683: >> 1684: void C2_MacroAssembler::compress_bits_v(Register dst, Register src, Register mask, Register tmp, bool is_long) { > > Seems that this is quite similar to the implementation of `CompressM` node in `riscv_v.ad` [1] which I think should be more efficient. The only difference is that we only need to move `src` into a vector register beforehand and change to perform `vcpop_m` under the given `mask`. Please consider. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/riscv_v.ad#L3506-L3518 Ah, I see the difference now. The active 0 bits are still kept back here in this case [1]. [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/intrinsicnode.cpp#L300-L310 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1386157421 From thartmann at openjdk.org Wed Nov 8 08:57:02 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 8 Nov 2023 08:57:02 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v3] In-Reply-To: <4kP-cNv7NVLPCLIvtNtZodu7KqWukhs_tMlG8pQugF0=.6c18c065-da2d-4c87-82e7-0dbc257ac7f1@github.com> References: <4kP-cNv7NVLPCLIvtNtZodu7KqWukhs_tMlG8pQugF0=.6c18c065-da2d-4c87-82e7-0dbc257ac7f1@github.com> Message-ID: On Mon, 6 Nov 2023 13:12:25 GMT, Jorn Vernee wrote: >> test/hotspot/jtreg/compiler/c2/TestExHandlerTrap.java line 37: >> >>> 35: * -Xbatch >>> 36: * -Xlog:deoptimization=trace >>> 37: * -XX:CompileCommand=PrintCompilation,compiler.c2.TestExHandlerTrap::payload >> >> Should the logging/printing be removed? > > I've left it since the output is pretty minimal, and it seems like it might be useful in case the test ever fails in CI? Okay, makes sense. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16416#discussion_r1386221047 From thartmann at openjdk.org Wed Nov 8 09:05:02 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 8 Nov 2023 09:05:02 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v3] In-Reply-To: References: Message-ID: On Mon, 6 Nov 2023 14:36:21 GMT, Jorn Vernee wrote: > Do note that a handful of compiler tests fail with -XX:+StressPrunedExceptionHandlers since they test for a very particular sequence of compilation and deoptimization, and the stress option introduces more deoptimizations. Right, we had similar issues with other stress flags and would either exclude the tests or make them more robust. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1801360315 From thartmann at openjdk.org Wed Nov 8 09:17:00 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 8 Nov 2023 09:17:00 GMT Subject: RFR: 8319429: Resetting MXCSR flags degrades ecore [v3] In-Reply-To: References: Message-ID: On Tue, 7 Nov 2023 19:51:34 GMT, Volodymyr Paprotski wrote: >> Improves vector rounding on ECore about 10x >> >> (BEFORE) FpRoundingBenchmark.test_round_float 2048 thrpt 3 40.912 ? 0.044 ops/ms >> (AFTER ) FpRoundingBenchmark.test_round_float 2048 thrpt 3 431.682 ? 0.727 ops/ms > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > change option name Looks reasonable to me. I'll run it through our testing and report back. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16504#pullrequestreview-1719761547 From mli at openjdk.org Wed Nov 8 09:18:16 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 8 Nov 2023 09:18:16 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v4] In-Reply-To: References: Message-ID: <5cdoDtF7dDZKYK7SGicIUoMg-DSgYrT09qREo8qKvhY=.db32e2ca-dfa5-4f0e-8ca1-891070f97383@github.com> > Hi, > Can you review the change to add intrinsic for CompressBits for Long & Integer? > Thanks! > > ##?Test > pass jtreg test: > test/jdk/java/lang/CompressExpand*.java Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Remove redundant check - Merge branch 'master' into compress-bits - Match Op_CompressBits based on UseRVV only - remove the new vm option, using Matcher::match_rule_supported instead; move code to riscv_v.ad and C2_MacroAssembler - Initial commit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16481/files - new: https://git.openjdk.org/jdk/pull/16481/files/b5eed0ec..3b256e25 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16481&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16481&range=02-03 Stats: 27499 lines in 621 files changed: 14092 ins; 5448 del; 7959 mod Patch: https://git.openjdk.org/jdk/pull/16481.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16481/head:pull/16481 PR: https://git.openjdk.org/jdk/pull/16481 From mli at openjdk.org Wed Nov 8 09:18:19 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 8 Nov 2023 09:18:19 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v3] In-Reply-To: References: Message-ID: On Wed, 8 Nov 2023 06:09:59 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Match Op_CompressBits based on UseRVV only > > src/hotspot/cpu/riscv/riscv.ad line 1901: > >> 1899: >> 1900: case Op_EncodeISOArray: >> 1901: return UseRVV && SpecialEncodeISOArray; > > Seems that we can remove this extra check for `SpecialEncodeISOArray` (and related code at [1]) and group those 5 cases (which return UseRVV) together. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/vm_version_riscv.cpp#L290-L292 I will do it in another pr later. > src/hotspot/cpu/riscv/riscv_v.ad line 2909: > >> 2907: >> 2908: instruct compressBitsL(iRegLNoSp dst, iRegL src, iRegL mask, iRegPNoSp tmp, vRegMask_V0 v0, vReg_V4 v4, vReg_V8 v8) %{ >> 2909: predicate(UseRVV && (MaxVectorSize >= 16)); > > Same as above. Yes, seems it's redundant to check it here again, as we already have a check in Matcher::match_rule_supported where it's necessary to avoid silent failure in case restrictions on MaxVectorSize is changed in the future. It's removed here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1386247964 PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1386249473 From ayang at openjdk.org Wed Nov 8 09:41:57 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 8 Nov 2023 09:41:57 GMT Subject: RFR: 8319456: jdk/jfr/event/gc/collection/TestGCCauseWith[Serial|Parallel].java : GC cause 'GCLocker Initiated GC' not in the valid causes In-Reply-To: <3vf4CilrHIDJRN6KbtezqVAz3YBMkyEiy8syWXesfvI=.99460479-ef0e-4d30-b918-2c2542ff2a2b@github.com> References: <3vf4CilrHIDJRN6KbtezqVAz3YBMkyEiy8syWXesfvI=.99460479-ef0e-4d30-b918-2c2542ff2a2b@github.com> Message-ID: On Tue, 7 Nov 2023 14:02:37 GMT, Thomas Schatzl wrote: > Hi all, > > please review these fixes to the `jdk/jfr/event/gc/collection/TestGCCauseWith[Serial|Parallel].java` tests that fail if the GC cause has been "GCLocker Initiated GC". > > This is a valid gc cause, just extremely rare (interestingly the corresponding G1 tests added it). > > Thanks, > Thomas Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16542#pullrequestreview-1719819307 From iwalulya at openjdk.org Wed Nov 8 09:50:57 2023 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 8 Nov 2023 09:50:57 GMT Subject: RFR: 8319456: jdk/jfr/event/gc/collection/TestGCCauseWith[Serial|Parallel].java : GC cause 'GCLocker Initiated GC' not in the valid causes In-Reply-To: <3vf4CilrHIDJRN6KbtezqVAz3YBMkyEiy8syWXesfvI=.99460479-ef0e-4d30-b918-2c2542ff2a2b@github.com> References: <3vf4CilrHIDJRN6KbtezqVAz3YBMkyEiy8syWXesfvI=.99460479-ef0e-4d30-b918-2c2542ff2a2b@github.com> Message-ID: On Tue, 7 Nov 2023 14:02:37 GMT, Thomas Schatzl wrote: > Hi all, > > please review these fixes to the `jdk/jfr/event/gc/collection/TestGCCauseWith[Serial|Parallel].java` tests that fail if the GC cause has been "GCLocker Initiated GC". > > This is a valid gc cause, just extremely rare (interestingly the corresponding G1 tests added it). > > Thanks, > Thomas Marked as reviewed by iwalulya (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16542#pullrequestreview-1719836277 From shade at openjdk.org Wed Nov 8 10:43:03 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 8 Nov 2023 10:43:03 GMT Subject: RFR: JDK-8314890: Reduce number of loads for Klass decoding in static code [v9] In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 16:11:04 GMT, Thomas Stuefe wrote: >> Small change that reduces the number of loads generated by the C++ compiler for a narrow Klass decoding operation (`CompressedKlassPointers::decode_xxx()`. >> >> Stock: three loads (with two probably sharing a cache line) - UseCompressedClassPointers, encoding base and shift. >> >> >> 8b7b62: 48 8d 05 7f 1b c3 00 lea 0xc31b7f(%rip),%rax # 14e96e8 >> 8b7b69: 0f b6 00 movzbl (%rax),%eax >> 8b7b6c: 84 c0 test %al,%al >> 8b7b6e: 0f 84 9c 00 00 00 je 8b7c10 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> >> 8b7b74: 48 8d 15 05 62 c6 00 lea 0xc66205(%rip),%rdx # 151dd80 <_ZN23CompressedKlassPointers6_shiftE> >> 8b7b7b: 8b 7b 08 mov 0x8(%rbx),%edi >> 8b7b7e: 8b 0a mov (%rdx),%ecx >> 8b7b80: 48 8d 15 01 62 c6 00 lea 0xc66201(%rip),%rdx # 151dd88 <_ZN23CompressedKlassPointers5_baseE> >> 8b7b87: 48 d3 e7 shl %cl,%rdi >> 8b7b8a: 48 03 3a add (%rdx),%rdi >> >> >> Patched: one load loads all three. Since shift occupies the lowest 8 bits, compiled code uses 8bit register; ditto the UseCompressedOops flag. >> >> >> 8ba302: 48 8d 05 97 9c c2 00 lea 0xc29c97(%rip),%rax # 14e3fa0 <_ZN23CompressedKlassPointers6_comboE> >> 8ba309: 48 8b 08 mov (%rax),%rcx >> 8ba30c: f6 c5 01 test $0x1,%ch # use compressed klass pointers? >> 8ba30f: 0f 84 9b 00 00 00 je 8ba3b0 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> >> 8ba315: 8b 7b 08 mov 0x8(%rbx),%edi >> 8ba318: 48 d3 e7 shl %cl,%rdi # shift >> 8ba31b: 66 31 c9 xor %cx,%cx # zero out lower 16 bits of base >> 8ba31e: 48 01 cf add %rcx,%rdi # add base >> 8ba321: 8b 4f 08 mov 0x8(%rdi),%ecx >> >> --- >> >> Performance measurements: >> >> G1, doing a full GC over a heap filled with 256 mio life j.l.Object instances. >> >> I see a reduction of Full Pause times between 1.2% and 5%. I am unsure how reliable these numbers are since, despite my efforts (running tests on isolated CPUs etc.), the standard deviation was quite high at ?4%. Still, in general, numbers seemed to go down rather than up. >> >> --- >> >> Future extensions: >> >> This patch uses the fact that the encoding base is aligned to metaspace reser... > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - Renamed _combo > - Merge branch 'master' into optimize-narrow-klass-decoding-in-c++ > - simplify assert > - add comment > - Update src/hotspot/share/oops/compressedKlass.hpp > > Co-authored-by: Aleksey Shipil?v > - Update src/hotspot/share/oops/compressedKlass.cpp > > Co-authored-by: Aleksey Shipil?v > - Update src/hotspot/share/oops/compressedKlass.cpp > > Co-authored-by: Aleksey Shipil?v > - Update src/hotspot/share/oops/compressedKlass.cpp > > Co-authored-by: Aleksey Shipil?v > - Merge branch 'openjdk:master' into optimize-narrow-klass-decoding-in-c++ > - Merge branch 'openjdk:master' into optimize-narrow-klass-decoding-in-c++ > - ... and 6 more: https://git.openjdk.org/jdk/compare/9864951d...56cde2a9 Looks okay, but I still have a few questions. src/hotspot/share/oops/compressedKlass.cpp line 36: > 34: size_t CompressedKlassPointers::_range = 0; > 35: // Note: initialization value is unchanged for -UseCompressedClassPointers, so > 36: // the bit mirroring UseCompressedClassPointers is off and maches the switch. Suggestion: // the bit mirroring UseCompressedClassPointers is off and matches the switch. src/hotspot/share/oops/compressedKlass.cpp line 45: > 43: assert(theshift == 0 || theshift == LogKlassAlignmentInBytes, "invalid shift for klass ptrs"); > 44: _base = thebase; > 45: _shift = theshift; Do we even need `_base` and `_shift` as separate fields after this change then? src/hotspot/share/oops/compressedKlass.hpp line 67: > 65: // - Bit [0-7] shift > 66: // - Bit 8 UseCompressedClassPointers > 67: // - Bits [16-64] the base. Suggestion: // - Bit [0-7] shift // - Bit 8 UseCompressedClassPointers // - Bits [16-64] heap base src/hotspot/share/oops/compressedKlass.hpp line 68: > 66: // - Bit 8 UseCompressedClassPointers > 67: // - Bits [16-64] the base. > 68: static uint64_t _compressionInfo; C++ style: Suggestion: static uint64_t _compression_info; ------------- PR Review: https://git.openjdk.org/jdk/pull/15389#pullrequestreview-1719981601 PR Review Comment: https://git.openjdk.org/jdk/pull/15389#discussion_r1386397717 PR Review Comment: https://git.openjdk.org/jdk/pull/15389#discussion_r1386414395 PR Review Comment: https://git.openjdk.org/jdk/pull/15389#discussion_r1386416204 PR Review Comment: https://git.openjdk.org/jdk/pull/15389#discussion_r1386415450 From ogillespie at openjdk.org Wed Nov 8 10:57:05 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Wed, 8 Nov 2023 10:57:05 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v4] In-Reply-To: References: <6lajvS2wTUMLb-JbqH-30AQB509F6jRG0FuMmrGY3gs=.9b83d6eb-babe-42ca-b628-aaea06323f4d@github.com> Message-ID: On Tue, 7 Nov 2023 20:00:17 GMT, Coleen Phillimore wrote: >> Thanks for all the details, I hadn't considered the SMR angle. I'll think about alternatives. > > In a brief conversation with Kim, where I bemoaned that there should be a simpler solution, he suggested maybe a fixed ring buffer with a xchg to replace the n'th element, like: > > > const uint N = ...; > Symbol* volatile _delay_queue[N]; // initialize to nullptr > volatile uint _index = 0; > > void add(Symbol* s) { > ... increment refcount for s > uint i = Atomic::add(&_index, 1u) % N; > Symbol* old = Atomic::xchg(&_delay_queue[i], s); > if (old != nullptr) { > ... decrement refcount for old, possibly deleting it > } > } ? no more NonblockingQueue! I think this is great, thanks, I will try it out today. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1386435934 From jsjolen at openjdk.org Wed Nov 8 13:34:12 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 8 Nov 2023 13:34:12 GMT Subject: RFR: 8319709: Make GrowableArrayCHeap copyable Message-ID: <2SEJ0Rh7DNmKgcylAW7_DFxas2Bs3YzTnUSe39OIVsI=.03298520-694f-4ba7-bdce-d1e67eb3872e@github.com> Hi, Please consider this code change which makes `GrowableArrayCHeap` copyable. The resulting copy does not share its data array with the original. ------------- Commit messages: - Rename testing struct - Copy assignment and copy constructor for GrowableArrayCHeap Changes: https://git.openjdk.org/jdk/pull/16559/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16559&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319709 Stats: 88 lines in 2 files changed: 85 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16559.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16559/head:pull/16559 PR: https://git.openjdk.org/jdk/pull/16559 From jsjolen at openjdk.org Wed Nov 8 13:34:15 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 8 Nov 2023 13:34:15 GMT Subject: RFR: 8319709: Make GrowableArrayCHeap copyable In-Reply-To: <2SEJ0Rh7DNmKgcylAW7_DFxas2Bs3YzTnUSe39OIVsI=.03298520-694f-4ba7-bdce-d1e67eb3872e@github.com> References: <2SEJ0Rh7DNmKgcylAW7_DFxas2Bs3YzTnUSe39OIVsI=.03298520-694f-4ba7-bdce-d1e67eb3872e@github.com> Message-ID: <2mmxSB9NCTeBw-M1_O7yuovZTmS0Qf_n_HMOILA0YqA=.9a8a6060-7b39-4043-8eab-6235dfe8dd19@github.com> On Wed, 8 Nov 2023 13:25:00 GMT, Johan Sj?len wrote: > Hi, > > Please consider this code change which makes `GrowableArrayCHeap` copyable. The resulting copy does not share its data array with the original. src/hotspot/share/utilities/growableArray.hpp line 370: > 368: protected: > 369: void grow(int j); > 370: Needed to make the method visible to subclasses. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16559#discussion_r1386617947 From stefank at openjdk.org Wed Nov 8 13:34:24 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 8 Nov 2023 13:34:24 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v5] In-Reply-To: References: Message-ID: > A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Rename monitors_iterate ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16519/files - new: https://git.openjdk.org/jdk/pull/16519/files/305b0567..2180a0c9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16519&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16519&range=03-04 Stats: 12 lines in 4 files changed: 0 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/16519.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16519/head:pull/16519 PR: https://git.openjdk.org/jdk/pull/16519 From tschatzl at openjdk.org Wed Nov 8 13:54:11 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 8 Nov 2023 13:54:11 GMT Subject: RFR: 8319456: jdk/jfr/event/gc/collection/TestGCCauseWith[Serial|Parallel].java : GC cause 'GCLocker Initiated GC' not in the valid causes In-Reply-To: References: <3vf4CilrHIDJRN6KbtezqVAz3YBMkyEiy8syWXesfvI=.99460479-ef0e-4d30-b918-2c2542ff2a2b@github.com> Message-ID: On Wed, 8 Nov 2023 09:39:41 GMT, Albert Mingkun Yang wrote: >> Hi all, >> >> please review these fixes to the `jdk/jfr/event/gc/collection/TestGCCauseWith[Serial|Parallel].java` tests that fail if the GC cause has been "GCLocker Initiated GC". >> >> This is a valid gc cause, just extremely rare (interestingly the corresponding G1 tests added it). >> >> Thanks, >> Thomas > > Marked as reviewed by ayang (Reviewer). Thanks @albertnetymk @walulyai for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/16542#issuecomment-1801928063 From tschatzl at openjdk.org Wed Nov 8 13:54:13 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 8 Nov 2023 13:54:13 GMT Subject: Integrated: 8319456: jdk/jfr/event/gc/collection/TestGCCauseWith[Serial|Parallel].java : GC cause 'GCLocker Initiated GC' not in the valid causes In-Reply-To: <3vf4CilrHIDJRN6KbtezqVAz3YBMkyEiy8syWXesfvI=.99460479-ef0e-4d30-b918-2c2542ff2a2b@github.com> References: <3vf4CilrHIDJRN6KbtezqVAz3YBMkyEiy8syWXesfvI=.99460479-ef0e-4d30-b918-2c2542ff2a2b@github.com> Message-ID: On Tue, 7 Nov 2023 14:02:37 GMT, Thomas Schatzl wrote: > Hi all, > > please review these fixes to the `jdk/jfr/event/gc/collection/TestGCCauseWith[Serial|Parallel].java` tests that fail if the GC cause has been "GCLocker Initiated GC". > > This is a valid gc cause, just extremely rare (interestingly the corresponding G1 tests added it). > > Thanks, > Thomas This pull request has now been integrated. Changeset: 7c7f8ea3 Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/7c7f8ea30da7fe552bcd4f2b593fa9aad27dcdb4 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod 8319456: jdk/jfr/event/gc/collection/TestGCCauseWith[Serial|Parallel].java : GC cause 'GCLocker Initiated GC' not in the valid causes Reviewed-by: ayang, iwalulya ------------- PR: https://git.openjdk.org/jdk/pull/16542 From jsjolen at openjdk.org Wed Nov 8 14:08:15 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 8 Nov 2023 14:08:15 GMT Subject: RFR: 8319709: Make GrowableArrayCHeap copyable [v2] In-Reply-To: <2SEJ0Rh7DNmKgcylAW7_DFxas2Bs3YzTnUSe39OIVsI=.03298520-694f-4ba7-bdce-d1e67eb3872e@github.com> References: <2SEJ0Rh7DNmKgcylAW7_DFxas2Bs3YzTnUSe39OIVsI=.03298520-694f-4ba7-bdce-d1e67eb3872e@github.com> Message-ID: <0UAh881Jw6L5YNbClDQmuE_Q6fzv0ayeqkrblIoigZ8=.5d81b8a8-d04c-48cb-8987-f3fba98ac403@github.com> > Hi, > > Please consider this code change which makes `GrowableArrayCHeap` copyable. The resulting copy does not share its data array with the original. Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Assignment operator takes const reference ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16559/files - new: https://git.openjdk.org/jdk/pull/16559/files/d6166077..c0de10a5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16559&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16559&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16559.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16559/head:pull/16559 PR: https://git.openjdk.org/jdk/pull/16559 From ogillespie at openjdk.org Wed Nov 8 14:25:17 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Wed, 8 Nov 2023 14:25:17 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v5] In-Reply-To: References: Message-ID: > Attempt to fix regressions in class-loading performance caused by fixing a symbol table leak in [JDK-8313678](https://bugs.openjdk.org/browse/JDK-8313678). > > See lengthy discussion in https://bugs.openjdk.org/browse/JDK-8315559 for more background. In short, the leak was providing an accidental cache for temporary symbols, allowing reuse. > > This change keeps new temporary symbols alive in a queue for a short time, allowing them to be re-used by subsequent operations. For example, when attempting to load a class we may call JVM_FindLoadedClass for multiple classloaders back-to-back, and all of them will create a TempNewSymbol for the same string. At present, each call will leave a dead symbol in the table and create a new one. Dead symbols add cleanup and lookup overhead, and insertion is also costly. With this change, the symbol from the first invocation will live for some time after it is used, and subsequent callers can find the symbol alive in the table - avoiding the extra work. > > The queue is bounded, and when full new entries displace the oldest entry. This means symbols are held for the time it takes for 100 new temp symbols to be created. 100 is chosen arbitrarily - the tradeoff is memory usage versus 'cache' hit rate. > > When concurrent symbol table cleanup runs, it also drains the queue. > > In my testing, this brings Dacapo pmd performance back to where it was before the leak was fixed. > > Thanks @shipilev , @coleenp and @MDBijman for helping with this fix. Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: Switch to a ringbuffer instead of NonblockingQueue ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16398/files - new: https://git.openjdk.org/jdk/pull/16398/files/1cc810df..fb366040 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16398&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16398&range=03-04 Stats: 76 lines in 4 files changed: 21 ins; 28 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/16398.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16398/head:pull/16398 PR: https://git.openjdk.org/jdk/pull/16398 From sjohanss at openjdk.org Wed Nov 8 14:49:00 2023 From: sjohanss at openjdk.org (Stefan Johansson) Date: Wed, 8 Nov 2023 14:49:00 GMT Subject: RFR: 8318706: Implement JEP 423: Region Pinning for G1 [v9] In-Reply-To: References: Message-ID: <6_MNWrkq_LnvB4EqrBFNo6e6kVHLgffcRbkxhRPzTpg=.1a05d8af-b181-4e80-9922-6aea9a517564@github.com> On Fri, 3 Nov 2023 14:14:36 GMT, Albert Mingkun Yang wrote: >> Parsing the separate components is easier :) Not sure if these tags in any way ever indicated some level of abstraction. >> >> I do not have a strong opinion here. The combinations >> >> (Pinned) >> (Allocation Failure) >> (Pinned + Allocation Failure) // or the other way around, or some other symbol for "+" or no symbol at all? >> >> are fine with me (and I thought about doing something more elaborate here), but my concern has been that any complicated string makes it less unique (e.g. `(Allocation Failure)` vs. "Allocation Failure") and adds code both to implement and parse the result. >> >> Much more disrupting is likely that there is no "Evacuation Failure" string any more. But log messages are not part of the external interface, and we should not want to change them just because. > > The example looks good to me. Have the final output looking something like this was agreed on during internal discussion: GC(6) Pause Young (Normal) (Evacuation Failure: Pinned) 1M->1M(22M) 36.16ms GC(6) Pause Young (Normal) (Evacuation Failure: Allocation) 1M->1M(22M) 36.16ms GC(6) Pause Young (Normal) (Evacuation Failure: Allocation / Pinned) 1M->1M(22M) 36.16ms ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1386736860 From rehn at openjdk.org Wed Nov 8 14:57:30 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 8 Nov 2023 14:57:30 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 Message-ID: Hi, please consider. Main author is @luhenry, I only fixed some minor things and tested it. Such as: test/hotspot/jtreg/compiler/intrinsics/sha/ test/jdk/java/security/MessageDigest/ test/jdk/jdk/security/ tier1 And still running some test. ------------- Commit messages: - SHA-2 Changes: https://git.openjdk.org/jdk/pull/16562/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16562&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319716 Stats: 1015 lines in 5 files changed: 1009 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16562.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16562/head:pull/16562 PR: https://git.openjdk.org/jdk/pull/16562 From sspitsyn at openjdk.org Wed Nov 8 16:02:03 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 8 Nov 2023 16:02:03 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v4] In-Reply-To: References: <6vZHqAagtrtDw-c9xNZNpaBCa1HrpYw22uhYtDTHwAw=.3a59dde0-ab33-43b8-a08f-d481e7acffef@github.com> Message-ID: On Tue, 7 Nov 2023 14:54:05 GMT, Patricio Chilano Mateo wrote: > Hi Serguei, > Looks good to me, nice code consolidation. Hi Patricio, thank you a lot for reviewing this! > src/hotspot/share/prims/jvmtiEnvBase.cpp line 1978: > >> 1976: } >> 1977: if (target_jt == nullptr) { // unmounted virtual thread >> 1978: hs_cl->do_vthread(target_h); // execute handshake closure callback on current thread directly > > I think comment should be: s/current thread/unmounted vthread Thank you for the comment but I'm not sure what do you mean. If target virtual thread is unmounted we execute the hs_cl callback on current thread. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16460#issuecomment-1802188368 PR Review Comment: https://git.openjdk.org/jdk/pull/16460#discussion_r1386857320 From sspitsyn at openjdk.org Wed Nov 8 16:05:03 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 8 Nov 2023 16:05:03 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v4] In-Reply-To: References: <6vZHqAagtrtDw-c9xNZNpaBCa1HrpYw22uhYtDTHwAw=.3a59dde0-ab33-43b8-a08f-d481e7acffef@github.com> Message-ID: On Tue, 7 Nov 2023 14:23:34 GMT, Patricio Chilano Mateo wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: get rid of the VM_HandshakeUnmountedVirtualThread > > src/hotspot/share/prims/jvmtiEnvBase.cpp line 2416: > >> 2414: if (!JvmtiEnvBase::is_vthread_alive(_target_h())) { >> 2415: return; // JVMTI_ERROR_THREAD_NOT_ALIVE (default) >> 2416: } > > Don't we have this check already in JvmtiHandshake::execute()? Same with the other converted functions. Good suggestion, thanks. I'm a little bit paranoid about terminated vthreads. :) Will try to get rid and retest all tiers. > src/hotspot/share/prims/jvmtiEnvBase.hpp line 490: > >> 488: class JvmtiHandshake : public Handshake { >> 489: protected: >> 490: static bool is_vthread_handshake_safe(JavaThread* thread, oop vt); > > Not defined, leftover? Good catch, thanks! Will remove it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16460#discussion_r1386861657 PR Review Comment: https://git.openjdk.org/jdk/pull/16460#discussion_r1386862308 From sspitsyn at openjdk.org Wed Nov 8 16:14:01 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 8 Nov 2023 16:14:01 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v4] In-Reply-To: References: <6vZHqAagtrtDw-c9xNZNpaBCa1HrpYw22uhYtDTHwAw=.3a59dde0-ab33-43b8-a08f-d481e7acffef@github.com> Message-ID: <2F8ze1cLKzvTMEqwY8JJMZ9QbZUxqrMCv7nl6uFJLMI=.7c4dba0a-8c8d-4825-9670-75b76b9bf184@github.com> On Tue, 7 Nov 2023 14:44:52 GMT, Patricio Chilano Mateo wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: get rid of the VM_HandshakeUnmountedVirtualThread > > src/hotspot/share/prims/jvmtiEnvBase.cpp line 1974: > >> 1972: >> 1973: if (java_lang_VirtualThread::is_instance(target_h())) { // virtual thread >> 1974: if (!JvmtiEnvBase::is_vthread_alive(target_h())) { > > There is only one issue I see in how this check is implemented and the removal of the VM_op for unmounted vthreads. The change of state to TERMINATED happens after notifyJvmtiUnmount(), i.e we can see that this vthread is alive here but a check later can return is not. This might hit the assert in JvmtiEnvBase::get_vthread_jvf() (maybe this the issue you saw on your first prototype). We can either change that order at the Java level, or maybe better change this function to read the state and add a case where if the state is RUNNING check whether the continuation is done or not (jdk_internal_vm_Continuation::done(cont)). Thank you for the suggestion. Will check it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16460#discussion_r1386875290 From dfenacci at openjdk.org Wed Nov 8 16:36:01 2023 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 8 Nov 2023 16:36:01 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs In-Reply-To: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: On Mon, 30 Oct 2023 18:34:44 GMT, Roger Riggs wrote: > Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. > The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. > > The changes include: > > - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate > if the input array is modified before the constructor returns. > The resulting string may contain any combination of characters sampled from the input array. > > - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. > For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. > If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. > If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. > Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. > Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. > > - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. > > - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. It would also be good to have the **aarch64** and **x64** intrinsics properly reviewed. @theRealAph could I ask you to have a look at **aarch64** and @TobiHartmann at **x64** please? (you seem to be the last ones that made major changes in the intrinsics) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16425#issuecomment-1802251889 From omikhaltcova at openjdk.org Wed Nov 8 16:36:59 2023 From: omikhaltcova at openjdk.org (Olga Mikhaltsova) Date: Wed, 8 Nov 2023 16:36:59 GMT Subject: RFR: 8318158: RISC-V: implement roundD/roundF intrinsics In-Reply-To: References: Message-ID: On Thu, 26 Oct 2023 17:20:49 GMT, Olga Mikhaltsova wrote: > Please, review this Implementation of the roundD/roundF intrinsics for RISC-V platform. > As shown below the output for RISC-V instructions and Java methods differs only for NaN argument. > > RISC-V Java > (FCVT.W.S) (FCVT.L.D) (long round(double a)) (int round(float a)) > Minimum valid input (after rounding) ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE > Maximum valid input (after rounding) 2^31 ? 1 2^63 ? 1 Long.MAX_VALUE Integer.MAX_VALUE > Output for out-of-range negative input ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE > Output for ?? ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE > Output for out-of-range positive input 2^31 ? 1 2^63 - 1 Long.MAX_VALUE Integer.MAX_VALUE > Output for +? 2^31 ? 1 2^63 - 1 Long.MAX_VALUE Integer.MAX_VALUE > Output for NaN 2^31 ? 1 2^63 - 1 0 0 > > The benchmark shows the following performance improvement: > > **Before** > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.test_round_double 2048 thrpt 15 4.675 ? 0.259 ops/ms > FpRoundingBenchmark.test_round_float 2048 thrpt 15 4.549 ? 0.210 ops/ms > > > **After** > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.test_round_double 2048 thrpt 15 10.483 ? 0.681 ops/ms > FpRoundingBenchmark.test_round_float 2048 thrpt 15 10.475 ? 0.480 ops/ms > > > Testing: tier1 tests successfully passed on a RISC-V HiFive board with Linux. Yes, you are both right, this is incorrect implementation. I compared the output of the assembler instructions fcvt.w.s/fcvt.l.d and Java Math.round(), paying attention to the range mentioned above. The results are different. Thank you for pointing me out this mistake! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16382#issuecomment-1802252694 From omikhaltcova at openjdk.org Wed Nov 8 16:46:15 2023 From: omikhaltcova at openjdk.org (Olga Mikhaltsova) Date: Wed, 8 Nov 2023 16:46:15 GMT Subject: RFR: 8318158: RISC-V: implement roundD/roundF intrinsics [v2] In-Reply-To: References: Message-ID: > Please, review this Implementation of the roundD/roundF intrinsics for RISC-V platform. > As shown below the output for RISC-V instructions and Java methods differs only for NaN argument. > > RISC-V Java > (FCVT.W.S) (FCVT.L.D) (long round(double a)) (int round(float a)) > Minimum valid input (after rounding) ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE > Maximum valid input (after rounding) 2^31 ? 1 2^63 ? 1 Long.MAX_VALUE Integer.MAX_VALUE > Output for out-of-range negative input ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE > Output for ?? ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE > Output for out-of-range positive input 2^31 ? 1 2^63 - 1 Long.MAX_VALUE Integer.MAX_VALUE > Output for +? 2^31 ? 1 2^63 - 1 Long.MAX_VALUE Integer.MAX_VALUE > Output for NaN 2^31 ? 1 2^63 - 1 0 0 > > The benchmark shows the following performance improvement: > > **Before** > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.test_round_double 2048 thrpt 15 4.675 ? 0.259 ops/ms > FpRoundingBenchmark.test_round_float 2048 thrpt 15 4.549 ? 0.210 ops/ms > > > **After** > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.test_round_double 2048 thrpt 15 10.483 ? 0.681 ops/ms > FpRoundingBenchmark.test_round_float 2048 thrpt 15 10.475 ? 0.480 ops/ms > > > Testing: tier1 tests successfully passed on a RISC-V HiFive board with Linux. Olga Mikhaltsova has updated the pull request incrementally with one additional commit since the last revision: Fixed intrinsics implementation. Reverted changes of FCVT_SAFE. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16382/files - new: https://git.openjdk.org/jdk/pull/16382/files/d8524ff9..13a65e48 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16382&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16382&range=00-01 Stats: 66 lines in 3 files changed: 53 ins; 1 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/16382.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16382/head:pull/16382 PR: https://git.openjdk.org/jdk/pull/16382 From cslucas at openjdk.org Wed Nov 8 16:51:00 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 8 Nov 2023 16:51:00 GMT Subject: RFR: JDK-8241503: C2: Share MacroAssembler between mach nodes during code emission In-Reply-To: References: Message-ID: On Thu, 2 Nov 2023 23:21:23 GMT, Martin Doerr wrote: >> # Description >> >> Please review this PR with a patch to re-use the same C2_MacroAssembler object to emit all instructions in the same compilation unit. >> >> Overall, the change is pretty simple. However, due to the renaming of the variable to access C2_MacroAssembler, from `_masm.` to `masm->`, and also some method prototype changes, the patch became quite large. >> >> # Help Needed for Testing >> >> I don't have access to all platforms necessary to test this. I hope some other folks can help with testing on `S390`, `RISC-V` and `PPC`. > > PPC64 runs into assert(masm->inst_mark() == nullptr) failed: should be. > V [libjvm.so+0x1648528] PhaseOutput::fill_buffer(C2_MacroAssembler*, unsigned int*)+0x10c8 (output.cpp:1812) > V [libjvm.so+0x164b35c] PhaseOutput::Output()+0xd5c (output.cpp:362) > V [libjvm.so+0x958f9c] Compile::Code_Gen()+0x4ec (compile.cpp:2989) > V [libjvm.so+0x95e484] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1a84 (compile.cpp:887) > V [libjvm.so+0x718f58] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x198 (c2compiler.cpp:119) Thank you for helping with test @TheRealMDoerr @offamitkumar @RealFYang . I working on an update and I'll push it today or soon after. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16484#issuecomment-1802277112 From rgiulietti at openjdk.org Wed Nov 8 18:40:07 2023 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Wed, 8 Nov 2023 18:40:07 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs In-Reply-To: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: On Mon, 30 Oct 2023 18:34:44 GMT, Roger Riggs wrote: > Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. > The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. > > The changes include: > > - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate > if the input array is modified before the constructor returns. > The resulting string may contain any combination of characters sampled from the input array. > > - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. > For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. > If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. > If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. > Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. > Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. > > - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. > > - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. First take ;-) More will follow. src/java.base/share/classes/java/lang/String.java line 602: > 600: } > 601: this.value = utf16; > 602: this.coder = (utf16.length == dp) ? LATIN1 : UTF16; Is it possible to have `utf16.length == dp` here? I think the `coder` can only be `UTF16`. src/java.base/share/classes/java/lang/StringLatin1.java line 37: > 35: import jdk.internal.util.ArraysSupport; > 36: import jdk.internal.util.DecimalDigits; > 37: import jdk.internal.vm.annotation.ForceInline; This annotation does not seem to be used. src/java.base/share/classes/java/lang/StringUTF16.java line 158: > 156: * {@return an encoded byte[] for the UTF16 characters in char[]} > 157: * **Only** use this if it is known that at least one character is UTF16. > 158: * Otherwise, an untrusted char array may have racy contents and really be latin1. While this is a good advice, it turns out that the `compress()` method below invokes this method _without_ knowing for sure if the provided `value` contains at least one non-latin1 `char` when this method is invoked or while it runs: it only _assumes_ so, and indeed prudentially checks afterwards. It should be suggested to check the result after invoking this method if `value` is untrusted. src/java.base/share/classes/java/lang/StringUTF16.java line 198: > 196: * @param val a char array > 197: * @param off starting offset > 198: * @param count length of chars to be compressed, length > 0 Suggestion: * @param count count of chars to be compressed, {@code count} > 0 src/java.base/share/classes/java/lang/StringUTF16.java line 214: > 212: } > 213: } > 214: return latin1; // latin1 success The original version of this `public` method can return `null` to signal failure to compress. Does this change impact callers that might expect `null`? src/java.base/share/classes/java/lang/StringUTF16.java line 226: > 224: * @param val a byte array with UTF16 coding > 225: * @param off starting offset > 226: * @param count length of chars to be compressed, length > 0 Suggestion: * @param count count of chars to be compressed, {@code count} > 0 src/java.base/share/classes/java/lang/StringUTF16.java line 232: > 230: int ndx = compress(val, off, latin1, 0, count); > 231: if (ndx != count) {// Switch to UTF16 > 232: byte[] utf16 = Arrays.copyOfRange(val, off << 1, (off + count) << 1); Not sure if the left shifts do not overflow on this `public` method. If that happens, the outcomes could be non-negative, so the copy would succeed but be kind of corrupted. src/java.base/share/classes/java/lang/StringUTF16.java line 240: > 238: } > 239: } > 240: return latin1; // latin1 success See the `compress()` above for a remark on `null` as a return value. src/java.base/share/classes/java/lang/StringUTF16.java line 319: > 317: int codePoint = val[off]; // read each codepoint from val only once > 318: int dstLimit = dstOff > 319: + (Character.isBmpCodePoint(codePoint) ? 1 : 2) Suggestion: + Character.charCount(codePoint) This method was introduced in 1.5, so should be safe to use even for backports. src/java.base/share/classes/java/lang/StringUTF16.java line 411: > 409: return 2; > 410: } else > 411: throw new IllegalArgumentException(Integer.toString(codePoint)); Maybe `Character.charCount()` can be used here, although it returns 2 even for invalid codepoints. ------------- PR Review: https://git.openjdk.org/jdk/pull/16425#pullrequestreview-1720680695 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1386992886 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1386829351 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1386885841 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1386905655 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1386923763 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1386905964 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1386915873 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1386927514 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1386951908 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1386997067 From msheppar at openjdk.org Wed Nov 8 19:05:04 2023 From: msheppar at openjdk.org (Mark Sheppard) Date: Wed, 8 Nov 2023 19:05:04 GMT Subject: RFR: 8319200: Don't use test thread factory in ProcessTools.createLimitedTestJavaProcessBuilder() [v3] In-Reply-To: References: <3damdMQpRBrkUN2S32tBD0Tmrl2tmSqA31NniV8FzHU=.d3a36aa7-d5f4-4a63-b2ff-8b9b616a9637@github.com> Message-ID: On Wed, 8 Nov 2023 02:33:29 GMT, Leonid Mesnik wrote: >> Test thread factory is a mode similar to VM flags and should not be used in ProcessTools.createLimitedTestJavaProcessBuilder(). Only createTestJavaProcessBuilder() should use it like jtreg VM options. >> >> Adding the test thread factory requires the injection of arguments in the middle of the list. I don't think it makes sense to modify arguments in several places so I replaced it with the flag isLimited and moved all modifications in createJavaProcessBuilder(). >> >> Testing tier1-5. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > converted list to array. test/lib/jdk/test/lib/process/ProcessTools.java line 387: > 385: */ > 386: > 387: private static String[] addTestThreadFactoryArgs(String testThreadFactoryName, String[] command) { would it be appropriate, at this juncture, to rename the method parameter "command" here, and throughout the associated code, to commandArgs, as the actual command i.e. java is added in createJavaProcessBuilder, and the parameter references the java command's args? or is that too much hassle. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16442#discussion_r1387076880 From rkennke at openjdk.org Wed Nov 8 19:06:17 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 8 Nov 2023 19:06:17 GMT Subject: RFR: 8318895: Deoptimization results in incorrect lightweight locking stack Message-ID: See JBS issue for details. I basically: - took the test-modification and turned it into its own test-case - added test runners for lightweight- and legacy-locking, so that we keep testing both, no matter what is the default - added Axels fix (mentioned in the JBS issue) with the modification to only inflate when exec_mode == Unpack_none, as explained by Richard. Testing: - [x] EATests.java - [ ] tier1 - [ ] tier2 ------------- Commit messages: - 8318895: Deoptimization results in incorrect lightweight locking stack Changes: https://git.openjdk.org/jdk/pull/16568/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16568&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318895 Stats: 99 lines in 2 files changed: 96 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16568.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16568/head:pull/16568 PR: https://git.openjdk.org/jdk/pull/16568 From lmesnik at openjdk.org Wed Nov 8 19:13:25 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 8 Nov 2023 19:13:25 GMT Subject: RFR: 8319200: Don't use test thread factory in ProcessTools.createLimitedTestJavaProcessBuilder() [v4] In-Reply-To: <3damdMQpRBrkUN2S32tBD0Tmrl2tmSqA31NniV8FzHU=.d3a36aa7-d5f4-4a63-b2ff-8b9b616a9637@github.com> References: <3damdMQpRBrkUN2S32tBD0Tmrl2tmSqA31NniV8FzHU=.d3a36aa7-d5f4-4a63-b2ff-8b9b616a9637@github.com> Message-ID: > Test thread factory is a mode similar to VM flags and should not be used in ProcessTools.createLimitedTestJavaProcessBuilder(). Only createTestJavaProcessBuilder() should use it like jtreg VM options. > > Adding the test thread factory requires the injection of arguments in the middle of the list. I don't think it makes sense to modify arguments in several places so I replaced it with the flag isLimited and moved all modifications in createJavaProcessBuilder(). > > Testing tier1-5. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: renamed arguments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16442/files - new: https://git.openjdk.org/jdk/pull/16442/files/4ba1e85e..aa93f71a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16442&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16442&range=02-03 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16442.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16442/head:pull/16442 PR: https://git.openjdk.org/jdk/pull/16442 From lmesnik at openjdk.org Wed Nov 8 19:13:28 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 8 Nov 2023 19:13:28 GMT Subject: RFR: 8319200: Don't use test thread factory in ProcessTools.createLimitedTestJavaProcessBuilder() [v3] In-Reply-To: References: <3damdMQpRBrkUN2S32tBD0Tmrl2tmSqA31NniV8FzHU=.d3a36aa7-d5f4-4a63-b2ff-8b9b616a9637@github.com> Message-ID: On Wed, 8 Nov 2023 19:02:36 GMT, Mark Sheppard wrote: >> Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: >> >> converted list to array. > > test/lib/jdk/test/lib/process/ProcessTools.java line 387: > >> 385: */ >> 386: >> 387: private static String[] addTestThreadFactoryArgs(String testThreadFactoryName, String[] command) { > > would it be appropriate, at this juncture, to rename the method parameter "command" here, and throughout the associated code, to commandArgs, as the actual command i.e. java is added in createJavaProcessBuilder, and the parameter references the java command's args? or is that too much hassle. done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16442#discussion_r1387085932 From vkempik at openjdk.org Wed Nov 8 19:20:58 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Wed, 8 Nov 2023 19:20:58 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 In-Reply-To: References: Message-ID: On Wed, 8 Nov 2023 14:47:07 GMT, Robbin Ehn wrote: > Hi, please consider. > > Main author is @luhenry, I only fixed some minor things and tested it. > > Such as: > test/hotspot/jtreg/compiler/intrinsics/sha/ > test/jdk/java/security/MessageDigest/ > test/jdk/jdk/security/ > tier1 > > And still running some test. src/hotspot/cpu/riscv/vm_version_riscv.cpp line 169: > 167: } > 168: > 169: if (UseZvknhb && UseZvkb) { this looks weird, two jdk options needed to enable sha intrinsincs. Can we simplify it somehow for now , like UseRVVCryptoExt ? Splitting this into UseZvknhb && UseZvkb can be done in future, if it really would be needed one day ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1387093990 From vkempik at openjdk.org Wed Nov 8 19:23:58 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Wed, 8 Nov 2023 19:23:58 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 In-Reply-To: References: Message-ID: On Wed, 8 Nov 2023 14:47:07 GMT, Robbin Ehn wrote: > Hi, please consider. > > Main author is @luhenry, I only fixed some minor things and tested it. > > Such as: > test/hotspot/jtreg/compiler/intrinsics/sha/ > test/jdk/java/security/MessageDigest/ > test/jdk/jdk/security/ > tier1 > > And still running some test. We still have https://bugs.openjdk.org/browse/JDK-8295382 and https://bugs.openjdk.org/browse/JDK-8295383, some of them should be marked as duplicates, or this PR need to go under 8295382&8295383 issues ------------- PR Comment: https://git.openjdk.org/jdk/pull/16562#issuecomment-1802507212 From rriggs at openjdk.org Wed Nov 8 20:33:00 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Wed, 8 Nov 2023 20:33:00 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: On Wed, 8 Nov 2023 16:47:17 GMT, Raffaello Giulietti wrote: >> Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. >> The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. >> >> The changes include: >> >> - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate >> if the input array is modified before the constructor returns. >> The resulting string may contain any combination of characters sampled from the input array. >> >> - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. >> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. >> If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. >> If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. >> Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. >> Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. >> >> - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. >> >> - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. > > src/java.base/share/classes/java/lang/StringUTF16.java line 214: > >> 212: } >> 213: } >> 214: return latin1; // latin1 success > > The original version of this `public` method can return `null` to signal failure to compress. Does this change impact callers that might expect `null`? It is public but in a package private class; all of the uses have been updated. > src/java.base/share/classes/java/lang/StringUTF16.java line 232: > >> 230: int ndx = compress(val, off, latin1, 0, count); >> 231: if (ndx != count) {// Switch to UTF16 >> 232: byte[] utf16 = Arrays.copyOfRange(val, off << 1, (off + count) << 1); > > Not sure if the left shifts do not overflow on this `public` method. If that happens, the outcomes could be non-negative, so the copy would succeed but be kind of corrupted. These deserve the same kind of check as used in StringUTF16.newBytesFor(len). > src/java.base/share/classes/java/lang/StringUTF16.java line 411: > >> 409: return 2; >> 410: } else >> 411: throw new IllegalArgumentException(Integer.toString(codePoint)); > > Maybe `Character.charCount()` can be used here, although it returns 2 even for invalid codepoints. The check and exception is specified in the constructor `public String(int[] codePoints, int offset, int count)` so its needed in at least one pass over the input. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1387164695 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1387163869 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1387167119 From dcubed at openjdk.org Wed Nov 8 22:27:01 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 8 Nov 2023 22:27:01 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v5] In-Reply-To: References: Message-ID: On Wed, 8 Nov 2023 13:34:24 GMT, Stefan Karlsson wrote: >> A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Rename monitors_iterate Thumbs up. I have some questions, but I don't think anything is blocking. src/hotspot/share/runtime/synchronizer.cpp line 1094: > 1092: // Iterate owned ObjectMonitors. > 1093: void ObjectSynchronizer::owned_monitors_iterate(MonitorClosure* closure) { > 1094: auto all_filter = [&](void* owner) { return true; }; I don't grok how this filter only iterates owned ObjectMonitors. It always returns `true` and does not check for a non-null owner. I'm probably missing some sort of lambda magic here... src/hotspot/share/runtime/vmOperations.cpp line 344: > 342: void do_monitor(ObjectMonitor* monitor) override { > 343: // The caller is interested in the owned ObjectMonitors. This does > 344: // not include when owner is set to a stack-lock address in thread. The stack-lock part of this comment doesn't agree with the header comment for this relocated version of the code. src/hotspot/share/runtime/vmOperations.cpp line 392: > 390: > 391: // If there are many object monitors in the system then the above iteration > 392: // can start to to take time. Be friendly to following thread dumps by nit typo: s/start to to take/start to take/ test/hotspot/jtreg/runtime/Monitor/ConcurrentDeflation.java line 69: > 67: > 68: static private void createMonitors() { > 69: Object[] monitors = new Object[1000]; Since `monitors` is local to this static function, if the test runs for long enough, then C2 might optimize away all those monitors... I usually ask @vnkozlov about the best way to keep that from happening. ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16519#pullrequestreview-1721324825 PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1387230578 PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1387238228 PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1387239156 PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1387248957 From dcubed at openjdk.org Wed Nov 8 22:27:03 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 8 Nov 2023 22:27:03 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v2] In-Reply-To: References: <-6DDgv7dVmV8eB5_putOLjWXq1PQo7BT37MdqsmIV2k=.4ef8c7e3-c7bc-4664-815f-ca46e50cbe12@github.com> Message-ID: On Mon, 6 Nov 2023 20:07:15 GMT, Stefan Karlsson wrote: >> src/hotspot/share/runtime/vmOperations.cpp line 400: >> >>> 398: const int DeflateRequestLimit = 100000; >>> 399: if (monitors_count > DeflateRequestLimit) { >>> 400: ObjectSynchronizer::request_deflate_idle_monitors(); >> >> Not sure about this. Arguably, the async deflation policy should re-evaluate the conditions for deflation and then decide to act. Otherwise, this effectively backdoors the heuristics, and does so with the hardcoded threshold. On the other hand, the old code effectively did the same with threshold of `0`. >> >> So, I would rather keep old behavior and just request deflation without a threshold here. > > Thanks for the feedback. It is unclear to me if the old behavior of deflating monitors for every single thread dump is beneficial or not, but I also wouldn't mind changing this to use your suggestion if others agree that it is the preferred way forward. I'm going to at least wait for @dcubed-ojdk to get some time to give his input on this. When I changed the thread dump code to deflate monitors, I was just trying to clean things up to reduce memory usage. In other words, I was taking advantage of the safepoint to clean up as many ObjectMonitors as I could while the threads that were generating them were stopped at the safepoint. However, my focus on cleanliness cost us pause time for this VM operation. I would be fine with only requesting deflation from a thread dump when we pass a limit of some sort. It would be good to make that limit value an experimental flag/option so it could be tuned for diagnostic purposes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1387244953 From dcubed at openjdk.org Wed Nov 8 22:27:06 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 8 Nov 2023 22:27:06 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v5] In-Reply-To: References: <_riE4qmrYNKLgzlHUBpRUlmrSobLlB1JEBySQX84S9Y=.44881eb0-c090-4548-b439-b60cf0a65de4@github.com> Message-ID: On Mon, 6 Nov 2023 13:09:21 GMT, Stefan Karlsson wrote: >> test/hotspot/jtreg/runtime/Monitor/ConcurrentDeflation.java line 41: >> >>> 39: >>> 40: public class ConcurrentDeflation { >>> 41: public static final int TOTAL_RUN_TIME = 10 * 1000; >> >> Given that this test always runs for at least 10 seconds, maybe it should be excluded from tier1. See `test/hotspot/jtreg/TEST.groups`. >> >> Unsure what the praxis is here. > > Done Hmmm... some of the ThreadSMR tests that I added run for 30 seconds by default... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1387246580 From dcubed at openjdk.org Wed Nov 8 22:27:06 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 8 Nov 2023 22:27:06 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v5] In-Reply-To: References: <_riE4qmrYNKLgzlHUBpRUlmrSobLlB1JEBySQX84S9Y=.44881eb0-c090-4548-b439-b60cf0a65de4@github.com> Message-ID: On Wed, 8 Nov 2023 22:00:34 GMT, Daniel D. Daugherty wrote: >> Done > > Hmmm... some of the ThreadSMR tests that I added run for 30 seconds by default... Also, accepting an optional parameter for duration is good way to make this test adaptable to having a stress wrapper placed around it. If no parameters are specified, then go with the default... See test/hotspot/jtreg/runtime/Thread/InterruptAtExit.java for an example. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1387252876 From jjoo at openjdk.org Thu Nov 9 00:42:33 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Thu, 9 Nov 2023 00:42:33 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v39] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Refactor changes to counters, successful build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/ac780c5e..22ccb909 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=38 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=37-38 Stats: 149 lines in 16 files changed: 26 ins; 112 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From vlivanov at openjdk.org Thu Nov 9 00:35:09 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 9 Nov 2023 00:35:09 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v11] In-Reply-To: <2mpVRiA1idUeB0AN8eAtghk_sGFu90tZyTvpYOPaBq4=.72ab5bb9-3095-4352-9aa6-9e59151e482e@github.com> References: <2mpVRiA1idUeB0AN8eAtghk_sGFu90tZyTvpYOPaBq4=.72ab5bb9-3095-4352-9aa6-9e59151e482e@github.com> Message-ID: On Sat, 21 Oct 2023 12:04:10 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. >> >> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. >> >> Components of this patch: >> >> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. >> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. >> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. >> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. >> - The object/oop + offset is exposed as temporary address to native code. >> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). >> - Only x64 and AArch64 for now. >> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 >> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. >> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` >> >> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. >> >> Numbers for the included benchmark on my machine are: >> >> >> Benchmar... > > Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 46 commits: > > - Merge branch 'master' into AllowHeapNoLock > - bump up argument counts in TestLargeStub to their maximum > - s390 updates > - add stub size stress test for allowHeap > - RISC-V impl > - remove leftover debug log line > - add s390 support > - add PPC impl > - add missing file > - Add xor benchmark > - ... and 36 more: https://git.openjdk.org/jdk/compare/a876beb6...2e00beff src/java.base/share/classes/java/lang/foreign/Linker.java line 783: > 781: * memory segments as addresses, where normally only off-heap memory segments would be allowed. The memory region > 782: * inside the Java heap is exposed through a temporary native address that is valid for the duration of the > 783: * function call. Use of this mechanism is therefore only recommend when a function needs to do s/recommend/recommended/ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1368969423 From dlong at openjdk.org Thu Nov 9 01:30:55 2023 From: dlong at openjdk.org (Dean Long) Date: Thu, 9 Nov 2023 01:30:55 GMT Subject: RFR: 8318895: Deoptimization results in incorrect lightweight locking stack In-Reply-To: References: Message-ID: On Wed, 8 Nov 2023 19:00:53 GMT, Roman Kennke wrote: > See JBS issue for details. > > I basically: > - took the test-modification and turned it into its own test-case > - added test runners for lightweight- and legacy-locking, so that we keep testing both, no matter what is the default > - added Axels fix (mentioned in the JBS issue) with the modification to only inflate when exec_mode == Unpack_none, as explained by Richard. > > Testing: > - [x] EATests.java > - [x] tier1 > - [ ] tier2 Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16568#pullrequestreview-1721539402 From dholmes at openjdk.org Thu Nov 9 01:36:04 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 9 Nov 2023 01:36:04 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v5] In-Reply-To: References: Message-ID: On Wed, 8 Nov 2023 21:41:44 GMT, Daniel D. Daugherty wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename monitors_iterate > > src/hotspot/share/runtime/synchronizer.cpp line 1094: > >> 1092: // Iterate owned ObjectMonitors. >> 1093: void ObjectSynchronizer::owned_monitors_iterate(MonitorClosure* closure) { >> 1094: auto all_filter = [&](void* owner) { return true; }; > > I don't grok how this filter only iterates owned ObjectMonitors. > It always returns `true` and does not check for a non-null owner. > I'm probably missing some sort of lambda magic here... I think in this case we already know that all the monitors in the closure are owned by the expected owner. But I wonder if we should/can assert that? > test/hotspot/jtreg/runtime/Monitor/ConcurrentDeflation.java line 69: > >> 67: >> 68: static private void createMonitors() { >> 69: Object[] monitors = new Object[1000]; > > Since `monitors` is local to this static function, if the test runs for long > enough, then C2 might optimize away all those monitors... > > I usually ask @vnkozlov about the best way to keep that from happening. Yeah make the array a static field. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1387367477 PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1387373821 From dholmes at openjdk.org Thu Nov 9 01:36:01 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 9 Nov 2023 01:36:01 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v5] In-Reply-To: References: Message-ID: On Wed, 8 Nov 2023 13:34:24 GMT, Stefan Karlsson wrote: >> A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Rename monitors_iterate Just a couple of drive by comments. For something that was summarized so succinctly( "fix for this is to stop deflating monitors from the safepoting VM_ThreadDump operation") the actual changes are quite extensive and I found it difficult to follow. But thanks for the fix! src/hotspot/share/runtime/synchronizer.hpp line 135: > 133: > 134: // Iterate owned ObjectMonitors. > 135: static void owned_monitors_iterate(MonitorClosure* closure); owned by whom? current thread? Does that include stack-locked or not? Just trying to understand how the two variants of `owned_monitors_iterate` relate. src/hotspot/share/runtime/vmOperations.cpp line 395: > 393: // telling the MonitorDeflationThread to deflate monitors. > 394: // > 395: // The limit has been arbitrarily chosen to be were the iteration started s/were/where/ test/hotspot/jtreg/runtime/Monitor/ConcurrentDeflation.java line 78: > 76: monitors[index] = new Object(); > 77: synchronized (monitors[index]) { > 78: } I would expect C2 to eliminate this as well. The monitors are provably thread-local so synchronization is a no-op. ------------- PR Review: https://git.openjdk.org/jdk/pull/16519#pullrequestreview-1721529428 PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1387370859 PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1387372489 PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1387377594 From vlivanov at openjdk.org Thu Nov 9 00:35:06 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 9 Nov 2023 00:35:06 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v12] In-Reply-To: References: Message-ID: On Tue, 24 Oct 2023 15:09:57 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. >> >> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. >> >> Components of this patch: >> >> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. >> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. >> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. >> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. >> - The object/oop + offset is exposed as temporary address to native code. >> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). >> - Only x64 and AArch64 for now. >> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 >> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. >> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` >> >> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. >> >> Numbers for the included benchmark on my machine are: >> >> >> Benchmar... > > Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: > > - a -> an > - add note to downcallHandle about passing heap segments by-reference hotspot changes look fine. src/hotspot/cpu/aarch64/downcallLinker_aarch64.cpp line 182: > 180: ArgumentShuffle arg_shuffle(filtered_java_regs, out_regs, shuffle_reg); > 181: > 182: #ifndef PRODUCT Any particular reason to exclude the logging in product builds? `ArgumentShuffle::print_on()` is unconditionally available there. src/java.base/share/classes/java/lang/foreign/Linker.java line 792: > 790: * @param allowHeapAccess whether the linked function should allow access to the Java heap. > 791: */ > 792: static Option critical(boolean allowHeapAccess) { Speaking of public API, I'm surprised to see critical function property conflated with ability to perform on-heap accesses. These aspects look orthogonal to me. Any particular reason not to represent them as 2 separate `Option`s? Even though it's straightforward to support on-heap accesses during critical function calls, object pinning would support that for non-critical function calls as well, but proposed API doesn't cover it and new API will be required. What's the plan here? ------------- PR Review: https://git.openjdk.org/jdk/pull/16201#pullrequestreview-1693030613 PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1387130520 PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1387341549 From rriggs at openjdk.org Thu Nov 9 04:16:25 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Thu, 9 Nov 2023 04:16:25 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v2] In-Reply-To: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: > Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. > The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. > > The changes include: > > - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate > if the input array is modified before the constructor returns. > The resulting string may contain any combination of characters sampled from the input array. > > - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. > For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. > If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. > If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. > Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. > Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. > > - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. > > - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. Roger Riggs has updated the pull request incrementally with three additional commits since the last revision: - Refactored extractCodePoints to avoid multiple resizes if the array was modified - Replaced isLatin1 implementation with `getChar(buf, ndx) <= 0xff` It performs better than the single byte array access by avoiding the bounds check. - Misc updates for review comments, javadoc cleanup Extra checking on maximum string lengths when calling toBytes(). ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16425/files - new: https://git.openjdk.org/jdk/pull/16425/files/4662dec7..ad73a2a6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16425&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16425&range=00-01 Stats: 54 lines in 3 files changed: 20 ins; 20 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/16425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16425/head:pull/16425 PR: https://git.openjdk.org/jdk/pull/16425 From dholmes at openjdk.org Thu Nov 9 04:59:57 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 9 Nov 2023 04:59:57 GMT Subject: RFR: 8319200: Don't use test thread factory in ProcessTools.createLimitedTestJavaProcessBuilder() [v4] In-Reply-To: References: <3damdMQpRBrkUN2S32tBD0Tmrl2tmSqA31NniV8FzHU=.d3a36aa7-d5f4-4a63-b2ff-8b9b616a9637@github.com> Message-ID: On Wed, 8 Nov 2023 19:13:25 GMT, Leonid Mesnik wrote: >> Test thread factory is a mode similar to VM flags and should not be used in ProcessTools.createLimitedTestJavaProcessBuilder(). Only createTestJavaProcessBuilder() should use it like jtreg VM options. >> >> Adding the test thread factory requires the injection of arguments in the middle of the list. I don't think it makes sense to modify arguments in several places so I replaced it with the flag isLimited and moved all modifications in createJavaProcessBuilder(). >> >> Testing tier1-5. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > renamed arguments test/lib/jdk/test/lib/process/ProcessTools.java line 401: > 399: boolean expectSecondArg = false; > 400: boolean isTestThreadFactoryAdded = false; > 401: for (String cmd : commandArgs) { So `cmd` should really be `arg` right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16442#discussion_r1387478034 From dholmes at openjdk.org Thu Nov 9 05:03:57 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 9 Nov 2023 05:03:57 GMT Subject: RFR: 8319200: Don't use test thread factory in ProcessTools.createLimitedTestJavaProcessBuilder() [v4] In-Reply-To: References: <3damdMQpRBrkUN2S32tBD0Tmrl2tmSqA31NniV8FzHU=.d3a36aa7-d5f4-4a63-b2ff-8b9b616a9637@github.com> Message-ID: On Wed, 8 Nov 2023 19:13:25 GMT, Leonid Mesnik wrote: >> Test thread factory is a mode similar to VM flags and should not be used in ProcessTools.createLimitedTestJavaProcessBuilder(). Only createTestJavaProcessBuilder() should use it like jtreg VM options. >> >> Adding the test thread factory requires the injection of arguments in the middle of the list. I don't think it makes sense to modify arguments in several places so I replaced it with the flag isLimited and moved all modifications in createJavaProcessBuilder(). >> >> Testing tier1-5. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > renamed arguments I remain concerned that this means that a whole swag of tests will never be run with virtual threads, which reduces our virtual thread test coverage. Hard to quantify. Do you have any stats on how many tests this will affect and which ones? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16442#issuecomment-1803166547 From jjoo at openjdk.org Thu Nov 9 05:27:40 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Thu, 9 Nov 2023 05:27:40 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v40] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Add missing cpuTimeCounters files ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/22ccb909..41771db6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=39 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=38-39 Stats: 220 lines in 2 files changed: 220 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From dholmes at openjdk.org Thu Nov 9 06:20:57 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 9 Nov 2023 06:20:57 GMT Subject: RFR: 8319709: Make GrowableArrayCHeap copyable [v2] In-Reply-To: <0UAh881Jw6L5YNbClDQmuE_Q6fzv0ayeqkrblIoigZ8=.5d81b8a8-d04c-48cb-8987-f3fba98ac403@github.com> References: <2SEJ0Rh7DNmKgcylAW7_DFxas2Bs3YzTnUSe39OIVsI=.03298520-694f-4ba7-bdce-d1e67eb3872e@github.com> <0UAh881Jw6L5YNbClDQmuE_Q6fzv0ayeqkrblIoigZ8=.5d81b8a8-d04c-48cb-8987-f3fba98ac403@github.com> Message-ID: On Wed, 8 Nov 2023 14:08:15 GMT, Johan Sj?len wrote: >> Hi, >> >> Please consider this code change which makes `GrowableArrayCHeap` copyable. The resulting copy does not share its data array with the original. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Assignment operator takes const reference What is the motivation for this? Please add something to JBS. Also see query below. Thanks src/hotspot/share/utilities/growableArray.hpp line 835: > 833: this->grow(other._len); > 834: for (int i = 0; i < other._len; i++) { > 835: this->_data[i] = other._data[i]; Does this imply that anything you put in a `GrowableArrayCHeap` must itself also be copyable now? ------------- PR Review: https://git.openjdk.org/jdk/pull/16559#pullrequestreview-1721781877 PR Review Comment: https://git.openjdk.org/jdk/pull/16559#discussion_r1387533329 From stefank at openjdk.org Thu Nov 9 06:46:58 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 9 Nov 2023 06:46:58 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v5] In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 01:14:12 GMT, David Holmes wrote: >> src/hotspot/share/runtime/synchronizer.cpp line 1094: >> >>> 1092: // Iterate owned ObjectMonitors. >>> 1093: void ObjectSynchronizer::owned_monitors_iterate(MonitorClosure* closure) { >>> 1094: auto all_filter = [&](void* owner) { return true; }; >> >> I don't grok how this filter only iterates owned ObjectMonitors. >> It always returns `true` and does not check for a non-null owner. >> I'm probably missing some sort of lambda magic here... > > I think in this case we already know that all the monitors in the closure are owned by the expected owner. But I wonder if we should/can assert that? The filter accepts all monitors. The filtering that only returns owned monitors is done inside the called `owned_monitors_iterate_filtered`: void ObjectSynchronizer::owned_monitors_iterate_filtered(MonitorClosure* closure, OwnerFilter filter) { ... if (mid->has_owner() && filter(mid->owner_raw())) { ... closure->do_monitor(mid); } The closure is only applied to monitors that "have an owner". Maybe "owned monitors" sounds too much as if the function only visit all monitors owned by the current thread? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1387552877 From stefank at openjdk.org Thu Nov 9 06:54:58 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 9 Nov 2023 06:54:58 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v5] In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 01:20:23 GMT, David Holmes wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename monitors_iterate > > src/hotspot/share/runtime/synchronizer.hpp line 135: > >> 133: >> 134: // Iterate owned ObjectMonitors. >> 135: static void owned_monitors_iterate(MonitorClosure* closure); > > owned by whom? current thread? Does that include stack-locked or not? > > Just trying to understand how the two variants of `owned_monitors_iterate` relate. Owned by *any* thread in any way. * `owned_monitors_iterate(MonitorClosure* closure)` - Visits all monitors with the owner set to anything that indicates that the monitor has an owner (`ObjectMonitor::has_owner()`). * `owned_monitors_iterate(MonitorClosure* m, JavaThread* thread)` - Visits all monitors with the owner field set to the specified `thread`. Maybe we could figure out more descriptive names for these. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1387558163 From stefank at openjdk.org Thu Nov 9 07:06:00 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 9 Nov 2023 07:06:00 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v5] In-Reply-To: References: Message-ID: On Wed, 8 Nov 2023 21:50:43 GMT, Daniel D. Daugherty wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename monitors_iterate > > src/hotspot/share/runtime/vmOperations.cpp line 344: > >> 342: void do_monitor(ObjectMonitor* monitor) override { >> 343: // The caller is interested in the owned ObjectMonitors. This does >> 344: // not include when owner is set to a stack-lock address in thread. > > The stack-lock part of this comment doesn't agree with > the header comment for this relocated version of the code. Good point. I'll remove the comment and add an `assert(monitor->has_owner(), ...)` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1387566964 From rehn at openjdk.org Thu Nov 9 07:16:57 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 9 Nov 2023 07:16:57 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 In-Reply-To: References: Message-ID: On Wed, 8 Nov 2023 19:21:04 GMT, Vladimir Kempik wrote: > We still have https://bugs.openjdk.org/browse/JDK-8295382 and https://bugs.openjdk.org/browse/JDK-8295383, some of them should be marked as duplicates, or this PR need to go under 8295382&8295383 issues Ah, I missed them, thanks, closed as dups. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16562#issuecomment-1803277936 From jsjolen at openjdk.org Thu Nov 9 07:19:57 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 9 Nov 2023 07:19:57 GMT Subject: RFR: 8319709: Make GrowableArrayCHeap copyable [v2] In-Reply-To: References: <2SEJ0Rh7DNmKgcylAW7_DFxas2Bs3YzTnUSe39OIVsI=.03298520-694f-4ba7-bdce-d1e67eb3872e@github.com> <0UAh881Jw6L5YNbClDQmuE_Q6fzv0ayeqkrblIoigZ8=.5d81b8a8-d04c-48cb-8987-f3fba98ac403@github.com> Message-ID: <0CTdcVjWDVOuV23lJ0EGFGgqa4x_P_UxSo5N4WzTJTE=.45f3ba14-bd81-4c58-afe7-6b4849e568aa@github.com> On Thu, 9 Nov 2023 06:18:11 GMT, David Holmes wrote: > What is the motivation for this? Please add something to JBS. Also see query below. > > Thanks I need this feature because I want to store `GrowableArray` s within `GrowableArray`s without an unnecessary pointer indirection. I'll add this justification to JBS. > src/hotspot/share/utilities/growableArray.hpp line 835: > >> 833: this->grow(other._len); >> 834: for (int i = 0; i < other._len; i++) { >> 835: this->_data[i] = other._data[i]; > > Does this imply that anything you put in a `GrowableArrayCHeap` must itself also be copyable now? Hi, No, if you never call the assignment operator then it won't be generated and so does not imply that. If you have something `NONCOPYABLE` and you do call the assignment operator then compilation will fail. However, just to be clear: It is already the case that everything we put into a `GrowableArray`, CHeap or not, must be copyable. This is a direct consequence of the fact that we do not allow for move semantics in HotSpot. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16559#issuecomment-1803280833 PR Review Comment: https://git.openjdk.org/jdk/pull/16559#discussion_r1387576498 From stefank at openjdk.org Thu Nov 9 07:29:15 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 9 Nov 2023 07:29:15 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v6] In-Reply-To: References: Message-ID: > A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... Stefan Karlsson has updated the pull request incrementally with five additional commits since the last revision: - Tweak the flag comment a bit - Add AsyncMonitorDeflationForThreadDumpLimit flag - Typos - Remove comment in do_monitors - Make monitors array public static ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16519/files - new: https://git.openjdk.org/jdk/pull/16519/files/2180a0c9..103d917a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16519&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16519&range=04-05 Stats: 19 lines in 3 files changed: 7 ins; 7 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16519.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16519/head:pull/16519 PR: https://git.openjdk.org/jdk/pull/16519 From stefank at openjdk.org Thu Nov 9 07:29:17 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 9 Nov 2023 07:29:17 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v5] In-Reply-To: References: Message-ID: <6hbpgRUjc1y79xDEl6QGKBhdVKApbecyzN0Vu4KKicg=.8856006f-417c-476e-8f7b-4dcf8bf659a3@github.com> On Thu, 9 Nov 2023 01:26:21 GMT, David Holmes wrote: >> test/hotspot/jtreg/runtime/Monitor/ConcurrentDeflation.java line 69: >> >>> 67: >>> 68: static private void createMonitors() { >>> 69: Object[] monitors = new Object[1000]; >> >> Since `monitors` is local to this static function, if the test runs for long >> enough, then C2 might optimize away all those monitors... >> >> I usually ask @vnkozlov about the best way to keep that from happening. > > Yeah make the array a static field. Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1387581814 From stefank at openjdk.org Thu Nov 9 07:29:20 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 9 Nov 2023 07:29:20 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v5] In-Reply-To: References: Message-ID: <3vGepwlbdiYF4-oFugUEzQv9PY2kUcEOmJq8pXlQuWM=.9dae984f-37e0-406f-a225-50f231d580ca@github.com> On Thu, 9 Nov 2023 01:33:38 GMT, David Holmes wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename monitors_iterate > > test/hotspot/jtreg/runtime/Monitor/ConcurrentDeflation.java line 78: > >> 76: monitors[index] = new Object(); >> 77: synchronized (monitors[index]) { >> 78: } > > I would expect C2 to eliminate this as well. The monitors are provably thread-local so synchronization is a no-op. Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1387581859 From dholmes at openjdk.org Thu Nov 9 07:38:59 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 9 Nov 2023 07:38:59 GMT Subject: RFR: 8319709: Make GrowableArrayCHeap copyable [v2] In-Reply-To: <0CTdcVjWDVOuV23lJ0EGFGgqa4x_P_UxSo5N4WzTJTE=.45f3ba14-bd81-4c58-afe7-6b4849e568aa@github.com> References: <2SEJ0Rh7DNmKgcylAW7_DFxas2Bs3YzTnUSe39OIVsI=.03298520-694f-4ba7-bdce-d1e67eb3872e@github.com> <0UAh881Jw6L5YNbClDQmuE_Q6fzv0ayeqkrblIoigZ8=.5d81b8a8-d04c-48cb-8987-f3fba98ac403@github.com> <0CTdcVjWDVOuV23lJ0EGFGgqa4x_P_UxSo5N4WzTJTE=.45f3ba14-bd81-4c58-afe7-6b4849e568aa@github.com> Message-ID: <8xE1byllZBEanHQ7r_3PnKJgr6PScdggn0Acb0FrMDo=.3bdc7661-f132-4803-8f6b-c8384ae83298@github.com> On Thu, 9 Nov 2023 07:16:04 GMT, Johan Sj?len wrote: >> src/hotspot/share/utilities/growableArray.hpp line 835: >> >>> 833: this->grow(other._len); >>> 834: for (int i = 0; i < other._len; i++) { >>> 835: this->_data[i] = other._data[i]; >> >> Does this imply that anything you put in a `GrowableArrayCHeap` must itself also be copyable now? > > Hi, > > No, if you never call the assignment operator then it won't be generated and so does not imply that. If you have something `NONCOPYABLE` and you do call the assignment operator then compilation will fail. > > However, just to be clear: It is already the case that everything we put into a `GrowableArray`, CHeap or not, must be copyable. This is a direct consequence of the fact that we do not allow for move semantics in HotSpot. Hmmm I'd guess I'd normally store pointers then, because I generally may not want to either move or copy the object that is accessible through the container. But thanks for clarifying. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16559#discussion_r1387593971 From stefank at openjdk.org Thu Nov 9 07:44:58 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 9 Nov 2023 07:44:58 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v2] In-Reply-To: References: <-6DDgv7dVmV8eB5_putOLjWXq1PQo7BT37MdqsmIV2k=.4ef8c7e3-c7bc-4664-815f-ca46e50cbe12@github.com> Message-ID: On Wed, 8 Nov 2023 21:58:42 GMT, Daniel D. Daugherty wrote: >> Thanks for the feedback. It is unclear to me if the old behavior of deflating monitors for every single thread dump is beneficial or not, but I also wouldn't mind changing this to use your suggestion if others agree that it is the preferred way forward. I'm going to at least wait for @dcubed-ojdk to get some time to give his input on this. > > When I changed the thread dump code to deflate monitors, I was just trying to > clean things up to reduce memory usage. In other words, I was taking advantage > of the safepoint to clean up as many ObjectMonitors as I could while the threads > that were generating them were stopped at the safepoint. However, my focus on > cleanliness cost us pause time for this VM operation. > > I would be fine with only requesting deflation from a thread dump when we pass > a limit of some sort. It would be good to make that limit value an experimental > flag/option so it could be tuned for diagnostic purposes. There are a few options how to move forward with this: 1. Stop triggering deflation from the thread dumping code 2. Only trigger if we pass a given limit, say 100000. 3. Always trigger monitor deflation I believe that long-term it would be best for the JVM if we went with (1). I added (2) just to counter some potential arguments that having too many monitors in the system will make the thread dumping take a long time. It is not clear to me at all that people will notice this, and if they do then maybe we need to tweak the monitor deflation heuristics instead. (3) seems excessive to me. I've added the flag AsyncMonitorDeflationForThreadDumpLimit. The long name hints that this flag is in support of something overly specific. I set the default to SIZE_MAX in my support for (1), and hope that we can release-note that we have stopped performing monitor deflation from thread dumping and that this flag is going to be good enough safeguard if there are applications that rely on the monitor deflation for thread dumping. But if reviewers disagree with this, I'm OK with changing the default value to support either (2) or (3). Could I get a all the reviewer's here to (re)state the preference on this, included their suggested limit I'm also OK with changing the name of the flag if you have a better name, and changing it to an experimental flag if that makes more sense. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1387598881 From stefank at openjdk.org Thu Nov 9 07:47:57 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 9 Nov 2023 07:47:57 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v5] In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 01:30:53 GMT, David Holmes wrote: > Just a couple of drive by comments. For something that was summarized so succinctly( "fix for this is to stop deflating monitors from the safepoting VM_ThreadDump operation") the actual changes are quite extensive and I found it difficult to follow. Yeah. I want to keep the summary text short because Skara posts mails with comments *after* the entire Summary text. I added the full summary as the first comment instead: https://github.com/openjdk/jdk/pull/16519#issuecomment-1794438582 ------------- PR Comment: https://git.openjdk.org/jdk/pull/16519#issuecomment-1803309901 From duke at openjdk.org Thu Nov 9 08:00:19 2023 From: duke at openjdk.org (duke) Date: Thu, 9 Nov 2023 08:00:19 GMT Subject: Withdrawn: 8310160: Make GC APIs for handling archive heap objects agnostic of GC policy In-Reply-To: References: Message-ID: On Fri, 16 Jun 2023 14:26:09 GMT, Ashutosh Mehra wrote: > This PR adds GC APIs to be implemented by all collectors for proper handling of archive heap space. Currently only G1 is updated to use these APIs which just involves renaming the existing G1 APIs. > In addition to that filemap.cpp is updated to replace calls to `G1CollectedHeap::heap()` with `Universe::heap()` to avoid G1 specific code as much as possible. > > At many places in filemap.cpp heap range is requested from GC. All collectors except ZGC have contiguous heap and set `CollectedHeap::_reserved` to the heap range, so it can be easily exposed to the CDS code. This is done in this patch through `CollectedHeap::reserved` API. But for ZGC the heap can be discontiguous which makes it tricky to expose the heap range. > Another point to note is that most of the usage for heap range is for logging purpose, but there is one place where it is used for setting the `mapping_offset` in `FileMapInfo::write_region()` based on the heap start. So purely based on the functional requirement, we only need the heap start address, not the range. > > To keep things simple and considering ZGC does not currently support archive heap, i refrained from tackling the issue of discontiguous heap range in this PR. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/14520 From rehn at openjdk.org Thu Nov 9 08:05:59 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 9 Nov 2023 08:05:59 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 In-Reply-To: References: Message-ID: <8rC40UxJC4IF9vdv6xIyaJl6l-fhAlRC0VezoUAuKYE=.bdc94ce5-d9d5-429c-bb38-701ffcbe0bcf@github.com> On Wed, 8 Nov 2023 19:17:32 GMT, Vladimir Kempik wrote: >> Hi, please consider. >> >> Main author is @luhenry, I only fixed some minor things and tested it. >> >> Such as: >> test/hotspot/jtreg/compiler/intrinsics/sha/ >> test/jdk/java/security/MessageDigest/ >> test/jdk/jdk/security/ >> tier1 >> >> And still running some test. > > src/hotspot/cpu/riscv/vm_version_riscv.cpp line 169: > >> 167: } >> 168: >> 169: if (UseZvknhb && UseZvkb) { > > this looks weird, two jdk options needed to enable sha intrinsincs. > Can we simplify it somehow for now , like UseRVVCryptoExt ? > Splitting this into UseZvknhb && UseZvkb can be done in future, if it really would be needed one day Yes, this is a total mess. For bystanders this is a 'simple' march to gcc-13: `rv64im0_a2p1_f2p2_d2p2_c2p0_v1p0_zicsr2p0_zifencei2p0_zmmul1p0_zacas1p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0` A simple RVA23U64 CPU may have like 40 extensions, a high performance server class CPU may have well over a hundred. Just the scalar crypto ones: `Zbkb, Zbkc, Zbkx, Zknd, Zkne, Zknh, Zksed, Zksh, Zkr, Zkt, Zkn, Zks, Zk` It is no reasonable to add all these as flags. So flags for the collections seems like much better idea. But we probably need to be able to turn off a sub-extension such UseZvknhb. "-XX:+UseVectorCryptoExt:zvknhb=false" Suggestions welcome. Just top of my head, at the moment I need to supply this crazy arch string to qemu, compiler, obj dump and there doesn't seem to be a solution near, so maybe we should be able to supply that arch string to the VM also. `-XX:UseArch=rv64im0_a2p1_f2p2_d2p2_c2p0_v1p0_zicsr2p0_zifencei2p0_zmmul1p0_zacas1p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1387619392 From shade at openjdk.org Thu Nov 9 08:07:59 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 9 Nov 2023 08:07:59 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v2] In-Reply-To: References: <2eXNHrpyHgdJQSGKW0fMQMCi0cVzd6hzaOTo5lLmFpg=.ee20bd05-3b39-4719-9d7e-4f7a54c78e81@github.com> Message-ID: <1NQjqlxQxjUVFhPGRClCNC7QHvwwL3ZLZP_ZxZdVXsM=.90fc67b9-d833-4e59-b675-8b8c51fd9818@github.com> On Mon, 6 Nov 2023 21:20:03 GMT, Daniel D. Daugherty wrote: >>> I think this is good for review. The reproducer that used to hang/fail on assert is now passing. `tier1 tier2 tier3` are all passing. I am running more tests overnight. >> >> Testing seems all good. I'll leave the `Linux` -> `Generic` switch in this PR, until the very last moment before integration to keep testing more easily. > > @shipilev - I'm glad that: > > vmTestbase/nsk/monitoring/ThreadInfo/isSuspended/issuspended002.java > > has proven to useful. I had been thinking about removing it from my weekly stress > kit runs since it has been a long time since I've seen a failure flushed out by that > test running the stress config. I think I'll keep it around for longer... All right! Any other reviewers for this? @dcubed-ojdk, @dholmes-ora, maybe? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1803333889 From shade at openjdk.org Thu Nov 9 08:28:58 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 9 Nov 2023 08:28:58 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v2] In-Reply-To: References: <-6DDgv7dVmV8eB5_putOLjWXq1PQo7BT37MdqsmIV2k=.4ef8c7e3-c7bc-4664-815f-ca46e50cbe12@github.com> Message-ID: On Thu, 9 Nov 2023 07:42:31 GMT, Stefan Karlsson wrote: >> When I changed the thread dump code to deflate monitors, I was just trying to >> clean things up to reduce memory usage. In other words, I was taking advantage >> of the safepoint to clean up as many ObjectMonitors as I could while the threads >> that were generating them were stopped at the safepoint. However, my focus on >> cleanliness cost us pause time for this VM operation. >> >> I would be fine with only requesting deflation from a thread dump when we pass >> a limit of some sort. It would be good to make that limit value an experimental >> flag/option so it could be tuned for diagnostic purposes. > > There are a few options how to move forward with this: > > 1. Stop triggering deflation from the thread dumping code > 2. Only trigger if we pass a given limit, say 100000. > 3. Always trigger monitor deflation > > I believe that long-term it would be best for the JVM if we went with (1). I added (2) just to counter some potential arguments that having too many monitors in the system will make the thread dumping take a long time. It is not clear to me at all that people will notice this, and if they do then maybe we need to tweak the monitor deflation heuristics instead. (3) seems excessive to me. > > I've added the flag AsyncMonitorDeflationForThreadDumpLimit. The long name hints that this flag is in support of something overly specific. I set the default to SIZE_MAX in my support for (1), and hope that we can release-note that we have stopped performing monitor deflation from thread dumping and that this flag is going to be good enough safeguard if there are applications that rely on the monitor deflation for thread dumping. But if reviewers disagree with this, I'm OK with changing the default value to support either (2) or (3). > > Could I get a all the reviewer's here to (re)state the preference on this, included their suggested limit > > I'm also OK with changing the name of the flag if you have a better name, and changing it to an experimental flag if that makes more sense. I would prefer (3), and then consider changing to (1) in a separate PR. This would match current behavior well, and thus would not make more things beyond fixing the interleaving trouble; would eliminate the need to have another flag that would be only temporary until (1) is here; would trigger (pun intended) more discussion about deflation policy once we do (1) -> (3). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1387642105 From fyang at openjdk.org Thu Nov 9 08:44:57 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 9 Nov 2023 08:44:57 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 In-Reply-To: References: Message-ID: On Wed, 8 Nov 2023 14:47:07 GMT, Robbin Ehn wrote: > Hi, please consider. > > Main author is @luhenry, I only fixed some minor things and tested it. > > Such as: > test/hotspot/jtreg/compiler/intrinsics/sha/ > test/jdk/java/security/MessageDigest/ > test/jdk/jdk/security/ > tier1 > > And still running some test. Hi, I am told today that a riscv64 version of sha256/512 for openssl has been added two weeks ago [1][2] Seems their version are more efficient after a brief look. Might be a good reference for us I guess. [1] https://github.com/openssl/openssl/blob/master/crypto/sha/asm/sha256-riscv64-zvkb-zvknha_or_zvknhb.pl [2] https://github.com/openssl/openssl/blob/master/crypto/sha/asm/sha512-riscv64-zvkb-zvknhb.pl ------------- PR Comment: https://git.openjdk.org/jdk/pull/16562#issuecomment-1803383959 From thartmann at openjdk.org Thu Nov 9 09:03:59 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 9 Nov 2023 09:03:59 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v2] In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: On Thu, 9 Nov 2023 04:16:25 GMT, Roger Riggs wrote: >> Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. >> The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. >> >> The changes include: >> >> - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate >> if the input array is modified before the constructor returns. >> The resulting string may contain any combination of characters sampled from the input array. >> >> - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. >> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. >> If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. >> If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. >> Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. >> Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. >> >> - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. >> >> - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. > > Roger Riggs has updated the pull request incrementally with three additional commits since the last revision: > > - Refactored extractCodePoints to avoid multiple resizes if the array was modified > - Replaced isLatin1 implementation with `getChar(buf, ndx) <= 0xff` > It performs better than the single byte array access by avoiding the bounds check. > - Misc updates for review comments, javadoc cleanup > Extra checking on maximum string lengths when calling toBytes(). The VM changes look good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16425#pullrequestreview-1722030583 From liach at openjdk.org Thu Nov 9 09:10:00 2023 From: liach at openjdk.org (Chen Liang) Date: Thu, 9 Nov 2023 09:10:00 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v2] In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> <6Z8gWGyohCB0dp1gfe7-K-HDFpYHtn7jjwTsNt0XujY=.d756475b-6a44-44fc-854c-a5bd5290eb1c@github.com> Message-ID: <38KXJbU5VUW3iZk8YUdffRCQM2i7IyzHfk0q8hdYY5E=.b11199b9-7da1-4dde-bf81-ecc5cb4cd42b@github.com> On Mon, 6 Nov 2023 15:30:46 GMT, Roger Riggs wrote: >> src/java.base/share/classes/java/lang/StringUTF16.java line 202: >> >>> 200: @ForceInline >>> 201: public static byte[] compress(final char[] val, final int off, final int count) { >>> 202: byte[] latin1 = new byte[count]; >> >> Will this redundant array allocation be costly if we are working with mostly-utf16 strings, such as CJK strings with no latin characters? >> >> I suggest we can use a heuristic to read the initial char; if it's utf16 then we skip the latin-1 process altogether (and we can assign the utf16 value to the initial index to ensure it's non-latin-1 compressible. > > We can reconsider this design as a separate PR. > Every additional check has a performance impact and in this bug the goal is to avoid any regression. > > We'll need to gain some insight into the distribution of strings when used in a non-latin1 application. > How many of the strings are latin1 vs non-latin1, what is the distribution of string lengths and which APIs are in use in the applications. The implementation is already pretty good about working with strings of different coders > but there may be some different choices when converting between char arrays and int arrays and strings. Just curious, how does benchmark StringConstructor.newStringFromCharsMixedBegin change before and after this patch? If we can see how much of an impact this has on CJK strings it would be appreciated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1387693255 From ogillespie at openjdk.org Thu Nov 9 09:51:28 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Thu, 9 Nov 2023 09:51:28 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v6] In-Reply-To: References: Message-ID: <_kNC8IkndeW49TdrHW24xQ4mBtDWN1W18bYWELYFm6Y=.cec0ff78-b785-4438-baaf-6e1439d6e533@github.com> > Attempt to fix regressions in class-loading performance caused by fixing a symbol table leak in [JDK-8313678](https://bugs.openjdk.org/browse/JDK-8313678). > > See lengthy discussion in https://bugs.openjdk.org/browse/JDK-8315559 for more background. In short, the leak was providing an accidental cache for temporary symbols, allowing reuse. > > This change keeps new temporary symbols alive in a queue for a short time, allowing them to be re-used by subsequent operations. For example, when attempting to load a class we may call JVM_FindLoadedClass for multiple classloaders back-to-back, and all of them will create a TempNewSymbol for the same string. At present, each call will leave a dead symbol in the table and create a new one. Dead symbols add cleanup and lookup overhead, and insertion is also costly. With this change, the symbol from the first invocation will live for some time after it is used, and subsequent callers can find the symbol alive in the table - avoiding the extra work. > > The queue is bounded, and when full new entries displace the oldest entry. This means symbols are held for the time it takes for 100 new temp symbols to be created. 100 is chosen arbitrarily - the tradeoff is memory usage versus 'cache' hit rate. > > When concurrent symbol table cleanup runs, it also drains the queue. > > In my testing, this brings Dacapo pmd performance back to where it was before the leak was fixed. > > Thanks @shipilev , @coleenp and @MDBijman for helping with this fix. Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: Add missing atomic.hpp include ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16398/files - new: https://git.openjdk.org/jdk/pull/16398/files/fb366040..5c9744bb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16398&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16398&range=04-05 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16398.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16398/head:pull/16398 PR: https://git.openjdk.org/jdk/pull/16398 From mli at openjdk.org Thu Nov 9 10:35:12 2023 From: mli at openjdk.org (Hamlin Li) Date: Thu, 9 Nov 2023 10:35:12 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v5] In-Reply-To: References: Message-ID: > Hi, > Can you review the change to add intrinsic for CompressBits for Long & Integer? > Thanks! > > ##?Test > pass jtreg test: > test/jdk/java/lang/CompressExpand*.java Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: remove extra new line ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16481/files - new: https://git.openjdk.org/jdk/pull/16481/files/3b256e25..dc6dedbf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16481&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16481&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16481.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16481/head:pull/16481 PR: https://git.openjdk.org/jdk/pull/16481 From mli at openjdk.org Thu Nov 9 10:35:15 2023 From: mli at openjdk.org (Hamlin Li) Date: Thu, 9 Nov 2023 10:35:15 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v3] In-Reply-To: References: Message-ID: On Wed, 8 Nov 2023 09:14:04 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/riscv.ad line 1901: >> >>> 1899: >>> 1900: case Op_EncodeISOArray: >>> 1901: return UseRVV && SpecialEncodeISOArray; >> >> Seems that we can remove this extra check for `SpecialEncodeISOArray` (and related code at [1]) and group those 5 cases (which return UseRVV) together. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/vm_version_riscv.cpp#L290-L292 > > I will do it in another pr later. It's in the https://github.com/openjdk/jdk/pull/16580 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1387805436 From mli at openjdk.org Thu Nov 9 10:36:21 2023 From: mli at openjdk.org (Hamlin Li) Date: Thu, 9 Nov 2023 10:36:21 GMT Subject: RFR: 8319781: RISC-V: Refactor UseRVV related checks Message-ID: Hi, Can you review the patch to refactor the code related UseRVV checks? Thanks! There are some code (flag setting/checking) depending on UseRVV's value, these code should be refactored, especially after the change of https://bugs.openjdk.org/browse/JDK-8319408: 1. some code needs to get the final UseRVV's value rather than the initial value, e.g. ChaCha20 intrinsic. 2. refactored to be more readable. 3. also add note to make sure the future code does not make the similar mistakes. ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/16580/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16580&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319781 Stats: 45 lines in 2 files changed: 18 ins; 24 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16580.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16580/head:pull/16580 PR: https://git.openjdk.org/jdk/pull/16580 From tschatzl at openjdk.org Thu Nov 9 10:44:29 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 9 Nov 2023 10:44:29 GMT Subject: RFR: 8318706: Implement JEP 423: Region Pinning for G1 [v16] In-Reply-To: References: Message-ID: <1kvCiT9zD5ZqoLH_HFRtPsh8M78WaFwE6R5ODemjUMs=.2e5b019c-4a3b-4184-8171-49ff5a84c841@github.com> > The JEP covers the idea very well, so I'm only covering some implementation details here: > > * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. > > * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: > > * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. > > * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). > > * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. > > * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) > > The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. > > I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. > > * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in addition to something like: > > `GC(6) P... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: Modify evacuation failure log message as suggested by sjohanss: Use "Evacuation Failure" with a cause description (either "Allocation" or "Pinned") ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16342/files - new: https://git.openjdk.org/jdk/pull/16342/files/83eff9fe..6395696a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=14-15 Stats: 16 lines in 3 files changed: 11 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16342.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16342/head:pull/16342 PR: https://git.openjdk.org/jdk/pull/16342 From tschatzl at openjdk.org Thu Nov 9 10:44:30 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 9 Nov 2023 10:44:30 GMT Subject: RFR: 8318706: Implement JEP 423: Region Pinning for G1 [v9] In-Reply-To: <6_MNWrkq_LnvB4EqrBFNo6e6kVHLgffcRbkxhRPzTpg=.1a05d8af-b181-4e80-9922-6aea9a517564@github.com> References: <6_MNWrkq_LnvB4EqrBFNo6e6kVHLgffcRbkxhRPzTpg=.1a05d8af-b181-4e80-9922-6aea9a517564@github.com> Message-ID: On Wed, 8 Nov 2023 14:46:16 GMT, Stefan Johansson wrote: >> The example looks good to me. > > Have the final output looking something like this was agreed on during internal discussion: > GC(6) Pause Young (Normal) (Evacuation Failure: Pinned) 1M->1M(22M) 36.16ms > GC(6) Pause Young (Normal) (Evacuation Failure: Allocation) 1M->1M(22M) 36.16ms > GC(6) Pause Young (Normal) (Evacuation Failure: Allocation / Pinned) 1M->1M(22M) 36.16ms Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1387817157 From sjohanss at openjdk.org Thu Nov 9 10:48:08 2023 From: sjohanss at openjdk.org (Stefan Johansson) Date: Thu, 9 Nov 2023 10:48:08 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v40] In-Reply-To: References: Message-ID: <4EVNbC2fI1AQGHMkRMfI6SDJrw98KEX0xuRJR1s361o=.d1ff2f1a-ffc9-4454-9755-e6e9e14d9110@github.com> On Thu, 9 Nov 2023 05:27:40 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Add missing cpuTimeCounters files A few more comments. src/hotspot/share/gc/g1/g1ConcurrentRefineThread.cpp line 82: > 80: _vtime_accum = 0.0; > 81: } > 82: maybe_update_threads_cpu_time(); I think the lines above here: if (os::supports_vtime()) { _vtime_accum = (os::elapsedVTime() - _vtime_start); } else { _vtime_accum = 0.0; } Should be extracted out into the new method and instead of calling it `maybe_update_threads_cpu_time()` just call `track_usage()` or `track_cpu_time()`. The the implementation in the primary thread can then call this and do the extra tracking. src/hotspot/share/gc/g1/g1ConcurrentRefineThread.cpp line 189: > 187: void G1PrimaryConcurrentRefineThread::maybe_update_threads_cpu_time() { > 188: if (UsePerfData && os::is_thread_cpu_time_supported()) { > 189: cr()->update_concurrent_refine_threads_cpu_time(); I think we should pull the tracking closure in here and that way leave the concurrent refine class untouched. Suggestion: // The primary thread is responsible for updating the CPU time for all workers. CPUTimeCounters* counters = G1CollectedHeap::heap()->cpu_time_counters(); ThreadTotalCPUTimeClosure tttc(counters, CPUTimeGroups::gc_conc_refine); cr()->threads_do(&tttc); This is more or less a copy from `G1ConcurrentRefineThreadControl::update_threads_cpu_time()` which if we go with this solution can be removed. The above needs some new includes though. I change the comment a because I could not fully understand it, the primary thread is the one always checking and starting more threads so it is not stopped first. Also not sure when a terminated thread could be read. Even the stopped threads are still present so should be fine. If I'm missing something feel free to add back the comment. src/hotspot/share/gc/g1/g1ServiceThread.cpp line 138: > 136: ThreadTotalCPUTimeClosure tttc(counters, CPUTimeGroups::gc_service); > 137: tttc.do_thread(task->_service_thread); > 138: } Please extract this to a function, similar to the other cases something like `track_cpu_time()`. ------------- Changes requested by sjohanss (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15082#pullrequestreview-1722194318 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1387787912 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1387803508 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1387805665 From msheppar at openjdk.org Thu Nov 9 10:51:00 2023 From: msheppar at openjdk.org (Mark Sheppard) Date: Thu, 9 Nov 2023 10:51:00 GMT Subject: RFR: 8319200: Don't use test thread factory in ProcessTools.createLimitedTestJavaProcessBuilder() [v4] In-Reply-To: References: <3damdMQpRBrkUN2S32tBD0Tmrl2tmSqA31NniV8FzHU=.d3a36aa7-d5f4-4a63-b2ff-8b9b616a9637@github.com> Message-ID: On Thu, 9 Nov 2023 05:01:35 GMT, David Holmes wrote: > I remain concerned that this means that a whole swag of tests will never be run with virtual threads, which reduces our virtual thread test coverage. Hard to quantify. Do you have any stats on how many tests this will affect and which ones? I thought the purpose of the createLimitedTestJavaProcessBuilder is for use where vm option are NOT being propagated to the ProcessBuilder, as such the thread factory property wont come into consideration ? the Java doc notes that test using createLimitedTestJavaProcessBuilder would be marked as vm.flagless which would suugest thread factory is not being used. createTestJavaProcessBuilder is for use when vm args are being propagated and as such thread factory can be "injected" via this method. So this change is streamlining these use cases. Is this a correct interpretation? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16442#issuecomment-1803595745 From sjohanss at openjdk.org Thu Nov 9 11:03:11 2023 From: sjohanss at openjdk.org (Stefan Johansson) Date: Thu, 9 Nov 2023 11:03:11 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v40] In-Reply-To: References: Message-ID: <9yVx0YyUxW40p2wV4ee2VB5m6WLOL9CxVcEqUE8W81Q=.a68ded01-b31f-4b09-91dc-04e893ff15eb@github.com> On Thu, 9 Nov 2023 05:27:40 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Add missing cpuTimeCounters files src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 2093: > 2091: tttc.do_thread(cm_thread()); > 2092: threads_do(&tttc); > 2093: } Any particular reason for having this in `G1ConcurrentMark` instead of keeping it where it is called in `G1ConcurrentMarkThread`. The implementation would be more or less the same apart from the two last lines: tttc.do_thread(this); cm()->threads_do(&tttc); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1387839879 From rehn at openjdk.org Thu Nov 9 11:11:58 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 9 Nov 2023 11:11:58 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 In-Reply-To: References: Message-ID: On Wed, 8 Nov 2023 14:47:07 GMT, Robbin Ehn wrote: > Hi, please consider. > > Main author is @luhenry, I only fixed some minor things and tested it. > > Such as: > test/hotspot/jtreg/compiler/intrinsics/sha/ > test/jdk/java/security/MessageDigest/ > test/jdk/jdk/security/ > tier1 > > And still running some test. AFIACT the are pretty much the same, except for constants handling in 256. Openssl preloads the constants into V10->V25 for 256. I think that is beneficial for multi block, but not for single pass. Compare these 512 round 2s: __ vl1re64_v(v15, consts); | @{[vle64_v $V20, ($KT)]} __ addi(consts, consts, 32); | addi $KT, $KT, 32 __ vadd_vv(v14, v15, v12); | @{[vadd_vv $V18, $V20, $V14]} __ vsha2cl_vv(v17, v16, v14); | @{[vsha2cl_vv $V24, $V22, $V18]} __ vsha2ch_vv(v16, v17, v14); | @{[vsha2ch_vv $V22, $V24, $V18]} __ vmerge_vvm(v14, v10, v13); | @{[vmerge_vvm $V18, $V10, $V16, $V0]} __ vsha2ms_vv(v12, v14, v11); | @{[vsha2ms_vv $V14, $V18, $V12]} I suggest we create a new enhancement for preload constants in 256 multi-block and ship this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16562#issuecomment-1803627957 From sjohanss at openjdk.org Thu Nov 9 11:13:06 2023 From: sjohanss at openjdk.org (Stefan Johansson) Date: Thu, 9 Nov 2023 11:13:06 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v40] In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 05:27:40 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Add missing cpuTimeCounters files One more thing I noticed is that the parallel worker time from the Remark pause is not updated after the remark pause. So I guess we should add a call to update that. It will still be account the next time we do a normal GC, but if we want the counter to be up to date as much as possible we should update it after the remark pause. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15082#issuecomment-1803629858 From fyang at openjdk.org Thu Nov 9 11:31:02 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 9 Nov 2023 11:31:02 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 11:09:22 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> Main author is @luhenry, I only fixed some minor things and tested it. >> >> Such as: >> test/hotspot/jtreg/compiler/intrinsics/sha/ >> test/jdk/java/security/MessageDigest/ >> test/jdk/jdk/security/ >> tier1 >> >> And still running some test. > > AFIACT the are pretty much the same, except for constants handling in 256. > Openssl preloads the constants into V10->V25 for 256. > I think that is beneficial for multi block, but not for single pass. > > Compare these 512 round 2s: > > __ vl1re64_v(v15, consts); | @{[vle64_v $V20, ($KT)]} > __ addi(consts, consts, 32); | addi $KT, $KT, 32 > __ vadd_vv(v14, v15, v12); | @{[vadd_vv $V18, $V20, $V14]} > __ vsha2cl_vv(v17, v16, v14); | @{[vsha2cl_vv $V24, $V22, $V18]} > __ vsha2ch_vv(v16, v17, v14); | @{[vsha2ch_vv $V22, $V24, $V18]} > __ vmerge_vvm(v14, v10, v13); | @{[vmerge_vvm $V18, $V10, $V16, $V0]} > __ vsha2ms_vv(v12, v14, v11); | @{[vsha2ms_vv $V14, $V18, $V12]} > > > I suggest we create a new enhancement for preload constants in 256 multi-block and ship this. @robehn : Thanks for checking & comparing those two versions. I think I can take a more closer look tomorrow or maybe next week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16562#issuecomment-1803654862 From rehn at openjdk.org Thu Nov 9 11:41:57 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 9 Nov 2023 11:41:57 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 11:09:22 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> Main author is @luhenry, I only fixed some minor things and tested it. >> >> Such as: >> test/hotspot/jtreg/compiler/intrinsics/sha/ >> test/jdk/java/security/MessageDigest/ >> test/jdk/jdk/security/ >> tier1 >> >> And still running some test. > > AFIACT the are pretty much the same, except for constants handling in 256. > Openssl preloads the constants into V10->V25 for 256. > I think that is beneficial for multi block, but not for single pass. > > Compare these 512 round 2s: > > __ vl1re64_v(v15, consts); | @{[vle64_v $V20, ($KT)]} > __ addi(consts, consts, 32); | addi $KT, $KT, 32 > __ vadd_vv(v14, v15, v12); | @{[vadd_vv $V18, $V20, $V14]} > __ vsha2cl_vv(v17, v16, v14); | @{[vsha2cl_vv $V24, $V22, $V18]} > __ vsha2ch_vv(v16, v17, v14); | @{[vsha2ch_vv $V22, $V24, $V18]} > __ vmerge_vvm(v14, v10, v13); | @{[vmerge_vvm $V18, $V10, $V16, $V0]} > __ vsha2ms_vv(v12, v14, v11); | @{[vsha2ms_vv $V14, $V18, $V12]} > > > I suggest we create a new enhancement for preload constants in 256 multi-block and ship this. > @robehn : Thanks for checking & comparing those two versions. I think I can take a more closer look tomorrow or maybe next week. No rush at all, rdp1 is 4 weeks away :) And we also need to sort out what our future plans is for describing the arch. LLVM supports 100+ extensions now, the plan can't be add these as UseXXX flags :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16562#issuecomment-1803669608 From vkempik at openjdk.org Thu Nov 9 11:50:03 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 9 Nov 2023 11:50:03 GMT Subject: RFR: 8318158: RISC-V: implement roundD/roundF intrinsics [v2] In-Reply-To: References: Message-ID: On Wed, 8 Nov 2023 16:46:15 GMT, Olga Mikhaltsova wrote: >> Please, review this Implementation of the roundD/roundF intrinsics for RISC-V platform. >> >> In the table below it is shown that NaN argument should be processed as a special case. >> >> RISC-V Java >> (FCVT.W.S) (FCVT.L.D) (long round(double a)) (int round(float a)) >> Minimum valid input (after rounding) ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE >> Maximum valid input (after rounding) 2^31 ? 1 2^63 ? 1 Long.MAX_VALUE Integer.MAX_VALUE >> Output for out-of-range negative input ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE >> Output for ?? ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE >> Output for out-of-range positive input 2^31 ? 1 2^63 - 1 Long.MAX_VALUE Integer.MAX_VALUE >> Output for +? 2^31 ? 1 2^63 - 1 Long.MAX_VALUE Integer.MAX_VALUE >> Output for NaN 2^31 ? 1 2^63 - 1 0 0 >> >> The benchmark running with the 2nd fixed implementation on the T-Head RVB-ICE board shows the following performance improvement:: >> >> **Before** >> >> Benchmark (TESTSIZE) Mode Cnt Score Error Units >> FpRoundingBenchmark.test_round_double 2048 thrpt 15 59.555 0.179 ops/ms >> FpRoundingBenchmark.test_round_float 2048 thrpt 15 49.760 0.103 ops/ms >> >> >> **After** >> >> Benchmark (TESTSIZE) Mode Cnt Score Error Units >> FpRoundingBenchmark.test_round_double 2048 thrpt 15 110.956 0.186 ops/ms >> FpRoundingBenchmark.test_round_float 2048 thrpt 15 115.947 0.122 ops/ms > > Olga Mikhaltsova has updated the pull request incrementally with one additional commit since the last revision: > > Fixed intrinsics implementation. Reverted changes of FCVT_SAFE. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4254: > 4252: // dst = 0 > 4253: // if +/-0, +/-subnormal numbers, signaling/quiet NaN > 4254: andi(tmp, tmp, 0b1100111100); Please, update this line ( and same in doubles) for new scheme of working with fclass mask ( https://github.com/openjdk/jdk/pull/16362/files#diff-314214875276cd9a11ecdfd52b68403ded286710ba0820461b0b510506f61a33R1077 ) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16382#discussion_r1387889278 From mbaesken at openjdk.org Thu Nov 9 13:21:27 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 9 Nov 2023 13:21:27 GMT Subject: RFR: 8295159: DSO created with -ffast-math breaks Java floating-point arithmetic [v20] In-Reply-To: References: <5D-3Gy3hU2M8MhKgY2z_NwurLe6mnu9gi3Z08C3Tp-s=.a4d7662f-8ccf-4bc5-8712-4269f9f569a0@github.com> Message-ID: On Fri, 3 Nov 2023 14:46:39 GMT, Andrew Haley wrote: >> A bug in GCC causes shared libraries linked with -ffast-math to disable denormal arithmetic. This breaks Java's floating-point semantics. >> >> The bug is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522 >> >> One solution is to save and restore the floating-point control word around System.loadLibrary(). This isn't perfect, because some shared library might load another shared library at runtime, but it's a lot better than what we do now. >> >> However, this fix is not complete. `dlopen()` is called from many places in the JDK. I guess the best thing to do is find and wrap them all. I'd like to hear people's opinions. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Delete src/hotspot/os/linux/.#os_linux.cpp This leads now to lots of assertions in the jdk tier4 javax/sound tests on our RHEL 9.3 Linux aarch64 box. Opened https://bugs.openjdk.org/browse/JDK-8319708 8319708: Assertion 'fsetenv didn't work' in jdk tier4 tests after 8295159 on Linux aarch64 RHEL9.3 Unfortunately , the shared libs loaded by the JDK native C code (and the dependencies of those libs) are not covered at all, so it is unclear to me what exactly caused the issue. Maybe some sound-related lib on RHEL ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/10661#issuecomment-1803816232 From mdoerr at openjdk.org Thu Nov 9 13:43:20 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 9 Nov 2023 13:43:20 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v9] In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 19:52:41 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64, RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > RISCV port update I have a version which works for PPC64: https://github.com/TheRealMDoerr/jdk/commit/6bff39224e3129a898711a392b64c38b331d79a2 Note that I have implemented a few things slightly differently: - `TemplateTable::load_resolved_method_entry_handle`: I'm loading the method at the end which avoids pushing and popping it on the expression stack which is not so nice IMHO. This works because I'm using a non-volatile register (asserted) for `cache` which is still valid after the C-call in `resolve_oop_handle`. - `TemplateTable::load_resolved_method_entry_interface` and `TemplateTable::load_resolved_method_entry_virtual`: I'm not putting values in registers depending on the flags because it doesn't fit nicely into the PPC64 design. I found myself scratching my head and thinking about what is in the register in which case. Instead of that, I'm loading the fields where they are needed which leads to a much cleaner design. I always know what is in which register this way. Please take a look and take these differences into consideration for other platforms. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15455#issuecomment-1803841494 From ogillespie at openjdk.org Thu Nov 9 14:13:56 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Thu, 9 Nov 2023 14:13:56 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v6] In-Reply-To: <_kNC8IkndeW49TdrHW24xQ4mBtDWN1W18bYWELYFm6Y=.cec0ff78-b785-4438-baaf-6e1439d6e533@github.com> References: <_kNC8IkndeW49TdrHW24xQ4mBtDWN1W18bYWELYFm6Y=.cec0ff78-b785-4438-baaf-6e1439d6e533@github.com> Message-ID: On Thu, 9 Nov 2023 09:51:28 GMT, Oli Gillespie wrote: >> Attempt to fix regressions in class-loading performance caused by fixing a symbol table leak in [JDK-8313678](https://bugs.openjdk.org/browse/JDK-8313678). >> >> See lengthy discussion in https://bugs.openjdk.org/browse/JDK-8315559 for more background. In short, the leak was providing an accidental cache for temporary symbols, allowing reuse. >> >> This change keeps new temporary symbols alive in a queue for a short time, allowing them to be re-used by subsequent operations. For example, when attempting to load a class we may call JVM_FindLoadedClass for multiple classloaders back-to-back, and all of them will create a TempNewSymbol for the same string. At present, each call will leave a dead symbol in the table and create a new one. Dead symbols add cleanup and lookup overhead, and insertion is also costly. With this change, the symbol from the first invocation will live for some time after it is used, and subsequent callers can find the symbol alive in the table - avoiding the extra work. >> >> The queue is bounded, and when full new entries displace the oldest entry. This means symbols are held for the time it takes for 100 new temp symbols to be created. 100 is chosen arbitrarily - the tradeoff is memory usage versus 'cache' hit rate. >> >> When concurrent symbol table cleanup runs, it also drains the queue. >> >> In my testing, this brings Dacapo pmd performance back to where it was before the leak was fixed. >> >> Thanks @shipilev , @coleenp and @MDBijman for helping with this fix. > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Add missing atomic.hpp include src/hotspot/share/oops/symbolHandle.hpp line 99: > 97: if (!_cleanup_delay_enabled) return; > 98: sym->increment_refcount(); > 99: uint i = Atomic::add(&_cleanup_delay_index, 1u) % CLEANUP_DELAY_MAX_ENTRIES; When this wraps around, it will skip a few values - but I don't see a problem with that. x = 4294967293, x % 100 = 93 x = 4294967294, x % 100 = 94 x = 4294967295, x % 100 = 95 x = 0, x % 100 = 0 x = 1, x % 100 = 1 x = 2, x % 100 = 2 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1388061775 From fparain at openjdk.org Thu Nov 9 14:29:59 2023 From: fparain at openjdk.org (Frederic Parain) Date: Thu, 9 Nov 2023 14:29:59 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v9] In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 19:52:41 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64, RISCV > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > RISCV port update Thank you for this great rework. Looks good to me now. ------------- Marked as reviewed by fparain (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15455#pullrequestreview-1722672670 From rrich at openjdk.org Thu Nov 9 14:58:03 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 9 Nov 2023 14:58:03 GMT Subject: RFR: 8318895: Deoptimization results in incorrect lightweight locking stack In-Reply-To: References: Message-ID: On Wed, 8 Nov 2023 19:00:53 GMT, Roman Kennke wrote: > See JBS issue for details. > > I basically: > - took the test-modification and turned it into its own test-case > - added test runners for lightweight- and legacy-locking, so that we keep testing both, no matter what is the default > - added Axels fix (mentioned in the JBS issue) with the modification to only inflate when exec_mode == Unpack_none, as explained by Richard. > > Testing: > - [x] EATests.java > - [x] tier1 > - [ ] tier2 Hi Roman, thanks for opening the pr. I've implemented [another test case](https://github.com/reinrich/jdk/commit/b72b3b3d7d1b5927811ae49e3ddea01d298dcb85) that demonstrates why relocking should be done before an object reference with eliminated locking is passed to a JVMTI agent. Would it be ok to include it in your pr? ([This is a version](https://github.com/reinrich/jdk/commit/f7a90c13e27e9fe38c892f069bd8d58484f59445) where relocking is delayed until the compiled frame is deoptimized. The new test fails with -XX:+UseNewCode). I will put the change through our CI testing. Cheers, Richard. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16568#issuecomment-1803985944 From rriggs at openjdk.org Thu Nov 9 15:15:59 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Thu, 9 Nov 2023 15:15:59 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v2] In-Reply-To: <38KXJbU5VUW3iZk8YUdffRCQM2i7IyzHfk0q8hdYY5E=.b11199b9-7da1-4dde-bf81-ecc5cb4cd42b@github.com> References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> <6Z8gWGyohCB0dp1gfe7-K-HDFpYHtn7jjwTsNt0XujY=.d756475b-6a44-44fc-854c-a5bd5290eb1c@github.com> <38KXJbU5VUW3iZk8YUdffRCQM2i7IyzHfk0q8hdYY5E=.b11199b9-7da1-4dde-bf81-ecc5cb4cd42b@github.com> Message-ID: <0iHPN-yDHDQHS0CS9mGVH-Vn2gQIWWEi07Hgsy1UjGE=.1f33d7bb-f0cd-4016-b337-6aeb3fec2131@github.com> On Thu, 9 Nov 2023 09:07:31 GMT, Chen Liang wrote: > Just curious, how does benchmark StringConstructor.newStringFromCharsMixedBegin change before and after this patch? If we can see how much of an impact this has on CJK strings it would be appreciated. You may have better insights from doing your own runs on your own systems. Here's a sample from a recent run. Mostly small improvements and a few small negatives. Benchmark Linux aarch64 | Linux x64 | MacOSX aarch64 | MacOSX x64 newStringFromCharsMixedBegin-size:64 | 1.22% | 0.91% | -0.81% | 1.08% newStringFromCharsMixedBegin-size:7 | 4.40% | 0.26% | -1.88% | 4.60% ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1388152228 From coleenp at openjdk.org Thu Nov 9 15:30:12 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 9 Nov 2023 15:30:12 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v6] In-Reply-To: <_kNC8IkndeW49TdrHW24xQ4mBtDWN1W18bYWELYFm6Y=.cec0ff78-b785-4438-baaf-6e1439d6e533@github.com> References: <_kNC8IkndeW49TdrHW24xQ4mBtDWN1W18bYWELYFm6Y=.cec0ff78-b785-4438-baaf-6e1439d6e533@github.com> Message-ID: On Thu, 9 Nov 2023 09:51:28 GMT, Oli Gillespie wrote: >> Attempt to fix regressions in class-loading performance caused by fixing a symbol table leak in [JDK-8313678](https://bugs.openjdk.org/browse/JDK-8313678). >> >> See lengthy discussion in https://bugs.openjdk.org/browse/JDK-8315559 for more background. In short, the leak was providing an accidental cache for temporary symbols, allowing reuse. >> >> This change keeps new temporary symbols alive in a queue for a short time, allowing them to be re-used by subsequent operations. For example, when attempting to load a class we may call JVM_FindLoadedClass for multiple classloaders back-to-back, and all of them will create a TempNewSymbol for the same string. At present, each call will leave a dead symbol in the table and create a new one. Dead symbols add cleanup and lookup overhead, and insertion is also costly. With this change, the symbol from the first invocation will live for some time after it is used, and subsequent callers can find the symbol alive in the table - avoiding the extra work. >> >> The queue is bounded, and when full new entries displace the oldest entry. This means symbols are held for the time it takes for 100 new temp symbols to be created. 100 is chosen arbitrarily - the tradeoff is memory usage versus 'cache' hit rate. >> >> When concurrent symbol table cleanup runs, it also drains the queue. >> >> In my testing, this brings Dacapo pmd performance back to where it was before the leak was fixed. >> >> Thanks @shipilev , @coleenp and @MDBijman for helping with this fix. > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Add missing atomic.hpp include This also looks good to me with a couple of minor comments. Kim might be away right now and it appears that you addressed his previous comments with this latest version. src/hotspot/share/oops/symbolHandle.hpp line 47: > 45: template > 46: class SymbolHandleBase : public StackObj { > 47: static const uint CLEANUP_DELAY_MAX_ENTRIES; Since this is const, can this be initialized here? ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16398#pullrequestreview-1722807579 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1388169490 From coleenp at openjdk.org Thu Nov 9 15:30:14 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 9 Nov 2023 15:30:14 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v6] In-Reply-To: References: <_kNC8IkndeW49TdrHW24xQ4mBtDWN1W18bYWELYFm6Y=.cec0ff78-b785-4438-baaf-6e1439d6e533@github.com> Message-ID: On Thu, 9 Nov 2023 14:11:08 GMT, Oli Gillespie wrote: >> Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: >> >> Add missing atomic.hpp include > > src/hotspot/share/oops/symbolHandle.hpp line 99: > >> 97: if (!_cleanup_delay_enabled) return; >> 98: sym->increment_refcount(); >> 99: uint i = Atomic::add(&_cleanup_delay_index, 1u) % CLEANUP_DELAY_MAX_ENTRIES; > > When this wraps around, it will skip a few values - but I don't see a problem with that. > > > x = 4294967293, x % 100 = 93 > x = 4294967294, x % 100 = 94 > x = 4294967295, x % 100 = 95 > x = 0, x % 100 = 0 > x = 1, x % 100 = 1 > x = 2, x % 100 = 2 I don't see a problem either. I wish I were better with integers. I was wondering if the max entries should be a power of two to make the % operation faster. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1388166622 From aph at openjdk.org Thu Nov 9 15:34:06 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 9 Nov 2023 15:34:06 GMT Subject: RFR: 8318158: RISC-V: implement roundD/roundF intrinsics [v2] In-Reply-To: References: Message-ID: On Wed, 8 Nov 2023 16:46:15 GMT, Olga Mikhaltsova wrote: >> Please, review this Implementation of the roundD/roundF intrinsics for RISC-V platform. >> >> In the table below it is shown that NaN argument should be processed as a special case. >> >> RISC-V Java >> (FCVT.W.S) (FCVT.L.D) (long round(double a)) (int round(float a)) >> Minimum valid input (after rounding) ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE >> Maximum valid input (after rounding) 2^31 ? 1 2^63 ? 1 Long.MAX_VALUE Integer.MAX_VALUE >> Output for out-of-range negative input ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE >> Output for ?? ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE >> Output for out-of-range positive input 2^31 ? 1 2^63 - 1 Long.MAX_VALUE Integer.MAX_VALUE >> Output for +? 2^31 ? 1 2^63 - 1 Long.MAX_VALUE Integer.MAX_VALUE >> Output for NaN 2^31 ? 1 2^63 - 1 0 0 >> >> The benchmark running with the 2nd fixed implementation on the T-Head RVB-ICE board shows the following performance improvement:: >> >> **Before** >> >> Benchmark (TESTSIZE) Mode Cnt Score Error Units >> FpRoundingBenchmark.test_round_double 2048 thrpt 15 59.555 0.179 ops/ms >> FpRoundingBenchmark.test_round_float 2048 thrpt 15 49.760 0.103 ops/ms >> >> >> **After** >> >> Benchmark (TESTSIZE) Mode Cnt Score Error Units >> FpRoundingBenchmark.test_round_double 2048 thrpt 15 110.956 0.186 ops/ms >> FpRoundingBenchmark.test_round_float 2048 thrpt 15 115.947 0.122 ops/ms > > Olga Mikhaltsova has updated the pull request incrementally with one additional commit since the last revision: > > Fixed intrinsics implementation. Reverted changes of FCVT_SAFE. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4263: > 4261: fadd_s(ftmp, src, ftmp); > 4262: fcvt_w_s(dst, ftmp, RoundingMode::rdn); > 4263: This still doesn't look right to me. I urge you to test it against the Java implementation over the full 32-bit range. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16382#discussion_r1388177136 From thartmann at openjdk.org Thu Nov 9 15:36:04 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 9 Nov 2023 15:36:04 GMT Subject: RFR: 8319429: Resetting MXCSR flags degrades ecore [v3] In-Reply-To: References: Message-ID: On Tue, 7 Nov 2023 19:51:34 GMT, Volodymyr Paprotski wrote: >> Improves vector rounding on ECore about 10x >> >> (BEFORE) FpRoundingBenchmark.test_round_float 2048 thrpt 3 40.912 ? 0.044 ops/ms >> (AFTER ) FpRoundingBenchmark.test_round_float 2048 thrpt 3 431.682 ? 0.727 ops/ms > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > change option name All tests passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16504#issuecomment-1804056562 From jvernee at openjdk.org Thu Nov 9 15:38:11 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 9 Nov 2023 15:38:11 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v12] In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 00:27:46 GMT, Vladimir Ivanov wrote: > Even though it's straightforward to support on-heap accesses during critical function calls, object pinning would support that for non-critical function calls as well, but proposed API doesn't cover it and new API will be required. What's the plan here? The issue is that most GCs don't support object pinning. I don't think we want an API that only works for some GCs. But, if we do, there's a better API that we can have for just pinning support, which is a `MemorySegment::pin(Arena)` method that returns a MemorySegment wrapping the pinned array. That would allow doing multiple native calls with just a single pin operation, and also allows embedding pointers to pinned segments in other data structures. For the current approach where we make the array accessible for the duration of the native call: without pinning support, other GCs would have to use GCLocker. That means that the native call also has to be relatively short-lived, at which point I figured we might as well drop the thread state transition, since that has the same requirement. I.e. we detect that the call is short-lived, and do the optimization ourselves without the hint from the user (`critical`). This coincidentally also greatly simplifies the implementation. In a prior iteration I did have a separate `allowHeap` `Option` that implied `critical`. But it was suggested to just merge the two together in that case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1388181983 From jvernee at openjdk.org Thu Nov 9 15:43:08 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 9 Nov 2023 15:43:08 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v12] In-Reply-To: References: Message-ID: On Wed, 8 Nov 2023 19:52:00 GMT, Vladimir Ivanov wrote: >> Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: >> >> - a -> an >> - add note to downcallHandle about passing heap segments by-reference > > src/hotspot/cpu/aarch64/downcallLinker_aarch64.cpp line 182: > >> 180: ArgumentShuffle arg_shuffle(filtered_java_regs, out_regs, shuffle_reg); >> 181: >> 182: #ifndef PRODUCT > > Any particular reason to exclude the logging in product builds? `ArgumentShuffle::print_on()` is unconditionally available there. This is partly historical. The log output is only intended for debugging, not for end-user eyes. So, I think I originally excluded it as a way of trimming fat from the product build. Either way, `ArgumentShuffle::print_on` should probably be excluded/included on the same basis. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1388189136 From rkennke at openjdk.org Thu Nov 9 15:54:13 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 9 Nov 2023 15:54:13 GMT Subject: RFR: 8318895: Deoptimization results in incorrect lightweight locking stack [v2] In-Reply-To: References: Message-ID: <90buwH_81LCEUj7bv7Ug4fDC8IbyMDCFcmNfmyd1Hxk=.8747fb2c-265b-41cd-8d74-a576e58adf85@github.com> > See JBS issue for details. > > I basically: > - took the test-modification and turned it into its own test-case > - added test runners for lightweight- and legacy-locking, so that we keep testing both, no matter what is the default > - added Axels fix (mentioned in the JBS issue) with the modification to only inflate when exec_mode == Unpack_none, as explained by Richard. > > Testing: > - [x] EATests.java > - [x] tier1 > - [ ] tier2 Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Add @reinrich's test-case ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16568/files - new: https://git.openjdk.org/jdk/pull/16568/files/d9370df6..966d0a3e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16568&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16568&range=00-01 Stats: 58 lines in 1 file changed: 58 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16568.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16568/head:pull/16568 PR: https://git.openjdk.org/jdk/pull/16568 From jvernee at openjdk.org Thu Nov 9 16:04:28 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 9 Nov 2023 16:04:28 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v13] In-Reply-To: References: Message-ID: > Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. > > The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. > > Components of this patch: > > - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. > - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. > - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. > - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. > - The object/oop + offset is exposed as temporary address to native code. > - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). > - Only x64 and AArch64 for now. > - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 > - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. > - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` > > Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. > > Numbers for the included benchmark on my machine are: > > > Benchmark (size) Mode Cnt ... Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 51 commits: - fix type and reformat doc in Linker - Merge branch 'master' into AllowHeapNoLock - tweak whitespace - a -> an - add note to downcallHandle about passing heap segments by-reference - Merge branch 'master' into AllowHeapNoLock - bump up argument counts in TestLargeStub to their maximum - s390 updates - add stub size stress test for allowHeap - RISC-V impl - ... and 41 more: https://git.openjdk.org/jdk/compare/f9395421...914f4882 ------------- Changes: https://git.openjdk.org/jdk/pull/16201/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16201&range=12 Stats: 2711 lines in 74 files changed: 1722 ins; 692 del; 297 mod Patch: https://git.openjdk.org/jdk/pull/16201.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16201/head:pull/16201 PR: https://git.openjdk.org/jdk/pull/16201 From ogillespie at openjdk.org Thu Nov 9 16:36:20 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Thu, 9 Nov 2023 16:36:20 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v7] In-Reply-To: References: Message-ID: > Attempt to fix regressions in class-loading performance caused by fixing a symbol table leak in [JDK-8313678](https://bugs.openjdk.org/browse/JDK-8313678). > > See lengthy discussion in https://bugs.openjdk.org/browse/JDK-8315559 for more background. In short, the leak was providing an accidental cache for temporary symbols, allowing reuse. > > This change keeps new temporary symbols alive in a queue for a short time, allowing them to be re-used by subsequent operations. For example, when attempting to load a class we may call JVM_FindLoadedClass for multiple classloaders back-to-back, and all of them will create a TempNewSymbol for the same string. At present, each call will leave a dead symbol in the table and create a new one. Dead symbols add cleanup and lookup overhead, and insertion is also costly. With this change, the symbol from the first invocation will live for some time after it is used, and subsequent callers can find the symbol alive in the table - avoiding the extra work. > > The queue is bounded, and when full new entries displace the oldest entry. This means symbols are held for the time it takes for 100 new temp symbols to be created. 100 is chosen arbitrarily - the tradeoff is memory usage versus 'cache' hit rate. > > When concurrent symbol table cleanup runs, it also drains the queue. > > In my testing, this brings Dacapo pmd performance back to where it was before the leak was fixed. > > Thanks @shipilev , @coleenp and @MDBijman for helping with this fix. Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: Set queue size to power of 2, use constant in test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16398/files - new: https://git.openjdk.org/jdk/pull/16398/files/5c9744bb..ed5ae51e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16398&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16398&range=05-06 Stats: 6 lines in 2 files changed: 2 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16398.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16398/head:pull/16398 PR: https://git.openjdk.org/jdk/pull/16398 From ogillespie at openjdk.org Thu Nov 9 16:36:25 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Thu, 9 Nov 2023 16:36:25 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v6] In-Reply-To: References: <_kNC8IkndeW49TdrHW24xQ4mBtDWN1W18bYWELYFm6Y=.cec0ff78-b785-4438-baaf-6e1439d6e533@github.com> Message-ID: On Thu, 9 Nov 2023 15:23:56 GMT, Coleen Phillimore wrote: >> src/hotspot/share/oops/symbolHandle.hpp line 99: >> >>> 97: if (!_cleanup_delay_enabled) return; >>> 98: sym->increment_refcount(); >>> 99: uint i = Atomic::add(&_cleanup_delay_index, 1u) % CLEANUP_DELAY_MAX_ENTRIES; >> >> When this wraps around, it will skip a few values - but I don't see a problem with that. >> >> >> x = 4294967293, x % 100 = 93 >> x = 4294967294, x % 100 = 94 >> x = 4294967295, x % 100 = 95 >> x = 0, x % 100 = 0 >> x = 1, x % 100 = 1 >> x = 2, x % 100 = 2 > > I don't see a problem either. I wish I were better with integers. I was wondering if the max entries should be a power of two to make the % operation faster. Good idea, changed to 128. I'm no expert but the code looks better for it here: https://godbolt.org/z/7cYv9Pzz7. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1388262435 From ogillespie at openjdk.org Thu Nov 9 16:36:23 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Thu, 9 Nov 2023 16:36:23 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v6] In-Reply-To: References: <_kNC8IkndeW49TdrHW24xQ4mBtDWN1W18bYWELYFm6Y=.cec0ff78-b785-4438-baaf-6e1439d6e533@github.com> Message-ID: On Thu, 9 Nov 2023 15:25:56 GMT, Coleen Phillimore wrote: >> Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: >> >> Add missing atomic.hpp include > > src/hotspot/share/oops/symbolHandle.hpp line 47: > >> 45: template >> 46: class SymbolHandleBase : public StackObj { >> 47: static const uint CLEANUP_DELAY_MAX_ENTRIES; > > Since this is const, can this be initialized here? Yes, thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1388262682 From macarte at openjdk.org Thu Nov 9 16:39:02 2023 From: macarte at openjdk.org (Mat Carter) Date: Thu, 9 Nov 2023 16:39:02 GMT Subject: RFR: 8317562: [JFR] Compilation queue statistics [v6] In-Reply-To: References: Message-ID: On Thu, 2 Nov 2023 16:30:32 GMT, Mat Carter wrote: >> Adding a new periodic jfr event to monitor and output statistics for the compiler queues. You will see one event per compiler queue (c1 and c2) >> >> Passes tier1 on linux (x86) and mac (aarch64) > > Mat Carter has updated the pull request incrementally with one additional commit since the last revision: > > Updated test to reflect field name changes @egahlin / @vnkozlov / @MBaesken / @brianjstafford / @JohnTortugo / @karianna - thank you for your reviews and feedback! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16211#issuecomment-1804169861 From coleenp at openjdk.org Thu Nov 9 16:55:10 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 9 Nov 2023 16:55:10 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v6] In-Reply-To: References: <_kNC8IkndeW49TdrHW24xQ4mBtDWN1W18bYWELYFm6Y=.cec0ff78-b785-4438-baaf-6e1439d6e533@github.com> Message-ID: On Thu, 9 Nov 2023 16:33:21 GMT, Oli Gillespie wrote: >> I don't see a problem either. I wish I were better with integers. I was wondering if the max entries should be a power of two to make the % operation faster. > > Good idea, changed to 128. I'm no expert but the code looks better for it here: https://godbolt.org/z/7cYv9Pzz7. I like it! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1388286945 From sviswanathan at openjdk.org Thu Nov 9 17:03:01 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 9 Nov 2023 17:03:01 GMT Subject: RFR: 8319429: Resetting MXCSR flags degrades ecore [v3] In-Reply-To: References: Message-ID: On Tue, 7 Nov 2023 19:51:34 GMT, Volodymyr Paprotski wrote: >> Improves vector rounding on ECore about 10x >> >> (BEFORE) FpRoundingBenchmark.test_round_float 2048 thrpt 3 40.912 ? 0.044 ops/ms >> (AFTER ) FpRoundingBenchmark.test_round_float 2048 thrpt 3 431.682 ? 0.727 ops/ms > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > change option name Thanks a lot Tobias! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16504#issuecomment-1804209994 From mli at openjdk.org Thu Nov 9 17:43:04 2023 From: mli at openjdk.org (Hamlin Li) Date: Thu, 9 Nov 2023 17:43:04 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 In-Reply-To: References: Message-ID: On Wed, 8 Nov 2023 14:47:07 GMT, Robbin Ehn wrote: > Hi, please consider. > > Main author is @luhenry, I only fixed some minor things and tested it. > > Such as: > test/hotspot/jtreg/compiler/intrinsics/sha/ > test/jdk/java/security/MessageDigest/ > test/jdk/jdk/security/ > tier1 > > And still running some test. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 3917: > 3915: //-------------------------------------------------------------------------------- > 3916: // Quad-round 0 (+0, v10->v11->v12->v13) > 3917: __ vl1re32_v(v15, consts); Seems the round 0-11 are quite similar with each other, although with some difference in some src registers, but with similar patterns. Would it be possible and better to group them in a loop to simplify the code? or construct some functions. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4348: > 4346: > 4347: //-------------------------------------------------------------------------------- > 4348: // Quad-round 0 (+0, v10->v11->v12->v13) similar comments as generate_sha256_implCompress about group the rounds in a loop. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1388349103 PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1388352492 From mli at openjdk.org Thu Nov 9 17:43:05 2023 From: mli at openjdk.org (Hamlin Li) Date: Thu, 9 Nov 2023 17:43:05 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 17:33:14 GMT, Hamlin Li wrote: >> Hi, please consider. >> >> Main author is @luhenry, I only fixed some minor things and tested it. >> >> Such as: >> test/hotspot/jtreg/compiler/intrinsics/sha/ >> test/jdk/java/security/MessageDigest/ >> test/jdk/jdk/security/ >> tier1 >> >> And still running some test. > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 3917: > >> 3915: //-------------------------------------------------------------------------------- >> 3916: // Quad-round 0 (+0, v10->v11->v12->v13) >> 3917: __ vl1re32_v(v15, consts); > > Seems the round 0-11 are quite similar with each other, although with some difference in some src registers, but with similar patterns. > Would it be possible and better to group them in a loop to simplify the code? or construct some functions. Seems also that generate_sha256_implCompress and generate_sha512_implCompress can share some code, looks like they are quite similar at a brief look. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1388356849 From vkempik at openjdk.org Thu Nov 9 17:54:03 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 9 Nov 2023 17:54:03 GMT Subject: RFR: 8318158: RISC-V: implement roundD/roundF intrinsics [v2] In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 15:31:02 GMT, Andrew Haley wrote: >> Olga Mikhaltsova has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed intrinsics implementation. Reverted changes of FCVT_SAFE. > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4263: > >> 4261: fadd_s(ftmp, src, ftmp); >> 4262: fcvt_w_s(dst, ftmp, RoundingMode::rdn); >> 4263: > > This still doesn't look right to me. I urge you to test it against the Java implementation over the full 32-bit range. I think it may start working if rounding mode for fadd_s would be changed from default rne, to rdn ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16382#discussion_r1388373754 From shade at openjdk.org Thu Nov 9 18:14:08 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 9 Nov 2023 18:14:08 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v6] In-Reply-To: References: <_kNC8IkndeW49TdrHW24xQ4mBtDWN1W18bYWELYFm6Y=.cec0ff78-b785-4438-baaf-6e1439d6e533@github.com> Message-ID: On Thu, 9 Nov 2023 16:52:26 GMT, Coleen Phillimore wrote: >> Good idea, changed to 128. I'm no expert but the code looks better for it here: https://godbolt.org/z/7cYv9Pzz7. > > I like it! A common thing is to do the mod conversion right here, without relying on optimizing compilers: `% CLEANUP_DELAY_MAX_ENTRIES` -> `& (CLEANUP_DELAY_MAX_ENTRIES - 1)`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1388397036 From duke at openjdk.org Thu Nov 9 18:14:10 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Thu, 9 Nov 2023 18:14:10 GMT Subject: Integrated: 8319429: Resetting MXCSR flags degrades ecore In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 22:32:44 GMT, Volodymyr Paprotski wrote: > Improves vector rounding on ECore about 10x > > (BEFORE) FpRoundingBenchmark.test_round_float 2048 thrpt 3 40.912 ? 0.044 ops/ms > (AFTER ) FpRoundingBenchmark.test_round_float 2048 thrpt 3 431.682 ? 0.727 ops/ms This pull request has now been integrated. Changeset: 636a3519 Author: Volodymyr Paprotski Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/636a35197695698a1f3ec6c7f8da6d95800741ae Stats: 19 lines in 5 files changed: 10 ins; 0 del; 9 mod 8319429: Resetting MXCSR flags degrades ecore Reviewed-by: sviswanathan, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/16504 From mcimadamore at openjdk.org Thu Nov 9 18:20:11 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Thu, 9 Nov 2023 18:20:11 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v12] In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 15:34:38 GMT, Jorn Vernee wrote: >> src/java.base/share/classes/java/lang/foreign/Linker.java line 792: >> >>> 790: * @param allowHeapAccess whether the linked function should allow access to the Java heap. >>> 791: */ >>> 792: static Option critical(boolean allowHeapAccess) { >> >> Speaking of public API, I'm surprised to see critical function property conflated with ability to perform on-heap accesses. These aspects look orthogonal to me. Any particular reason not to represent them as 2 separate `Option`s? >> >> Even though it's straightforward to support on-heap accesses during critical function calls, object pinning would support that for non-critical function calls as well, but proposed API doesn't cover it and new API will be required. What's the plan here? > >> Even though it's straightforward to support on-heap accesses during critical function calls, object pinning would support that for non-critical function calls as well, but proposed API doesn't cover it and new API will be required. What's the plan here? > > The issue is that most GCs don't support object pinning. I don't think we want an API that only works for some GCs. But, if we do, there's a better API that we can have for just pinning support, which is a `MemorySegment::pin(Arena)` method that returns a MemorySegment wrapping the pinned array. That would allow doing multiple native calls with just a single pin operation, and also allows embedding pointers to pinned segments in other data structures. > > For the current approach where we make the array accessible for the duration of the native call: without pinning support, other GCs would have to use GCLocker. That means that the native call also has to be relatively short-lived, at which point I figured we might as well drop the thread state transition, since that has the same requirement. I.e. we detect that the call is short-lived, and do the optimization ourselves without the hint from the user (`critical`). This coincidentally also greatly simplifies the implementation. In a prior iteration I did have a separate `allowHeap` `Option` that implied `critical`. But it was suggested to just merge the two together in that case. I stand by the current design: a GCLocker-based mechanism (as the current implementation is) needs to have similar restrictions both on-heap access and also removal of state transitions. It's true that a more general notion of pinning is possible, which doesn't necessarily require special support from the linker (because we can turn an heap segment into a native segment by pinning it and _then_ pass that to the linker). But at this point in this support for region-based pinning is not mature enough to justify such an API (and, if we'll ever get to that point, that would not invalidate the critical linker options). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1388411265 From shade at openjdk.org Thu Nov 9 18:23:08 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 9 Nov 2023 18:23:08 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v7] In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 16:36:20 GMT, Oli Gillespie wrote: >> Attempt to fix regressions in class-loading performance caused by fixing a symbol table leak in [JDK-8313678](https://bugs.openjdk.org/browse/JDK-8313678). >> >> See lengthy discussion in https://bugs.openjdk.org/browse/JDK-8315559 for more background. In short, the leak was providing an accidental cache for temporary symbols, allowing reuse. >> >> This change keeps new temporary symbols alive in a queue for a short time, allowing them to be re-used by subsequent operations. For example, when attempting to load a class we may call JVM_FindLoadedClass for multiple classloaders back-to-back, and all of them will create a TempNewSymbol for the same string. At present, each call will leave a dead symbol in the table and create a new one. Dead symbols add cleanup and lookup overhead, and insertion is also costly. With this change, the symbol from the first invocation will live for some time after it is used, and subsequent callers can find the symbol alive in the table - avoiding the extra work. >> >> The queue is bounded, and when full new entries displace the oldest entry. This means symbols are held for the time it takes for 100 new temp symbols to be created. 100 is chosen arbitrarily - the tradeoff is memory usage versus 'cache' hit rate. >> >> When concurrent symbol table cleanup runs, it also drains the queue. >> >> In my testing, this brings Dacapo pmd performance back to where it was before the leak was fixed. >> >> Thanks @shipilev , @coleenp and @MDBijman for helping with this fix. > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Set queue size to power of 2, use constant in test Very nice, looks much better. I have cosmetic comments. src/hotspot/share/oops/symbolHandle.hpp line 47: > 45: template > 46: class SymbolHandleBase : public StackObj { > 47: static Symbol* volatile _cleanup_delay_queue[]; This looks off. Should be `static Symbol** volatile _cleanup_delay_queue`, maybe? src/hotspot/share/oops/symbolHandle.hpp line 54: > 52: > 53: public: > 54: static const uint CLEANUP_DELAY_MAX_ENTRIES = 128; `const` -> `constexpr` test/hotspot/gtest/classfile/test_symbolTable.cpp line 37: > 35: ThreadInVMfromNative ThreadInVMfromNative(THREAD); > 36: // Disable the temp symbol cleanup delay queue because it increases refcounts. > 37: TempNewSymbol::set_cleanup_delay_enabled(false); So we have additional check for "enabled" flag in hot production code only to make these tests happy? If so, can we "just" drain the delay queue after new_symbol here? Maybe with helper method here in test? test/hotspot/gtest/classfile/test_symbolTable.cpp line 148: > 146: } > 147: > 148: TEST_VM(SymbolTable, test_cleanup_delay) { Please another test that checks that `drain_cleanup_delay_queue` also does decrement refcounts? ------------- PR Review: https://git.openjdk.org/jdk/pull/16398#pullrequestreview-1723217949 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1388405130 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1388407187 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1388402715 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1388411131 From shade at openjdk.org Thu Nov 9 18:23:09 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 9 Nov 2023 18:23:09 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v6] In-Reply-To: References: <_kNC8IkndeW49TdrHW24xQ4mBtDWN1W18bYWELYFm6Y=.cec0ff78-b785-4438-baaf-6e1439d6e533@github.com> Message-ID: On Thu, 9 Nov 2023 18:08:30 GMT, Aleksey Shipilev wrote: >> I like it! > > A common thing is to do the mod conversion right here, without relying on optimizing compilers: > `% CLEANUP_DELAY_MAX_ENTRIES` -> `& (CLEANUP_DELAY_MAX_ENTRIES - 1)`. If you do `constexpr` for the constant, then you can also do `static_assert(is_power_of_2(CLEANUP_DELAY_MAX_ENTRIES))`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1388409029 From ogillespie at openjdk.org Thu Nov 9 18:27:08 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Thu, 9 Nov 2023 18:27:08 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v7] In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 18:11:33 GMT, Aleksey Shipilev wrote: >> Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: >> >> Set queue size to power of 2, use constant in test > > test/hotspot/gtest/classfile/test_symbolTable.cpp line 37: > >> 35: ThreadInVMfromNative ThreadInVMfromNative(THREAD); >> 36: // Disable the temp symbol cleanup delay queue because it increases refcounts. >> 37: TempNewSymbol::set_cleanup_delay_enabled(false); > > So we have additional check for "enabled" flag in hot production code only to make these tests happy? If so, can we "just" drain the delay queue after new_symbol here? Maybe with helper method here in test? Yes that's true. I can try to avoid it. > test/hotspot/gtest/classfile/test_symbolTable.cpp line 148: > >> 146: } >> 147: >> 148: TEST_VM(SymbolTable, test_cleanup_delay) { > > Please another test that checks that `drain_cleanup_delay_queue` also does decrement refcounts? Good idea, will add ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1388418502 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1388419267 From rkennke at openjdk.org Thu Nov 9 19:59:51 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 9 Nov 2023 19:59:51 GMT Subject: RFR: 8139457: Relax alignment of array elements [v63] In-Reply-To: References: Message-ID: > See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. > > Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. > > Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. > > Testing: > - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] tier1 (x86_64, x86_32, aarch64, riscv) > - [x] tier2 (x86_64, aarch64, riscv) > - [x] tier3 (x86_64, riscv) Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 92 commits: - Merge remote-tracking branch 'upstream/master' into JDK-8139457 - Update copyright headers - Merge branch 'master' into JDK-8139457 - Fix ARM build - Merge remote-tracking branch 'upstream/master' into JDK-8139457 - Various cleanups - RISC changes - Move gap init into allocate_header() (x86) - Fix gtest failure on x86 - Merge remote-tracking branch 'upstream/master' into JDK-8139457 - ... and 82 more: https://git.openjdk.org/jdk/compare/38745eca...bb16aeda ------------- Changes: https://git.openjdk.org/jdk/pull/11044/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11044&range=62 Stats: 628 lines in 33 files changed: 478 ins; 83 del; 67 mod Patch: https://git.openjdk.org/jdk/pull/11044.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/11044/head:pull/11044 PR: https://git.openjdk.org/jdk/pull/11044 From rkennke at openjdk.org Thu Nov 9 20:11:33 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 9 Nov 2023 20:11:33 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v22] In-Reply-To: References: Message-ID: > Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when compaction fails, to indicate that the object has been looked at, but failed compaction into to-space. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: - Merge branch 'JDK-8305896' into JDK-8305898 - Merge branch 'JDK-8305896' into JDK-8305898 - Merge branch 'JDK-8305896' into JDK-8305898 - Merge branch 'JDK-8305896' into JDK-8305898 - Merge branch 'JDK-8305896' into JDK-8305898 - Merge branch 'JDK-8305896' into JDK-8305898 - Update comment about mark-word layout - Merge branch 'JDK-8305896' into JDK-8305898 - Fix tests on 32bit builds - Merge branch 'JDK-8305896' into JDK-8305898 - ... and 24 more: https://git.openjdk.org/jdk/compare/bdc8c823...eee8ab57 ------------- Changes: https://git.openjdk.org/jdk/pull/13779/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13779&range=21 Stats: 101 lines in 8 files changed: 85 ins; 2 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/13779.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13779/head:pull/13779 PR: https://git.openjdk.org/jdk/pull/13779 From rriggs at openjdk.org Thu Nov 9 22:28:58 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Thu, 9 Nov 2023 22:28:58 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v2] In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: On Thu, 9 Nov 2023 04:16:25 GMT, Roger Riggs wrote: >> Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. >> The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. >> >> The changes include: >> >> - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate >> if the input array is modified before the constructor returns. >> The resulting string may contain any combination of characters sampled from the input array. >> >> - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. >> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. >> If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. >> If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. >> Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. >> Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. >> >> - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. >> >> - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. > > Roger Riggs has updated the pull request incrementally with three additional commits since the last revision: > > - Refactored extractCodePoints to avoid multiple resizes if the array was modified > - Replaced isLatin1 implementation with `getChar(buf, ndx) <= 0xff` > It performs better than the single byte array access by avoiding the bounds check. > - Misc updates for review comments, javadoc cleanup > Extra checking on maximum string lengths when calling toBytes(). Claes Redestad contributed performance improvements ------------- PR Comment: https://git.openjdk.org/jdk/pull/16425#issuecomment-1804780787 From matsaave at openjdk.org Thu Nov 9 22:48:28 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 9 Nov 2023 22:48:28 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v10] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. > > This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. > > Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. > > To streamline the review, please consider these major areas that have been changed: > 1. ResolvedMethodEntry class > 2. Rewriter for initialization of the structure > 3. cpCache for resolution > 4. InterpreterRuntime, linkResolver, and templateTable > 5. JVMCI > 6. SA > > Verified with tier 1-9 tests. > > This change supports the following platforms: x86, aarch64, RISCV Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: - PPC port - Improved load_resolved_method_entry_handle on x86 and aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15455/files - new: https://git.openjdk.org/jdk/pull/15455/files/6950709c..ea067795 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15455&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15455&range=08-09 Stats: 378 lines in 7 files changed: 113 ins; 177 del; 88 mod Patch: https://git.openjdk.org/jdk/pull/15455.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15455/head:pull/15455 PR: https://git.openjdk.org/jdk/pull/15455 From matsaave at openjdk.org Thu Nov 9 22:52:12 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 9 Nov 2023 22:52:12 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v10] In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 22:48:28 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64, RISCV, PPC > > Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: > > - PPC port > - Improved load_resolved_method_entry_handle on x86 and aarch64 > I have a version which works for PPC64: [TheRealMDoerr at 6bff392](https://github.com/TheRealMDoerr/jdk/commit/6bff39224e3129a898711a392b64c38b331d79a2) > > Note that I have implemented a few things slightly differently: > > * `TemplateTable::load_resolved_method_entry_handle`: I'm loading the method at the end which avoids pushing and popping it on the expression stack which is not so nice IMHO. This works because I'm using a non-volatile register (asserted) for `cache` which is still valid after the C-call in `resolve_oop_handle`. > > * `TemplateTable::load_resolved_method_entry_interface` and `TemplateTable::load_resolved_method_entry_virtual`: I'm not putting values in registers depending on the flags because it doesn't fit nicely into the PPC64 design. I found myself scratching my head and thinking about what is in the register in which case. Instead of that, I'm loading the fields where they are needed which leads to a much cleaner design. I always know what is in which register this way. > > > Please take a look and take these differences into consideration for other platforms. Thanks! Thank you for the port! I liked your recommendation with regards to invokehandle and added that change to x86 and aarch64 as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15455#issuecomment-1804803551 From lmesnik at openjdk.org Fri Nov 10 01:49:17 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 10 Nov 2023 01:49:17 GMT Subject: RFR: 8319200: Don't use test thread factory in ProcessTools.createLimitedTestJavaProcessBuilder() [v5] In-Reply-To: <3damdMQpRBrkUN2S32tBD0Tmrl2tmSqA31NniV8FzHU=.d3a36aa7-d5f4-4a63-b2ff-8b9b616a9637@github.com> References: <3damdMQpRBrkUN2S32tBD0Tmrl2tmSqA31NniV8FzHU=.d3a36aa7-d5f4-4a63-b2ff-8b9b616a9637@github.com> Message-ID: > Test thread factory is a mode similar to VM flags and should not be used in ProcessTools.createLimitedTestJavaProcessBuilder(). Only createTestJavaProcessBuilder() should use it like jtreg VM options. > > Adding the test thread factory requires the injection of arguments in the middle of the list. I don't think it makes sense to modify arguments in several places so I replaced it with the flag isLimited and moved all modifications in createJavaProcessBuilder(). > > Testing tier1-5. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: variable was renamed. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16442/files - new: https://git.openjdk.org/jdk/pull/16442/files/aa93f71a..bc165dd6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16442&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16442&range=03-04 Stats: 10 lines in 1 file changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/16442.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16442/head:pull/16442 PR: https://git.openjdk.org/jdk/pull/16442 From lmesnik at openjdk.org Fri Nov 10 02:19:56 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 10 Nov 2023 02:19:56 GMT Subject: RFR: 8319200: Don't use test thread factory in ProcessTools.createLimitedTestJavaProcessBuilder() [v5] In-Reply-To: References: <3damdMQpRBrkUN2S32tBD0Tmrl2tmSqA31NniV8FzHU=.d3a36aa7-d5f4-4a63-b2ff-8b9b616a9637@github.com> Message-ID: <2AAk5mCkSkklvI_8dSY85b9Cpik2v_HQsaAuJ-QBrtA=.ad327557-c0f8-41ca-bb71-bb2ea42cd78a@github.com> On Fri, 10 Nov 2023 01:49:17 GMT, Leonid Mesnik wrote: >> Test thread factory is a mode similar to VM flags and should not be used in ProcessTools.createLimitedTestJavaProcessBuilder(). Only createTestJavaProcessBuilder() should use it like jtreg VM options. >> >> Adding the test thread factory requires the injection of arguments in the middle of the list. I don't think it makes sense to modify arguments in several places so I replaced it with the flag isLimited and moved all modifications in createJavaProcessBuilder(). >> >> Testing tier1-5. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > variable was renamed. The createLimitedTestJavaProcessBuilder is used in 410 tests (hotspot and jdk). 233 of them are flagless and not supposed to be executed with any additional VM flags. (They should be reviewed by it is a separate issue). 177 are not marked as flagless and should be updated to be flagless or use createTestJavaProcessBuilder. The 'createLimitedTestJavaProcessBuilder' method should be used only when the process not accept any flags and it is logical to assume that thread factory shouldn't be used either. I think it just makes consistent the createLimitedTestJavaProcessBuilder and createTestJavaProcessBuilder methods. It was not my original goal to add test thread factory part of createLimitedTestJavaProcessBuilder, I just missed that original createJavaProcessBuilder is applied to all processes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16442#issuecomment-1804972301 From dholmes at openjdk.org Fri Nov 10 02:33:56 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 10 Nov 2023 02:33:56 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v5] In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 06:44:40 GMT, Stefan Karlsson wrote: >> I think in this case we already know that all the monitors in the closure are owned by the expected owner. But I wonder if we should/can assert that? > > The filter accepts all monitors. The filtering that only returns owned monitors is done inside the called `owned_monitors_iterate_filtered`: > > > void ObjectSynchronizer::owned_monitors_iterate_filtered(MonitorClosure* closure, OwnerFilter filter) { > ... > if (mid->has_owner() && filter(mid->owner_raw())) { > ... > closure->do_monitor(mid); > } > > > The closure is only applied to monitors that "have an owner". Maybe "owned monitors" sounds too much as if the function only visit all monitors owned by the current thread? No I was reading too much into it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1388804233 From dholmes at openjdk.org Fri Nov 10 03:01:56 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 10 Nov 2023 03:01:56 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v2] In-Reply-To: <1NQjqlxQxjUVFhPGRClCNC7QHvwwL3ZLZP_ZxZdVXsM=.90fc67b9-d833-4e59-b675-8b8c51fd9818@github.com> References: <2eXNHrpyHgdJQSGKW0fMQMCi0cVzd6hzaOTo5lLmFpg=.ee20bd05-3b39-4719-9d7e-4f7a54c78e81@github.com> <1NQjqlxQxjUVFhPGRClCNC7QHvwwL3ZLZP_ZxZdVXsM=.90fc67b9-d833-4e59-b675-8b8c51fd9818@github.com> Message-ID: <4mUQ3GbDbMxygTGDJUVg3XoSWYdNUVz6skudd-s9eTA=.19046b10-7dac-4b28-ae07-75b6f3519d43@github.com> On Thu, 9 Nov 2023 08:04:47 GMT, Aleksey Shipilev wrote: >> @shipilev - I'm glad that: >> >> vmTestbase/nsk/monitoring/ThreadInfo/isSuspended/issuspended002.java >> >> has proven to useful. I had been thinking about removing it from my weekly stress >> kit runs since it has been a long time since I've seen a failure flushed out by that >> test running the stress config. I think I'll keep it around for longer... > > All right! Any other reviewers for this? @dcubed-ojdk, @dholmes-ora, maybe? Sorry @shipilev I'm not in a position to evaluate the correctness or performance of this code. As usual I'm interested in any general performance impact this may have, and any new pathologies it may introduce. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1804999975 From gcao at openjdk.org Fri Nov 10 04:23:58 2023 From: gcao at openjdk.org (Gui Cao) Date: Fri, 10 Nov 2023 04:23:58 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v2] In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: <1qOoa9SlbVqiyZNIhT4L3gq3R9YIjrxXpDsFYz8LoUw=.3a72fff6-6216-456f-a999-6fe01c71d179@github.com> On Thu, 9 Nov 2023 04:16:25 GMT, Roger Riggs wrote: >> Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. >> The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. >> >> The changes include: >> >> - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate >> if the input array is modified before the constructor returns. >> The resulting string may contain any combination of characters sampled from the input array. >> >> - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. >> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. >> If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. >> If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. >> Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. >> Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. >> >> - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. >> >> - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. > > Roger Riggs has updated the pull request incrementally with three additional commits since the last revision: > > - Refactored extractCodePoints to avoid multiple resizes if the array was modified > - Replaced isLatin1 implementation with `getChar(buf, ndx) <= 0xff` > It performs better than the single byte array access by avoiding the bounds check. > - Misc updates for review comments, javadoc cleanup > Extra checking on maximum string lengths when calling toBytes(). Hi, I have performed tier1-3 test on linux-riscv64 using QEMU with -XX:+UseRVV. Result looks good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16425#issuecomment-1805063605 From dholmes at openjdk.org Fri Nov 10 04:32:59 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 10 Nov 2023 04:32:59 GMT Subject: RFR: 8319200: Don't use test thread factory in ProcessTools.createLimitedTestJavaProcessBuilder() [v5] In-Reply-To: References: <3damdMQpRBrkUN2S32tBD0Tmrl2tmSqA31NniV8FzHU=.d3a36aa7-d5f4-4a63-b2ff-8b9b616a9637@github.com> Message-ID: On Fri, 10 Nov 2023 01:49:17 GMT, Leonid Mesnik wrote: >> Test thread factory is a mode similar to VM flags and should not be used in ProcessTools.createLimitedTestJavaProcessBuilder(). Only createTestJavaProcessBuilder() should use it like jtreg VM options. >> >> Adding the test thread factory requires the injection of arguments in the middle of the list. I don't think it makes sense to modify arguments in several places so I replaced it with the flag isLimited and moved all modifications in createJavaProcessBuilder(). >> >> Testing tier1-5. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > variable was renamed. Okay, sorry, I've misunderstood the nature of this issue. This change has no affect on whether jtreg runs a specific test (or set of tests) using the virtual thread wrapper to launch the test. This only affects code that uses ProcessTools and where the launched JVM would examine the `test.thread.factory` property. How many tests actually look at this property directly in code that is run via ProcessTools? The only tests that appear to look at this property are JDB and JDI tests, where we have a test framework that selects use of virtual threads based on this property. I couldn't find any test that would actually be affected by this. Sorry for the noise. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16442#issuecomment-1805069620 From dholmes at openjdk.org Fri Nov 10 04:44:57 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 10 Nov 2023 04:44:57 GMT Subject: RFR: 8319200: Don't use test thread factory in ProcessTools.createLimitedTestJavaProcessBuilder() [v5] In-Reply-To: References: <3damdMQpRBrkUN2S32tBD0Tmrl2tmSqA31NniV8FzHU=.d3a36aa7-d5f4-4a63-b2ff-8b9b616a9637@github.com> Message-ID: <8wapJbtnZDHwUMUTGHUeF8nR51Jbi_fjofhDowyVIU4=.8d911151-87e0-41cc-98e7-73343716c881@github.com> On Fri, 10 Nov 2023 01:49:17 GMT, Leonid Mesnik wrote: >> Test thread factory is a mode similar to VM flags and should not be used in ProcessTools.createLimitedTestJavaProcessBuilder(). Only createTestJavaProcessBuilder() should use it like jtreg VM options. >> >> Adding the test thread factory requires the injection of arguments in the middle of the list. I don't think it makes sense to modify arguments in several places so I replaced it with the flag isLimited and moved all modifications in createJavaProcessBuilder(). >> >> Testing tier1-5. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > variable was renamed. Uggghhh no I've still misunderstood what this is doing. `addTestThreadFactoryArgs` will create a wrapper to run the specified main class in a virtual thread. So if the test using `createLimitedTestJavaProcessBuilder` was being run in a virtual thread, then the exec'd "app" would be too. Plus it would have the property set. So this is potentially reducing virtual thread test coverage. That said I'm bowing out of this discussion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16442#issuecomment-1805077951 From gcao at openjdk.org Fri Nov 10 07:16:16 2023 From: gcao at openjdk.org (Gui Cao) Date: Fri, 10 Nov 2023 07:16:16 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v10] In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 22:48:59 GMT, Matias Saavedra Silva wrote: >> Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: >> >> - PPC port >> - Improved load_resolved_method_entry_handle on x86 and aarch64 > >> I have a version which works for PPC64: [TheRealMDoerr at 6bff392](https://github.com/TheRealMDoerr/jdk/commit/6bff39224e3129a898711a392b64c38b331d79a2) >> >> Note that I have implemented a few things slightly differently: >> >> * `TemplateTable::load_resolved_method_entry_handle`: I'm loading the method at the end which avoids pushing and popping it on the expression stack which is not so nice IMHO. This works because I'm using a non-volatile register (asserted) for `cache` which is still valid after the C-call in `resolve_oop_handle`. >> >> * `TemplateTable::load_resolved_method_entry_interface` and `TemplateTable::load_resolved_method_entry_virtual`: I'm not putting values in registers depending on the flags because it doesn't fit nicely into the PPC64 design. I found myself scratching my head and thinking about what is in the register in which case. Instead of that, I'm loading the fields where they are needed which leads to a much cleaner design. I always know what is in which register this way. >> >> >> Please take a look and take these differences into consideration for other platforms. Thanks! > > Thank you for the port! I liked your recommendation with regards to invokehandle and added that change to x86 and aarch64 as well. Hi @matias9927 : This change [1] for aarch64 and x86 looks nice and I am trying to prepare a similar one for riscv. But I have a small question regarding this change. I see we are using `r2` for `cache` on aarch64 which is a volatile (or caller-saved) register according to the ABI. So the value in `cache` won't be preserved accross the C-call in resolve_oop_handle. Will this work? Is there anything I missed? Thanks. [1] https://github.com/openjdk/jdk/pull/15455/commits/7d2a20ccf63f6140f7ba96904af2e3d89c6ab370 ------------- PR Comment: https://git.openjdk.org/jdk/pull/15455#issuecomment-1805214695 From rrich at openjdk.org Fri Nov 10 07:25:58 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 10 Nov 2023 07:25:58 GMT Subject: RFR: 8318895: Deoptimization results in incorrect lightweight locking stack [v2] In-Reply-To: <90buwH_81LCEUj7bv7Ug4fDC8IbyMDCFcmNfmyd1Hxk=.8747fb2c-265b-41cd-8d74-a576e58adf85@github.com> References: <90buwH_81LCEUj7bv7Ug4fDC8IbyMDCFcmNfmyd1Hxk=.8747fb2c-265b-41cd-8d74-a576e58adf85@github.com> Message-ID: On Thu, 9 Nov 2023 15:54:13 GMT, Roman Kennke wrote: >> See JBS issue for details. >> >> I basically: >> - took the test-modification and turned it into its own test-case >> - added test runners for lightweight- and legacy-locking, so that we keep testing both, no matter what is the default >> - added Axels fix (mentioned in the JBS issue) with the modification to only inflate when exec_mode == Unpack_none, as explained by Richard. >> >> Testing: >> - [x] EATests.java >> - [x] tier1 >> - [ ] tier2 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Add @reinrich's test-case Fix and new test case look good to me. Local testing was clean. Thanks, Richard. test/jdk/com/sun/jdi/EATests.java line 1755: > 1753: ///////////////////////////////////////////////////////////////////////////// > 1754: > 1755: // The debugger reads and publishes an object with eliminated locking to a static variable. Suggestion: // The debugger reads and publishes an object with eliminated locking to an instance field. ------------- Marked as reviewed by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16568#pullrequestreview-1724167589 PR Review Comment: https://git.openjdk.org/jdk/pull/16568#discussion_r1389009395 From dholmes at openjdk.org Fri Nov 10 07:51:01 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 10 Nov 2023 07:51:01 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v5] In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 06:52:23 GMT, Stefan Karlsson wrote: >> src/hotspot/share/runtime/synchronizer.hpp line 135: >> >>> 133: >>> 134: // Iterate owned ObjectMonitors. >>> 135: static void owned_monitors_iterate(MonitorClosure* closure); >> >> owned by whom? current thread? Does that include stack-locked or not? >> >> Just trying to understand how the two variants of `owned_monitors_iterate` relate. > > Owned by *any* thread in any way. > > * `owned_monitors_iterate(MonitorClosure* closure)` - Visits all monitors with the owner set to anything that indicates that the monitor has an owner (`ObjectMonitor::has_owner()`). > > * `owned_monitors_iterate(MonitorClosure* m, JavaThread* thread)` - Visits all monitors with the owner field set to the specified `thread`. > > Maybe we could figure out more descriptive names for these. Updating the comment would be fine: // Iterate ObjectMonitors owned by any thread. Thanks >> test/hotspot/jtreg/runtime/Monitor/ConcurrentDeflation.java line 78: >> >>> 76: monitors[index] = new Object(); >>> 77: synchronized (monitors[index]) { >>> 78: } >> >> I would expect C2 to eliminate this as well. The monitors are provably thread-local so synchronization is a no-op. > > Done It still might eliminate locking due to the empty block. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1388805691 PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1388807213 From dholmes at openjdk.org Fri Nov 10 07:51:05 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 10 Nov 2023 07:51:05 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v6] In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 07:29:15 GMT, Stefan Karlsson wrote: >> A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... > > Stefan Karlsson has updated the pull request incrementally with five additional commits since the last revision: > > - Tweak the flag comment a bit > - Add AsyncMonitorDeflationForThreadDumpLimit flag > - Typos > - Remove comment in do_monitors > - Make monitors array public static test/hotspot/jtreg/runtime/Monitor/ConcurrentDeflation.java line 70: > 68: > 69: static private void createMonitors() { > 70: monitors = new Object[1000]; `monitors` is already initialized. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1388806485 From stefank at openjdk.org Fri Nov 10 07:51:02 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 10 Nov 2023 07:51:02 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v5] In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 02:34:48 GMT, David Holmes wrote: >> Owned by *any* thread in any way. >> >> * `owned_monitors_iterate(MonitorClosure* closure)` - Visits all monitors with the owner set to anything that indicates that the monitor has an owner (`ObjectMonitor::has_owner()`). >> >> * `owned_monitors_iterate(MonitorClosure* m, JavaThread* thread)` - Visits all monitors with the owner field set to the specified `thread`. >> >> Maybe we could figure out more descriptive names for these. > > Updating the comment would be fine: > > // Iterate ObjectMonitors owned by any thread. > > Thanks Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1389029900 From stefank at openjdk.org Fri Nov 10 07:51:06 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 10 Nov 2023 07:51:06 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v6] In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 02:36:47 GMT, David Holmes wrote: >> Stefan Karlsson has updated the pull request incrementally with five additional commits since the last revision: >> >> - Tweak the flag comment a bit >> - Add AsyncMonitorDeflationForThreadDumpLimit flag >> - Typos >> - Remove comment in do_monitors >> - Make monitors array public static > > test/hotspot/jtreg/runtime/Monitor/ConcurrentDeflation.java line 70: > >> 68: >> 69: static private void createMonitors() { >> 70: monitors = new Object[1000]; > > `monitors` is already initialized. Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1389031286 From stefank at openjdk.org Fri Nov 10 08:19:32 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 10 Nov 2023 08:19:32 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v7] In-Reply-To: References: Message-ID: > A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... Stefan Karlsson has updated the pull request incrementally with three additional commits since the last revision: - Remove the limit for deflation requests - Remove reinitialization in test - Update comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16519/files - new: https://git.openjdk.org/jdk/pull/16519/files/103d917a..560c67df Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16519&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16519&range=05-06 Stats: 24 lines in 5 files changed: 4 ins; 12 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/16519.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16519/head:pull/16519 PR: https://git.openjdk.org/jdk/pull/16519 From stefank at openjdk.org Fri Nov 10 08:19:33 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 10 Nov 2023 08:19:33 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v2] In-Reply-To: References: <-6DDgv7dVmV8eB5_putOLjWXq1PQo7BT37MdqsmIV2k=.4ef8c7e3-c7bc-4664-815f-ca46e50cbe12@github.com> Message-ID: <7m5mCjTk8Y7CapTZMO6x8FSU87D30a3t5MF4HsLZa5o=.f767fb92-376e-436c-801e-c3276469fc84@github.com> On Thu, 9 Nov 2023 08:25:58 GMT, Aleksey Shipilev wrote: >> There are a few options how to move forward with this: >> >> 1. Stop triggering deflation from the thread dumping code >> 2. Only trigger if we pass a given limit, say 100000. >> 3. Always trigger monitor deflation >> >> I believe that long-term it would be best for the JVM if we went with (1). I added (2) just to counter some potential arguments that having too many monitors in the system will make the thread dumping take a long time. It is not clear to me at all that people will notice this, and if they do then maybe we need to tweak the monitor deflation heuristics instead. (3) seems excessive to me. >> >> I've added the flag AsyncMonitorDeflationForThreadDumpLimit. The long name hints that this flag is in support of something overly specific. I set the default to SIZE_MAX in my support for (1), and hope that we can release-note that we have stopped performing monitor deflation from thread dumping and that this flag is going to be good enough safeguard if there are applications that rely on the monitor deflation for thread dumping. But if reviewers disagree with this, I'm OK with changing the default value to support either (2) or (3). >> >> Could I get a all the reviewer's here to (re)state the preference on this, included their suggested limit >> >> I'm also OK with changing the name of the flag if you have a better name, and changing it to an experimental flag if that makes more sense. > > I would prefer (3), and then consider changing to (1) in a separate PR. This would match current behavior well, and thus would not make more things beyond fixing the interleaving trouble; would eliminate the need to have another flag that would be only temporary until (1) is here; would trigger (pun intended) more discussion about deflation policy once we do (1) -> (3). I didn't get much more feedback on this, so I went with Aleksey's suggestion and ripped out the flag and the ability to set a limit. The code now always requests monitor deflation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1389056205 From stuefe at openjdk.org Fri Nov 10 08:38:58 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 10 Nov 2023 08:38:58 GMT Subject: RFR: JDK-8319437: NMT should show library names in call stacks In-Reply-To: References: Message-ID: On Wed, 8 Nov 2023 06:05:58 GMT, David Holmes wrote: > Seems reasonable. > > Thanks Thank you, David! Need a second review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16508#issuecomment-1805309002 From adinn at openjdk.org Fri Nov 10 08:50:09 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Fri, 10 Nov 2023 08:50:09 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v10] In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 22:48:59 GMT, Matias Saavedra Silva wrote: >> Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: >> >> - PPC port >> - Improved load_resolved_method_entry_handle on x86 and aarch64 > >> I have a version which works for PPC64: [TheRealMDoerr at 6bff392](https://github.com/TheRealMDoerr/jdk/commit/6bff39224e3129a898711a392b64c38b331d79a2) >> >> Note that I have implemented a few things slightly differently: >> >> * `TemplateTable::load_resolved_method_entry_handle`: I'm loading the method at the end which avoids pushing and popping it on the expression stack which is not so nice IMHO. This works because I'm using a non-volatile register (asserted) for `cache` which is still valid after the C-call in `resolve_oop_handle`. >> >> * `TemplateTable::load_resolved_method_entry_interface` and `TemplateTable::load_resolved_method_entry_virtual`: I'm not putting values in registers depending on the flags because it doesn't fit nicely into the PPC64 design. I found myself scratching my head and thinking about what is in the register in which case. Instead of that, I'm loading the fields where they are needed which leads to a much cleaner design. I always know what is in which register this way. >> >> >> Please take a look and take these differences into consideration for other platforms. Thanks! > > Thank you for the port! I liked your recommendation with regards to invokehandle and added that change to x86 and aarch64 as well. @matias9927 aarch64 changes still look good after applying Martin's tweak. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15455#issuecomment-1805322632 From mdoerr at openjdk.org Fri Nov 10 09:31:19 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 10 Nov 2023 09:31:19 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v10] In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 22:48:59 GMT, Matias Saavedra Silva wrote: >> Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: >> >> - PPC port >> - Improved load_resolved_method_entry_handle on x86 and aarch64 > >> I have a version which works for PPC64: [TheRealMDoerr at 6bff392](https://github.com/TheRealMDoerr/jdk/commit/6bff39224e3129a898711a392b64c38b331d79a2) >> >> Note that I have implemented a few things slightly differently: >> >> * `TemplateTable::load_resolved_method_entry_handle`: I'm loading the method at the end which avoids pushing and popping it on the expression stack which is not so nice IMHO. This works because I'm using a non-volatile register (asserted) for `cache` which is still valid after the C-call in `resolve_oop_handle`. >> >> * `TemplateTable::load_resolved_method_entry_interface` and `TemplateTable::load_resolved_method_entry_virtual`: I'm not putting values in registers depending on the flags because it doesn't fit nicely into the PPC64 design. I found myself scratching my head and thinking about what is in the register in which case. Instead of that, I'm loading the fields where they are needed which leads to a much cleaner design. I always know what is in which register this way. >> >> >> Please take a look and take these differences into consideration for other platforms. Thanks! > > Thank you for the port! I liked your recommendation with regards to invokehandle and added that change to x86 and aarch64 as well. > Hi @matias9927 : This change [1] for aarch64 and x86 looks nice and I am trying to prepare a similar one for riscv. But I have a small question regarding this change. I see we are using `r2` for `cache` on aarch64 which is a volatile (or caller-saved) register according to the ABI. So the value in `cache` won't be preserved accross the C-call in resolve_oop_handle. Will this work? Is there anything I missed? Thanks. > > [1] [7d2a20c](https://github.com/openjdk/jdk/commit/7d2a20ccf63f6140f7ba96904af2e3d89c6ab370) Please note that I'm using `assert(cache->is_nonvolatile(), "C-call in resolve_oop_handle");` on PPC64 which makes it reliable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15455#issuecomment-1805376992 From fyang at openjdk.org Fri Nov 10 09:40:57 2023 From: fyang at openjdk.org (Fei Yang) Date: Fri, 10 Nov 2023 09:40:57 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v5] In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 10:35:12 GMT, Hamlin Li wrote: >> Hi, >> Can you review the change to add intrinsic for CompressBits for Long & Integer? >> Thanks! >> >> ##?Test >> pass jtreg test: >> test/jdk/java/lang/CompressExpand*.java > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > remove extra new line Changes requested by fyang (Reviewer). src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1687: > 1685: Assembler::SEW sew = is_long ? Assembler::e64 : Assembler::e32; > 1686: // intrinsic is enabled when MaxVectorSize >= 16 > 1687: Assembler::LMUL lmul = is_long ? Assembler::m4 : Assembler::m2; A `lmul` of `m2` or `m4` means vector register group of 2 or 4 registers respectively. So <`v4`, `v5`> and <`v8`, `v9`> will be used in the case of `m2`. But I only see `v4` and `v8` are reserved for CompressBits nodes. You should also reserve `v5` and `v9` for this case. Similar for the `m4` case. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1694: > 1692: vmv_s_x(v0, src); > 1693: // reset the src data(in bytes) to zero. > 1694: mv(tmp, len); Can we use scratch register `t0` instead of `tmp` in this function? src/hotspot/cpu/riscv/riscv_v.ad line 2884: > 2882: > 2883: instruct compressBitsI(iRegINoSp dst, iRegIorL2I src, iRegIorL2I mask, iRegPNoSp tmp, vRegMask_V0 v0, vReg_V4 v4, vReg_V8 v8) %{ > 2884: predicate(UseRVV); Seems this UseRVV check is also redundant. It has already been checked in `Matcher::match_rule_supported` for this node. But seems that this is not the first one. So we might want do this cleanup in your next PR: https://github.com/openjdk/jdk/pull/16580 src/hotspot/cpu/riscv/riscv_v.ad line 2909: > 2907: > 2908: instruct compressBitsL(iRegLNoSp dst, iRegL src, iRegL mask, iRegPNoSp tmp, vRegMask_V0 v0, vReg_V4 v4, vReg_V8 v8) %{ > 2909: predicate(UseRVV); Similar here. ------------- PR Review: https://git.openjdk.org/jdk/pull/16481#pullrequestreview-1724331115 PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1389112263 PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1389113911 PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1389143653 PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1389143814 From gcao at openjdk.org Fri Nov 10 09:50:18 2023 From: gcao at openjdk.org (Gui Cao) Date: Fri, 10 Nov 2023 09:50:18 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v10] In-Reply-To: References: Message-ID: <4W-89QFjuemLrQc2yPo58HM-Yq8W01CWLBGaWZ6npZ8=.dd48ccb4-1deb-4091-ae4b-12e9d2d94af4@github.com> On Thu, 9 Nov 2023 22:48:59 GMT, Matias Saavedra Silva wrote: >> Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: >> >> - PPC port >> - Improved load_resolved_method_entry_handle on x86 and aarch64 > >> I have a version which works for PPC64: [TheRealMDoerr at 6bff392](https://github.com/TheRealMDoerr/jdk/commit/6bff39224e3129a898711a392b64c38b331d79a2) >> >> Note that I have implemented a few things slightly differently: >> >> * `TemplateTable::load_resolved_method_entry_handle`: I'm loading the method at the end which avoids pushing and popping it on the expression stack which is not so nice IMHO. This works because I'm using a non-volatile register (asserted) for `cache` which is still valid after the C-call in `resolve_oop_handle`. >> >> * `TemplateTable::load_resolved_method_entry_interface` and `TemplateTable::load_resolved_method_entry_virtual`: I'm not putting values in registers depending on the flags because it doesn't fit nicely into the PPC64 design. I found myself scratching my head and thinking about what is in the register in which case. Instead of that, I'm loading the fields where they are needed which leads to a much cleaner design. I always know what is in which register this way. >> >> >> Please take a look and take these differences into consideration for other platforms. Thanks! > > Thank you for the port! I liked your recommendation with regards to invokehandle and added that change to x86 and aarch64 as well. > > Hi @matias9927 : This change [1] for aarch64 and x86 looks nice and I am trying to prepare a similar one for riscv. But I have a small question regarding this change. I see we are using `r2` for `cache` on aarch64 which is a volatile (or caller-saved) register according to the ABI. So the value in `cache` won't be preserved accross the C-call in resolve_oop_handle. Will this work? Is there anything I missed? Thanks. > > [1] [7d2a20c](https://github.com/openjdk/jdk/commit/7d2a20ccf63f6140f7ba96904af2e3d89c6ab370) > > Please note that I'm using `assert(cache->is_nonvolatile(), "C-call in resolve_oop_handle");` on PPC64 which makes it reliable. Hi Martin, Yes, I see that assertion for PPC. But my question is about aarch64 where `cache` could alias `r2` which is an volatile/caller-save register according to the aarch64 ABI. I am not sure if this is OK. Maybe I missed anything? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15455#issuecomment-1805410644 From mdoerr at openjdk.org Fri Nov 10 09:54:15 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 10 Nov 2023 09:54:15 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v10] In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 22:48:59 GMT, Matias Saavedra Silva wrote: >> Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: >> >> - PPC port >> - Improved load_resolved_method_entry_handle on x86 and aarch64 > >> I have a version which works for PPC64: [TheRealMDoerr at 6bff392](https://github.com/TheRealMDoerr/jdk/commit/6bff39224e3129a898711a392b64c38b331d79a2) >> >> Note that I have implemented a few things slightly differently: >> >> * `TemplateTable::load_resolved_method_entry_handle`: I'm loading the method at the end which avoids pushing and popping it on the expression stack which is not so nice IMHO. This works because I'm using a non-volatile register (asserted) for `cache` which is still valid after the C-call in `resolve_oop_handle`. >> >> * `TemplateTable::load_resolved_method_entry_interface` and `TemplateTable::load_resolved_method_entry_virtual`: I'm not putting values in registers depending on the flags because it doesn't fit nicely into the PPC64 design. I found myself scratching my head and thinking about what is in the register in which case. Instead of that, I'm loading the fields where they are needed which leads to a much cleaner design. I always know what is in which register this way. >> >> >> Please take a look and take these differences into consideration for other platforms. Thanks! > > Thank you for the port! I liked your recommendation with regards to invokehandle and added that change to x86 and aarch64 as well. > > > Hi @matias9927 : This change [1] for aarch64 and x86 looks nice and I am trying to prepare a similar one for riscv. But I have a small question regarding this change. I see we are using `r2` for `cache` on aarch64 which is a volatile (or caller-saved) register according to the ABI. So the value in `cache` won't be preserved accross the C-call in resolve_oop_handle. Will this work? Is there anything I missed? Thanks. > > > [1] [7d2a20c](https://github.com/openjdk/jdk/commit/7d2a20ccf63f6140f7ba96904af2e3d89c6ab370) > > > > > > Please note that I'm using `assert(cache->is_nonvolatile(), "C-call in resolve_oop_handle");` on PPC64 which makes it reliable. > > Hi Martin, Yes, I see that assertion for PPC. But my question is about aarch64 where `cache` could alias `r2` which is an volatile/caller-save register according to the aarch64 ABI. I am not sure if this is OK. Maybe I missed anything? Sorry, I was not clear enough. I wanted to suggest adding the assertion to other platforms and changing the register if needed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15455#issuecomment-1805419059 From rkennke at openjdk.org Fri Nov 10 10:41:16 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 10 Nov 2023 10:41:16 GMT Subject: RFR: 8318895: Deoptimization results in incorrect lightweight locking stack [v3] In-Reply-To: References: Message-ID: > See JBS issue for details. > > I basically: > - took the test-modification and turned it into its own test-case > - added test runners for lightweight- and legacy-locking, so that we keep testing both, no matter what is the default > - added Axels fix (mentioned in the JBS issue) with the modification to only inflate when exec_mode == Unpack_none, as explained by Richard. > > Testing: > - [x] EATests.java > - [x] tier1 > - [ ] tier2 Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Update test/jdk/com/sun/jdi/EATests.java Co-authored-by: Richard Reingruber ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16568/files - new: https://git.openjdk.org/jdk/pull/16568/files/966d0a3e..b27def47 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16568&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16568&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16568.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16568/head:pull/16568 PR: https://git.openjdk.org/jdk/pull/16568 From mdoerr at openjdk.org Fri Nov 10 10:51:15 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 10 Nov 2023 10:51:15 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v10] In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 22:48:28 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64, RISCV, PPC > > Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: > > - PPC port > - Improved load_resolved_method_entry_handle on x86 and aarch64 AFAICS, `resolve_oop_handle` preserves volatile regs on aarch64 and x86_64, so, the current version should be ok. I can see the following test passing: `make run-test TEST="java/lang/invoke" JTREG="VM_OPTIONS=-XX:+UseZGC -XX:+ZGenerational"` on linux x86_64, aarch64 and ppc64le. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15455#issuecomment-1805496275 From sjohanss at openjdk.org Fri Nov 10 11:02:06 2023 From: sjohanss at openjdk.org (Stefan Johansson) Date: Fri, 10 Nov 2023 11:02:06 GMT Subject: RFR: 8318706: Implement JEP 423: Region Pinning for G1 [v16] In-Reply-To: <1kvCiT9zD5ZqoLH_HFRtPsh8M78WaFwE6R5ODemjUMs=.2e5b019c-4a3b-4184-8171-49ff5a84c841@github.com> References: <1kvCiT9zD5ZqoLH_HFRtPsh8M78WaFwE6R5ODemjUMs=.2e5b019c-4a3b-4184-8171-49ff5a84c841@github.com> Message-ID: On Thu, 9 Nov 2023 10:44:29 GMT, Thomas Schatzl wrote: >> The JEP covers the idea very well, so I'm only covering some implementation details here: >> >> * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. >> >> * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: >> >> * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. >> >> * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). >> >> * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. >> >> * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) >> >> The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. >> >> I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. >> >> * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in a... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > Modify evacuation failure log message as suggested by sjohanss: Use "Evacuation Failure" with a cause description (either "Allocation" or "Pinned") Looks good. Just a few small things. src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 466: > 464: if (region_attr.is_pinned() && klass->is_typeArray_klass()) { > 465: return handle_evacuation_failure_par(old, old_mark, word_sz, true /* cause_pinned */); > 466: } I think it would make sense to add a comment here that this is an optimization, that we know that only type array can be pinned so we only care if the region is pinned or not for type arrays. src/hotspot/share/gc/g1/g1YoungGCPostEvacuateTasks.cpp line 925: > 923: _evac_failure_regions->num_regions_alloc_failed(), > 924: G1GCPhaseTimes::RestoreEvacFailureRegionsAllocFailedNum); > 925: The counts used here are the totals, but they are added to thread work item, so total count will be wrong. Discussed this a bit with Thomas and we should probably count those things in this task instead and could that way make `G1EvacFailureRegions` store a bit less information. src/hotspot/share/gc/shared/collectedHeap.hpp line 169: > 167: static inline size_t filler_array_hdr_size(); > 168: public: > 169: static size_t filler_array_min_size(); Is this needed or some leftover from earlier iterations? ------------- Marked as reviewed by sjohanss (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16342#pullrequestreview-1722530941 PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1389227404 PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1389211990 PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1387998596 From shade at openjdk.org Fri Nov 10 11:05:58 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 10 Nov 2023 11:05:58 GMT Subject: RFR: 8319406: x86: Shorter movptr(reg, imm) for 32-bit immediates [v3] In-Reply-To: References: Message-ID: <6eTk2qGlSLcDNBw_bI6QaiKZ_K2QXPWtgCHzxd-Gjdw=.f7485226-0ba5-4314-8154-16c30bf02968@github.com> On Fri, 3 Nov 2023 19:49:55 GMT, Quan Anh Mai wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Enlighs > > Can we create `MacroAssembler::mov64` that does the branching instead, I think it is more natural there. And things that need 8-byte immediates will call into `Assembler::mov64`. > > Thanks. I simplified the patch a bit, still looking for reviewers. @merykitty, does the new version look better to you? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16497#issuecomment-1805519015 From ogillespie at openjdk.org Fri Nov 10 11:18:24 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Fri, 10 Nov 2023 11:18:24 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v8] In-Reply-To: References: Message-ID: > Attempt to fix regressions in class-loading performance caused by fixing a symbol table leak in [JDK-8313678](https://bugs.openjdk.org/browse/JDK-8313678). > > See lengthy discussion in https://bugs.openjdk.org/browse/JDK-8315559 for more background. In short, the leak was providing an accidental cache for temporary symbols, allowing reuse. > > This change keeps new temporary symbols alive in a queue for a short time, allowing them to be re-used by subsequent operations. For example, when attempting to load a class we may call JVM_FindLoadedClass for multiple classloaders back-to-back, and all of them will create a TempNewSymbol for the same string. At present, each call will leave a dead symbol in the table and create a new one. Dead symbols add cleanup and lookup overhead, and insertion is also costly. With this change, the symbol from the first invocation will live for some time after it is used, and subsequent callers can find the symbol alive in the table - avoiding the extra work. > > The queue is bounded, and when full new entries displace the oldest entry. This means symbols are held for the time it takes for 100 new temp symbols to be created. 100 is chosen arbitrarily - the tradeoff is memory usage versus 'cache' hit rate. > > When concurrent symbol table cleanup runs, it also drains the queue. > > In my testing, this brings Dacapo pmd performance back to where it was before the leak was fixed. > > Thanks @shipilev , @coleenp and @MDBijman for helping with this fix. Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: Remove is_enabled check, use modulo shortcut, add drain test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16398/files - new: https://git.openjdk.org/jdk/pull/16398/files/ed5ae51e..c885e814 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16398&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16398&range=06-07 Stats: 55 lines in 2 files changed: 32 ins; 16 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/16398.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16398/head:pull/16398 PR: https://git.openjdk.org/jdk/pull/16398 From ogillespie at openjdk.org Fri Nov 10 11:18:25 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Fri, 10 Nov 2023 11:18:25 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v7] In-Reply-To: References: Message-ID: <32kBtV5qWl9Bk3WbujeTG_fKXhygLN5qhqRAG_gX9Q0=.1ac4fe78-35d8-48a6-a9d9-40c7d1bf0a5e@github.com> On Thu, 9 Nov 2023 18:23:20 GMT, Oli Gillespie wrote: >> test/hotspot/gtest/classfile/test_symbolTable.cpp line 37: >> >>> 35: ThreadInVMfromNative ThreadInVMfromNative(THREAD); >>> 36: // Disable the temp symbol cleanup delay queue because it increases refcounts. >>> 37: TempNewSymbol::set_cleanup_delay_enabled(false); >> >> So we have additional check for "enabled" flag in hot production code only to make these tests happy? If so, can we "just" drain the delay queue after new_symbol here? Maybe with helper method here in test? > > Yes that's true. I can try to avoid it. I have added a small helper in the latest commit - do you think it's reasonable? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1389265383 From tschatzl at openjdk.org Fri Nov 10 11:25:06 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 10 Nov 2023 11:25:06 GMT Subject: RFR: 8318706: Implement JEP 423: Region Pinning for G1 [v16] In-Reply-To: References: <1kvCiT9zD5ZqoLH_HFRtPsh8M78WaFwE6R5ODemjUMs=.2e5b019c-4a3b-4184-8171-49ff5a84c841@github.com> Message-ID: On Thu, 9 Nov 2023 13:26:06 GMT, Stefan Johansson wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> Modify evacuation failure log message as suggested by sjohanss: Use "Evacuation Failure" with a cause description (either "Allocation" or "Pinned") > > src/hotspot/share/gc/shared/collectedHeap.hpp line 169: > >> 167: static inline size_t filler_array_hdr_size(); >> 168: public: >> 169: static size_t filler_array_min_size(); > > Is this needed or some leftover from earlier iterations? Removed. I started doing an implementation that properly formats smaller filler arrays into the dead areas taking potential `typeArray`s into account, but ultimately opted for just allowing oversized filler arrays. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16342#discussion_r1389274174 From ogillespie at openjdk.org Fri Nov 10 11:25:25 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Fri, 10 Nov 2023 11:25:25 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v9] In-Reply-To: References: Message-ID: > Attempt to fix regressions in class-loading performance caused by fixing a symbol table leak in [JDK-8313678](https://bugs.openjdk.org/browse/JDK-8313678). > > See lengthy discussion in https://bugs.openjdk.org/browse/JDK-8315559 for more background. In short, the leak was providing an accidental cache for temporary symbols, allowing reuse. > > This change keeps new temporary symbols alive in a queue for a short time, allowing them to be re-used by subsequent operations. For example, when attempting to load a class we may call JVM_FindLoadedClass for multiple classloaders back-to-back, and all of them will create a TempNewSymbol for the same string. At present, each call will leave a dead symbol in the table and create a new one. Dead symbols add cleanup and lookup overhead, and insertion is also costly. With this change, the symbol from the first invocation will live for some time after it is used, and subsequent callers can find the symbol alive in the table - avoiding the extra work. > > The queue is bounded, and when full new entries displace the oldest entry. This means symbols are held for the time it takes for 100 new temp symbols to be created. 100 is chosen arbitrarily - the tradeoff is memory usage versus 'cache' hit rate. > > When concurrent symbol table cleanup runs, it also drains the queue. > > In my testing, this brings Dacapo pmd performance back to where it was before the leak was fixed. > > Thanks @shipilev , @coleenp and @MDBijman for helping with this fix. Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: Remove trailing whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16398/files - new: https://git.openjdk.org/jdk/pull/16398/files/c885e814..3becbcb4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16398&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16398&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16398.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16398/head:pull/16398 PR: https://git.openjdk.org/jdk/pull/16398 From mli at openjdk.org Fri Nov 10 11:43:16 2023 From: mli at openjdk.org (Hamlin Li) Date: Fri, 10 Nov 2023 11:43:16 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v6] In-Reply-To: References: Message-ID: <4shJ-ET362VIOhvAKhA0FGBpn-_pofC0WI1D_ePl7v0=.a42608ad-56dc-4827-9435-0f3db631ca4b@github.com> > Hi, > Can you review the change to add intrinsic for CompressBits for Long & Integer? > Thanks! > > ##?Test > pass jtreg test: > test/jdk/java/lang/CompressExpand*.java Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: reserve all used v register; use t0 directly ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16481/files - new: https://git.openjdk.org/jdk/pull/16481/files/dc6dedbf..b6456d79 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16481&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16481&range=04-05 Stats: 26 lines in 3 files changed: 3 ins; 0 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/16481.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16481/head:pull/16481 PR: https://git.openjdk.org/jdk/pull/16481 From mli at openjdk.org Fri Nov 10 11:43:20 2023 From: mli at openjdk.org (Hamlin Li) Date: Fri, 10 Nov 2023 11:43:20 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v5] In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 09:08:00 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> remove extra new line > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1687: > >> 1685: Assembler::SEW sew = is_long ? Assembler::e64 : Assembler::e32; >> 1686: // intrinsic is enabled when MaxVectorSize >= 16 >> 1687: Assembler::LMUL lmul = is_long ? Assembler::m4 : Assembler::m2; > > A `lmul` of `m2` or `m4` means vector register group of 2 or 4 registers respectively. So <`v4`, `v5`> and <`v8`, `v9`> will be used in the case of `m2`. But I only see `v4` and `v8` are reserved for CompressBits nodes. You should also reserve `v5` and `v9` for this case. Similar for the `m4` case. Good catch, fixed. > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1694: > >> 1692: vmv_s_x(v0, src); >> 1693: // reset the src data(in bytes) to zero. >> 1694: mv(tmp, len); > > Can we use scratch register `t0` instead of `tmp` in this function? done > src/hotspot/cpu/riscv/riscv_v.ad line 2884: > >> 2882: >> 2883: instruct compressBitsI(iRegINoSp dst, iRegIorL2I src, iRegIorL2I mask, iRegPNoSp tmp, vRegMask_V0 v0, vReg_V4 v4, vReg_V8 v8) %{ >> 2884: predicate(UseRVV); > > Seems this UseRVV check is also redundant. It has already been checked in `Matcher::match_rule_supported` for this node. But seems that this is not the first one. So we might want do this cleanup in your next PR: https://github.com/openjdk/jdk/pull/16580 Sure, let me clean it there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1389289424 PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1389289518 PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1389289964 From shade at openjdk.org Fri Nov 10 11:46:06 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 10 Nov 2023 11:46:06 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v7] In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 18:12:55 GMT, Aleksey Shipilev wrote: >> Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: >> >> Set queue size to power of 2, use constant in test > > src/hotspot/share/oops/symbolHandle.hpp line 47: > >> 45: template >> 46: class SymbolHandleBase : public StackObj { >> 47: static Symbol* volatile _cleanup_delay_queue[]; > > This looks off. Should be `static Symbol** volatile _cleanup_delay_queue`, maybe? Nevermind, that's not doable here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1389292875 From tschatzl at openjdk.org Fri Nov 10 12:14:19 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 10 Nov 2023 12:14:19 GMT Subject: RFR: 8318706: Implement JEP 423: Region Pinning for G1 [v17] In-Reply-To: References: Message-ID: > The JEP covers the idea very well, so I'm only covering some implementation details here: > > * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. > > * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: > > * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. > > * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). > > * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. > > * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) > > The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. > > I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. > > * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in addition to something like: > > `GC(6) P... Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: stefanj review - fix counting of pinned/allocation failed regions in log - some cleanup of evacuation failure code, removing unnecessary members - comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16342/files - new: https://git.openjdk.org/jdk/pull/16342/files/6395696a..d6df3b06 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16342&range=15-16 Stats: 54 lines in 6 files changed: 16 ins; 28 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/16342.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16342/head:pull/16342 PR: https://git.openjdk.org/jdk/pull/16342 From shade at openjdk.org Fri Nov 10 12:15:08 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 10 Nov 2023 12:15:08 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v9] In-Reply-To: References: Message-ID: <23mvocb6bu9KPilsxs-fi-kES8-P5o5E_RHDJBej4g4=.dccdf72a-e19d-4535-a294-f320624ca410@github.com> On Fri, 10 Nov 2023 11:25:25 GMT, Oli Gillespie wrote: >> Attempt to fix regressions in class-loading performance caused by fixing a symbol table leak in [JDK-8313678](https://bugs.openjdk.org/browse/JDK-8313678). >> >> See lengthy discussion in https://bugs.openjdk.org/browse/JDK-8315559 for more background. In short, the leak was providing an accidental cache for temporary symbols, allowing reuse. >> >> This change keeps new temporary symbols alive in a queue for a short time, allowing them to be re-used by subsequent operations. For example, when attempting to load a class we may call JVM_FindLoadedClass for multiple classloaders back-to-back, and all of them will create a TempNewSymbol for the same string. At present, each call will leave a dead symbol in the table and create a new one. Dead symbols add cleanup and lookup overhead, and insertion is also costly. With this change, the symbol from the first invocation will live for some time after it is used, and subsequent callers can find the symbol alive in the table - avoiding the extra work. >> >> The queue is bounded, and when full new entries displace the oldest entry. This means symbols are held for the time it takes for 100 new temp symbols to be created. 100 is chosen arbitrarily - the tradeoff is memory usage versus 'cache' hit rate. >> >> When concurrent symbol table cleanup runs, it also drains the queue. >> >> In my testing, this brings Dacapo pmd performance back to where it was before the leak was fixed. >> >> Thanks @shipilev , @coleenp and @MDBijman for helping with this fix. > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Remove trailing whitespace I have a few other cosmetic things left. src/hotspot/share/oops/symbolHandle.hpp line 102: > 100: Symbol* old = Atomic::xchg(&_cleanup_delay_queue[i], sym); > 101: if (old != nullptr) { > 102: old->decrement_refcount(); Indenting: Suggestion: old->decrement_refcount(); src/hotspot/share/oops/symbolHandle.hpp line 119: > 117: Symbol* sym = Atomic::xchg(&_cleanup_delay_queue[i], (Symbol*) nullptr); > 118: if (sym != nullptr) { > 119: sym->decrement_refcount(); Indenting: Suggestion: sym->decrement_refcount(); test/hotspot/gtest/classfile/test_symbolTable.cpp line 32: > 30: // Helper to avoid interference from the cleanup delay queue by draining it > 31: // immediately after creation. > 32: TempNewSymbol tmp(Symbol* sym) { Let's call it `stable_temp_symbol`? ------------- PR Review: https://git.openjdk.org/jdk/pull/16398#pullrequestreview-1724615475 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1389297048 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1389297208 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1389315738 From gcao at openjdk.org Fri Nov 10 12:20:15 2023 From: gcao at openjdk.org (Gui Cao) Date: Fri, 10 Nov 2023 12:20:15 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v10] In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 10:48:21 GMT, Martin Doerr wrote: > AFAICS, `resolve_oop_handle` preserves volatile regs on aarch64 and x86_64, so, the current version should be ok. I can see the following test passing: `make run-test TEST="java/lang/invoke" JTREG="VM_OPTIONS=-XX:+UseZGC -XX:+ZGenerational"` on linux x86_64, aarch64 and ppc64le. Thank you for taking a look. I also checked the aarch64 and x86 code and now I see the difference. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15455#issuecomment-1805637086 From aboldtch at openjdk.org Fri Nov 10 12:23:19 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 10 Nov 2023 12:23:19 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 12:18:29 GMT, Axel Boldt-Christmas wrote: > LM_LIGHTWEIGHT only uses the lock bits for its locking. This leaves the hashCode bits free when a monitor is not inflated. So instead of inflating when installing the hashCode on a fast locked object it can simply use the hashCode bits in the markWord. > > The mark word transitions Unlocked (0b01) <=> Locked (0b00) are done by retrying the CAS if it fails due to non-lock bit changes. > The mark word transitions Monitor (0b10) <=> Locked/Unlocked (0b0X) are the same as before, inflation already handles hash codes. This change does not interact with the mark word if it is in a Monitor (0b10) state, so the strong CAS which is used for deflation are still valid, and will not fail to any other reason than the cooperative race to help transition the mark word during deflation. > > This is dependent on JDK-8319778 simply because JDK-8319797 is dependent on both this and JDK-8319778. ## Testing - `linux-x64`, `linux-x64-debug` - `LM_LEGACY` - [X] tier 1-7 - `LM_LIGHTWEIGHT` - [ ] tier 1-7 (still running) - [X] GitHub actions ------------- PR Comment: https://git.openjdk.org/jdk/pull/16603#issuecomment-1805638425 From aboldtch at openjdk.org Fri Nov 10 12:23:18 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 10 Nov 2023 12:23:18 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT Message-ID: LM_LIGHTWEIGHT only uses the lock bits for its locking. This leaves the hashCode bits free when a monitor is not inflated. So instead of inflating when installing the hashCode on a fast locked object it can simply use the hashCode bits in the markWord. The mark word transitions Unlocked (0b01) <=> Locked (0b00) are done by retrying the CAS if it fails due to non-lock bit changes. The mark word transitions Monitor (0b10) <=> Locked/Unlocked (0b0X) are the same as before, inflation already handles hash codes. This change does not interact with the mark word if it is in a Monitor (0b10) state, so the strong CAS which is used for deflation are still valid, and will not fail to any other reason than the cooperative race to help transition the mark word during deflation. This is dependent on JDK-8319778 simply because JDK-8319797 is dependent on both this and JDK-8319778. ------------- Depends on: https://git.openjdk.org/jdk/pull/16602 Commit messages: - 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT Changes: https://git.openjdk.org/jdk/pull/16603/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16603&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319773 Stats: 70 lines in 4 files changed: 30 ins; 17 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/16603.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16603/head:pull/16603 PR: https://git.openjdk.org/jdk/pull/16603 From ogillespie at openjdk.org Fri Nov 10 12:23:27 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Fri, 10 Nov 2023 12:23:27 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v10] In-Reply-To: References: Message-ID: <6UvVZPlnf_cSgctUVPspR767dHLa7vxOA4DiC7fJ3LY=.35e24443-ab4d-4240-a991-2d98d120bb3b@github.com> > Attempt to fix regressions in class-loading performance caused by fixing a symbol table leak in [JDK-8313678](https://bugs.openjdk.org/browse/JDK-8313678). > > See lengthy discussion in https://bugs.openjdk.org/browse/JDK-8315559 for more background. In short, the leak was providing an accidental cache for temporary symbols, allowing reuse. > > This change keeps new temporary symbols alive in a queue for a short time, allowing them to be re-used by subsequent operations. For example, when attempting to load a class we may call JVM_FindLoadedClass for multiple classloaders back-to-back, and all of them will create a TempNewSymbol for the same string. At present, each call will leave a dead symbol in the table and create a new one. Dead symbols add cleanup and lookup overhead, and insertion is also costly. With this change, the symbol from the first invocation will live for some time after it is used, and subsequent callers can find the symbol alive in the table - avoiding the extra work. > > The queue is bounded, and when full new entries displace the oldest entry. This means symbols are held for the time it takes for 100 new temp symbols to be created. 100 is chosen arbitrarily - the tradeoff is memory usage versus 'cache' hit rate. > > When concurrent symbol table cleanup runs, it also drains the queue. > > In my testing, this brings Dacapo pmd performance back to where it was before the leak was fixed. > > Thanks @shipilev , @coleenp and @MDBijman for helping with this fix. Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: Fix indentation, rename test helper ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16398/files - new: https://git.openjdk.org/jdk/pull/16398/files/3becbcb4..6e06f007 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16398&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16398&range=08-09 Stats: 8 lines in 2 files changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/16398.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16398/head:pull/16398 PR: https://git.openjdk.org/jdk/pull/16398 From gcao at openjdk.org Fri Nov 10 12:25:18 2023 From: gcao at openjdk.org (Gui Cao) Date: Fri, 10 Nov 2023 12:25:18 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v10] In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 22:48:59 GMT, Matias Saavedra Silva wrote: >> Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: >> >> - PPC port >> - Improved load_resolved_method_entry_handle on x86 and aarch64 > >> I have a version which works for PPC64: [TheRealMDoerr at 6bff392](https://github.com/TheRealMDoerr/jdk/commit/6bff39224e3129a898711a392b64c38b331d79a2) >> >> Note that I have implemented a few things slightly differently: >> >> * `TemplateTable::load_resolved_method_entry_handle`: I'm loading the method at the end which avoids pushing and popping it on the expression stack which is not so nice IMHO. This works because I'm using a non-volatile register (asserted) for `cache` which is still valid after the C-call in `resolve_oop_handle`. >> >> * `TemplateTable::load_resolved_method_entry_interface` and `TemplateTable::load_resolved_method_entry_virtual`: I'm not putting values in registers depending on the flags because it doesn't fit nicely into the PPC64 design. I found myself scratching my head and thinking about what is in the register in which case. Instead of that, I'm loading the fields where they are needed which leads to a much cleaner design. I always know what is in which register this way. >> >> >> Please take a look and take these differences into consideration for other platforms. Thanks! > > Thank you for the port! I liked your recommendation with regards to invokehandle and added that change to x86 and aarch64 as well. @matias9927 Hi, I have prepared a similar improvement for riscv64. The volatile registers are also preserved on this platform. [15455-riscv-port-v3.diff.txt](https://github.com/openjdk/jdk/files/13319170/15455-riscv-port-v3.diff.txt) ------------- PR Comment: https://git.openjdk.org/jdk/pull/15455#issuecomment-1805643151 From rkennke at openjdk.org Fri Nov 10 12:33:57 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 10 Nov 2023 12:33:57 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 12:18:29 GMT, Axel Boldt-Christmas wrote: > LM_LIGHTWEIGHT only uses the lock bits for its locking. This leaves the hashCode bits free when a monitor is not inflated. So instead of inflating when installing the hashCode on a fast locked object it can simply use the hashCode bits in the markWord. > > The mark word transitions Unlocked (0b01) <=> Locked (0b00) are done by retrying the CAS if it fails due to non-lock bit changes. > The mark word transitions Monitor (0b10) <=> Locked/Unlocked (0b0X) are the same as before, inflation already handles hash codes. This change does not interact with the mark word if it is in a Monitor (0b10) state, so the strong CAS which is used for deflation are still valid, and will not fail to any other reason than the cooperative race to help transition the mark word during deflation. > > This is dependent on JDK-8319778 simply because JDK-8319797 is dependent on both this and JDK-8319778. This is great! I've done something like that in my original LW-locking PR, but then ripped it out to keep it simpler. Never got around to re-do it. Changes look good to me! ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16603#pullrequestreview-1724683360 From aboldtch at openjdk.org Fri Nov 10 12:46:11 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 10 Nov 2023 12:46:11 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation Message-ID: Implements the runtime part of JDK-8319796. The different CPU implementations are/will be created as dependent pull requests. This enhancement proposes introducing the ability for LM_LIGHTWEIGHT to handle consecutive recursive monitor enter. Limiting the implementation to only consecutive monitor enters allows for more efficient emitted code which only needs to look at the two top most entires on the lock stack to determine what to do in a monitor exit. A high level overview: * Locking is still performed on the mark word * Unlocked (0b01) <=> Locked (0b00) * Monitor enter on Obj with mark word Unlocked (0b01) is the same * Transition Obj's mark word Unlocked (0b01) => Locked (0b00) * Push Obj onto the lock stack * Success * Monitor enter on Obj with mark word Locked (0b00) will check the top entry on the lock stack * If top entry is Obj * Push Obj on the lock stack * Success * If top entry is not Obj * Inflate and call ObjectMonitor::enter * Monitor exit on Obj with mark word Locked (0b00) will check the two top entries on the lock stack * If just the top entry is Obj * Transition Obj's mark word Locked (0b00) => Unlocked (0b01) * Pop the entry * Success * If both entries are Obj * Pop the top entry * Success * Any other case only occurs for unstructured locking, then just inflate and call ObjectMonitor::exit * If the monitor has been inflated for object Obj which is owned by the current thread * All corresponding entries for Obj is removed from the lock stack * The monitor recursions is set to the number of removed entries - 1 * The owner is changed from anonymous to the thread * The regular ObjectMonitor::action is called. ------------- Depends on: https://git.openjdk.org/jdk/pull/16603 Commit messages: - 8319797: Recursive lightweight locking: Runtime implementation Changes: https://git.openjdk.org/jdk/pull/16606/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16606&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319797 Stats: 663 lines in 10 files changed: 632 ins; 9 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/16606.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16606/head:pull/16606 PR: https://git.openjdk.org/jdk/pull/16606 From aboldtch at openjdk.org Fri Nov 10 12:46:11 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 10 Nov 2023 12:46:11 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 12:40:03 GMT, Axel Boldt-Christmas wrote: > Implements the runtime part of JDK-8319796. > The different CPU implementations are/will be created as dependent pull requests. > > This enhancement proposes introducing the ability for LM_LIGHTWEIGHT to handle consecutive recursive monitor enter. Limiting the implementation to only consecutive monitor enters allows for more efficient emitted code which only needs to look at the two top most entires on the lock stack to determine what to do in a monitor exit. > > A high level overview: > * Locking is still performed on the mark word > * Unlocked (0b01) <=> Locked (0b00) > * Monitor enter on Obj with mark word Unlocked (0b01) is the same > * Transition Obj's mark word Unlocked (0b01) => Locked (0b00) > * Push Obj onto the lock stack > * Success > * Monitor enter on Obj with mark word Locked (0b00) will check the top entry on the lock stack > * If top entry is Obj > * Push Obj on the lock stack > * Success > * If top entry is not Obj > * Inflate and call ObjectMonitor::enter > * Monitor exit on Obj with mark word Locked (0b00) will check the two top entries on the lock stack > * If just the top entry is Obj > * Transition Obj's mark word Locked (0b00) => Unlocked (0b01) > * Pop the entry > * Success > * If both entries are Obj > * Pop the top entry > * Success > * Any other case only occurs for unstructured locking, then just inflate and call ObjectMonitor::exit > * If the monitor has been inflated for object Obj which is owned by the current thread > * All corresponding entries for Obj is removed from the lock stack > * The monitor recursions is set to the number of removed entries - 1 > * The owner is changed from anonymous to the thread > * The regular ObjectMonitor::action is called. ## Testing As no port implements this. The `LM_LIGHTWEIGHT` is less interesting as it simply tests that the non recursive still works if some CPU has not yet implemented recursive lightweight. As such the full gtest is never run here. See the dependent CPU implementation PRs for more extensive testing. - `linux-x64`, `linux-x64-debug` - `LM_LEGACY` - [X] tier 1-7 - `LM_LIGHTWEIGHT` - [X] tier 1-7 - [X] GitHub actions ------------- PR Comment: https://git.openjdk.org/jdk/pull/16606#issuecomment-1805664538 From shade at openjdk.org Fri Nov 10 12:59:01 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 10 Nov 2023 12:59:01 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v7] In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 08:19:32 GMT, Stefan Karlsson wrote: >> A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... > > Stefan Karlsson has updated the pull request incrementally with three additional commits since the last revision: > > - Remove the limit for deflation requests > - Remove reinitialization in test > - Update comments This looks good to me. I have cosmetic comments about the tests. test/hotspot/jtreg/runtime/Monitor/ConcurrentDeflation.java line 46: > 44: public static void main(String[] args) throws Exception { > 45: Thread thread_dumper = new Thread(() -> dumpThreads()); > 46: thread_dumper.start(); Here and later, the Java style is `threadDumper`, etc. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16519#pullrequestreview-1724721135 PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1389360671 From shade at openjdk.org Fri Nov 10 12:59:03 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 10 Nov 2023 12:59:03 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v5] In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 02:38:09 GMT, David Holmes wrote: >> Done > > It still might eliminate locking due to the empty block. Yes, I agree with David. Look how other monitor inflation tests do this, we call `wait()` to guarantee both inflation (although it is partially handled by `LockingMode=0` here) and escaping to native call thus avoiding aggressive compiler opts. https://github.com/openjdk/jdk/blob/6b21ff61dad6f633c744c1c33c29ea86183b509d/test/hotspot/jtreg/runtime/Monitor/DeflationIntervalsTest.java#L132-L140 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1389360096 From aboldtch at openjdk.org Fri Nov 10 13:05:10 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 10 Nov 2023 13:05:10 GMT Subject: RFR: 8319799: Recursive lightweight locking: x86 implementation Message-ID: Implements the x86 port of JDK-8319796. There are two major parts for the port implementation. The C2 part, and the part shared by the interpreter, C1 and the native call wrapper. The biggest change for both parts is that we check the lock stack first and if it is a recursive lightweight [un]lock and in that case simply pop/push and finish successfully. Only if the recursive lightweight [un]lock fails does it look at the mark word. For the shared part if it is an unstructured exit, the monitor is inflated or the mark word transition fails it calls into the runtime. The C2 operates under a few more assumptions, that the locking is structured and balanced. This means that some checks can be elided. First this means that in C2 unlock if the obj is not on the top of the lock stack, it must be inflated. And reversely if we reach the inflated C2 unlock the obj is not on the lock stack. This second property makes it possible to avoid reading the owner (and checking if it is anonymous). Instead it can either just do an un-contended unlock by writing null to the owner, or if contention happens, simply write the thread to the owner and jump to the runtime. The x86 C2 port also has some extra oddities. The mark word read is done early as it showed better scaling in hyper-threaded scenarios on certain intel hardware, and no noticeable downside on other tested x86 hardware. The fast path is written to avoid going through conditional branches. This in combination with keeping the ZF output correct, the code does some actions eagerly, decrementing the held monitor count, popping from the lock stack. And jumps to a code stub if a slow path is required which restores the thread local state to a correct state before jumping to the runtime. The contended unlock was also moved to the code stub. ------------- Depends on: https://git.openjdk.org/jdk/pull/16606 Commit messages: - 8319799: Recursive lightweight locking: x86 implementation - Cleanup: C2 fast_lock/fast_unlock x86 Changes: https://git.openjdk.org/jdk/pull/16607/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16607&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319799 Stats: 576 lines in 13 files changed: 463 ins; 57 del; 56 mod Patch: https://git.openjdk.org/jdk/pull/16607.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16607/head:pull/16607 PR: https://git.openjdk.org/jdk/pull/16607 From aboldtch at openjdk.org Fri Nov 10 13:05:11 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 10 Nov 2023 13:05:11 GMT Subject: RFR: 8319799: Recursive lightweight locking: x86 implementation In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 12:59:27 GMT, Axel Boldt-Christmas wrote: > Implements the x86 port of JDK-8319796. > > There are two major parts for the port implementation. The C2 part, and the part shared by the interpreter, C1 and the native call wrapper. > > The biggest change for both parts is that we check the lock stack first and if it is a recursive lightweight [un]lock and in that case simply pop/push and finish successfully. > > Only if the recursive lightweight [un]lock fails does it look at the mark word. > > For the shared part if it is an unstructured exit, the monitor is inflated or the mark word transition fails it calls into the runtime. > > The C2 operates under a few more assumptions, that the locking is structured and balanced. This means that some checks can be elided. > > First this means that in C2 unlock if the obj is not on the top of the lock stack, it must be inflated. And reversely if we reach the inflated C2 unlock the obj is not on the lock stack. This second property makes it possible to avoid reading the owner (and checking if it is anonymous). Instead it can either just do an un-contended unlock by writing null to the owner, or if contention happens, simply write the thread to the owner and jump to the runtime. > > The x86 C2 port also has some extra oddities. > > The mark word read is done early as it showed better scaling in hyper-threaded scenarios on certain intel hardware, and no noticeable downside on other tested x86 hardware. > > The fast path is written to avoid going through conditional branches. This in combination with keeping the ZF output correct, the code does some actions eagerly, decrementing the held monitor count, popping from the lock stack. And jumps to a code stub if a slow path is required which restores the thread local state to a correct state before jumping to the runtime. > > The contended unlock was also moved to the code stub. ## Testing - `linux-x64`, `linux-x64-debug` - `LM_LIGHTWEIGHT` - [X] tier 1-7 - `com/sun/jdi/EATests.java` is currently failing due to JDK-8318895. Will rerun the tests after that has been integrated and merged in here. - `windows-x64`, `windows-x64-debug` - `LM_LIGHTWEIGHT` - [ ] tier 1-7 (Will run after JDK-8318895 is resolved) - `macosx-x64`, `macosx-x64-debug` - `LM_LIGHTWEIGHT` - [ ] tier 1-7 (Will run after JDK-8318895 is resolved) - [X] GitHub actions ------------- PR Comment: https://git.openjdk.org/jdk/pull/16607#issuecomment-1805688367 From aboldtch at openjdk.org Fri Nov 10 13:07:20 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 10 Nov 2023 13:07:20 GMT Subject: RFR: 8319801: Recursive lightweight locking: aarch64 implementation Message-ID: Implements the aarch64 port of JDK-8319796. There are two major parts for the port implementation. The C2 part, and the part shared by the interpreter, C1 and the native call wrapper. The biggest change for both parts is that we check the lock stack first and if it is a recursive lightweight [un]lock and in that case simply pop/push and finish successfully. Only if the recursive lightweight [un]lock fails does it look at the mark word. For the shared part if it is an unstructured exit, the monitor is inflated or the mark word transition fails it calls into the runtime. The C2 operates under a few more assumptions, that the locking is structured and balanced. This means that some checks can be elided. First this means that in C2 unlock if the obj is not on the top of the lock stack, it must be inflated. And reversely if we reach the inflated C2 unlock the obj is not on the lock stack. This second property makes it possible to avoid reading the owner (and checking if it is anonymous). Instead it can either just do an un-contended unlock by writing null to the owner, or if contention happens, simply write the thread to the owner and jump to the runtime. The aarch64 C2 port tries to avoid stronger memory semantics where ever possible. In C2 lock it first does a relaxed load of the mark word to check for inflation. Both lock and unlock uses a load/store exclusive register pair to transition the mark word. ------------- Depends on: https://git.openjdk.org/jdk/pull/16606 Commit messages: - 8319801: Recursive lightweight locking: aarch64 implementation - Cleanup: C2 fast_lock/fast_unlock aarch64 Changes: https://git.openjdk.org/jdk/pull/16608/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16608&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319801 Stats: 499 lines in 9 files changed: 368 ins; 82 del; 49 mod Patch: https://git.openjdk.org/jdk/pull/16608.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16608/head:pull/16608 PR: https://git.openjdk.org/jdk/pull/16608 From aboldtch at openjdk.org Fri Nov 10 13:07:21 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 10 Nov 2023 13:07:21 GMT Subject: RFR: 8319801: Recursive lightweight locking: aarch64 implementation In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 13:00:52 GMT, Axel Boldt-Christmas wrote: > Implements the aarch64 port of JDK-8319796. > > There are two major parts for the port implementation. The C2 part, and the part shared by the interpreter, C1 and the native call wrapper. > > The biggest change for both parts is that we check the lock stack first and if it is a recursive lightweight [un]lock and in that case simply pop/push and finish successfully. > > Only if the recursive lightweight [un]lock fails does it look at the mark word. > > For the shared part if it is an unstructured exit, the monitor is inflated or the mark word transition fails it calls into the runtime. > > The C2 operates under a few more assumptions, that the locking is structured and balanced. This means that some checks can be elided. > > First this means that in C2 unlock if the obj is not on the top of the lock stack, it must be inflated. And reversely if we reach the inflated C2 unlock the obj is not on the lock stack. This second property makes it possible to avoid reading the owner (and checking if it is anonymous). Instead it can either just do an un-contended unlock by writing null to the owner, or if contention happens, simply write the thread to the owner and jump to the runtime. > > The aarch64 C2 port tries to avoid stronger memory semantics where ever possible. In C2 lock it first does a relaxed load of the mark word to check for inflation. Both lock and unlock uses a load/store exclusive register pair to transition the mark word. ## Testing - `linux-aarch64`, `linux-aarch64-debug` - `LM_LIGHTWEIGHT` - [X] tier 1-7 - `com/sun/jdi/EATests.java` is currently failing due to JDK-8318895. Will rerun the tests after that has been integrated and merged in here. - `macosx-aarch64`, `macosx-aarch64-debug` - `LM_LIGHTWEIGHT` - [ ] tier 1-7 (Will run after JDK-8318895 is resolved) - [X] GitHub actions ------------- PR Comment: https://git.openjdk.org/jdk/pull/16608#issuecomment-1805690165 From stefank at openjdk.org Fri Nov 10 13:21:18 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 10 Nov 2023 13:21:18 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v8] In-Reply-To: References: Message-ID: > A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Update names in test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16519/files - new: https://git.openjdk.org/jdk/pull/16519/files/560c67df..ac8398ae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16519&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16519&range=06-07 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/16519.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16519/head:pull/16519 PR: https://git.openjdk.org/jdk/pull/16519 From stefank at openjdk.org Fri Nov 10 13:21:20 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 10 Nov 2023 13:21:20 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v7] In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 12:55:05 GMT, Aleksey Shipilev wrote: >> Stefan Karlsson has updated the pull request incrementally with three additional commits since the last revision: >> >> - Remove the limit for deflation requests >> - Remove reinitialization in test >> - Update comments > > test/hotspot/jtreg/runtime/Monitor/ConcurrentDeflation.java line 46: > >> 44: public static void main(String[] args) throws Exception { >> 45: Thread thread_dumper = new Thread(() -> dumpThreads()); >> 46: thread_dumper.start(); > > Here and later, the Java style is `threadDumper`, etc. Updated ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1389383429 From stefank at openjdk.org Fri Nov 10 13:25:59 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 10 Nov 2023 13:25:59 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v5] In-Reply-To: References: Message-ID: <2GXdCuDpZipznp0MyLwpEUYfwBXvOExv6r6p5xjZ8I4=.329d43d9-d820-462d-bb03-8a57b2ae6375@github.com> On Fri, 10 Nov 2023 12:54:26 GMT, Aleksey Shipilev wrote: >> It still might eliminate locking due to the empty block. > > Yes, I agree with David. Look how other monitor inflation tests do this, we call `wait()` to guarantee both inflation (although it is partially handled by `LockingMode=0` here) and escaping to native call thus avoiding aggressive compiler opts. > https://github.com/openjdk/jdk/blob/6b21ff61dad6f633c744c1c33c29ea86183b509d/test/hotspot/jtreg/runtime/Monitor/DeflationIntervalsTest.java#L132-L140 Right, but currently it doesn't remove the synchronized block in my runs and this test was mainly added to be a very specific test that provokes the bug that this PR fixes. I previously considered adding a wait(1), but that will significantly limit the number of created monitors. Maybe I can add a counter inside the synchronized block, just like our JMH lock micros do? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1389388211 From aph at openjdk.org Fri Nov 10 13:26:59 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 10 Nov 2023 13:26:59 GMT Subject: RFR: 8319801: Recursive lightweight locking: aarch64 implementation In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 13:00:52 GMT, Axel Boldt-Christmas wrote: > Implements the aarch64 port of JDK-8319796. > > There are two major parts for the port implementation. The C2 part, and the part shared by the interpreter, C1 and the native call wrapper. > > The biggest change for both parts is that we check the lock stack first and if it is a recursive lightweight [un]lock and in that case simply pop/push and finish successfully. > > Only if the recursive lightweight [un]lock fails does it look at the mark word. > > For the shared part if it is an unstructured exit, the monitor is inflated or the mark word transition fails it calls into the runtime. > > The C2 operates under a few more assumptions, that the locking is structured and balanced. This means that some checks can be elided. > > First this means that in C2 unlock if the obj is not on the top of the lock stack, it must be inflated. And reversely if we reach the inflated C2 unlock the obj is not on the lock stack. This second property makes it possible to avoid reading the owner (and checking if it is anonymous). Instead it can either just do an un-contended unlock by writing null to the owner, or if contention happens, simply write the thread to the owner and jump to the runtime. > > The aarch64 C2 port tries to avoid stronger memory semantics where ever possible. In C2 lock it first does a relaxed load of the mark word to check for inflation. Both lock and unlock uses a load/store exclusive register pair to transition the mark word. src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6344: > 6342: > 6343: // Try to lock. Transition lock bits 0b01 => 0b00 > 6344: assert(oopDesc::mark_offset_in_bytes() == 0, "required to avoid lea"); It might be cleaner just to put in the `lea`. I believe that nothing will be emitted if the addend is zero. It's up to you. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16608#discussion_r1389388894 From stefank at openjdk.org Fri Nov 10 13:29:18 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 10 Nov 2023 13:29:18 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v9] In-Reply-To: References: Message-ID: <0xPQTdT1Zrk_FuFdU5b7DWr3MDPfkrQJ81pmuRqkNpM=.154c0b0c-996a-4114-bd40-e8ff5926aa20@github.com> > A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Do stuff in the synchronized block of the test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16519/files - new: https://git.openjdk.org/jdk/pull/16519/files/ac8398ae..d87fc6eb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16519&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16519&range=07-08 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16519.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16519/head:pull/16519 PR: https://git.openjdk.org/jdk/pull/16519 From aph at openjdk.org Fri Nov 10 13:39:57 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 10 Nov 2023 13:39:57 GMT Subject: RFR: 8319801: Recursive lightweight locking: aarch64 implementation In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 13:00:52 GMT, Axel Boldt-Christmas wrote: > The aarch64 C2 port tries to avoid stronger memory semantics where ever possible. In C2 lock it first does a relaxed load of the mark word to check for inflation. Both lock and unlock uses a load/store exclusive register pair to transition the mark word. It's probably not a good idea to use load/store exclusive, because recent AArch64 implementations scale very badly under contention. Better to use atomic update instructions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16608#issuecomment-1805737519 From rkennke at openjdk.org Fri Nov 10 13:51:00 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 10 Nov 2023 13:51:00 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 12:40:03 GMT, Axel Boldt-Christmas wrote: > Implements the runtime part of JDK-8319796. > The different CPU implementations are/will be created as dependent pull requests. > > This enhancement proposes introducing the ability for LM_LIGHTWEIGHT to handle consecutive recursive monitor enter. Limiting the implementation to only consecutive monitor enters allows for more efficient emitted code which only needs to look at the two top most entires on the lock stack to determine what to do in a monitor exit. > > A high level overview: > * Locking is still performed on the mark word > * Unlocked (0b01) <=> Locked (0b00) > * Monitor enter on Obj with mark word Unlocked (0b01) is the same > * Transition Obj's mark word Unlocked (0b01) => Locked (0b00) > * Push Obj onto the lock stack > * Success > * Monitor enter on Obj with mark word Locked (0b00) will check the top entry on the lock stack > * If top entry is Obj > * Push Obj on the lock stack > * Success > * If top entry is not Obj > * Inflate and call ObjectMonitor::enter > * Monitor exit on Obj with mark word Locked (0b00) will check the two top entries on the lock stack > * If just the top entry is Obj > * Transition Obj's mark word Locked (0b00) => Unlocked (0b01) > * Pop the entry > * Success > * If both entries are Obj > * Pop the top entry > * Success > * Any other case only occurs for unstructured locking, then just inflate and call ObjectMonitor::exit > * If the monitor has been inflated for object Obj which is owned by the current thread > * All corresponding entries for Obj is removed from the lock stack > * The monitor recursions is set to the number of removed entries - 1 > * The owner is changed from anonymous to the thread > * The regular ObjectMonitor::action is called. Good work! Mostly looks good! I only have some minor comments and a question. src/hotspot/share/runtime/lockStack.cpp line 47: > 45: LockStack::LockStack(JavaThread* jt) : > 46: _top(lock_stack_base_offset), _base() { > 47: // Make sure the layout of the object is compatable with the emitted codes assumptions. Typo: compatible -> compatible, codes -> code's (?) src/hotspot/share/runtime/synchronizer.cpp line 530: > 528: LockStack& lock_stack = current->lock_stack(); > 529: if (lock_stack.is_full()) { > 530: // The emitted code always goes into the runtime incase the lock stack Typo: incase -> in case src/hotspot/share/runtime/synchronizer.cpp line 530: > 528: LockStack& lock_stack = current->lock_stack(); > 529: if (lock_stack.is_full()) { > 530: // The emitted code always goes into the runtime incase the lock stack What is the rationale behind this block? Is it beneficial to inflate the top-most lock to make room for the new one, because that might be hotter? If so, then it may be even more useful to inflate the bottom-most entry instead? ------------- Changes requested by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16606#pullrequestreview-1724787619 PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1389399785 PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1389412144 PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1389415768 From fyang at openjdk.org Fri Nov 10 14:58:59 2023 From: fyang at openjdk.org (Fei Yang) Date: Fri, 10 Nov 2023 14:58:59 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 12:17:16 GMT, Hamlin Li wrote: >>> I made a mistake, UseRVVForCompressBitsIntrinsics is only defined in riscv global.hpp. I think I can resolve the issue by defining it in global global.hpp, but seems it's not a good idea either. Any suggestions? >> >> Maybe `bool Matcher::match_rule_supported(int opcode) {` in `riscv.ad` is a good place, and just returning `UseRVV` would be enough for `Op_CompressBits`?: >> https://github.com/openjdk/jdk/blob/c788160f8acea7b58b54ad857b601bb7ffb53f8e/src/hotspot/cpu/riscv/riscv.ad#L1896-L1897 > > Thanks @feilongjiang for pointing at the postion. > > @robehn @theRealAph I agree, thanks for discussion @Hamlin-Li : Thanks for the update. But I am still not satisfied with current approach. The issue is that we will be wasting vector registers when running on hardwares equipped with RVV registers of bigger width, say 256-bits. We are reserving more vector registers than needed in that case, which might mean some extra vector register spilling/reloading under high register pressure. We should consider this issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16481#issuecomment-1805888144 From rkennke at openjdk.org Fri Nov 10 15:05:01 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 10 Nov 2023 15:05:01 GMT Subject: RFR: 8319799: Recursive lightweight locking: x86 implementation In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 12:59:27 GMT, Axel Boldt-Christmas wrote: > Implements the x86 port of JDK-8319796. > > There are two major parts for the port implementation. The C2 part, and the part shared by the interpreter, C1 and the native call wrapper. > > The biggest change for both parts is that we check the lock stack first and if it is a recursive lightweight [un]lock and in that case simply pop/push and finish successfully. > > Only if the recursive lightweight [un]lock fails does it look at the mark word. > > For the shared part if it is an unstructured exit, the monitor is inflated or the mark word transition fails it calls into the runtime. > > The C2 operates under a few more assumptions, that the locking is structured and balanced. This means that some checks can be elided. > > First this means that in C2 unlock if the obj is not on the top of the lock stack, it must be inflated. And reversely if we reach the inflated C2 unlock the obj is not on the lock stack. This second property makes it possible to avoid reading the owner (and checking if it is anonymous). Instead it can either just do an un-contended unlock by writing null to the owner, or if contention happens, simply write the thread to the owner and jump to the runtime. > > The x86 C2 port also has some extra oddities. > > The mark word read is done early as it showed better scaling in hyper-threaded scenarios on certain intel hardware, and no noticeable downside on other tested x86 hardware. > > The fast path is written to avoid going through conditional branches. This in combination with keeping the ZF output correct, the code does some actions eagerly, decrementing the held monitor count, popping from the lock stack. And jumps to a code stub if a slow path is required which restores the thread local state to a correct state before jumping to the runtime. > > The contended unlock was also moved to the code stub. Very good stuff! I like how the C2 code-paths are re-shaped. I only have minor comments and a question. src/hotspot/cpu/x86/c2_CodeStubs_x86.cpp line 123: > 121: __ movptr(Address(monitor, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner)), _thread); > 122: > 123: // succsesor null check. typo: succsesor -> successor src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 971: > 969: > 970: // Check if lock-stack is full. > 971: cmpl(Address(thread, JavaThread::lock_stack_top_offset()), LockStack::end_offset() - 1); I believe you can mov the movl(top, Address(thread, JavaThread::lock_stack_top_offset())) here, and use top in both checks. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 979: > 977: jccb(Assembler::equal, push); > 978: > 979: // Check for monitor (0b10). It baffles me a little bit that we check for the monitor only after we checked for full-lock-stack and recursive locking. This means that if the object is monitor-locked, it has to wait for 3 loads (mark-word, top-of-stack-offset and top-of-stack) and two (pointless) test-and-branches. This seems to optimise the lw-locking case at the expense of monitor-locking case. I'm not sure that this is the right trade-off. You said in the description that this scales better? Can you elaborate on that? src/hotspot/cpu/x86/macroAssembler_x86.cpp line 9817: > 9815: > 9816: // Check if the lock-stack is full. > 9817: cmpl(Address(thread, JavaThread::lock_stack_top_offset()), LockStack::end_offset()); I believe you can mov the movl(top, Address(thread, JavaThread::lock_stack_top_offset())) here, and use top in both checks. ------------- Changes requested by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16607#pullrequestreview-1724936988 PR Review Comment: https://git.openjdk.org/jdk/pull/16607#discussion_r1389503804 PR Review Comment: https://git.openjdk.org/jdk/pull/16607#discussion_r1389490845 PR Review Comment: https://git.openjdk.org/jdk/pull/16607#discussion_r1389500182 PR Review Comment: https://git.openjdk.org/jdk/pull/16607#discussion_r1389489016 From rgiulietti at openjdk.org Fri Nov 10 15:06:01 2023 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Fri, 10 Nov 2023 15:06:01 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v2] In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: On Thu, 9 Nov 2023 04:16:25 GMT, Roger Riggs wrote: >> Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. >> The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. >> >> The changes include: >> >> - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate >> if the input array is modified before the constructor returns. >> The resulting string may contain any combination of characters sampled from the input array. >> >> - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. >> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. >> If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. >> If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. >> Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. >> Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. >> >> - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. >> >> - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. > > Roger Riggs has updated the pull request incrementally with three additional commits since the last revision: > > - Refactored extractCodePoints to avoid multiple resizes if the array was modified > - Replaced isLatin1 implementation with `getChar(buf, ndx) <= 0xff` > It performs better than the single byte array access by avoiding the bounds check. > - Misc updates for review comments, javadoc cleanup > Extra checking on maximum string lengths when calling toBytes(). src/java.base/share/classes/java/lang/StringUTF16.java line 279: > 277: } else { > 278: // Pass 1: Compute precise size of char[]; see extractCodePoints for caveat > 279: int estSize = ndx + computeCodePointSize(val, off, end); To avoid reallocations in `extractCodepoints()`, a way is to profit from the presence of `latin1[]`, putting `latin1[i] = (byte) (cp >>> 16)`, starting from `ndx`, while scanning the `val[]` in `computeCodePointSize()` to preserve information about the upper bits of the codepoint. Later, while copying the `val[]` codepoints to `utf16[]`, the info in `latin1[]` is included in the `cp` just read from `val[]`, for example as in `cp = (cp & 0xffff) | ((latin1[i] & 0xff) << 16)`. The resulting codepoint would be BMP or supplementary as when it was scanned during `computeCodePointSize()`, even in presence of later modifications, and preserving the original value if it wasn't modified in the meantime. Since the info about a codepoint needing 1 or 2 chars in `utf16[]` is preserved in `latin1[]`, there should be no need for reallocations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1389509263 From mli at openjdk.org Fri Nov 10 15:11:00 2023 From: mli at openjdk.org (Hamlin Li) Date: Fri, 10 Nov 2023 15:11:00 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v6] In-Reply-To: <4shJ-ET362VIOhvAKhA0FGBpn-_pofC0WI1D_ePl7v0=.a42608ad-56dc-4827-9435-0f3db631ca4b@github.com> References: <4shJ-ET362VIOhvAKhA0FGBpn-_pofC0WI1D_ePl7v0=.a42608ad-56dc-4827-9435-0f3db631ca4b@github.com> Message-ID: On Fri, 10 Nov 2023 11:43:16 GMT, Hamlin Li wrote: >> Hi, >> Can you review the change to add intrinsic for CompressBits for Long & Integer? >> Thanks! >> >> ##?Test >> pass jtreg test: >> test/jdk/java/lang/CompressExpand*.java > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > reserve all used v register; use t0 directly Yes, that's the potential possible issues. Or maybe we can tighten the matcher rule to enable the intrinsic: for example, `return UseRVV && (MaxVectorSize >= 32)` in `Matcher::match_rule_supported`, so for Long it will be v2(v3), v4(v5), for Integer, it will v2, v4. Does this make sense? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16481#issuecomment-1805905656 From rkennke at openjdk.org Fri Nov 10 15:31:12 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 10 Nov 2023 15:31:12 GMT Subject: RFR: 8318895: Deoptimization results in incorrect lightweight locking stack [v3] In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 10:41:16 GMT, Roman Kennke wrote: >> See JBS issue for details. >> >> I basically: >> - took the test-modification and turned it into its own test-case >> - added test runners for lightweight- and legacy-locking, so that we keep testing both, no matter what is the default >> - added Axels fix (mentioned in the JBS issue) with the modification to only inflate when exec_mode == Unpack_none, as explained by Richard. >> >> Testing: >> - [x] EATests.java >> - [x] tier1 >> - [x] tier2 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Update test/jdk/com/sun/jdi/EATests.java > > Co-authored-by: Richard Reingruber Thanks, all! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16568#issuecomment-1805939306 From rkennke at openjdk.org Fri Nov 10 15:31:13 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 10 Nov 2023 15:31:13 GMT Subject: Integrated: 8318895: Deoptimization results in incorrect lightweight locking stack In-Reply-To: References: Message-ID: <2t-1klBg1-GAN4ss9bqW8KpK98y5-8yYTjYgM61R8B8=.29af1d78-cbfa-44c4-8a3b-1ff237f6a0ec@github.com> On Wed, 8 Nov 2023 19:00:53 GMT, Roman Kennke wrote: > See JBS issue for details. > > I basically: > - took the test-modification and turned it into its own test-case > - added test runners for lightweight- and legacy-locking, so that we keep testing both, no matter what is the default > - added Axels fix (mentioned in the JBS issue) with the modification to only inflate when exec_mode == Unpack_none, as explained by Richard. > > Testing: > - [x] EATests.java > - [x] tier1 > - [x] tier2 This pull request has now been integrated. Changeset: ea1ffa34 Author: Roman Kennke URL: https://git.openjdk.org/jdk/commit/ea1ffa34192448317ce9a61a3588b0dee3a2ef44 Stats: 157 lines in 2 files changed: 154 ins; 0 del; 3 mod 8318895: Deoptimization results in incorrect lightweight locking stack Co-authored-by: Axel Boldt-Christmas Co-authored-by: Richard Reingruber Reviewed-by: dlong, rrich ------------- PR: https://git.openjdk.org/jdk/pull/16568 From mbaesken at openjdk.org Fri Nov 10 16:11:06 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 10 Nov 2023 16:11:06 GMT Subject: RFR: JDK-8319927: Add some logging after 8295159 Message-ID: [JDK-8295159](https://bugs.openjdk.org/browse/JDK-8295159) added some IEEE conformance checks and corrections on Linux and macOS/BSD , however in case of issues no logging is done, this should be improved. ------------- Commit messages: - JDK-8319927 Changes: https://git.openjdk.org/jdk/pull/16618/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16618&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319927 Stats: 37 lines in 4 files changed: 31 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/16618.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16618/head:pull/16618 PR: https://git.openjdk.org/jdk/pull/16618 From rriggs at openjdk.org Fri Nov 10 16:44:00 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Fri, 10 Nov 2023 16:44:00 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v2] In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: <6VulrG9chGhju3-nh8ibToI87bQQe2J0SMpnQSMzB6s=.a1fc7c4d-fcee-49db-a0ec-264a8528dce6@github.com> On Fri, 10 Nov 2023 14:59:57 GMT, Raffaello Giulietti wrote: >> Roger Riggs has updated the pull request incrementally with three additional commits since the last revision: >> >> - Refactored extractCodePoints to avoid multiple resizes if the array was modified >> - Replaced isLatin1 implementation with `getChar(buf, ndx) <= 0xff` >> It performs better than the single byte array access by avoiding the bounds check. >> - Misc updates for review comments, javadoc cleanup >> Extra checking on maximum string lengths when calling toBytes(). > > src/java.base/share/classes/java/lang/StringUTF16.java line 279: > >> 277: } else { >> 278: // Pass 1: Compute precise size of char[]; see extractCodePoints for caveat >> 279: int estSize = ndx + computeCodePointSize(val, off, end); > > To avoid reallocations in `extractCodepoints()`, a way is to profit from the presence of `latin1[]`, putting `latin1[i] = (byte) (cp >>> 16)`, starting from `ndx`, while scanning the `val[]` in `computeCodePointSize()` to preserve information about the upper bits of the codepoint. > > Later, while copying the `val[]` codepoints to `utf16[]`, the info in `latin1[]` is included in the `cp` just read from `val[]`, for example as in `cp = (cp & 0xffff) | ((latin1[i] & 0xff) << 16)`. > > The resulting codepoint would be BMP or supplementary as when it was scanned during `computeCodePointSize()`, even in presence of later modifications, and preserving the original value if it wasn't modified in the meantime. Since the info about a codepoint needing 1 or 2 chars in `utf16[]` is preserved in `latin1[]`, there should be no need for reallocations. Interesting idea, but it might mean that if the codepoint val[i] was modified it could result in a cp that did not (ever) exist in the input; creating a value out of thin air. The high bits would be from the first read of val[i] and the low bits from the 2nd read. The code in extractCodepoints could be simpler and computeCodePointSize just a little more comples. Creating values out of thin air is usually bad and could have (different) unexpected results in the app. The additional writes to the latin1 array would also slow down the normal case of computing the size whether or not the input was modified. On that basis, I'd keep the current approach to resizing only if needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1389637034 From rkennke at openjdk.org Fri Nov 10 18:13:57 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 10 Nov 2023 18:13:57 GMT Subject: RFR: 8319801: Recursive lightweight locking: aarch64 implementation In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 13:00:52 GMT, Axel Boldt-Christmas wrote: > Implements the aarch64 port of JDK-8319796. > > There are two major parts for the port implementation. The C2 part, and the part shared by the interpreter, C1 and the native call wrapper. > > The biggest change for both parts is that we check the lock stack first and if it is a recursive lightweight [un]lock and in that case simply pop/push and finish successfully. > > Only if the recursive lightweight [un]lock fails does it look at the mark word. > > For the shared part if it is an unstructured exit, the monitor is inflated or the mark word transition fails it calls into the runtime. > > The C2 operates under a few more assumptions, that the locking is structured and balanced. This means that some checks can be elided. > > First this means that in C2 unlock if the obj is not on the top of the lock stack, it must be inflated. And reversely if we reach the inflated C2 unlock the obj is not on the lock stack. This second property makes it possible to avoid reading the owner (and checking if it is anonymous). Instead it can either just do an un-contended unlock by writing null to the owner, or if contention happens, simply write the thread to the owner and jump to the runtime. > > The aarch64 C2 port tries to avoid stronger memory semantics where ever possible. In C2 lock it first does a relaxed load of the mark word to check for inflation. Both lock and unlock uses a load/store exclusive register pair to transition the mark word. Looks good! I like the reshaping of C2 fast-paths! Only one question. src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 268: > 266: br(Assembler::EQ, push); > 267: > 268: // Relaxed normal load to check for monitor. Optimization for monitor case. Is it beneficial to have this block here, right in front of the load-exclusive block below, which does the same thing? ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16608#pullrequestreview-1725300193 PR Review Comment: https://git.openjdk.org/jdk/pull/16608#discussion_r1389718960 From aph at openjdk.org Fri Nov 10 18:27:57 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 10 Nov 2023 18:27:57 GMT Subject: RFR: 8319801: Recursive lightweight locking: aarch64 implementation In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 18:05:51 GMT, Roman Kennke wrote: >> Implements the aarch64 port of JDK-8319796. >> >> There are two major parts for the port implementation. The C2 part, and the part shared by the interpreter, C1 and the native call wrapper. >> >> The biggest change for both parts is that we check the lock stack first and if it is a recursive lightweight [un]lock and in that case simply pop/push and finish successfully. >> >> Only if the recursive lightweight [un]lock fails does it look at the mark word. >> >> For the shared part if it is an unstructured exit, the monitor is inflated or the mark word transition fails it calls into the runtime. >> >> The C2 operates under a few more assumptions, that the locking is structured and balanced. This means that some checks can be elided. >> >> First this means that in C2 unlock if the obj is not on the top of the lock stack, it must be inflated. And reversely if we reach the inflated C2 unlock the obj is not on the lock stack. This second property makes it possible to avoid reading the owner (and checking if it is anonymous). Instead it can either just do an un-contended unlock by writing null to the owner, or if contention happens, simply write the thread to the owner and jump to the runtime. >> >> The aarch64 C2 port tries to avoid stronger memory semantics where ever possible. In C2 lock it first does a relaxed load of the mark word to check for inflation. Both lock and unlock uses a load/store exclusive register pair to transition the mark word. > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 268: > >> 266: br(Assembler::EQ, push); >> 267: >> 268: // Relaxed normal load to check for monitor. Optimization for monitor case. > > Is it beneficial to have this block here, right in front of the load-exclusive block below, which does the same thing? I'd keep it and turn the load-exclusive block below into a CAS. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16608#discussion_r1389737309 From rgiulietti at openjdk.org Fri Nov 10 19:34:59 2023 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Fri, 10 Nov 2023 19:34:59 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v2] In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: On Thu, 9 Nov 2023 04:16:25 GMT, Roger Riggs wrote: >> Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. >> The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. >> >> The changes include: >> >> - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate >> if the input array is modified before the constructor returns. >> The resulting string may contain any combination of characters sampled from the input array. >> >> - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. >> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. >> If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. >> If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. >> Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. >> Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. >> >> - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. >> >> - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. > > Roger Riggs has updated the pull request incrementally with three additional commits since the last revision: > > - Refactored extractCodePoints to avoid multiple resizes if the array was modified > - Replaced isLatin1 implementation with `getChar(buf, ndx) <= 0xff` > It performs better than the single byte array access by avoiding the bounds check. > - Misc updates for review comments, javadoc cleanup > Extra checking on maximum string lengths when calling toBytes(). Second take: Java code in src/ looks good Next take will be about code in test/ ------------- PR Comment: https://git.openjdk.org/jdk/pull/16425#issuecomment-1806325709 From mdoerr at openjdk.org Fri Nov 10 22:42:00 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 10 Nov 2023 22:42:00 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v2] In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: On Thu, 9 Nov 2023 04:16:25 GMT, Roger Riggs wrote: >> Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. >> The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. >> >> The changes include: >> >> - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate >> if the input array is modified before the constructor returns. >> The resulting string may contain any combination of characters sampled from the input array. >> >> - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. >> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. >> If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. >> If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. >> Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. >> Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. >> >> - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. >> >> - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. > > Roger Riggs has updated the pull request incrementally with three additional commits since the last revision: > > - Refactored extractCodePoints to avoid multiple resizes if the array was modified > - Replaced isLatin1 implementation with `getChar(buf, ndx) <= 0xff` > It performs better than the single byte array access by avoiding the bounds check. > - Misc updates for review comments, javadoc cleanup > Extra checking on maximum string lengths when calling toBytes(). Can you include PPC64, please? diff --git a/src/hotspot/cpu/ppc/ppc.ad b/src/hotspot/cpu/ppc/ppc.ad index 89ce51e997e..102701e4969 100644 --- a/src/hotspot/cpu/ppc/ppc.ad +++ b/src/hotspot/cpu/ppc/ppc.ad @@ -12727,16 +12727,8 @@ instruct string_compress(rarg1RegP src, rarg2RegP dst, iRegIsrc len, iRegIdst re ins_cost(300); format %{ "String Compress $src,$dst,$len -> $result \t// KILL $tmp1, $tmp2, $tmp3, $tmp4, $tmp5" %} ins_encode %{ - Label Lskip, Ldone; - __ li($result$$Register, 0); - __ string_compress_16($src$$Register, $dst$$Register, $len$$Register, $tmp1$$Register, - $tmp2$$Register, $tmp3$$Register, $tmp4$$Register, $tmp5$$Register, Ldone); - __ rldicl_($tmp1$$Register, $len$$Register, 0, 64-3); // Remaining characters. - __ beq(CCR0, Lskip); - __ string_compress($src$$Register, $dst$$Register, $tmp1$$Register, $tmp2$$Register, Ldone); - __ bind(Lskip); - __ mr($result$$Register, $len$$Register); - __ bind(Ldone); + __ encode_iso_array($src$$Register, $dst$$Register, $len$$Register, $tmp1$$Register, $tmp2$$Register, + $tmp3$$Register, $tmp4$$Register, $tmp5$$Register, $result$$Register, false); %} ins_pipe(pipe_class_default); %} @offamitkumar: I guess s390 also needs an adaptation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16425#issuecomment-1806526012 From jjoo at openjdk.org Sat Nov 11 00:23:28 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Sat, 11 Nov 2023 00:23:28 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v41] In-Reply-To: References: Message-ID: <3kqtWcZA2lY3fPjUgyo5aO_-4SicTOPzF6AnKGyRCBA=.53d5b175-7e0f-4f6b-91a7-e4cfea62cb7a@github.com> > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with two additional commits since the last revision: - Refactor ConcurrentRefine logic - Make CPUTimeCounters a singleton class ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/41771db6..533af850 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=40 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=39-40 Stats: 151 lines in 17 files changed: 62 ins; 62 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From jjoo at openjdk.org Sat Nov 11 00:23:29 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Sat, 11 Nov 2023 00:23:29 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v40] In-Reply-To: <4EVNbC2fI1AQGHMkRMfI6SDJrw98KEX0xuRJR1s361o=.d1ff2f1a-ffc9-4454-9755-e6e9e14d9110@github.com> References: <4EVNbC2fI1AQGHMkRMfI6SDJrw98KEX0xuRJR1s361o=.d1ff2f1a-ffc9-4454-9755-e6e9e14d9110@github.com> Message-ID: <1NfEWhHC6ZHOtFiDpwcpZwuxDzbf-sfY7IenNn3nv9M=.d189f48e-8772-4ec2-bb17-bd912a28c5ab@github.com> On Thu, 9 Nov 2023 10:29:29 GMT, Stefan Johansson wrote: >> Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: >> >> Add missing cpuTimeCounters files > > src/hotspot/share/gc/g1/g1ConcurrentRefineThread.cpp line 189: > >> 187: void G1PrimaryConcurrentRefineThread::maybe_update_threads_cpu_time() { >> 188: if (UsePerfData && os::is_thread_cpu_time_supported()) { >> 189: cr()->update_concurrent_refine_threads_cpu_time(); > > I think we should pull the tracking closure in here and that way leave the concurrent refine class untouched. > > Suggestion: > > // The primary thread is responsible for updating the CPU time for all workers. > CPUTimeCounters* counters = G1CollectedHeap::heap()->cpu_time_counters(); > ThreadTotalCPUTimeClosure tttc(counters, CPUTimeGroups::gc_conc_refine); > cr()->threads_do(&tttc); > > > This is more or less a copy from `G1ConcurrentRefineThreadControl::update_threads_cpu_time()` which if we go with this solution can be removed. The above needs some new includes though. > > I change the comment a because I could not fully understand it, the primary thread is the one always checking and starting more threads so it is not stopped first. Also not sure when a terminated thread could be read. Even the stopped threads are still present so should be fine. If I'm missing something feel free to add back the comment. Thank you for the review! Do you think you could take a look at the newest update and see if that aligns with what you were thinking? I wanted to make sure I understood your comments correctly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1390054625 From jjoo at openjdk.org Sat Nov 11 00:30:07 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Sat, 11 Nov 2023 00:30:07 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v41] In-Reply-To: <3kqtWcZA2lY3fPjUgyo5aO_-4SicTOPzF6AnKGyRCBA=.53d5b175-7e0f-4f6b-91a7-e4cfea62cb7a@github.com> References: <3kqtWcZA2lY3fPjUgyo5aO_-4SicTOPzF6AnKGyRCBA=.53d5b175-7e0f-4f6b-91a7-e4cfea62cb7a@github.com> Message-ID: On Sat, 11 Nov 2023 00:23:28 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with two additional commits since the last revision: > > - Refactor ConcurrentRefine logic > - Make CPUTimeCounters a singleton class Will address the Remark pause update in my next commit! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15082#issuecomment-1806593319 From qamai at openjdk.org Sat Nov 11 01:01:01 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 11 Nov 2023 01:01:01 GMT Subject: RFR: 8319406: x86: Shorter movptr(reg, imm) for 32-bit immediates [v3] In-Reply-To: References: Message-ID: On Tue, 7 Nov 2023 12:53:45 GMT, Aleksey Shipilev wrote: >> Noticed this while doing C1 work, but the issue is more generic. If you look into x86 code, then sometimes you'll notice `movabs` with small immediates on x86. That `movabs` actually carries the full-blown 64-bit immediate. >> >> Similar to [JDK-8255838](https://bugs.openjdk.org/browse/JDK-8255838), it would be useful to shorten movptr(reg, imm) when immediate fits in 32 bits. This would compact some code, notably the code in C1 profiling ([JDK-8315843](https://bugs.openjdk.org/browse/JDK-8315843)), but also other code, generically. >> >> For example, sample branch profiling hunk from C1 tier3 on x86_64: >> >> >> Before: >> 0x00007f269065ed02: test %edx,%edx >> 0x00007f269065ed04: movabs $0x7f260a4ddd68,%rax ; {metadata(method data for {method} ? >> 0x00007f269065ed0e: movabs $0x138,%rsi >> ? 0x00007f269065ed18: je 0x00007f269065ed24 >> ? 0x00007f269065ed1a: movabs $0x148,%rsi >> ? 0x00007f269065ed24: mov (%rax,%rsi,1),%rdi >> 0x00007f269065ed28: lea 0x1(%rdi),%rdi >> 0x00007f269065ed2c: mov %rdi,(%rax,%rsi,1) >> 0x00007f269065ed30: je 0x00007f269065ed4e >> >> After: >> 0x00007f1370dcd782: test %edx,%edx >> 0x00007f1370dcd784: movabs $0x7f12f64ddd68,%rax ; {metadata(method data for {method} ? >> 0x00007f1370dcd78e: mov $0x138,%esi >> ? 0x00007f1370dcd793: je 0x00007f1370dcd79a >> ? 0x00007f1370dcd795: mov $0x148,%esi >> ? 0x00007f1370dcd79a: mov (%rax,%rsi,1),%rdi >> 0x00007f1370dcd79e: lea 0x1(%rdi),%rdi >> 0x00007f1370dcd7a2: mov %rdi,(%rax,%rsi,1) >> 0x00007f1370dcd7a6: je 0x00007f1370dcd7c4 >> >> >> We can use a shorter 32-bit immediate moves. In the hunk above, this saves about 8 bytes. >> >> This is not limited to the profiling code. There is observable code space savings on larger tests in C2, e.g. on `-Xcomp -XX:TieredStopAtLevel=... HelloWorld`. >> >> >> # Before >> nmethod code size : 430328 bytes >> nmethod code size : 467032 bytes >> nmethod code size : 908936 bytes >> nmethod code size : 1267816 bytes >> >> # After >> nmethod code size : 429616 bytes (-0.1%) >> nmethod code size : 466344 bytes (-0.1%) >> nmethod code size : 897144 bytes (-1.3%) >> nmethod code size : 1256216 bytes (-0.9%) >> >> >> There are two wrinkles: >> 1. Current `movslq(Register, int32_t)` is broken and protected by `ShouldNotReachHere()`. I would have used it in... > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Enlighs LGTM src/hotspot/cpu/x86/assembler_x86.cpp line 13453: > 13451: } > 13452: > 13453: void Assembler::movslq(Register dst, int32_t imm32) { You can remove the corresponding declaration in the header file. ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/16497#pullrequestreview-1725859284 PR Review Comment: https://git.openjdk.org/jdk/pull/16497#discussion_r1390070648 From duke at openjdk.org Sat Nov 11 06:57:17 2023 From: duke at openjdk.org (Lei Zaakjyu) Date: Sat, 11 Nov 2023 06:57:17 GMT Subject: RFR: JDK-8234502 : Merge GenCollectedHeap and SerialHeap Message-ID: JDK-8234502 : Merge GenCollectedHeap and SerialHeap ------------- Commit messages: - fix trialing whitespace - add copyright announsement - merge 'GenCollectedHeap' and 'SerialHeap' Changes: https://git.openjdk.org/jdk/pull/16623/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16623&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8234502 Stats: 2899 lines in 16 files changed: 1461 ins; 1418 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/16623.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16623/head:pull/16623 PR: https://git.openjdk.org/jdk/pull/16623 From aph at openjdk.org Sat Nov 11 10:08:57 2023 From: aph at openjdk.org (Andrew Haley) Date: Sat, 11 Nov 2023 10:08:57 GMT Subject: RFR: 8318158: RISC-V: implement roundD/roundF intrinsics [v2] In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 17:51:41 GMT, Vladimir Kempik wrote: >> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4263: >> >>> 4261: fadd_s(ftmp, src, ftmp); >>> 4262: fcvt_w_s(dst, ftmp, RoundingMode::rdn); >>> 4263: >> >> This still doesn't look right to me. I urge you to test it against the Java implementation over the full 32-bit range. > > I think it may start working if rounding mode for fadd_s would be changed from default rne, to rdn It won't because of double rounding, and changing rounding modes is expensive. https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L5882 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16382#discussion_r1390193250 From duke at openjdk.org Sat Nov 11 10:25:15 2023 From: duke at openjdk.org (Lei Zaakjyu) Date: Sat, 11 Nov 2023 10:25:15 GMT Subject: RFR: JDK-8234502 : Merge GenCollectedHeap and SerialHeap [v2] In-Reply-To: References: Message-ID: > JDK-8234502 : Merge GenCollectedHeap and SerialHeap Lei Zaakjyu has updated the pull request incrementally with one additional commit since the last revision: include 'serialVMOperations.hpp' ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16623/files - new: https://git.openjdk.org/jdk/pull/16623/files/08f1f0a8..4e74cc25 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16623&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16623&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16623.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16623/head:pull/16623 PR: https://git.openjdk.org/jdk/pull/16623 From vkempik at openjdk.org Sat Nov 11 10:45:57 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Sat, 11 Nov 2023 10:45:57 GMT Subject: RFR: 8318158: RISC-V: implement roundD/roundF intrinsics [v2] In-Reply-To: References: Message-ID: On Sat, 11 Nov 2023 10:06:26 GMT, Andrew Haley wrote: >> I think it may start working if rounding mode for fadd_s would be changed from default rne, to rdn > > It won't because of double rounding, and changing rounding modes is expensive. > https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L5882 But doing fadd 0.5 to the number, which can't have fractional part, in rdn mode becomes no-op. At least on single precision floats it works: fadd(-8388609.0, +0.5, rdn) results in -8388609.0 and the mode for both fadd and fcvt will be the same, (perf tests showed no difference on thead tho) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16382#discussion_r1390198946 From duke at openjdk.org Sat Nov 11 11:49:33 2023 From: duke at openjdk.org (Lei Zaakjyu) Date: Sat, 11 Nov 2023 11:49:33 GMT Subject: RFR: JDK-8234502 : Merge GenCollectedHeap and SerialHeap [v3] In-Reply-To: References: Message-ID: > JDK-8234502 : Merge GenCollectedHeap and SerialHeap Lei Zaakjyu has updated the pull request incrementally with one additional commit since the last revision: Fix 'young_gen' function in 'genCollectedHeap.cpp' ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16623/files - new: https://git.openjdk.org/jdk/pull/16623/files/4e74cc25..883dd7b1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16623&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16623&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16623.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16623/head:pull/16623 PR: https://git.openjdk.org/jdk/pull/16623 From aph at openjdk.org Sun Nov 12 09:37:02 2023 From: aph at openjdk.org (Andrew Haley) Date: Sun, 12 Nov 2023 09:37:02 GMT Subject: RFR: 8318158: RISC-V: implement roundD/roundF intrinsics [v2] In-Reply-To: References: Message-ID: On Sat, 11 Nov 2023 10:43:35 GMT, Vladimir Kempik wrote: >> It won't because of double rounding, and changing rounding modes is expensive. >> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L5882 > > But doing fadd 0.5 to the number, which can't have fractional part, in rdn mode becomes no-op. > At least on single precision floats it works: > fadd(-8388609.0, +0.5, rdn) results in -8388609.0 > and the mode for both fadd and fcvt will be the same, (perf tests showed no difference on thead tho) Maybe. I didn't try it, but on a great big out-of-order machine changing floating-point modes can be fantastically expensive, forcing ops in progress to retire, changing mode, and then continuing. Effectively it's as bad as a mispredict. Given that a correct solution that doesn't involve changing modes is available, I don't see why you wouldn't use it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16382#discussion_r1390371225 From tanksherman27 at gmail.com Sun Nov 12 14:49:59 2023 From: tanksherman27 at gmail.com (Julian Waters) Date: Sun, 12 Nov 2023 22:49:59 +0800 Subject: What is Vectored Exception Handling used for? Message-ID: >From what I can tell, Vectored Exception Handling replaces Structured Exception Handling (at least for some cases) on ARM64 Windows, and it replaces Structured Exception Handling in thread_native_entry for thread->call_run() and for calls to JNI_CreateJavaVM_inner and jni_DestroyJavaVM_inner. But in the Vectored Exception Filter, it only seems to work for faults that happen in the CodeCache, which neither of the above cases (at least to my knowledge) are. The above confuses me a little, what is Vectored Exception Handling used for within HotSpot, since the few use cases for it aren't actually handled by the Vectored Filter? best regards, Julian -------------- next part -------------- An HTML attachment was scrubbed... URL: From dholmes at openjdk.org Mon Nov 13 00:42:02 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 13 Nov 2023 00:42:02 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v9] In-Reply-To: <0xPQTdT1Zrk_FuFdU5b7DWr3MDPfkrQJ81pmuRqkNpM=.154c0b0c-996a-4114-bd40-e8ff5926aa20@github.com> References: <0xPQTdT1Zrk_FuFdU5b7DWr3MDPfkrQJ81pmuRqkNpM=.154c0b0c-996a-4114-bd40-e8ff5926aa20@github.com> Message-ID: On Fri, 10 Nov 2023 13:29:18 GMT, Stefan Karlsson wrote: >> A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Do stuff in the synchronized block of the test test/hotspot/jtreg/runtime/Monitor/ConcurrentDeflation.java line 71: > 69: > 70: static private void createMonitors() { > 71: int monitorCount = 0; I think this needs to be static to prevent it being hoisted out of the sync block. I try not to assume how clever the JIT might be in this area. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1390522410 From dholmes at openjdk.org Mon Nov 13 00:57:57 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 13 Nov 2023 00:57:57 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 12:18:29 GMT, Axel Boldt-Christmas wrote: > LM_LIGHTWEIGHT only uses the lock bits for its locking. This leaves the hashCode bits free when a monitor is not inflated. So instead of inflating when installing the hashCode on a fast locked object it can simply use the hashCode bits in the markWord. > > The mark word transitions Unlocked (0b01) <=> Locked (0b00) are done by retrying the CAS if it fails due to non-lock bit changes. > The mark word transitions Monitor (0b10) <=> Locked/Unlocked (0b0X) are the same as before, inflation already handles hash codes. This change does not interact with the mark word if it is in a Monitor (0b10) state, so the strong CAS which is used for deflation are still valid, and will not fail to any other reason than the cooperative race to help transition the mark word during deflation. > > This is dependent on JDK-8319778 simply because JDK-8319797 is dependent on both this and JDK-8319778. Seems reasonable - though more complex than I had envisaged. One query below and a change requested for the test. Thanks src/hotspot/share/runtime/synchronizer.cpp line 582: > 580: // It can only have installed an anonymously locked monitor at this point. > 581: // Fetch that monitor, set owner correctly to this thread, and > 582: // exit it (allowing waiting threads to enter). I don't understand why the anonymous owner case is no longer being checked. ?? test/hotspot/jtreg/runtime/whitebox/TestWBDeflateIdleMonitors.java line 69: > 67: obj = new Object(); > 68: synchronized (obj) { > 69: if (LockingMode != LM_LIGHTWEIGHT) { I don't think we need to make this test behave differently because of implementation details of different locking modes. Just use `wait(1)` unconditionally. Thanks. ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16603#pullrequestreview-1726397913 PR Review Comment: https://git.openjdk.org/jdk/pull/16603#discussion_r1390528029 PR Review Comment: https://git.openjdk.org/jdk/pull/16603#discussion_r1390529012 From dholmes at openjdk.org Mon Nov 13 01:09:59 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 13 Nov 2023 01:09:59 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 13:34:13 GMT, Roman Kennke wrote: >> Implements the runtime part of JDK-8319796. >> The different CPU implementations are/will be created as dependent pull requests. >> >> This enhancement proposes introducing the ability for LM_LIGHTWEIGHT to handle consecutive recursive monitor enter. Limiting the implementation to only consecutive monitor enters allows for more efficient emitted code which only needs to look at the two top most entires on the lock stack to determine what to do in a monitor exit. >> >> A high level overview: >> * Locking is still performed on the mark word >> * Unlocked (0b01) <=> Locked (0b00) >> * Monitor enter on Obj with mark word Unlocked (0b01) is the same >> * Transition Obj's mark word Unlocked (0b01) => Locked (0b00) >> * Push Obj onto the lock stack >> * Success >> * Monitor enter on Obj with mark word Locked (0b00) will check the top entry on the lock stack >> * If top entry is Obj >> * Push Obj on the lock stack >> * Success >> * If top entry is not Obj >> * Inflate and call ObjectMonitor::enter >> * Monitor exit on Obj with mark word Locked (0b00) will check the two top entries on the lock stack >> * If just the top entry is Obj >> * Transition Obj's mark word Locked (0b00) => Unlocked (0b01) >> * Pop the entry >> * Success >> * If both entries are Obj >> * Pop the top entry >> * Success >> * Any other case only occurs for unstructured locking, then just inflate and call ObjectMonitor::exit >> * If the monitor has been inflated for object Obj which is owned by the current thread >> * All corresponding entries for Obj is removed from the lock stack >> * The monitor recursions is set to the number of removed entries - 1 >> * The owner is changed from anonymous to the thread >> * The regular ObjectMonitor::action is called. > > src/hotspot/share/runtime/lockStack.cpp line 47: > >> 45: LockStack::LockStack(JavaThread* jt) : >> 46: _top(lock_stack_base_offset), _base() { >> 47: // Make sure the layout of the object is compatable with the emitted codes assumptions. > > Typo: compatible -> compatible, codes -> code's (?) Typo: codes -> code's ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1390536368 From dholmes at openjdk.org Mon Nov 13 01:36:58 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 13 Nov 2023 01:36:58 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 12:40:03 GMT, Axel Boldt-Christmas wrote: > Implements the runtime part of JDK-8319796. > The different CPU implementations are/will be created as dependent pull requests. > > This enhancement proposes introducing the ability for LM_LIGHTWEIGHT to handle consecutive recursive monitor enter. Limiting the implementation to only consecutive monitor enters allows for more efficient emitted code which only needs to look at the two top most entires on the lock stack to determine what to do in a monitor exit. > > A high level overview: > * Locking is still performed on the mark word > * Unlocked (0b01) <=> Locked (0b00) > * Monitor enter on Obj with mark word Unlocked (0b01) is the same > * Transition Obj's mark word Unlocked (0b01) => Locked (0b00) > * Push Obj onto the lock stack > * Success > * Monitor enter on Obj with mark word Locked (0b00) will check the top entry on the lock stack > * If top entry is Obj > * Push Obj on the lock stack > * Success > * If top entry is not Obj > * Inflate and call ObjectMonitor::enter > * Monitor exit on Obj with mark word Locked (0b00) will check the two top entries on the lock stack > * If just the top entry is Obj > * Transition Obj's mark word Locked (0b00) => Unlocked (0b01) > * Pop the entry > * Success > * If both entries are Obj > * Pop the top entry > * Success > * Any other case only occurs for unstructured locking, then just inflate and call ObjectMonitor::exit > * If the monitor has been inflated for object Obj which is owned by the current thread > * All corresponding entries for Obj is removed from the lock stack > * The monitor recursions is set to the number of removed entries - 1 > * The owner is changed from anonymous to the thread > * The regular ObjectMonitor::action is called. I get the gist of this but I have a number of concerns about what is getting checked and when. I have a suspicion that recursion support is making the lockstack full more often and thus invalidating the sizing of the lockStack that was based on there being no recursion support. In that case we need to bump the size. A full lockstack should be rare not something we check for first. src/hotspot/share/runtime/lockStack.cpp line 50: > 48: STATIC_ASSERT(sizeof(_bad_oop_sentinel) == oopSize); > 49: STATIC_ASSERT(sizeof(_base[0]) == oopSize); > 50: STATIC_ASSERT(std::is_standard_layout::value); What is this? Are we allowed to use it? src/hotspot/share/runtime/lockStack.cpp line 82: > 80: if (VM_Version::supports_recursive_lightweight_locking()) { > 81: oop o = _base[i]; > 82: for (;i < top - 1; i++) { Nit: space after first `;` src/hotspot/share/runtime/lockStack.inline.hpp line 29: > 27: #define SHARE_RUNTIME_LOCKSTACK_INLINE_HPP > 28: > 29: #include "runtime/lockStack.hpp" Why was this pulled out first? src/hotspot/share/runtime/synchronizer.cpp line 395: > 393: // Always go into runtime if the lock stack is full. > 394: return false; > 395: } It isn't obvious that it is beneficial to check what should be a rare occurrence. Why do this? src/hotspot/share/runtime/synchronizer.cpp line 609: > 607: return; > 608: } else if (mark.is_fast_locked() && lock_stack.is_recursive(object)) { > 609: // This lock is recursive but unstructured exit. Just inflate the lock. Again this seems in the wrong place - this should be a very rare case so we should not be checking it explicitly before the expected cases! ------------- PR Review: https://git.openjdk.org/jdk/pull/16606#pullrequestreview-1726411316 PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1390536608 PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1390537026 PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1390540833 PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1390546734 PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1390548291 From dholmes at openjdk.org Mon Nov 13 01:37:00 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 13 Nov 2023 01:37:00 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 01:09:48 GMT, David Holmes wrote: >> Implements the runtime part of JDK-8319796. >> The different CPU implementations are/will be created as dependent pull requests. >> >> This enhancement proposes introducing the ability for LM_LIGHTWEIGHT to handle consecutive recursive monitor enter. Limiting the implementation to only consecutive monitor enters allows for more efficient emitted code which only needs to look at the two top most entires on the lock stack to determine what to do in a monitor exit. >> >> A high level overview: >> * Locking is still performed on the mark word >> * Unlocked (0b01) <=> Locked (0b00) >> * Monitor enter on Obj with mark word Unlocked (0b01) is the same >> * Transition Obj's mark word Unlocked (0b01) => Locked (0b00) >> * Push Obj onto the lock stack >> * Success >> * Monitor enter on Obj with mark word Locked (0b00) will check the top entry on the lock stack >> * If top entry is Obj >> * Push Obj on the lock stack >> * Success >> * If top entry is not Obj >> * Inflate and call ObjectMonitor::enter >> * Monitor exit on Obj with mark word Locked (0b00) will check the two top entries on the lock stack >> * If just the top entry is Obj >> * Transition Obj's mark word Locked (0b00) => Unlocked (0b01) >> * Pop the entry >> * Success >> * If both entries are Obj >> * Pop the top entry >> * Success >> * Any other case only occurs for unstructured locking, then just inflate and call ObjectMonitor::exit >> * If the monitor has been inflated for object Obj which is owned by the current thread >> * All corresponding entries for Obj is removed from the lock stack >> * The monitor recursions is set to the number of removed entries - 1 >> * The owner is changed from anonymous to the thread >> * The regular ObjectMonitor::action is called. > > src/hotspot/share/runtime/lockStack.cpp line 82: > >> 80: if (VM_Version::supports_recursive_lightweight_locking()) { >> 81: oop o = _base[i]; >> 82: for (;i < top - 1; i++) { > > Nit: space after first `;` Though rather than walk the lockstack twice can't we just change the check below to something like: if (VM_Version::supports_recursive_lightweight_locking() && i != j - 1) { assert(_base[i] != _base[j], "entries must be unique: %s", msg); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1390539219 From dholmes at openjdk.org Mon Nov 13 01:37:01 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 13 Nov 2023 01:37:01 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation In-Reply-To: References: Message-ID: <-lwt39Gx_QJfxgzgSLHkysdtOrVxgP8dFh7gN4TDkmY=.86139caf-08c2-484f-999f-fa6cf121f9df@github.com> On Fri, 10 Nov 2023 13:45:40 GMT, Roman Kennke wrote: >> Implements the runtime part of JDK-8319796. >> The different CPU implementations are/will be created as dependent pull requests. >> >> This enhancement proposes introducing the ability for LM_LIGHTWEIGHT to handle consecutive recursive monitor enter. Limiting the implementation to only consecutive monitor enters allows for more efficient emitted code which only needs to look at the two top most entires on the lock stack to determine what to do in a monitor exit. >> >> A high level overview: >> * Locking is still performed on the mark word >> * Unlocked (0b01) <=> Locked (0b00) >> * Monitor enter on Obj with mark word Unlocked (0b01) is the same >> * Transition Obj's mark word Unlocked (0b01) => Locked (0b00) >> * Push Obj onto the lock stack >> * Success >> * Monitor enter on Obj with mark word Locked (0b00) will check the top entry on the lock stack >> * If top entry is Obj >> * Push Obj on the lock stack >> * Success >> * If top entry is not Obj >> * Inflate and call ObjectMonitor::enter >> * Monitor exit on Obj with mark word Locked (0b00) will check the two top entries on the lock stack >> * If just the top entry is Obj >> * Transition Obj's mark word Locked (0b00) => Unlocked (0b01) >> * Pop the entry >> * Success >> * If both entries are Obj >> * Pop the top entry >> * Success >> * Any other case only occurs for unstructured locking, then just inflate and call ObjectMonitor::exit >> * If the monitor has been inflated for object Obj which is owned by the current thread >> * All corresponding entries for Obj is removed from the lock stack >> * The monitor recursions is set to the number of removed entries - 1 >> * The owner is changed from anonymous to the thread >> * The regular ObjectMonitor::action is called. > > src/hotspot/share/runtime/synchronizer.cpp line 530: > >> 528: LockStack& lock_stack = current->lock_stack(); >> 529: if (lock_stack.is_full()) { >> 530: // The emitted code always goes into the runtime incase the lock stack > > What is the rationale behind this block? Is it beneficial to inflate the top-most lock to make room for the new one, because that might be hotter? If so, then it may be even more useful to inflate the bottom-most entry instead? I'm also unclear on the rationale, and again on checking for a full-stack upfront like this, when it should be a rare case. If recursion support means the lockStack is no longer big enough then we need to increase its size to accommodate that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1390547845 From kbarrett at openjdk.org Mon Nov 13 01:42:09 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 13 Nov 2023 01:42:09 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v10] In-Reply-To: <6UvVZPlnf_cSgctUVPspR767dHLa7vxOA4DiC7fJ3LY=.35e24443-ab4d-4240-a991-2d98d120bb3b@github.com> References: <6UvVZPlnf_cSgctUVPspR767dHLa7vxOA4DiC7fJ3LY=.35e24443-ab4d-4240-a991-2d98d120bb3b@github.com> Message-ID: On Fri, 10 Nov 2023 12:23:27 GMT, Oli Gillespie wrote: >> Attempt to fix regressions in class-loading performance caused by fixing a symbol table leak in [JDK-8313678](https://bugs.openjdk.org/browse/JDK-8313678). >> >> See lengthy discussion in https://bugs.openjdk.org/browse/JDK-8315559 for more background. In short, the leak was providing an accidental cache for temporary symbols, allowing reuse. >> >> This change keeps new temporary symbols alive in a queue for a short time, allowing them to be re-used by subsequent operations. For example, when attempting to load a class we may call JVM_FindLoadedClass for multiple classloaders back-to-back, and all of them will create a TempNewSymbol for the same string. At present, each call will leave a dead symbol in the table and create a new one. Dead symbols add cleanup and lookup overhead, and insertion is also costly. With this change, the symbol from the first invocation will live for some time after it is used, and subsequent callers can find the symbol alive in the table - avoiding the extra work. >> >> The queue is bounded, and when full new entries displace the oldest entry. This means symbols are held for the time it takes for 100 new temp symbols to be created. 100 is chosen arbitrarily - the tradeoff is memory usage versus 'cache' hit rate. >> >> When concurrent symbol table cleanup runs, it also drains the queue. >> >> In my testing, this brings Dacapo pmd performance back to where it was before the leak was fixed. >> >> Thanks @shipilev , @coleenp and @MDBijman for helping with this fix. > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Fix indentation, rename test helper Changes requested by kbarrett (Reviewer). src/hotspot/share/oops/symbolHandle.hpp line 48: > 46: class SymbolHandleBase : public StackObj { > 47: static Symbol* volatile _cleanup_delay_queue[]; > 48: static volatile uint _cleanup_delay_index; Putting the delay queue implementation in SymbolHandleBase<> makes unnecessary and unused data and possibly copies of the code. It is only used in the case where the template parameter is true. Better would be to put the cleanup delay queue in a separate, non-templated, class. The entire implementation of the queue could then be in the .cpp file. (I suggest the overhead of an out-of-line call to add to the queue is in the noise, given that adding to the queue performs 3 atomic RMW operations.) So something like this: // In symbolHandle.hpp. class TempSymbolCleanupDelayer : AllStatic { // Or make these file-scoped statics in the .cpp file. static const uint QueueSize = 128; static Symbol* volatile _queue[]; static volatile uint _index; public: static void delay_cleanup(Symbol* s); }; // In symbolHandle.cpp. Symbol* volatile TempSymbolCleanupDelayer::_queue[QueueSize] = {}; volatile uint TempSymbolCleanupDelayer::_index = 0; void TempSymbolCleanupDelayer::delay_cleanup(Symbol* sym) { assert(sym != nullptr, "precondition"); sym->increment_refcount(); uint i = Atomic::add(&_index, 1u) % QueueSize; Symbol* old = Atomic::xchg(&_queue, sym); Symbol::maybe_decrement_refcount(old); } Code is completely untested. It incorporates suggestions I'm making elsewhere in this review too. src/hotspot/share/oops/symbolHandle.hpp line 53: > 51: > 52: public: > 53: static constexpr uint CLEANUP_DELAY_MAX_ENTRIES = 128; This doesn't need to be public. src/hotspot/share/oops/symbolHandle.hpp line 58: > 56: > 57: // Conversion from a Symbol* to a SymbolHandleBase. > 58: // Does not increment the current reference count if temporary. This comment is no longer true for temp symbols, since adding to the delay queue increments the refcount. src/hotspot/share/oops/symbolHandle.hpp line 59: > 57: // Conversion from a Symbol* to a SymbolHandleBase. > 58: // Does not increment the current reference count if temporary. > 59: SymbolHandleBase(Symbol *s) : _temp(s) { Is this really called with nullptr sometimes? It would be better if that was disallowed. But that's probably outside the scope of this PR. src/hotspot/share/oops/symbolHandle.hpp line 99: > 97: sym->increment_refcount(); > 98: STATIC_ASSERT(is_power_of_2(CLEANUP_DELAY_MAX_ENTRIES)); // allow modulo shortcut > 99: uint i = Atomic::add(&_cleanup_delay_index, 1u) & (CLEANUP_DELAY_MAX_ENTRIES - 1); I disagree with @shipilev . Just use `% CLEANUP_DELAY_MAX_ENTRIES`. This is an entirely unreasonable amount of code to explicitly provide a micro-optimization that any non-stupid compiler will do for us anyway. At most, add a comment to the definition of that constant that being a power of 2 benefits the ring-buffer. But I wouldn't even bother with that. src/hotspot/share/oops/symbolHandle.hpp line 103: > 101: if (old != nullptr) { > 102: old->decrement_refcount(); > 103: } This conditional refcount decrement could instead be `Symbol::maybe_decrement_refcount(old);`. src/hotspot/share/oops/symbolHandle.hpp line 115: > 113: } > 114: > 115: static void drain_cleanup_delay_queue() { It's not obvious that draining the queue is useful. Unless there's a reason I'm missing, I suggest not doing so. test/hotspot/gtest/classfile/test_placeholders.cpp line 45: > 43: Symbol* D = SymbolTable::new_symbol("def2_8_2023_class"); > 44: Symbol* super = SymbolTable::new_symbol("super2_8_2023_supername"); > 45: Symbol* interf = SymbolTable::new_symbol("interface2_8_2023_supername"); This doesn't seem like the right way to update this test. Doesn't this leave the symbols dangling? And in the face of potential queue draining, it seems to me this could lead the test to intermittent failures. ------------- PR Review: https://git.openjdk.org/jdk/pull/16398#pullrequestreview-1726414311 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1390538976 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1390539047 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1390539239 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1390542435 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1390547204 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1390548044 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1390542654 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1390549795 From dholmes at openjdk.org Mon Nov 13 04:44:09 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 13 Nov 2023 04:44:09 GMT Subject: RFR: 8318776: Require supports_cx8 to always be true Message-ID: As discussed in JBS all platforms (some tweaks to Zero are in progress) actually do support `cx8` i.e. 64-bit compare-and-exchange, so we can strip out the locked-based alternatives to using it and just add a guarantee that it is true at runtime. And all platforms except some ARM variants set `SUPPORTS_NATIVE_CX8`, so we can greatly simplify things. Summary of changes: - `_supports_cx8` field is only needed when `SUPPORTS_NATIVE_CX8` is not defined - Assertions for `supports_cx8()` are removed - Access backend is greatly simplified without the need for lock-based alternative - `java.util.concurrent.AtomicLongFieldUpdater` is simplified without the need for a lock-based alternative I did consider moving all the ARM `kuser_helper` related code to be only defined when `SUPPORTS_NATIVE_CX8` is not defined, but there was a theoretical risk this could change the behaviour if ARMv7 binaries were run on other ARM CPU's. I added a note to that effect in vm_version_linux_arm32.cpp so the ARM port maintainers could clean this up further if desired. Testing: - All Oracle tiers 1-5 builds (which includes an ARMv7 build) - GHA builds/tests - Oracle tiers 1-3 sanity testing Zero changes coming in via [JDK-8319777](https://bugs.openjdk.org/browse/JDK-8319777) will be merged when they arrive. Thanks. ------------- Commit messages: - 8318776: Require supports_cx8 to always be true Changes: https://git.openjdk.org/jdk/pull/16625/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16625&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318776 Stats: 386 lines in 35 files changed: 14 ins; 359 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/16625.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16625/head:pull/16625 PR: https://git.openjdk.org/jdk/pull/16625 From dholmes at openjdk.org Mon Nov 13 04:50:12 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 13 Nov 2023 04:50:12 GMT Subject: RFR: 8318776: Require supports_cx8 to always be true [v2] In-Reply-To: References: Message-ID: > As discussed in JBS all platforms (some tweaks to Zero are in progress) actually do support `cx8` i.e. 64-bit compare-and-exchange, so we can strip out the locked-based alternatives to using it and just add a guarantee that it is true at runtime. And all platforms except some ARM variants set `SUPPORTS_NATIVE_CX8`, so we can greatly simplify things. Summary of changes: > - `_supports_cx8` field is only needed when `SUPPORTS_NATIVE_CX8` is not defined > - Assertions for `supports_cx8()` are removed > - Compiler predicates requiring `supports_cx8()` are removed > - Access backend is greatly simplified without the need for lock-based alternative > - `java.util.concurrent.AtomicLongFieldUpdater` is simplified without the need for a lock-based alternative > > I did consider moving all the ARM `kuser_helper` related code to be only defined when `SUPPORTS_NATIVE_CX8` is not defined, but there was a theoretical risk this could change the behaviour if ARMv7 binaries were run on other ARM CPU's. I added a note to that effect in vm_version_linux_arm32.cpp so the ARM port maintainers could clean this up further if desired. > > Testing: > - All Oracle tiers 1-5 builds (which includes an ARMv7 build) > - GHA builds/tests > - Oracle tiers 1-3 sanity testing > > Zero changes coming in via [JDK-8319777](https://bugs.openjdk.org/browse/JDK-8319777) will be merged when they arrive. > > Thanks. David Holmes has updated the pull request incrementally with one additional commit since the last revision: Remove test for VMSupportsCX8 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16625/files - new: https://git.openjdk.org/jdk/pull/16625/files/3f2ec66f..b6dea4b6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16625&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16625&range=00-01 Stats: 53 lines in 1 file changed: 0 ins; 53 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16625.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16625/head:pull/16625 PR: https://git.openjdk.org/jdk/pull/16625 From iklam at openjdk.org Mon Nov 13 04:58:11 2023 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 13 Nov 2023 04:58:11 GMT Subject: RFR: 8319944: Remove DynamicDumpSharedSpaces Message-ID: Please review this cleanup. Most of the changes are the following patterns: - `if (DumpSharedSpaces)` => `if (CDSConfig::is_dumping_dynamic_archive())` - `DumpSharedSpaces = true` => `CDSConfig::enable_dumping_dynamic_archive()` - `DumpSharedSpaces = false` => `CDSConfig::disable_dumping_dynamic_archive()` ------------- Commit messages: - 8319944: Remove DynamicDumpSharedSpaces Changes: https://git.openjdk.org/jdk/pull/16626/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16626&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319944 Stats: 60 lines in 15 files changed: 10 ins; 8 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/16626.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16626/head:pull/16626 PR: https://git.openjdk.org/jdk/pull/16626 From dholmes at openjdk.org Mon Nov 13 05:10:17 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 13 Nov 2023 05:10:17 GMT Subject: RFR: 8318776: Require supports_cx8 to always be true [v3] In-Reply-To: References: Message-ID: > As discussed in JBS all platforms (some tweaks to Zero are in progress) actually do support `cx8` i.e. 64-bit compare-and-exchange, so we can strip out the locked-based alternatives to using it and just add a guarantee that it is true at runtime. And all platforms except some ARM variants set `SUPPORTS_NATIVE_CX8`, so we can greatly simplify things. Summary of changes: > - `_supports_cx8` field is only needed when `SUPPORTS_NATIVE_CX8` is not defined > - Assertions for `supports_cx8()` are removed > - Compiler predicates requiring `supports_cx8()` are removed > - Access backend is greatly simplified without the need for lock-based alternative > - `java.util.concurrent.AtomicLongFieldUpdater` is simplified without the need for a lock-based alternative > > I did consider moving all the ARM `kuser_helper` related code to be only defined when `SUPPORTS_NATIVE_CX8` is not defined, but there was a theoretical risk this could change the behaviour if ARMv7 binaries were run on other ARM CPU's. I added a note to that effect in vm_version_linux_arm32.cpp so the ARM port maintainers could clean this up further if desired. > > Testing: > - All Oracle tiers 1-5 builds (which includes an ARMv7 build) > - GHA builds/tests > - Oracle tiers 1-3 sanity testing > > Zero changes coming in via [JDK-8319777](https://bugs.openjdk.org/browse/JDK-8319777) will be merged when they arrive. > > Thanks. David Holmes has updated the pull request incrementally with two additional commits since the last revision: - Remove cx8 comment as no longer relevant (the spinlock is used regardless of cx8) - Remove suports_cx8() checks from gtest ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16625/files - new: https://git.openjdk.org/jdk/pull/16625/files/b6dea4b6..65871144 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16625&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16625&range=01-02 Stats: 14 lines in 2 files changed: 0 ins; 14 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16625.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16625/head:pull/16625 PR: https://git.openjdk.org/jdk/pull/16625 From david.holmes at oracle.com Mon Nov 13 05:38:53 2023 From: david.holmes at oracle.com (David Holmes) Date: Mon, 13 Nov 2023 15:38:53 +1000 Subject: What is Vectored Exception Handling used for? In-Reply-To: References: Message-ID: Hi Julian, On 13/11/2023 12:49 am, Julian Waters wrote: > From what I can tell, Vectored Exception Handling replaces Structured > Exception Handling (at least for some cases) on ARM64 Windows, and it > replaces Structured Exception Handling in thread_native_entry for > thread->call_run() and for calls to?JNI_CreateJavaVM_inner > and?jni_DestroyJavaVM_inner. But in the Vectored Exception Filter, it > only seems to work for faults that happen in the CodeCache, which > neither of the above cases (at least to my knowledge) are. The above > confuses me a little, what is Vectored Exception Handling used for > within HotSpot, since the few use cases for it aren't actually handled > by the Vectored Filter? If the vectored handler doesn't handle it (and it only handles the codecache) then it passes up to the topLevelUnhandledExceptionFilter which calls report_error. VEH was proposed for hotspot in place of SEH but it was rejected, but there is a good discussion of it: https://bugs.openjdk.org/browse/JDK-8247941 However it is required for Windows-Aarch64 https://bugs.openjdk.org/browse/JDK-8248496 so came in for that port. It was also used by AOT at some point. HTH, David > best regards, > Julian From iklam at openjdk.org Mon Nov 13 05:53:14 2023 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 13 Nov 2023 05:53:14 GMT Subject: RFR: 8319944: Remove DynamicDumpSharedSpaces [v2] In-Reply-To: References: Message-ID: > Please review this cleanup. Most of the changes are the following patterns: > > - `if (DumpSharedSpaces)` => `if (CDSConfig::is_dumping_dynamic_archive())` > - `DumpSharedSpaces = true` => `CDSConfig::enable_dumping_dynamic_archive()` > - `DumpSharedSpaces = false` => `CDSConfig::disable_dumping_dynamic_archive()` Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: fixed typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16626/files - new: https://git.openjdk.org/jdk/pull/16626/files/25884414..c232da3a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16626&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16626&range=00-01 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16626.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16626/head:pull/16626 PR: https://git.openjdk.org/jdk/pull/16626 From dholmes at openjdk.org Mon Nov 13 06:10:57 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 13 Nov 2023 06:10:57 GMT Subject: RFR: 8319944: Remove DynamicDumpSharedSpaces [v2] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 05:53:14 GMT, Ioi Lam wrote: >> Please review this cleanup. Most of the changes are the following patterns: >> >> - `if (DumpSharedSpaces)` => `if (CDSConfig::is_dumping_dynamic_archive())` >> - `DumpSharedSpaces = true` => `CDSConfig::enable_dumping_dynamic_archive()` >> - `DumpSharedSpaces = false` => `CDSConfig::disable_dumping_dynamic_archive()` > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > fixed typo Looks good. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16626#pullrequestreview-1726593167 From fyang at openjdk.org Mon Nov 13 06:37:01 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 13 Nov 2023 06:37:01 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v6] In-Reply-To: <4shJ-ET362VIOhvAKhA0FGBpn-_pofC0WI1D_ePl7v0=.a42608ad-56dc-4827-9435-0f3db631ca4b@github.com> References: <4shJ-ET362VIOhvAKhA0FGBpn-_pofC0WI1D_ePl7v0=.a42608ad-56dc-4827-9435-0f3db631ca4b@github.com> Message-ID: On Fri, 10 Nov 2023 11:43:16 GMT, Hamlin Li wrote: >> Hi, >> Can you review the change to add intrinsic for CompressBits for Long & Integer? >> Thanks! >> >> ##?Test >> pass jtreg test: >> test/jdk/java/lang/CompressExpand*.java > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > reserve all used v register; use t0 directly Changes requested by fyang (Reviewer). src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1691: > 1689: > 1690: // load the src data(in bits) to be compressed. > 1691: vsetivli(x0, 1, sew, lmul); A default `lmul` of `m1` is enough to perform the succeeding `vmv_s_x` instuction as specified by the RVV spec. The integer scalar read/write instructions transfer a single value between a scalar x register and element 0 of a vector register. The instructions ignore LMUL and vector register groups. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1703: > 1701: // load the mask data(in bits). > 1702: vsetivli(x0, 1, sew, lmul); > 1703: vmv_v_x(v0, mask); Shouldn't this be `vmv_s_x(v0, mask)` instead of `vmv_v_x(v0, mask)`? The `vcompress.vm` instruction is expecting a vector mask register. Also the preceding `vsetivli` should be changed to use a default `lmul` of `m1` at the same time. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1710: > 1708: vmseq_vi(v0, v8, 1); > 1709: // store result back. > 1710: vsetivli(x0, 1, sew, lmul); Similar here. This `vsetivli` instruction should be changed to use a default lmul of m1 ------------- PR Review: https://git.openjdk.org/jdk/pull/16481#pullrequestreview-1726607955 PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1390664667 PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1390666669 PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1390667506 From fyang at openjdk.org Mon Nov 13 06:48:57 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 13 Nov 2023 06:48:57 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v6] In-Reply-To: References: <4shJ-ET362VIOhvAKhA0FGBpn-_pofC0WI1D_ePl7v0=.a42608ad-56dc-4827-9435-0f3db631ca4b@github.com> Message-ID: On Fri, 10 Nov 2023 15:07:51 GMT, Hamlin Li wrote: > Yes, that's the potential possible issues. Or maybe we can tighten the matcher rule to enable the intrinsic: for example, `return UseRVV && (MaxVectorSize >= 32)` (or even 64) in `Matcher::match_rule_supported`, so for Long it will be v2(v3), v4(v5), for Integer, it will v2, v4, just any 2 vector regs if we tighten the match rule to >=64. Does this make sense? I see chip vendors are shipping products with RVV VLEN of 128 bits. So I think it's more reasonable to go with the current implementation for now. While it seems that this would win in respect of number if instruction executed compared with the scalar version, we still need to revisit/bechmark this change when we have access to the real RVV hardware. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16481#issuecomment-1807557750 From rehn at openjdk.org Mon Nov 13 07:20:57 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 13 Nov 2023 07:20:57 GMT Subject: RFR: 8319781: RISC-V: Refactor UseRVV related checks In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 10:30:41 GMT, Hamlin Li wrote: > Hi, > Can you review the patch to refactor the code related UseRVV checks? > Thanks! > > There are some code (flag setting/checking) depending on UseRVV's value, these code should be refactored, especially after the change of https://bugs.openjdk.org/browse/JDK-8319408: > 1. some code needs to get the final UseRVV's value rather than the initial value, e.g. ChaCha20 intrinsic. > 2. refactored to be more readable. > 3. also add note to make sure the future code get the final UseRVV value instead of inital value. Hey, how is the changes to SpecialEncodeISOArray related ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16580#issuecomment-1807585701 From thartmann at openjdk.org Mon Nov 13 07:26:57 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 13 Nov 2023 07:26:57 GMT Subject: RFR: JDK-8316991: Reduce nullable allocation merges [v4] In-Reply-To: References: Message-ID: <9ruRW2rZxYXBEPWwT7s9bfsfipjfC-sddzxomBiOHNI=.b1a27ab4-d214-4091-a90c-a276b01587f7@github.com> On Mon, 30 Oct 2023 19:59:47 GMT, Cesar Soares Lucas wrote: >> ### Description >> >> Many, if not most, allocation merges (Phis) are nullable because they join object allocations with "NULL", or objects returned from method calls, etc. Please review this Pull Request that improves Reduce Allocation Merge implementation so that it can reduce at least some of these allocation merges. >> >> Overall, the improvements are related to 1) making rematerialization of merges able to represent "NULL" objects, and 2) being able to reduce merges used by CmpP/N and CastPP. >> >> The approach to reducing CmpP/N and CastPP is pretty similar to that used in the `MemNode::split_through_phi` method: a clone of the node being split is added on each input of the Phi. I make use of `optimize_ptr_compare` and some type information to remove redundant CmpP and CastPP nodes. I added a bunch of ASCII diagrams illustrating what some of the more important methods are doing. >> >> ### Benchmarking >> >> **Note:** In some of these tests no reduction happens. I left them in to validate that no perf. regression happens in that case. >> **Note 2:** Marging of error was negligible. >> >> | Benchmark | No RAM (ms/op) | Yes RAM (ms/op) | >> |--------------------------------------|------------------|-------------------| >> | TestTrapAfterMerge | 19.515 | 13.386 | >> | TestArgEscape | 33.165 | 33.254 | >> | TestCallTwoSide | 70.547 | 69.427 | >> | TestCmpAfterMerge | 16.400 | 2.984 | >> | TestCmpMergeWithNull_Second | 27.204 | 27.293 | >> | TestCmpMergeWithNull | 8.248 | 4.920 | >> | TestCondAfterMergeWithAllocate | 12.890 | 5.252 | >> | TestCondAfterMergeWithNull | 6.265 | 5.078 | >> | TestCondLoadAfterMerge | 12.713 | 5.163 | >> | TestConsecutiveSimpleMerge | 30.863 | 4.068 | >> | TestDoubleIfElseMerge | 16.069 | 2.444 | >> | TestEscapeInCallAfterMerge | 23.111 | 22.924 | >> | TestGlobalEscape | 14.459 | 14.425 | >> | TestIfElseInLoop | 246.061 | 42.786 | >> | TestLoadAfterLoopAlias | 45.808 | 45.812 | >> ... > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Ammend previous fix & add repro tests. All tests passed. I'll provide a review later this week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15825#issuecomment-1807590868 From aboldtch at openjdk.org Mon Nov 13 07:32:00 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 13 Nov 2023 07:32:00 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT In-Reply-To: References: Message-ID: <1DG4zwC5I96PdIuDQCQbeEsOL3NR5owY7ehs-3axPlE=.68e173e2-e92b-43a5-abf3-9ef30b48443d@github.com> On Mon, 13 Nov 2023 00:51:26 GMT, David Holmes wrote: >> LM_LIGHTWEIGHT only uses the lock bits for its locking. This leaves the hashCode bits free when a monitor is not inflated. So instead of inflating when installing the hashCode on a fast locked object it can simply use the hashCode bits in the markWord. >> >> The mark word transitions Unlocked (0b01) <=> Locked (0b00) are done by retrying the CAS if it fails due to non-lock bit changes. >> The mark word transitions Monitor (0b10) <=> Locked/Unlocked (0b0X) are the same as before, inflation already handles hash codes. This change does not interact with the mark word if it is in a Monitor (0b10) state, so the strong CAS which is used for deflation are still valid, and will not fail to any other reason than the cooperative race to help transition the mark word during deflation. >> >> This is dependent on JDK-8319778 simply because JDK-8319797 is dependent on both this and JDK-8319778. > > src/hotspot/share/runtime/synchronizer.cpp line 582: > >> 580: // It can only have installed an anonymously locked monitor at this point. >> 581: // Fetch that monitor, set owner correctly to this thread, and >> 582: // exit it (allowing waiting threads to enter). > > I don't understand why the anonymous owner case is no longer being checked. ?? The condition does now check for a successful CAS, not the unsuccessful one. If it was successful then there is no monitor, thus no anonymous owner. If the CAS failed and the mark word is no longer fast locked. It must be inflated. So we fallthrough down to the inflated case. `ObjectSynchronizer::inflate` correctly handles fixing the owner. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16603#discussion_r1390700789 From aboldtch at openjdk.org Mon Nov 13 07:39:10 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 13 Nov 2023 07:39:10 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v2] In-Reply-To: References: Message-ID: > LM_LIGHTWEIGHT only uses the lock bits for its locking. This leaves the hashCode bits free when a monitor is not inflated. So instead of inflating when installing the hashCode on a fast locked object it can simply use the hashCode bits in the markWord. > > The mark word transitions Unlocked (0b01) <=> Locked (0b00) are done by retrying the CAS if it fails due to non-lock bit changes. > The mark word transitions Monitor (0b10) <=> Locked/Unlocked (0b0X) are the same as before, inflation already handles hash codes. This change does not interact with the mark word if it is in a Monitor (0b10) state, so the strong CAS which is used for deflation are still valid, and will not fail to any other reason than the cooperative race to help transition the mark word during deflation. > > This is dependent on JDK-8319778 simply because JDK-8319797 is dependent on both this and JDK-8319778. Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Simplify test. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16603/files - new: https://git.openjdk.org/jdk/pull/16603/files/46d4c1b1..f10571e8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16603&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16603&range=00-01 Stats: 11 lines in 1 file changed: 0 ins; 9 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16603.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16603/head:pull/16603 PR: https://git.openjdk.org/jdk/pull/16603 From aboldtch at openjdk.org Mon Nov 13 07:39:12 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 13 Nov 2023 07:39:12 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v2] In-Reply-To: References: Message-ID: <9dlE1IWNLSUZf3J_2peL26Grsv4XFmCVEG6ryyvB_2M=.d9e3f81e-f457-4a6b-bb95-6290bdc81506@github.com> On Mon, 13 Nov 2023 00:54:49 GMT, David Holmes wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Simplify test. > > test/hotspot/jtreg/runtime/whitebox/TestWBDeflateIdleMonitors.java line 69: > >> 67: obj = new Object(); >> 68: synchronized (obj) { >> 69: if (LockingMode != LM_LIGHTWEIGHT) { > > I don't think we need to make this test behave differently because of implementation details of different locking modes. Just use `wait(1)` unconditionally. Thanks. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16603#discussion_r1390703852 From stefank at openjdk.org Mon Nov 13 07:53:00 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 13 Nov 2023 07:53:00 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 01:15:14 GMT, David Holmes wrote: >> Implements the runtime part of JDK-8319796. >> The different CPU implementations are/will be created as dependent pull requests. >> >> This enhancement proposes introducing the ability for LM_LIGHTWEIGHT to handle consecutive recursive monitor enter. Limiting the implementation to only consecutive monitor enters allows for more efficient emitted code which only needs to look at the two top most entires on the lock stack to determine what to do in a monitor exit. >> >> A high level overview: >> * Locking is still performed on the mark word >> * Unlocked (0b01) <=> Locked (0b00) >> * Monitor enter on Obj with mark word Unlocked (0b01) is the same >> * Transition Obj's mark word Unlocked (0b01) => Locked (0b00) >> * Push Obj onto the lock stack >> * Success >> * Monitor enter on Obj with mark word Locked (0b00) will check the top entry on the lock stack >> * If top entry is Obj >> * Push Obj on the lock stack >> * Success >> * If top entry is not Obj >> * Inflate and call ObjectMonitor::enter >> * Monitor exit on Obj with mark word Locked (0b00) will check the two top entries on the lock stack >> * If just the top entry is Obj >> * Transition Obj's mark word Locked (0b00) => Unlocked (0b01) >> * Pop the entry >> * Success >> * If both entries are Obj >> * Pop the top entry >> * Success >> * Any other case only occurs for unstructured locking, then just inflate and call ObjectMonitor::exit >> * If the monitor has been inflated for object Obj which is owned by the current thread >> * All corresponding entries for Obj is removed from the lock stack >> * The monitor recursions is set to the number of removed entries - 1 >> * The owner is changed from anonymous to the thread >> * The regular ObjectMonitor::action is called. > > src/hotspot/share/runtime/lockStack.inline.hpp line 29: > >> 27: #define SHARE_RUNTIME_LOCKSTACK_INLINE_HPP >> 28: >> 29: #include "runtime/lockStack.hpp" > > Why was this pulled out first? I moved this include up because this is the mechanism we use to resolve circular dependencies between .inline.hpp files. If you take a look at most other .inline.hpp files you will see the same. I've written a section about this in the style guide: https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md > All .inline.hpp files should include their corresponding .hpp file as the first include line. Declarations needed by other files should be put in the .hpp file, and not in the .inline.hpp file. This rule exists to resolve problems with circular dependencies between .inline.hpp files. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1390715758 From aboldtch at openjdk.org Mon Nov 13 08:26:38 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 13 Nov 2023 08:26:38 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v2] In-Reply-To: References: Message-ID: > Implements the runtime part of JDK-8319796. > The different CPU implementations are/will be created as dependent pull requests. > > This enhancement proposes introducing the ability for LM_LIGHTWEIGHT to handle consecutive recursive monitor enter. Limiting the implementation to only consecutive monitor enters allows for more efficient emitted code which only needs to look at the two top most entires on the lock stack to determine what to do in a monitor exit. > > A high level overview: > * Locking is still performed on the mark word > * Unlocked (0b01) <=> Locked (0b00) > * Monitor enter on Obj with mark word Unlocked (0b01) is the same > * Transition Obj's mark word Unlocked (0b01) => Locked (0b00) > * Push Obj onto the lock stack > * Success > * Monitor enter on Obj with mark word Locked (0b00) will check the top entry on the lock stack > * If top entry is Obj > * Push Obj on the lock stack > * Success > * If top entry is not Obj > * Inflate and call ObjectMonitor::enter > * Monitor exit on Obj with mark word Locked (0b00) will check the two top entries on the lock stack > * If just the top entry is Obj > * Transition Obj's mark word Locked (0b00) => Unlocked (0b01) > * Pop the entry > * Success > * If both entries are Obj > * Pop the top entry > * Success > * Any other case only occurs for unstructured locking, then just inflate and call ObjectMonitor::exit > * If the monitor has been inflated for object Obj which is owned by the current thread > * All corresponding entries for Obj is removed from the lock stack > * The monitor recursions is set to the number of removed entries - 1 > * The owner is changed from anonymous to the thread > * The regular ObjectMonitor::action is called. Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Fix comment typos ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16606/files - new: https://git.openjdk.org/jdk/pull/16606/files/cd313451..6dd462f4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16606&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16606&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16606.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16606/head:pull/16606 PR: https://git.openjdk.org/jdk/pull/16606 From aboldtch at openjdk.org Mon Nov 13 08:26:39 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 13 Nov 2023 08:26:39 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v2] In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 13:42:09 GMT, Roman Kennke wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix comment typos > > src/hotspot/share/runtime/synchronizer.cpp line 530: > >> 528: LockStack& lock_stack = current->lock_stack(); >> 529: if (lock_stack.is_full()) { >> 530: // The emitted code always goes into the runtime incase the lock stack > > Typo: incase -> in case Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1390747379 From aboldtch at openjdk.org Mon Nov 13 08:26:39 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 13 Nov 2023 08:26:39 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v2] In-Reply-To: References: Message-ID: <5kIfq_IVLJdWxycN7Tdvh8nAvgEKqMec0pcsVHXVJPE=.1fe0c9ac-7721-4b26-b72f-fe913c9fc4ad@github.com> On Mon, 13 Nov 2023 01:07:22 GMT, David Holmes wrote: >> src/hotspot/share/runtime/lockStack.cpp line 47: >> >>> 45: LockStack::LockStack(JavaThread* jt) : >>> 46: _top(lock_stack_base_offset), _base() { >>> 47: // Make sure the layout of the object is compatable with the emitted codes assumptions. >> >> Typo: compatible -> compatible, codes -> code's (?) > > Typo: codes -> code's Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1390747249 From aturbanov at openjdk.org Mon Nov 13 08:27:08 2023 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Mon, 13 Nov 2023 08:27:08 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v9] In-Reply-To: <0xPQTdT1Zrk_FuFdU5b7DWr3MDPfkrQJ81pmuRqkNpM=.154c0b0c-996a-4114-bd40-e8ff5926aa20@github.com> References: <0xPQTdT1Zrk_FuFdU5b7DWr3MDPfkrQJ81pmuRqkNpM=.154c0b0c-996a-4114-bd40-e8ff5926aa20@github.com> Message-ID: On Fri, 10 Nov 2023 13:29:18 GMT, Stefan Karlsson wrote: >> A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Do stuff in the synchronized block of the test test/hotspot/jtreg/runtime/Monitor/ConcurrentDeflation.java line 48: > 46: Thread threadDumper = new Thread(() -> dumpThreads()); > 47: threadDumper.start(); > 48: Thread monitorCreator = new Thread(() -> createMonitors()); Suggestion: Thread monitorCreator = new Thread(() -> createMonitors()); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1390748083 From aturbanov at openjdk.org Mon Nov 13 08:46:04 2023 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Mon, 13 Nov 2023 08:46:04 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v2] In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: On Thu, 9 Nov 2023 04:16:25 GMT, Roger Riggs wrote: >> Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. >> The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. >> >> The changes include: >> >> - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate >> if the input array is modified before the constructor returns. >> The resulting string may contain any combination of characters sampled from the input array. >> >> - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. >> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. >> If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. >> If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. >> Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. >> Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. >> >> - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. >> >> - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. > > Roger Riggs has updated the pull request incrementally with three additional commits since the last revision: > > - Refactored extractCodePoints to avoid multiple resizes if the array was modified > - Replaced isLatin1 implementation with `getChar(buf, ndx) <= 0xff` > It performs better than the single byte array access by avoiding the bounds check. > - Misc updates for review comments, javadoc cleanup > Extra checking on maximum string lengths when calling toBytes(). test/jdk/java/lang/String/StringRacyConstructor.java line 283: > 281: if ((s.charAt(0) < 256 && !original.equals(s)) || i > 1_000_000) { > 282: thread.interrupt(); > 283: try { Suggestion: try { test/jdk/java/lang/String/StringRacyConstructor.java line 329: > 327: if ((s.charAt(0) < 256 && !original.equals(s)) || i > 1_000_000) { > 328: thread.interrupt(); > 329: try { Suggestion: try { test/jdk/java/lang/String/StringRacyConstructor.java line 376: > 374: if ((s.length() != original.length()) || i > 1_000_000) { > 375: thread.interrupt(); > 376: try { Suggestion: try { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1390767265 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1390767406 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1390767564 From lkorinth at openjdk.org Mon Nov 13 08:47:00 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Mon, 13 Nov 2023 08:47:00 GMT Subject: RFR: 8319117: GrowableArray: Allow for custom initializer instead of copy constructor [v15] In-Reply-To: References: <7640OMFYd1jbL0RFjUqQvWPekCmULEv5fQS4zHS099k=.be32fc60-8b78-4e0a-bfc3-2de75b6769f1@github.com> Message-ID: On Tue, 7 Nov 2023 15:24:49 GMT, Johan Sj?len wrote: >> Hi, >> >> When using at_put and at_put_grow you can provide a value which will be supplied to the constructor of each element. In other words, you can intialize each element through a copy constructor. >> >> I suggest that we also provide a function equivalent where the function is provided a pointer to the memory to be initialized. This can be used for `NONCOPYABLE` classes, for example. >> >> This is implemented using a SFINAE pattern because `nullptr` introduces ambiguity if you use static overload. >> >> Currently running tier1-tier4. > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Fix bug and style issues src/hotspot/share/utilities/growableArray.hpp line 434: > 432: assert(0 <= i, "negative index %d", i); > 433: if (i < this->_len) { > 434: this->_data[i] = E(elem); This is not good if we want to remove dependency on having an assignment operator. Maybe change the assignment and the return statement to a destructor call, and let the code path below initialize the value in the destructed memory? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16409#discussion_r1390768906 From aboldtch at openjdk.org Mon Nov 13 08:55:01 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 13 Nov 2023 08:55:01 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v2] In-Reply-To: <-lwt39Gx_QJfxgzgSLHkysdtOrVxgP8dFh7gN4TDkmY=.86139caf-08c2-484f-999f-fa6cf121f9df@github.com> References: <-lwt39Gx_QJfxgzgSLHkysdtOrVxgP8dFh7gN4TDkmY=.86139caf-08c2-484f-999f-fa6cf121f9df@github.com> Message-ID: On Mon, 13 Nov 2023 01:29:40 GMT, David Holmes wrote: >> src/hotspot/share/runtime/synchronizer.cpp line 530: >> >>> 528: LockStack& lock_stack = current->lock_stack(); >>> 529: if (lock_stack.is_full()) { >>> 530: // The emitted code always goes into the runtime incase the lock stack >> >> What is the rationale behind this block? Is it beneficial to inflate the top-most lock to make room for the new one, because that might be hotter? If so, then it may be even more useful to inflate the bottom-most entry instead? > > I'm also unclear on the rationale, and again on checking for a full-stack upfront like this, when it should be a rare case. If recursion support means the lockStack is no longer big enough then we need to increase its size to accommodate that. > What is the rationale behind this block? Is it beneficial to inflate the top-most lock to make room for the new one, because that might be hotter? If so, then it may be even more useful to inflate the bottom-most entry instead? The current implementation inflates the bottom (least recently added) entry. The rational is that because the emitted code always goes into the runtime for monitorenter if the lock stack is full, we need to inflate at least one object on the lock stack to not get into a scenario where we are constantly going into the runtime because we are in some deeply nested critical sections entering and exiting in a loop with the lock stack full. I've also have versions of this which goes through the lock stack, and first inflates the already inflated objects, and only inflate a not inflated object if the lock stack is still full. As for inflating the bottom instead of the top. I am unsure what would be best. The idea behind the bottom is that it is furthest away from the current running code, and in case the top is in a loop with different objects every time it would cause a lot of inflation. But it could obviously also be that the stack is in a loop and the bottom most object is different every time while the top is the same. I can't say that I have seen programs with this either of this behaviour. Both can have equally bad worst case programs (with respect to number of inflations) but my gut feeling is that the worst case is less likely when inflating the bottom. > If recursion support means the lockStack is no longer big enough then we need to increase its size to accommodate that. I have not seen it being a problem, but it would be worth looking for programs where this could be an issue and evaluate increasing the lock stack size. Regardless of the capacity, if (and when) the lock stack gets full it needs to be handled in some way. > I'm also unclear on the rationale, and again on checking for a full-stack upfront like this, when it should be a rare case. The check for a full lock stack is always performed in every codepath, emitted C2, emitted shared and the runtime. This only adds an escape hatch for the degenerate behaviour we could arrive in. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1390777648 From stefank at openjdk.org Mon Nov 13 09:00:31 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 13 Nov 2023 09:00:31 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v10] In-Reply-To: References: Message-ID: > A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Tweak test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16519/files - new: https://git.openjdk.org/jdk/pull/16519/files/d87fc6eb..e90e81ff Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16519&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16519&range=08-09 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16519.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16519/head:pull/16519 PR: https://git.openjdk.org/jdk/pull/16519 From stefank at openjdk.org Mon Nov 13 09:00:31 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 13 Nov 2023 09:00:31 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v9] In-Reply-To: References: <0xPQTdT1Zrk_FuFdU5b7DWr3MDPfkrQJ81pmuRqkNpM=.154c0b0c-996a-4114-bd40-e8ff5926aa20@github.com> Message-ID: On Mon, 13 Nov 2023 00:39:28 GMT, David Holmes wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Do stuff in the synchronized block of the test > > test/hotspot/jtreg/runtime/Monitor/ConcurrentDeflation.java line 71: > >> 69: >> 70: static private void createMonitors() { >> 71: int monitorCount = 0; > > I think this needs to be static to prevent it being hoisted out of the sync block. I try not to assume how clever the JIT might be in this area. I did make it static, but didn't remove the local monitorCount variable. Fixed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1390779521 From stefank at openjdk.org Mon Nov 13 09:00:31 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 13 Nov 2023 09:00:31 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v9] In-Reply-To: References: <0xPQTdT1Zrk_FuFdU5b7DWr3MDPfkrQJ81pmuRqkNpM=.154c0b0c-996a-4114-bd40-e8ff5926aa20@github.com> Message-ID: On Mon, 13 Nov 2023 08:23:55 GMT, Andrey Turbanov wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Do stuff in the synchronized block of the test > > test/hotspot/jtreg/runtime/Monitor/ConcurrentDeflation.java line 48: > >> 46: Thread threadDumper = new Thread(() -> dumpThreads()); >> 47: threadDumper.start(); >> 48: Thread monitorCreator = new Thread(() -> createMonitors()); > > Suggestion: > > Thread monitorCreator = new Thread(() -> createMonitors()); Thanks! Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1390780813 From shade at openjdk.org Mon Nov 13 09:04:19 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 13 Nov 2023 09:04:19 GMT Subject: RFR: 8319777: Zero: Support 8-byte cmpxchg Message-ID: See related discussion in [JDK-8318776](https://bugs.openjdk.org/browse/JDK-8318776) that targets to require `supports_cx8()` unconditionally. I think we can claim Zero is `supports_cx8() == true`, because we have enough fallbacks for 8-byte CASes to work. Note that some code already reaches for these without checking for `supports_cx8()`, so the proverbial horses have already left the barn. I ran tests with [JDK-8319883](https://bugs.openjdk.org/browse/JDK-8319883) applied to fix known problems with x86_32 Zero. Additional testing: - [x] Linux x86_32 Zero release; jcstress - [x] Linux x86_32 Zero fastdebug, `compiler/unsafe java/lang/invoke/VarHandles` - [x] Linux x86_32 Zero fastdebug, bootcycle-images ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/16614/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16614&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319777 Stats: 7 lines in 2 files changed: 5 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16614.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16614/head:pull/16614 PR: https://git.openjdk.org/jdk/pull/16614 From aboldtch at openjdk.org Mon Nov 13 09:11:05 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 13 Nov 2023 09:11:05 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v2] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 01:08:20 GMT, David Holmes wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix comment typos > > src/hotspot/share/runtime/lockStack.cpp line 50: > >> 48: STATIC_ASSERT(sizeof(_bad_oop_sentinel) == oopSize); >> 49: STATIC_ASSERT(sizeof(_base[0]) == oopSize); >> 50: STATIC_ASSERT(std::is_standard_layout::value); > > What is this? Are we allowed to use it? There is probably more nuance here w.r.t. `offsetof` than I know. My belief was that reason we did not use `offsetof` is because we use it on non standard layout types, for which is invalid. But the lock stack is a standard layout. However, reading some of issues surrounding `offsetof` (mainly poor compiler support and becoming conditionally supported in C++17) there might be more reasons to avoid it. If that is the case this property would have to be asserted at runtime instead. Maybe @kimbarrett has some more insight. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1390797991 From fyang at openjdk.org Mon Nov 13 09:11:03 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 13 Nov 2023 09:11:03 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 In-Reply-To: <8rC40UxJC4IF9vdv6xIyaJl6l-fhAlRC0VezoUAuKYE=.bdc94ce5-d9d5-429c-bb38-701ffcbe0bcf@github.com> References: <8rC40UxJC4IF9vdv6xIyaJl6l-fhAlRC0VezoUAuKYE=.bdc94ce5-d9d5-429c-bb38-701ffcbe0bcf@github.com> Message-ID: On Thu, 9 Nov 2023 08:03:17 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/vm_version_riscv.cpp line 169: >> >>> 167: } >>> 168: >>> 169: if (UseZvknhb && UseZvkb) { >> >> this looks weird, two jdk options needed to enable sha intrinsincs. >> Can we simplify it somehow for now , like UseRVVCryptoExt ? >> Splitting this into UseZvknhb && UseZvkb can be done in future, if it really would be needed one day > > Yes, this is a total mess. > For bystanders this is a 'simple' march to clang: > `rv64im0_a2p1_f2p2_d2p2_c2p0_v1p0_zicsr2p0_zifencei2p0_zmmul1p0_zacas1p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0` > A simple RVA23U64 CPU may have like 40 extensions, a high performance server class CPU may have well over a hundred. > > Just the scalar crypto ones: > `Zbkb, Zbkc, Zbkx, Zknd, Zkne, Zknh, Zksed, Zksh, Zkr, Zkt, Zkn, Zks, Zk` > > It is no reasonable to add all these as flags. > So flags for the collections seems like much better idea. > But we probably need to be able to turn off a sub-extension such UseZvknhb. > "-XX:+UseVectorCryptoExt:zvknhb=false" > Suggestions welcome. > > Just top of my head, at the moment I need to supply this crazy arch string to compiler, obj dump, qemu(bit different but still crazy :) ) and there doesn't seem to be a solution near, so maybe we should be able to supply that arch string to the VM also. > `-XX:UseArch=rv64im0_a2p1_f2p2_d2p2_c2p0_v1p0_zicsr2p0_zifencei2p0_zmmul1p0_zacas1p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0` I think that's what RISC-V profiles are for [1] which make some basic extensions mandatory for each profile. And we already have JVM options like `UseRVA20U64` and `UseRVA22U64` for riscv. But there are still some optional extensions for each profile, say RVV for RVA22U64. So instead of feeding a rather long march to the JVM, I feel it's more reasonable to have some JVM options at the extension level (instead of sub-extension level) as suggested by @robehn. Personally, I would suggest something slightly different. Say: "-XX:VectorCryptoExt=zvknhb", "-XX:VectorCryptoExt=zvknhb_zvkb", or "-XX:VectorCryptoExt=all" This way we will still be able to distinguish specific sub-extensions while keeping one JVM option for each extension/collection. [1] https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1390793824 From aboldtch at openjdk.org Mon Nov 13 09:14:28 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 13 Nov 2023 09:14:28 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v3] In-Reply-To: References: Message-ID: > Implements the runtime part of JDK-8319796. > The different CPU implementations are/will be created as dependent pull requests. > > This enhancement proposes introducing the ability for LM_LIGHTWEIGHT to handle consecutive recursive monitor enter. Limiting the implementation to only consecutive monitor enters allows for more efficient emitted code which only needs to look at the two top most entires on the lock stack to determine what to do in a monitor exit. > > A high level overview: > * Locking is still performed on the mark word > * Unlocked (0b01) <=> Locked (0b00) > * Monitor enter on Obj with mark word Unlocked (0b01) is the same > * Transition Obj's mark word Unlocked (0b01) => Locked (0b00) > * Push Obj onto the lock stack > * Success > * Monitor enter on Obj with mark word Locked (0b00) will check the top entry on the lock stack > * If top entry is Obj > * Push Obj on the lock stack > * Success > * If top entry is not Obj > * Inflate and call ObjectMonitor::enter > * Monitor exit on Obj with mark word Locked (0b00) will check the two top entries on the lock stack > * If just the top entry is Obj > * Transition Obj's mark word Locked (0b00) => Unlocked (0b01) > * Pop the entry > * Success > * If both entries are Obj > * Pop the top entry > * Success > * Any other case only occurs for unstructured locking, then just inflate and call ObjectMonitor::exit > * If the monitor has been inflated for object Obj which is owned by the current thread > * All corresponding entries for Obj is removed from the lock stack > * The monitor recursions is set to the number of removed entries - 1 > * The owner is changed from anonymous to the thread > * The regular ObjectMonitor::action is called. Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Fix nit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16606/files - new: https://git.openjdk.org/jdk/pull/16606/files/6dd462f4..52b38136 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16606&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16606&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16606.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16606/head:pull/16606 PR: https://git.openjdk.org/jdk/pull/16606 From aboldtch at openjdk.org Mon Nov 13 09:23:02 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 13 Nov 2023 09:23:02 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v3] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 01:13:10 GMT, David Holmes wrote: >> src/hotspot/share/runtime/lockStack.cpp line 82: >> >>> 80: if (VM_Version::supports_recursive_lightweight_locking()) { >>> 81: oop o = _base[i]; >>> 82: for (;i < top - 1; i++) { >> >> Nit: space after first `;` > > Though rather than walk the lockstack twice can't we just change the check below to something like: > > if (VM_Version::supports_recursive_lightweight_locking() && i != j - 1) { > assert(_base[i] != _base[j], "entries must be unique: %s", msg); > } Not sure I understand. The change here is that [for every object A, there exists no other object B such that A == B] is changed to [for every contiguous run of object A, there exists no other object B outside that run such that A == B]. Both of these checks are `O(n^2)` in the worst case, and the second is is `O(n)` in the best case. I think you have describe the algorithm you envision. Or maybe you want to change what property we are asserting. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1390816671 From vkempik at openjdk.org Mon Nov 13 09:25:59 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Mon, 13 Nov 2023 09:25:59 GMT Subject: RFR: 8318158: RISC-V: implement roundD/roundF intrinsics [v2] In-Reply-To: References: Message-ID: <5i-hk8Vm8mfogwTT8eQv9PV41MGRZ0P8JkoogXyzovY=.b305d21d-934c-4ce7-9206-6bd32e926b42@github.com> On Sun, 12 Nov 2023 09:34:03 GMT, Andrew Haley wrote: >> But doing fadd 0.5 to the number, which can't have fractional part, in rdn mode becomes no-op. >> At least on single precision floats it works: >> fadd(-8388609.0, +0.5, rdn) results in -8388609.0 >> and the mode for both fadd and fcvt will be the same, (perf tests showed no difference on thead tho) > > Maybe. I didn't try it, but on a great big out-of-order machine changing floating-point modes can be fantastically expensive, forcing ops in progress to retire, changing mode, and then continuing. Effectively it's as bad as a mispredict. Given that a correct solution that doesn't involve changing modes is available, I don't see why you wouldn't use it. I have consulted with our h/w team and they told me next: multi-issue FP Unit can process few (data-independent) fp instructions at a time, even if they have different rounding mode. The only issue is when the rounding mode is set to dynamic rounding (aka get it from csr), but it's not our case here ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16382#discussion_r1390820971 From aboldtch at openjdk.org Mon Nov 13 09:28:58 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 13 Nov 2023 09:28:58 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v3] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 01:26:00 GMT, David Holmes wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix nit > > src/hotspot/share/runtime/synchronizer.cpp line 395: > >> 393: // Always go into runtime if the lock stack is full. >> 394: return false; >> 395: } > > It isn't obvious that it is beneficial to check what should be a rare occurrence. Why do this? All lightweight enters must check if the lock stack is full. Both push and try_recursive_enter have that as a pre condition. All code paths emitted C2, emitted shared code and the runtime does the is full check first. The reason that quick_enter does this without checking the mark word (for monitor) is that we go into the runtime from the emitted code if the lock stack is full. So we want to enter the runtime to inflate and make room, to not get into scenario where we have to go into the runtime on every monitor enter because we are locking on new objects in a loop with a full lock stack. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1390826245 From shade at openjdk.org Mon Nov 13 09:30:19 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 13 Nov 2023 09:30:19 GMT Subject: RFR: 8319406: x86: Shorter movptr(reg, imm) for 32-bit immediates [v4] In-Reply-To: References: Message-ID: > Noticed this while doing C1 work, but the issue is more generic. If you look into x86 code, then sometimes you'll notice `movabs` with small immediates on x86. That `movabs` actually carries the full-blown 64-bit immediate. > > Similar to [JDK-8255838](https://bugs.openjdk.org/browse/JDK-8255838), it would be useful to shorten movptr(reg, imm) when immediate fits in 32 bits. This would compact some code, notably the code in C1 profiling ([JDK-8315843](https://bugs.openjdk.org/browse/JDK-8315843)), but also other code, generically. > > For example, sample branch profiling hunk from C1 tier3 on x86_64: > > > Before: > 0x00007f269065ed02: test %edx,%edx > 0x00007f269065ed04: movabs $0x7f260a4ddd68,%rax ; {metadata(method data for {method} ? > 0x00007f269065ed0e: movabs $0x138,%rsi > ? 0x00007f269065ed18: je 0x00007f269065ed24 > ? 0x00007f269065ed1a: movabs $0x148,%rsi > ? 0x00007f269065ed24: mov (%rax,%rsi,1),%rdi > 0x00007f269065ed28: lea 0x1(%rdi),%rdi > 0x00007f269065ed2c: mov %rdi,(%rax,%rsi,1) > 0x00007f269065ed30: je 0x00007f269065ed4e > > After: > 0x00007f1370dcd782: test %edx,%edx > 0x00007f1370dcd784: movabs $0x7f12f64ddd68,%rax ; {metadata(method data for {method} ? > 0x00007f1370dcd78e: mov $0x138,%esi > ? 0x00007f1370dcd793: je 0x00007f1370dcd79a > ? 0x00007f1370dcd795: mov $0x148,%esi > ? 0x00007f1370dcd79a: mov (%rax,%rsi,1),%rdi > 0x00007f1370dcd79e: lea 0x1(%rdi),%rdi > 0x00007f1370dcd7a2: mov %rdi,(%rax,%rsi,1) > 0x00007f1370dcd7a6: je 0x00007f1370dcd7c4 > > > We can use a shorter 32-bit immediate moves. In the hunk above, this saves about 8 bytes. > > This is not limited to the profiling code. There is observable code space savings on larger tests in C2, e.g. on `-Xcomp -XX:TieredStopAtLevel=... HelloWorld`. > > > # Before > nmethod code size : 430328 bytes > nmethod code size : 467032 bytes > nmethod code size : 908936 bytes > nmethod code size : 1267816 bytes > > # After > nmethod code size : 429616 bytes (-0.1%) > nmethod code size : 466344 bytes (-0.1%) > nmethod code size : 897144 bytes (-1.3%) > nmethod code size : 1256216 bytes (-0.9%) > > > There are two wrinkles: > 1. Current `movslq(Register, int32_t)` is broken and protected by `ShouldNotReachHere()`. I would have used it in this patch, but x86_64 does not actually define `movslq reg64, imm32`, so we use a regular `mov reg... Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Remove the movslq declaration as well - Merge branch 'master' into JDK-8319406-shorter-movptr-32 - Enlighs - Remove new imm64 method completely, inline at use - Easy review feedback - Merge branch 'master' into JDK-8319406-shorter-movptr-32 - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16497/files - new: https://git.openjdk.org/jdk/pull/16497/files/6dcaf425..968344ad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16497&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16497&range=02-03 Stats: 12297 lines in 151 files changed: 3136 ins; 8003 del; 1158 mod Patch: https://git.openjdk.org/jdk/pull/16497.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16497/head:pull/16497 PR: https://git.openjdk.org/jdk/pull/16497 From shade at openjdk.org Mon Nov 13 09:30:21 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 13 Nov 2023 09:30:21 GMT Subject: RFR: 8319406: x86: Shorter movptr(reg, imm) for 32-bit immediates [v4] In-Reply-To: References: Message-ID: On Sat, 11 Nov 2023 00:58:03 GMT, Quan Anh Mai wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Remove the movslq declaration as well >> - Merge branch 'master' into JDK-8319406-shorter-movptr-32 >> - Enlighs >> - Remove new imm64 method completely, inline at use >> - Easy review feedback >> - Merge branch 'master' into JDK-8319406-shorter-movptr-32 >> - Fix > > src/hotspot/cpu/x86/assembler_x86.cpp line 13453: > >> 13451: } >> 13452: >> 13453: void Assembler::movslq(Register dst, int32_t imm32) { > > You can remove the corresponding declaration in the header file. Right, removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16497#discussion_r1390824263 From aph at openjdk.org Mon Nov 13 09:35:56 2023 From: aph at openjdk.org (Andrew Haley) Date: Mon, 13 Nov 2023 09:35:56 GMT Subject: RFR: 8318158: RISC-V: implement roundD/roundF intrinsics [v2] In-Reply-To: <5i-hk8Vm8mfogwTT8eQv9PV41MGRZ0P8JkoogXyzovY=.b305d21d-934c-4ce7-9206-6bd32e926b42@github.com> References: <5i-hk8Vm8mfogwTT8eQv9PV41MGRZ0P8JkoogXyzovY=.b305d21d-934c-4ce7-9206-6bd32e926b42@github.com> Message-ID: <1o0ZvsGehXm52tpvB1okWb0OKM1R7B-dJZpLXRC-oA0=.a4f3ad9e-26c8-4cfe-b167-4954230045dc@github.com> On Mon, 13 Nov 2023 09:23:20 GMT, Vladimir Kempik wrote: > I have consulted with our h/w team and they told me next: multi-issue FP Unit can process few (data-independent) fp instructions at a time, even if they have different rounding mode. The only issue is when the rounding mode is set to dynamic rounding (aka get it from csr), but it's not our case here OK. Do what you will, but you're coding for an architecture not an implementation, and this code may stand for many years. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16382#discussion_r1390834630 From rkennke at openjdk.org Mon Nov 13 09:54:59 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 13 Nov 2023 09:54:59 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v3] In-Reply-To: References: Message-ID: <17VTTf6CmPVd-QeuKsCukTHBOqdkkr2erX83G_azDmg=.76519524-1ec9-45f4-8abb-f6848d09aa6a@github.com> On Mon, 13 Nov 2023 09:26:31 GMT, Axel Boldt-Christmas wrote: >> src/hotspot/share/runtime/synchronizer.cpp line 395: >> >>> 393: // Always go into runtime if the lock stack is full. >>> 394: return false; >>> 395: } >> >> It isn't obvious that it is beneficial to check what should be a rare occurrence. Why do this? > > All lightweight enters must check if the lock stack is full. Both push and try_recursive_enter have that as a pre condition. All code paths emitted C2, emitted shared code and the runtime does the is full check first. > > The reason that quick_enter does this without checking the mark word (for monitor) is that we go into the runtime from the emitted code if the lock stack is full. So we want to enter the runtime to inflate and make room, to not get into scenario where we have to go into the runtime on every monitor enter because we are locking on new objects in a loop with a full lock stack. FWIW, when I did the original LW-locking implementation, and when the lock-stack was not yet fixed-size, I experimented with an optimisation for this problem: instead of doing the check for every monitorenter, I let C2 analyse the maximum lock-stack depth that the method is going to require, and do the check (and growing of the lock-stack) at method-entry. However, I haven't seen a case where this was beneficial, and there have been several problems with the approach as well (maybe that's why it wasn't beneficial). But maybe it is worth revisiting at some point? OTOH, with recursive locking we need to load and check the top-offset anyway, which makes the extra cost to check for overflow even smaller. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1390858880 From aboldtch at openjdk.org Mon Nov 13 10:02:01 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 13 Nov 2023 10:02:01 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v3] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 01:31:14 GMT, David Holmes wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix nit > > src/hotspot/share/runtime/synchronizer.cpp line 609: > >> 607: return; >> 608: } else if (mark.is_fast_locked() && lock_stack.is_recursive(object)) { >> 609: // This lock is recursive but unstructured exit. Just inflate the lock. > > Again this seems in the wrong place - this should be a very rare case so we should not be checking it explicitly before the expected cases! In exit we must always check for recursions first. Unsure what you are proposing here. Maybe you want to call remove first, and have a branch on if the number removed is greater than 1. And in that case inflate an update the recessions field before falling through. Something like this: ```c++ // Fast-locking does not use the 'lock' argument. LockStack& lock_stack = current->lock_stack(); if (mark.is_fast_locked()) { if (lock_stack.try_recursive_exit(object)) { // Recursively unlocked. return; } size_t recursions = lock_stack.remove(object) - 1; if (recursions == 0) { while (mark.is_fast_locked()) { const markWord new_mark = mark.set_unlocked(); const markWord old_mark = mark; mark = object->cas_set_mark(new_mark, old_mark); if (old_mark == mark) { return; } } } lock_stack.push(object); ObjectMonitor* mon = inflate(current, object, inflate_cause_vm_internal); if (mon->is_owner_anonymous()) { mon->set_owner_from_anonymous(current); } mon->set_recursions(recursions); } This make the code a little more like the emitted code. Except it is conditioned on the mark word lock bits. Hard to believe this will have a measurable difference. But at least to me it is more noisy. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1390877436 From aboldtch at openjdk.org Mon Nov 13 10:33:58 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 13 Nov 2023 10:33:58 GMT Subject: RFR: 8319799: Recursive lightweight locking: x86 implementation In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 14:51:46 GMT, Roman Kennke wrote: >> Implements the x86 port of JDK-8319796. >> >> There are two major parts for the port implementation. The C2 part, and the part shared by the interpreter, C1 and the native call wrapper. >> >> The biggest change for both parts is that we check the lock stack first and if it is a recursive lightweight [un]lock and in that case simply pop/push and finish successfully. >> >> Only if the recursive lightweight [un]lock fails does it look at the mark word. >> >> For the shared part if it is an unstructured exit, the monitor is inflated or the mark word transition fails it calls into the runtime. >> >> The C2 operates under a few more assumptions, that the locking is structured and balanced. This means that some checks can be elided. >> >> First this means that in C2 unlock if the obj is not on the top of the lock stack, it must be inflated. And reversely if we reach the inflated C2 unlock the obj is not on the lock stack. This second property makes it possible to avoid reading the owner (and checking if it is anonymous). Instead it can either just do an un-contended unlock by writing null to the owner, or if contention happens, simply write the thread to the owner and jump to the runtime. >> >> The x86 C2 port also has some extra oddities. >> >> The mark word read is done early as it showed better scaling in hyper-threaded scenarios on certain intel hardware, and no noticeable downside on other tested x86 hardware. >> >> The fast path is written to avoid going through conditional branches. This in combination with keeping the ZF output correct, the code does some actions eagerly, decrementing the held monitor count, popping from the lock stack. And jumps to a code stub if a slow path is required which restores the thread local state to a correct state before jumping to the runtime. >> >> The contended unlock was also moved to the code stub. > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 979: > >> 977: jccb(Assembler::equal, push); >> 978: >> 979: // Check for monitor (0b10). > > It baffles me a little bit that we check for the monitor only after we checked for full-lock-stack and recursive locking. This means that if the object is monitor-locked, it has to wait for 3 loads (mark-word, top-of-stack-offset and top-of-stack) and two (pointless) test-and-branches. This seems to optimise the lw-locking case at the expense of monitor-locking case. I'm not sure that this is the right trade-off. You said in the description that this scales better? Can you elaborate on that? I believe you are correct. In fast_lock it might be better to check for monitor first. Have to run some more benchmarks. But running some (single threaded un-contended) micros show that moving monitor check in lock has modest improvements on inflated locking and very minor, if any regressions on lightweight locking. The scaling was mostly seen when moving the mark word load in the unlock case. In fast unlock the lock stack must be checked first for correctness if we wish to elide the owner field anonymous owner check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16607#discussion_r1390927772 From luhenry at openjdk.org Mon Nov 13 10:35:57 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 13 Nov 2023 10:35:57 GMT Subject: RFR: 8319781: RISC-V: Refactor UseRVV related checks In-Reply-To: References: Message-ID: <2IlOVG_5AT4GvCuDhZn3xK6iZqxn1d9UXDkrjvW1LT4=.6ca322e0-813f-4e26-a607-cf4e2a7b48cc@github.com> On Thu, 9 Nov 2023 10:30:41 GMT, Hamlin Li wrote: > Hi, > Can you review the patch to refactor the code related UseRVV checks? > Thanks! > > There are some code (flag setting/checking) depending on UseRVV's value, these code should be refactored, especially after the change of https://bugs.openjdk.org/browse/JDK-8319408: > 1. some code needs to get the final UseRVV's value rather than the initial value, e.g. ChaCha20 intrinsic. > 2. refactored to be more readable. > 3. also add note to make sure the future code get the final UseRVV value instead of inital value. src/hotspot/cpu/riscv/vm_version_riscv.cpp line 269: > 267: #endif // COMPILER2 > 268: > 269: // NOTE: Make sure codes dependent on UseRVV are put at the behind of c2_initialize(), Suggestion: // NOTE: Make sure codes dependent on UseRVV are put after c2_initialize(), ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16580#discussion_r1390927971 From aboldtch at openjdk.org Mon Nov 13 10:45:10 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 13 Nov 2023 10:45:10 GMT Subject: RFR: 8319799: Recursive lightweight locking: x86 implementation [v2] In-Reply-To: References: Message-ID: > Implements the x86 port of JDK-8319796. > > There are two major parts for the port implementation. The C2 part, and the part shared by the interpreter, C1 and the native call wrapper. > > The biggest change for both parts is that we check the lock stack first and if it is a recursive lightweight [un]lock and in that case simply pop/push and finish successfully. > > Only if the recursive lightweight [un]lock fails does it look at the mark word. > > For the shared part if it is an unstructured exit, the monitor is inflated or the mark word transition fails it calls into the runtime. > > The C2 operates under a few more assumptions, that the locking is structured and balanced. This means that some checks can be elided. > > First this means that in C2 unlock if the obj is not on the top of the lock stack, it must be inflated. And reversely if we reach the inflated C2 unlock the obj is not on the lock stack. This second property makes it possible to avoid reading the owner (and checking if it is anonymous). Instead it can either just do an un-contended unlock by writing null to the owner, or if contention happens, simply write the thread to the owner and jump to the runtime. > > The x86 C2 port also has some extra oddities. > > The mark word read is done early as it showed better scaling in hyper-threaded scenarios on certain intel hardware, and no noticeable downside on other tested x86 hardware. > > The fast path is written to avoid going through conditional branches. This in combination with keeping the ZF output correct, the code does some actions eagerly, decrementing the held monitor count, popping from the lock stack. And jumps to a code stub if a slow path is required which restores the thread local state to a correct state before jumping to the runtime. > > The contended unlock was also moved to the code stub. Axel Boldt-Christmas has updated the pull request incrementally with three additional commits since the last revision: - Fix type - Move inflated check in fast_locked - Move top load ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16607/files - new: https://git.openjdk.org/jdk/pull/16607/files/e0f1a5e6..39b421c1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16607&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16607&range=00-01 Stats: 20 lines in 3 files changed: 10 ins; 6 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16607.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16607/head:pull/16607 PR: https://git.openjdk.org/jdk/pull/16607 From aboldtch at openjdk.org Mon Nov 13 10:45:13 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 13 Nov 2023 10:45:13 GMT Subject: RFR: 8319799: Recursive lightweight locking: x86 implementation [v2] In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 14:55:00 GMT, Roman Kennke wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with three additional commits since the last revision: >> >> - Fix type >> - Move inflated check in fast_locked >> - Move top load > > src/hotspot/cpu/x86/c2_CodeStubs_x86.cpp line 123: > >> 121: __ movptr(Address(monitor, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner)), _thread); >> 122: >> 123: // succsesor null check. > > typo: succsesor -> successor Done. > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 971: > >> 969: >> 970: // Check if lock-stack is full. >> 971: cmpl(Address(thread, JavaThread::lock_stack_top_offset()), LockStack::end_offset() - 1); > > I believe you can mov the movl(top, Address(thread, JavaThread::lock_stack_top_offset())) here, and use top in both checks. Done. > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 9817: > >> 9815: >> 9816: // Check if the lock-stack is full. >> 9817: cmpl(Address(thread, JavaThread::lock_stack_top_offset()), LockStack::end_offset()); > > I believe you can mov the movl(top, Address(thread, JavaThread::lock_stack_top_offset())) here, and use top in both checks. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16607#discussion_r1390939354 PR Review Comment: https://git.openjdk.org/jdk/pull/16607#discussion_r1390938003 PR Review Comment: https://git.openjdk.org/jdk/pull/16607#discussion_r1390937811 From aturbanov at openjdk.org Mon Nov 13 11:16:00 2023 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Mon, 13 Nov 2023 11:16:00 GMT Subject: RFR: JDK-8316991: Reduce nullable allocation merges [v4] In-Reply-To: References: Message-ID: On Mon, 30 Oct 2023 19:59:47 GMT, Cesar Soares Lucas wrote: >> ### Description >> >> Many, if not most, allocation merges (Phis) are nullable because they join object allocations with "NULL", or objects returned from method calls, etc. Please review this Pull Request that improves Reduce Allocation Merge implementation so that it can reduce at least some of these allocation merges. >> >> Overall, the improvements are related to 1) making rematerialization of merges able to represent "NULL" objects, and 2) being able to reduce merges used by CmpP/N and CastPP. >> >> The approach to reducing CmpP/N and CastPP is pretty similar to that used in the `MemNode::split_through_phi` method: a clone of the node being split is added on each input of the Phi. I make use of `optimize_ptr_compare` and some type information to remove redundant CmpP and CastPP nodes. I added a bunch of ASCII diagrams illustrating what some of the more important methods are doing. >> >> ### Benchmarking >> >> **Note:** In some of these tests no reduction happens. I left them in to validate that no perf. regression happens in that case. >> **Note 2:** Marging of error was negligible. >> >> | Benchmark | No RAM (ms/op) | Yes RAM (ms/op) | >> |--------------------------------------|------------------|-------------------| >> | TestTrapAfterMerge | 19.515 | 13.386 | >> | TestArgEscape | 33.165 | 33.254 | >> | TestCallTwoSide | 70.547 | 69.427 | >> | TestCmpAfterMerge | 16.400 | 2.984 | >> | TestCmpMergeWithNull_Second | 27.204 | 27.293 | >> | TestCmpMergeWithNull | 8.248 | 4.920 | >> | TestCondAfterMergeWithAllocate | 12.890 | 5.252 | >> | TestCondAfterMergeWithNull | 6.265 | 5.078 | >> | TestCondLoadAfterMerge | 12.713 | 5.163 | >> | TestConsecutiveSimpleMerge | 30.863 | 4.068 | >> | TestDoubleIfElseMerge | 16.069 | 2.444 | >> | TestEscapeInCallAfterMerge | 23.111 | 22.924 | >> | TestGlobalEscape | 14.459 | 14.425 | >> | TestIfElseInLoop | 246.061 | 42.786 | >> | TestLoadAfterLoopAlias | 45.808 | 45.812 | >> ... > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Ammend previous fix & add repro tests. test/micro/org/openjdk/bench/vm/compiler/AllocationMerges.java line 1079: > 1077: Load p1 = new Load(x, y); > 1078: Load p2 = new Load(x, y); > 1079: int val = 0; Suggestion: int val = 0; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15825#discussion_r1390973273 From amitkumar at openjdk.org Mon Nov 13 11:25:02 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 13 Nov 2023 11:25:02 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v2] In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: On Thu, 9 Nov 2023 04:16:25 GMT, Roger Riggs wrote: >> Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. >> The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. >> >> The changes include: >> >> - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate >> if the input array is modified before the constructor returns. >> The resulting string may contain any combination of characters sampled from the input array. >> >> - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. >> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. >> If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. >> If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. >> Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. >> Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. >> >> - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. >> >> - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. > > Roger Riggs has updated the pull request incrementally with three additional commits since the last revision: > > - Refactored extractCodePoints to avoid multiple resizes if the array was modified > - Replaced isLatin1 implementation with `getChar(buf, ndx) <= 0xff` > It performs better than the single byte array access by avoiding the bounds check. > - Misc updates for review comments, javadoc cleanup > Extra checking on maximum string lengths when calling toBytes(). Please add s390 port: diff --git a/src/hotspot/cpu/s390/s390.ad b/src/hotspot/cpu/s390/s390.ad index ffac6b70a58..61b6a6a5906 100644 --- a/src/hotspot/cpu/s390/s390.ad +++ b/src/hotspot/cpu/s390/s390.ad @@ -10190,7 +10190,7 @@ instruct string_compress(iRegP src, iRegP dst, iRegI result, iRegI len, iRegI tm format %{ "String Compress $src->$dst($len) -> $result" %} ins_encode %{ __ string_compress($result$$Register, $src$$Register, $dst$$Register, $len$$Register, - $tmp$$Register, false, false); + $tmp$$Register, true, false); %} ins_pipe(pipe_class_dummy); %} ------------- PR Comment: https://git.openjdk.org/jdk/pull/16425#issuecomment-1807971207 From mli at openjdk.org Mon Nov 13 12:09:57 2023 From: mli at openjdk.org (Hamlin Li) Date: Mon, 13 Nov 2023 12:09:57 GMT Subject: RFR: 8319781: RISC-V: Refactor UseRVV related checks In-Reply-To: References: Message-ID: <5d1JP84-wIn3HrXus00yccWj3zL2bmdhb_btH2lv1Ws=.33a0f524-0e33-4ac4-9ff7-9a990f08f313@github.com> On Mon, 13 Nov 2023 07:18:41 GMT, Robbin Ehn wrote: > Hey, how is the changes to SpecialEncodeISOArray related ? This patch is also to respond the comment at https://github.com/openjdk/jdk/pull/16481#discussion_r1386040152 ------------- PR Comment: https://git.openjdk.org/jdk/pull/16580#issuecomment-1808036923 From mli at openjdk.org Mon Nov 13 12:09:59 2023 From: mli at openjdk.org (Hamlin Li) Date: Mon, 13 Nov 2023 12:09:59 GMT Subject: RFR: 8319781: RISC-V: Refactor UseRVV related checks In-Reply-To: <2IlOVG_5AT4GvCuDhZn3xK6iZqxn1d9UXDkrjvW1LT4=.6ca322e0-813f-4e26-a607-cf4e2a7b48cc@github.com> References: <2IlOVG_5AT4GvCuDhZn3xK6iZqxn1d9UXDkrjvW1LT4=.6ca322e0-813f-4e26-a607-cf4e2a7b48cc@github.com> Message-ID: On Mon, 13 Nov 2023 10:31:39 GMT, Ludovic Henry wrote: >> Hi, >> Can you review the patch to refactor the code related UseRVV checks? >> Thanks! >> >> There are some code (flag setting/checking) depending on UseRVV's value, these code should be refactored, especially after the change of https://bugs.openjdk.org/browse/JDK-8319408: >> 1. some code needs to get the final UseRVV's value rather than the initial value, e.g. ChaCha20 intrinsic. >> 2. refactored to be more readable. >> 3. also add note to make sure the future code get the final UseRVV value instead of inital value. > > src/hotspot/cpu/riscv/vm_version_riscv.cpp line 269: > >> 267: #endif // COMPILER2 >> 268: >> 269: // NOTE: Make sure codes dependent on UseRVV are put at the behind of c2_initialize(), > > Suggestion: > > // NOTE: Make sure codes dependent on UseRVV are put after c2_initialize(), Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16580#discussion_r1391028995 From mli at openjdk.org Mon Nov 13 12:15:29 2023 From: mli at openjdk.org (Hamlin Li) Date: Mon, 13 Nov 2023 12:15:29 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v7] In-Reply-To: References: Message-ID: > Hi, > Can you review the change to add intrinsic for CompressBits for Long & Integer? > Thanks! > > ##?Test > pass jtreg test: > test/jdk/java/lang/CompressExpand*.java Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: refine code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16481/files - new: https://git.openjdk.org/jdk/pull/16481/files/b6456d79..f9f31e74 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16481&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16481&range=05-06 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16481.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16481/head:pull/16481 PR: https://git.openjdk.org/jdk/pull/16481 From mli at openjdk.org Mon Nov 13 12:15:33 2023 From: mli at openjdk.org (Hamlin Li) Date: Mon, 13 Nov 2023 12:15:33 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v6] In-Reply-To: References: <4shJ-ET362VIOhvAKhA0FGBpn-_pofC0WI1D_ePl7v0=.a42608ad-56dc-4827-9435-0f3db631ca4b@github.com> Message-ID: On Mon, 13 Nov 2023 06:27:18 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> reserve all used v register; use t0 directly > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1691: > >> 1689: >> 1690: // load the src data(in bits) to be compressed. >> 1691: vsetivli(x0, 1, sew, lmul); > > A default `lmul` of `m1` is enough to perform the succeeding `vmv_s_x` instuction as specified by the RVV spec. > > The integer scalar read/write instructions transfer a single value between a scalar x register and element 0 of a vector > register. The instructions ignore LMUL and vector register groups. Seems it make no difference at run time, as `The instructions ignore LMUL and vector register groups.`. But it makes sense to modify it as you suggested, it's more clear. > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1703: > >> 1701: // load the mask data(in bits). >> 1702: vsetivli(x0, 1, sew, lmul); >> 1703: vmv_v_x(v0, mask); > > Shouldn't this be `vmv_s_x(v0, mask)` instead of `vmv_v_x(v0, mask)`? The `vcompress.vm` instruction is expecting a vector mask register. Also the preceding `vsetivli` should be changed to use a default `lmul` of `m1` at the same time. Good catch! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1391031706 PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1391031969 From dholmes at openjdk.org Mon Nov 13 12:26:00 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 13 Nov 2023 12:26:00 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v2] In-Reply-To: References: Message-ID: <-wv8lJu-I8JKXGYZ_kFnvXEppqbm4wpRn3fpSRziDN8=.19b67eac-9141-43c2-9242-59199e1cb724@github.com> On Mon, 13 Nov 2023 07:39:10 GMT, Axel Boldt-Christmas wrote: >> LM_LIGHTWEIGHT only uses the lock bits for its locking. This leaves the hashCode bits free when a monitor is not inflated. So instead of inflating when installing the hashCode on a fast locked object it can simply use the hashCode bits in the markWord. >> >> The mark word transitions Unlocked (0b01) <=> Locked (0b00) are done by retrying the CAS if it fails due to non-lock bit changes. >> The mark word transitions Monitor (0b10) <=> Locked/Unlocked (0b0X) are the same as before, inflation already handles hash codes. This change does not interact with the mark word if it is in a Monitor (0b10) state, so the strong CAS which is used for deflation are still valid, and will not fail to any other reason than the cooperative race to help transition the mark word during deflation. >> >> This is dependent on JDK-8319778 simply because JDK-8319797 is dependent on both this and JDK-8319778. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Simplify test. Noting further from me. Runtime changes look good. Best to get a compiler dev to okay the C2 changes. Thanks ------------- PR Review: https://git.openjdk.org/jdk/pull/16603#pullrequestreview-1727211308 From dholmes at openjdk.org Mon Nov 13 12:26:02 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 13 Nov 2023 12:26:02 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v2] In-Reply-To: <1DG4zwC5I96PdIuDQCQbeEsOL3NR5owY7ehs-3axPlE=.68e173e2-e92b-43a5-abf3-9ef30b48443d@github.com> References: <1DG4zwC5I96PdIuDQCQbeEsOL3NR5owY7ehs-3axPlE=.68e173e2-e92b-43a5-abf3-9ef30b48443d@github.com> Message-ID: On Mon, 13 Nov 2023 07:29:40 GMT, Axel Boldt-Christmas wrote: >> src/hotspot/share/runtime/synchronizer.cpp line 582: >> >>> 580: // It can only have installed an anonymously locked monitor at this point. >>> 581: // Fetch that monitor, set owner correctly to this thread, and >>> 582: // exit it (allowing waiting threads to enter). >> >> I don't understand why the anonymous owner case is no longer being checked. ?? > > The condition does now check for a successful CAS, not the unsuccessful one. If it was successful then there is no monitor, thus no anonymous owner. > > If the CAS failed and the mark word is no longer fast locked. It must be inflated. So we fallthrough down to the inflated case. > > `ObjectSynchronizer::inflate` correctly handles fixing the owner. I see - thanks. It is hard to see where the code goes to when the CAS fails. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16603#discussion_r1391043625 From dholmes at openjdk.org Mon Nov 13 12:31:57 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 13 Nov 2023 12:31:57 GMT Subject: RFR: 8319777: Zero: Support 8-byte cmpxchg In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 14:17:32 GMT, Aleksey Shipilev wrote: > See related discussion in [JDK-8318776](https://bugs.openjdk.org/browse/JDK-8318776) that targets to require `supports_cx8()` unconditionally. > > I think we can claim Zero is `supports_cx8() == true`, because we have enough fallbacks for 8-byte CASes to work. Note that some code already reaches for these without checking for `supports_cx8()`, so the proverbial horses have already left the barn. > > I ran tests with [JDK-8319883](https://bugs.openjdk.org/browse/JDK-8319883) applied to fix known problems with x86_32 Zero. > > Additional testing: > - [x] Linux x86_32 Zero release; jcstress > - [x] Linux x86_32 Zero fastdebug, `compiler/unsafe java/lang/invoke/VarHandles` > - [x] Linux x86_32 Zero fastdebug, bootcycle-images src/hotspot/cpu/zero/globalDefinitions_zero.hpp line 30: > 28: > 29: // Unconditionally supports 8-byte cmpxchg either with > 30: // compiler intrinsics or with library/kernel helpers. That's not what "native support for cx8" was meant to mean though - e.g. see the ARM header - it only sets this when building for ARMv7. You can just leave this file alone and simply set `_supports_cx8` below to achieve the same goal. And that will fit in cleaner with the changes I am making. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16614#discussion_r1391053336 From rehn at openjdk.org Mon Nov 13 12:38:57 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 13 Nov 2023 12:38:57 GMT Subject: RFR: 8319781: RISC-V: Refactor UseRVV related checks In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 10:30:41 GMT, Hamlin Li wrote: > Hi, > Can you review the patch to refactor the code related UseRVV checks? > Thanks! > > There are some code (flag setting/checking) depending on UseRVV's value, these code should be refactored, especially after the change of https://bugs.openjdk.org/browse/JDK-8319408: > 1. some code needs to get the final UseRVV's value rather than the initial value, e.g. ChaCha20 intrinsic. > 2. refactored to be more readable. > 3. also add note to make sure the future code get the final UseRVV value instead of inital value. Marked as reviewed by rehn (Reviewer). > > Hey, how is the changes to SpecialEncodeISOArray related ? > > This patch is also to respond the comment at [#16481 (comment)](https://github.com/openjdk/jdk/pull/16481#discussion_r1386040152) Ok, looks good! ------------- PR Review: https://git.openjdk.org/jdk/pull/16580#pullrequestreview-1727232825 PR Comment: https://git.openjdk.org/jdk/pull/16580#issuecomment-1808086211 From duke at openjdk.org Mon Nov 13 12:39:16 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Mon, 13 Nov 2023 12:39:16 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode Message-ID: Hello All, Please review these changes to support _vectorizedHashCode intrinsic on RISC-V platform. The patch adds the "scalar" code for the intrinsic without usage of any RVV instruction but provides manual unrolling of the appropriate loop. The code with usage of RVV instruction could be added as follow-up of the patch or independently. Thanks, -Yuri Gaevsky P.S. My OCA has been accepted recently (ygaevsky). ### Correctness checks Testing: tier1 tests successfully passed on a RISC-V StarFive JH7110 board with Linux. ### Performance results (the numbers for non-ints are similar) #### StarFive JH7110 board: ArraysHashCode: without intrinsic with intrinsic ------------------------------------------------------------------------------- Benchmark (size) Mode Cnt Score Error Score Error Units ------------------------------------------------------------------------------- multiints 0 avgt 30 2.658 ? 0.001 2.661 ? 0.004 ns/op multiints 1 avgt 30 4.881 ? 0.011 4.892 ? 0.015 ns/op multiints 2 avgt 30 16.109 ? 0.041 10.451 ? 0.075 ns/op multiints 3 avgt 30 14.873 ? 0.068 11.753 ? 0.024 ns/op multiints 4 avgt 30 17.283 ? 0.078 13.176 ? 0.044 ns/op multiints 5 avgt 30 19.691 ? 0.136 14.723 ? 0.046 ns/op multiints 6 avgt 30 21.727 ? 0.166 15.463 ? 0.124 ns/op multiints 7 avgt 30 23.790 ? 0.126 18.298 ? 0.059 ns/op multiints 8 avgt 30 23.527 ? 0.116 18.267 ? 0.046 ns/op multiints 9 avgt 30 27.981 ? 0.303 20.453 ? 0.069 ns/op multiints 10 avgt 30 26.947 ? 0.215 20.541 ? 0.051 ns/op multiints 50 avgt 30 95.373 ? 0.588 69.238 ? 0.208 ns/op multiints 100 avgt 30 177.109 ? 0.525 137.852 ? 0.417 ns/op multiints 200 avgt 30 341.074 ? 1.363 296.832 ? 0.725 ns/op multiints 500 avgt 30 847.993 ? 1.713 752.415 ? 1.918 ns/op multiints 1000 avgt 30 1610.199 ? 5.424 1426.112 ? 3.407 ns/op multiints 10000 avgt 30 16234.260 ? 26.789 14447.936 ? 26.345 ns/op multiints 100000 avgt 30 170726.025 ? 184.003 152587.649 ? 381.964 ns/op ------------------------------------------------------------------------------- #### T-Head RVB-ICE board: ArraysHashCode: without intrinsic with intrinsic ------------------------------------------------------------------------------ Benchmark (size) Mode Cnt Score Error Score Error Units ------------------------------------------------------------------------------ multiints 0 avgt 30 2.780 ? 0.022 2.816 ? 0.038 ns/op multiints 1 avgt 30 5.073 ? 0.032 5.101 ? 0.064 ns/op multiints 2 avgt 30 14.656 ? 0.234 10.974 ? 0.118 ns/op multiints 3 avgt 30 12.890 ? 0.064 14.168 ? 0.096 ns/op multiints 4 avgt 30 13.715 ? 0.092 12.552 ? 0.188 ns/op multiints 5 avgt 30 19.068 ? 0.172 13.557 ? 0.164 ns/op multiints 6 avgt 30 18.863 ? 0.122 14.848 ? 0.086 ns/op multiints 7 avgt 30 23.155 ? 0.123 17.300 ? 0.360 ns/op multiints 8 avgt 30 21.656 ? 0.130 19.214 ? 1.689 ns/op multiints 9 avgt 30 27.223 ? 0.116 20.450 ? 0.088 ns/op multiints 10 avgt 30 26.194 ? 0.116 19.463 ? 0.411 ns/op multiints 50 avgt 30 97.904 ? 1.396 69.662 ? 1.729 ns/op multiints 100 avgt 30 179.607 ? 0.766 137.127 ? 3.156 ns/op multiints 200 avgt 30 303.845 ? 2.743 235.476 ? 3.446 ns/op multiints 500 avgt 30 752.295 ? 2.131 571.157 ? 4.258 ns/op multiints 1000 avgt 30 1404.489 ? 5.839 1048.263 ? 3.473 ns/op multiints 10000 avgt 30 13797.829 ? 44.821 10344.752 ? 24.183 ns/op multiints 100000 avgt 30 135067.307 ? 254.361 98806.823 ? 131.562 ns/op ------------------------------------------------------------------------------ ------------- Commit messages: - 8318217: RISC-V: C2 VectorizedHashCode Changes: https://git.openjdk.org/jdk/pull/16629/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16629&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318217 Stats: 145 lines in 6 files changed: 145 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16629.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16629/head:pull/16629 PR: https://git.openjdk.org/jdk/pull/16629 From ayang at openjdk.org Mon Nov 13 12:44:58 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 13 Nov 2023 12:44:58 GMT Subject: RFR: JDK-8234502 : Merge GenCollectedHeap and SerialHeap [v3] In-Reply-To: References: Message-ID: On Sat, 11 Nov 2023 11:49:33 GMT, Lei Zaakjyu wrote: >> JDK-8234502 : Merge GenCollectedHeap and SerialHeap > > Lei Zaakjyu has updated the pull request incrementally with one additional commit since the last revision: > > Fix 'young_gen' function in 'genCollectedHeap.cpp' Maybe there is some misinterpretation about the ticket description, but I'd expect `class GenCollectedHeap` to be removed completely with this ticket. Could you enable GHA (https://wiki.openjdk.org/display/SKARA/Testing) to catch potential issues for all platforms? It should show sth like "19 successful checks", e.g. in https://github.com/openjdk/jdk/pull/16560. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16623#issuecomment-1808094234 From ogillesp at amazon.co.uk Mon Nov 13 12:47:16 2023 From: ogillesp at amazon.co.uk (Gillespie, Oli) Date: Mon, 13 Nov 2023 12:47:16 +0000 Subject: Automatically reduce heap size near compressed oops limit Message-ID: Hi, It's quite easy to make a mistake when configuring heap size near the compressed oops limit. Either for someone who knows about the limit, but misses some nuance (they hear that 32GB is the maximum addressable space, and set Xmx32G, not realizing that the buffer required for the null page means that Xmx32G guarantees that you *won't* get compressed oops), or for someone who doesn't know about the limit and just chooses close to 32GB some other way, but loses out because the uncompressed oops use more space than the extra allotted (it's typical for people to recommend never choosing between 32 and say 36GB for this reason), or for someone that knows all this, and specifies Xmx31G because they don't know exactly what the maximum is on every server they run on, and lose out on up to 1GB of potential heap space. What do people think of these two related ideas? 1. If the user sets *exactly* Xmx32G, we assume they intend or at least would benefit from compressed oops, so we reduce the chosen heap size to the maximum that compressed oops supports on their machine. We can limit the adjustment to some sanity value, like say 500MB, to avoid possible edge cases. 2. If the user sets anywhere in our chosen 'inefficient' range ((32GB - buffer) up to say 36GB), we reduce the chosen heap size to the maximum that compressed oops supports. These would be accompanied by a warning message, and could be overridden with a new flag `-XX:+-AllowHeapAdjustmentForCompressedOops` or for 2 `-XX:MaxHeapAdjustmentForCompressedOops=4G` (naming suggestions welcome!). Idea 1 is simply idea 2 with tighter bounds, but I suspect it covers the majority of useful cases while minimizing mistakes, so that's where I'd start. A dedicated way to say 'give me the max heap compressed oops supports' could also be valuable, but I don't think it offers much over 'magic' handling of 32G. Oli Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. From jvernee at openjdk.org Mon Nov 13 12:51:36 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 13 Nov 2023 12:51:36 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v14] In-Reply-To: References: Message-ID: > Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. > > The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. > > Components of this patch: > > - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. > - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. > - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. > - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. > - The object/oop + offset is exposed as temporary address to native code. > - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). > - Only x64 and AArch64 for now. > - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 > - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. > - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` > > Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. > > Numbers for the included benchmark on my machine are: > > > Benchmark (size) Mode Cnt ... Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 52 commits: - Merge branch 'master' into AllowHeapNoLock - fix type and reformat doc in Linker - Merge branch 'master' into AllowHeapNoLock - tweak whitespace - a -> an - add note to downcallHandle about passing heap segments by-reference - Merge branch 'master' into AllowHeapNoLock - bump up argument counts in TestLargeStub to their maximum - s390 updates - add stub size stress test for allowHeap - ... and 42 more: https://git.openjdk.org/jdk/compare/03db8281...36da79d1 ------------- Changes: https://git.openjdk.org/jdk/pull/16201/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16201&range=13 Stats: 2711 lines in 74 files changed: 1722 ins; 692 del; 297 mod Patch: https://git.openjdk.org/jdk/pull/16201.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16201/head:pull/16201 PR: https://git.openjdk.org/jdk/pull/16201 From jvernee at openjdk.org Mon Nov 13 12:51:36 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 13 Nov 2023 12:51:36 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v12] In-Reply-To: References: Message-ID: <6CiULyvr5njIHm6Pm7sU5ElUrXElHAmD3up0aYVutuU=.77f43912-543e-4505-a04c-181233939af4@github.com> On Thu, 9 Nov 2023 15:39:54 GMT, Jorn Vernee wrote: >> src/hotspot/cpu/aarch64/downcallLinker_aarch64.cpp line 182: >> >>> 180: ArgumentShuffle arg_shuffle(filtered_java_regs, out_regs, shuffle_reg); >>> 181: >>> 182: #ifndef PRODUCT >> >> Any particular reason to exclude the logging in product builds? `ArgumentShuffle::print_on()` is unconditionally available there. > > This is partly historical. The log output is only intended for debugging, not for end-user eyes. So, I think I originally excluded it as a way of trimming fat from the product build. > > Either way, `ArgumentShuffle::print_on` should probably be excluded/included on the same basis. Either way, this seems like something that should be addressed in a separate patch ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16201#discussion_r1391072872 From rehn at openjdk.org Mon Nov 13 13:49:57 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 13 Nov 2023 13:49:57 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 In-Reply-To: References: <8rC40UxJC4IF9vdv6xIyaJl6l-fhAlRC0VezoUAuKYE=.bdc94ce5-d9d5-429c-bb38-701ffcbe0bcf@github.com> Message-ID: On Mon, 13 Nov 2023 09:06:34 GMT, Fei Yang wrote: >> Yes, this is a total mess. >> For bystanders this is a 'simple' march to clang: >> `rv64im0_a2p1_f2p2_d2p2_c2p0_v1p0_zicsr2p0_zifencei2p0_zmmul1p0_zacas1p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0` >> A simple RVA23U64 CPU may have like 40 extensions, a high performance server class CPU may have well over a hundred. >> >> Just the scalar crypto ones: >> `Zbkb, Zbkc, Zbkx, Zknd, Zkne, Zknh, Zksed, Zksh, Zkr, Zkt, Zkn, Zks, Zk` >> >> It is no reasonable to add all these as flags. >> So flags for the collections seems like much better idea. >> But we probably need to be able to turn off a sub-extension such UseZvknhb. >> "-XX:+UseVectorCryptoExt:zvknhb=false" >> Suggestions welcome. >> >> Just top of my head, at the moment I need to supply this crazy arch string to compiler, obj dump, qemu(bit different but still crazy :) ) and there doesn't seem to be a solution near, so maybe we should be able to supply that arch string to the VM also. >> `-XX:UseArch=rv64im0_a2p1_f2p2_d2p2_c2p0_v1p0_zicsr2p0_zifencei2p0_zmmul1p0_zacas1p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0` > > I think that's what RISC-V profiles are for [1] which make some basic extensions mandatory for each profile. And we already have JVM options like `UseRVA20U64` and `UseRVA22U64` for riscv. But there are still some optional extensions for each profile, say RVV for RVA22U64. So instead of feeding a rather long march to the JVM, I feel it's more reasonable to have some JVM options at the extension level (instead of sub-extension level) as suggested by @robehn. > > Personally, I would suggest something slightly different. Say: > "-XX:VectorCryptoExt=zvknhb", "-XX:VectorCryptoExt=zvknhb_zvkb", or "-XX:VectorCryptoExt=all" > > This way we will still be able to distinguish specific sub-extensions while keeping one JVM option for each extension/collection. > > [1] https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc Let's take it on the list: https://mail.openjdk.org/pipermail/riscv-port-dev/2023-November/001211.html ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1391138003 From shade at openjdk.org Mon Nov 13 14:27:59 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 13 Nov 2023 14:27:59 GMT Subject: RFR: 8319777: Zero: Support 8-byte cmpxchg In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 12:29:03 GMT, David Holmes wrote: >> See related discussion in [JDK-8318776](https://bugs.openjdk.org/browse/JDK-8318776) that targets to require `supports_cx8()` unconditionally. >> >> I think we can claim Zero is `supports_cx8() == true`, because we have enough fallbacks for 8-byte CASes to work. Note that some code already reaches for these without checking for `supports_cx8()`, so the proverbial horses have already left the barn. >> >> I ran tests with [JDK-8319883](https://bugs.openjdk.org/browse/JDK-8319883) applied to fix known problems with x86_32 Zero. >> >> Additional testing: >> - [x] Linux x86_32 Zero release; jcstress >> - [x] Linux x86_32 Zero fastdebug, `compiler/unsafe java/lang/invoke/VarHandles` >> - [x] Linux x86_32 Zero fastdebug, bootcycle-images > > src/hotspot/cpu/zero/globalDefinitions_zero.hpp line 30: > >> 28: >> 29: // Unconditionally supports 8-byte cmpxchg either with >> 30: // compiler intrinsics or with library/kernel helpers. > > That's not what "native support for cx8" was meant to mean though - e.g. see the ARM header - it only sets this when building for ARMv7. > > You can just leave this file alone and simply set `_supports_cx8` below to achieve the same goal. And that will fit in cleaner with the changes I am making. Well, yes, we can just do `_supports_cx8 = true`. But I am confused by the meaning of `SUPPORTS_NATIVE_CX8`. What is it? I read it as "we know statically, at compile time, that the target platform supports CX8". Otherwise, we poll it at runtime and let the runtime code decide by checking `VMVersion::supports_cx8()`. Defining `SUPPORTS_NATIVE_CX8` compiles out access backend locking paths completely, for example, without resorting to runtime checks. What I am missing? Is the wording for the comment misleading? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16614#discussion_r1391185118 From evergizova at openjdk.org Mon Nov 13 14:46:02 2023 From: evergizova at openjdk.org (Ekaterina Vergizova) Date: Mon, 13 Nov 2023 14:46:02 GMT Subject: RFR: 8314220: Configurable InlineCacheBuffer size [v2] In-Reply-To: References: Message-ID: On Fri, 22 Sep 2023 23:49:25 GMT, Dean Long wrote: >> Ekaterina Vergizova has updated the pull request incrementally with one additional commit since the last revision: >> >> Changed type, added range check > > src/hotspot/share/runtime/flags/jvmFlagConstraintsCompiler.cpp line 459: > >> 457: } >> 458: >> 459: if ((value % CodeEntryAlignment) != 0) { > > I don't understand why this is necessary. It is needed to avoid 'failed: _buffer_size not aligned' crashes on debug builds: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/stubs.cpp#L221 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15271#discussion_r1391205013 From evergizova at openjdk.org Mon Nov 13 14:46:04 2023 From: evergizova at openjdk.org (Ekaterina Vergizova) Date: Mon, 13 Nov 2023 14:46:04 GMT Subject: RFR: 8314220: Configurable InlineCacheBuffer size [v2] In-Reply-To: <5X84H-etGQ-RDan1RTnnZVmXujoo6aleWopu3Hl_J0k=.d82350d3-be39-4acb-accb-ddcb7a8a6fa4@github.com> References: <5X84H-etGQ-RDan1RTnnZVmXujoo6aleWopu3Hl_J0k=.d82350d3-be39-4acb-accb-ddcb7a8a6fa4@github.com> Message-ID: On Fri, 22 Sep 2023 23:54:30 GMT, Dean Long wrote: >> Thanks, I changed type to int and added a range check constraint. > > I'd rather have the type as size_t and change StubQueue accordingly. Thanks @dean-long. I would like to keep this enhancement simple and minimal so that it can be backported to 17 and 11. So I'd like to avoid changes to StubQueue. I can change the type of InlineCacheBufferSize to size_t and add checked_cast to StubQueue constructor in InlineCacheBuffer::initialize(): _buffer = new StubQueue(new ICStubInterface, checked_cast(InlineCacheBufferSize), InlineCacheBuffer_lock, "InlineCacheBuffer"); Because in any case InlineCacheBufferSize can't be greater than INT_MAX: `InlineCacheBufferSize < NonNMethodCodeHeapSize < ReservedCodeCacheSize < CODE_CACHE_DEFAULT_LIMIT = 2G`: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/codeCache.cpp#L191 https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/compilerDefinitions.cpp#L492 https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/globalDefinitions.hpp#L589 Will that be OK? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15271#discussion_r1391208645 From fyang at openjdk.org Mon Nov 13 14:47:01 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 13 Nov 2023 14:47:01 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v7] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 12:15:29 GMT, Hamlin Li wrote: >> Hi, >> Can you review the change to add intrinsic for CompressBits for Long & Integer? >> Thanks! >> >> ##?Test >> pass jtreg test: >> test/jdk/java/lang/CompressExpand*.java > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > refine code src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1702: > 1700: vmv_v_i(v8, 0); > 1701: // load the mask data(in bits). > 1702: vsetivli(x0, 1, sew, lmul); Please also change line into "vsetivli(x0, 1, sew, Assembler::m1)" for consistency. Otherwise LGTM. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1391210994 From fyang at openjdk.org Mon Nov 13 14:49:59 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 13 Nov 2023 14:49:59 GMT Subject: RFR: 8319781: RISC-V: Refactor UseRVV related checks In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 10:30:41 GMT, Hamlin Li wrote: > Hi, > Can you review the patch to refactor the code related UseRVV checks? > Thanks! > > There are some code (flag setting/checking) depending on UseRVV's value, these code should be refactored, especially after the change of https://bugs.openjdk.org/browse/JDK-8319408: > 1. some code needs to get the final UseRVV's value rather than the initial value, e.g. ChaCha20 intrinsic. > 2. refactored to be more readable. > 3. also add note to make sure the future code get the final UseRVV value instead of inital value. Changes requested by fyang (Reviewer). src/hotspot/cpu/riscv/vm_version_riscv.cpp line 295: > 293: > 294: if (!UseRVV) { > 295: FLAG_SET_DEFAULT(SpecialEncodeISOArray, false); Could you please remove this line while you are on it? I don't think we need to reset `SpecialEncodeISOArray` here. ------------- PR Review: https://git.openjdk.org/jdk/pull/16580#pullrequestreview-1727476994 PR Review Comment: https://git.openjdk.org/jdk/pull/16580#discussion_r1391214061 From mli at openjdk.org Mon Nov 13 15:06:16 2023 From: mli at openjdk.org (Hamlin Li) Date: Mon, 13 Nov 2023 15:06:16 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v8] In-Reply-To: References: Message-ID: > Hi, > Can you review the change to add intrinsic for CompressBits for Long & Integer? > Thanks! > > ##?Test > pass jtreg test: > test/jdk/java/lang/CompressExpand*.java Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: refine code 2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16481/files - new: https://git.openjdk.org/jdk/pull/16481/files/f9f31e74..2503e42f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16481&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16481&range=06-07 Stats: 9 lines in 2 files changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/16481.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16481/head:pull/16481 PR: https://git.openjdk.org/jdk/pull/16481 From mli at openjdk.org Mon Nov 13 15:06:19 2023 From: mli at openjdk.org (Hamlin Li) Date: Mon, 13 Nov 2023 15:06:19 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v7] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 14:44:44 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> refine code > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1702: > >> 1700: vmv_v_i(v8, 0); >> 1701: // load the mask data(in bits). >> 1702: vsetivli(x0, 1, sew, lmul); > > Please also change line into "vsetivli(x0, 1, sew, Assembler::m1)" for consistency. Otherwise LGTM. Modified. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1391238036 From fyang at openjdk.org Mon Nov 13 15:14:04 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 13 Nov 2023 15:14:04 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v8] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 15:06:16 GMT, Hamlin Li wrote: >> Hi, >> Can you review the change to add intrinsic for CompressBits for Long & Integer? >> Thanks! >> >> ##?Test >> pass jtreg test: >> test/jdk/java/lang/CompressExpand*.java > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > refine code 2 Ah ... We still have two copy-paste issues. src/hotspot/cpu/riscv/riscv_v.ad line 2884: > 2882: > 2883: instruct compressBitsI(iRegINoSp dst, iRegIorL2I src, iRegIorL2I mask, vRegMask_V0 v0, > 2884: vReg_V4 v4, vReg_V4 v5, vReg_V8 v8, vReg_V4 v9) %{ Just noticed that this should be `vReg_V4 v4, vReg_V5 v5, vReg_V8 v8, vReg_V9 v9`. src/hotspot/cpu/riscv/riscv_v.ad line 2911: > 2909: instruct compressBitsL(iRegLNoSp dst, iRegL src, iRegL mask, vRegMask_V0 v0, > 2910: vReg_V4 v4, vReg_V4 v5, vReg_V4 v6, vReg_V4 v7, > 2911: vReg_V8 v8, vReg_V4 v9, vReg_V4 v10, vReg_V4 v11) %{ Similar issue here for v5-v7 and v9-v11. ------------- Changes requested by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16481#pullrequestreview-1727528614 PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1391244946 PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1391247033 From mli at openjdk.org Mon Nov 13 15:17:00 2023 From: mli at openjdk.org (Hamlin Li) Date: Mon, 13 Nov 2023 15:17:00 GMT Subject: RFR: 8319781: RISC-V: Refactor UseRVV related checks In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 14:47:04 GMT, Fei Yang wrote: >> Hi, >> Can you review the patch to refactor the code related UseRVV checks? >> Thanks! >> >> There are some code (flag setting/checking) depending on UseRVV's value, these code should be refactored, especially after the change of https://bugs.openjdk.org/browse/JDK-8319408: >> 1. some code needs to get the final UseRVV's value rather than the initial value, e.g. ChaCha20 intrinsic. >> 2. refactored to be more readable. >> 3. also add note to make sure the future code get the final UseRVV value instead of inital value. > > src/hotspot/cpu/riscv/vm_version_riscv.cpp line 295: > >> 293: >> 294: if (!UseRVV) { >> 295: FLAG_SET_DEFAULT(SpecialEncodeISOArray, false); > > Could you please remove this line while you are on it? I don't think we need to reset `SpecialEncodeISOArray` here. I'm not sure, as there are some code in share/classfile/vmIntrinsics.cpp which depends on SpecialEncodeISOArray's value: case vmIntrinsics::_encodeISOArray: case vmIntrinsics::_encodeAsciiArray: case vmIntrinsics::_encodeByteISOArray: if (!SpecialEncodeISOArray) return true; break; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16580#discussion_r1391252954 From mli at openjdk.org Mon Nov 13 15:21:11 2023 From: mli at openjdk.org (Hamlin Li) Date: Mon, 13 Nov 2023 15:21:11 GMT Subject: RFR: 8319781: RISC-V: Refactor UseRVV related checks [v2] In-Reply-To: References: Message-ID: > Hi, > Can you review the patch to refactor the code related UseRVV checks? > Thanks! > > There are some code (flag setting/checking) depending on UseRVV's value, these code should be refactored, especially after the change of https://bugs.openjdk.org/browse/JDK-8319408: > 1. some code needs to get the final UseRVV's value rather than the initial value, e.g. ChaCha20 intrinsic. > 2. refactored to be more readable. > 3. also add note to make sure the future code get the final UseRVV value instead of inital value. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: refine comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16580/files - new: https://git.openjdk.org/jdk/pull/16580/files/f9337946..096ea83c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16580&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16580&range=00-01 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16580.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16580/head:pull/16580 PR: https://git.openjdk.org/jdk/pull/16580 From shade at openjdk.org Mon Nov 13 15:24:03 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 13 Nov 2023 15:24:03 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v5] In-Reply-To: References: Message-ID: On Tue, 7 Nov 2023 20:22:13 GMT, Aleksey Shipilev wrote: >> See the symptoms, reproducer and analysis in the bug. >> >> Current code waits on `disarm()`, which effectively stalls leaving the safepoint if some threads lag behind. Having more runnable threads than CPUs nearly guarantees that we would wait for quite some time, but it also reproduces well if you have enough threads near the CPU count. Just waiting at `arm()` is insufficient, but we can have several `Semaphores` to do what we want. >> >> This PR implements a more efficient `GenericWaitBarrier` to recover the performance. Most of the implementation discussion is in the code comments. The key observation that drives this work is that we want to reuse `Semaphore` and related counters without being stuck waiting for threads to leave. >> >> (AFAICS, futex-based `LinuxWaitBarrier` does roughly the same, but handles this reuse on futex side, by assigning the "address" per futex.) >> >> This issue affects everything except Linux. I initially found this on my M1 Mac, but pretty sure it is easy to reproduce on Windows as well. The safepoints from the reproducer in the bug improved dramatically on a Mac. Note not only the orders of magnitude better safepoint times, but also the several times more GC safepoints in the time-bound allocation test, which means the attainable GC throughput is similarly better, since we don't waste time at this wait barrier. >> >> ![plot-generic-wait-barrier-macos](https://github.com/openjdk/jdk/assets/1858943/3440f200-2360-482a-b8b2-71385cca35b7) >> >> Additional testing: >> - [x] MacOS AArch64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] MacOS AArch64 server fastdebug, `tier2 tier3` >> - [x] Linux x86_64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Rework paddings That's okay. I expect MacOS and Windows GC-heavy benchmarks to improve with this patch. @cl4es, want to run this through performance testing? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1808368844 From mli at openjdk.org Mon Nov 13 15:27:28 2023 From: mli at openjdk.org (Hamlin Li) Date: Mon, 13 Nov 2023 15:27:28 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v9] In-Reply-To: References: Message-ID: <7nzXUcZTMe6tmV7VPZmIYTWA7z7aakB_oL-jQuYLI-8=.5ae719d6-6d6d-41fb-8128-09f634c45538@github.com> > Hi, > Can you review the change to add intrinsic for CompressBits for Long & Integer? > Thanks! > > ##?Test > pass jtreg test: > test/jdk/java/lang/CompressExpand*.java Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: Fix typos ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16481/files - new: https://git.openjdk.org/jdk/pull/16481/files/2503e42f..7af61f97 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16481&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16481&range=07-08 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16481.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16481/head:pull/16481 PR: https://git.openjdk.org/jdk/pull/16481 From mli at openjdk.org Mon Nov 13 15:27:32 2023 From: mli at openjdk.org (Hamlin Li) Date: Mon, 13 Nov 2023 15:27:32 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v8] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 15:08:21 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> refine code 2 > > src/hotspot/cpu/riscv/riscv_v.ad line 2884: > >> 2882: >> 2883: instruct compressBitsI(iRegINoSp dst, iRegIorL2I src, iRegIorL2I mask, vRegMask_V0 v0, >> 2884: vReg_V4 v4, vReg_V4 v5, vReg_V8 v8, vReg_V4 v9) %{ > > Just noticed that this should be `vReg_V4 v4, vReg_V5 v5, vReg_V8 v8, vReg_V9 v9`. My bad, thanks for catching these typos! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16481#discussion_r1391266387 From fyang at openjdk.org Mon Nov 13 15:28:59 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 13 Nov 2023 15:28:59 GMT Subject: RFR: 8319781: RISC-V: Refactor UseRVV related checks [v2] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 15:14:05 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/vm_version_riscv.cpp line 295: >> >>> 293: >>> 294: if (!UseRVV) { >>> 295: FLAG_SET_DEFAULT(SpecialEncodeISOArray, false); >> >> Could you please remove this line while you are on it? I don't think we need to reset `SpecialEncodeISOArray` here. > > I'm not sure, as there are some code in share/classfile/vmIntrinsics.cpp which depends on SpecialEncodeISOArray's value: > > case vmIntrinsics::_encodeISOArray: > case vmIntrinsics::_encodeAsciiArray: > case vmIntrinsics::_encodeByteISOArray: > if (!SpecialEncodeISOArray) return true; > break; The reason is that `SpecialEncodeISOArray` is a DIAGNOSTIC option which I think is for debugging / trouble-shooting purpose. The other four ones are `SpecialStringCompareTo`, `SpecialStringIndexOf`, `SpecialStringEquals` and `SpecialArraysEquals`. So this code snippet checking for `SpecialEncodeISOArray` is in function `vmIntrinsics::disabled_by_jvm_flags`. So I think we are safe to have this change. In fact, `SpecialEncodeISOArray` and the others are not checked or modified by other platforms either. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16580#discussion_r1391268606 From rkennke at openjdk.org Mon Nov 13 15:53:00 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 13 Nov 2023 15:53:00 GMT Subject: RFR: 8319799: Recursive lightweight locking: x86 implementation [v2] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 10:45:10 GMT, Axel Boldt-Christmas wrote: >> Implements the x86 port of JDK-8319796. >> >> There are two major parts for the port implementation. The C2 part, and the part shared by the interpreter, C1 and the native call wrapper. >> >> The biggest change for both parts is that we check the lock stack first and if it is a recursive lightweight [un]lock and in that case simply pop/push and finish successfully. >> >> Only if the recursive lightweight [un]lock fails does it look at the mark word. >> >> For the shared part if it is an unstructured exit, the monitor is inflated or the mark word transition fails it calls into the runtime. >> >> The C2 operates under a few more assumptions, that the locking is structured and balanced. This means that some checks can be elided. >> >> First this means that in C2 unlock if the obj is not on the top of the lock stack, it must be inflated. And reversely if we reach the inflated C2 unlock the obj is not on the lock stack. This second property makes it possible to avoid reading the owner (and checking if it is anonymous). Instead it can either just do an un-contended unlock by writing null to the owner, or if contention happens, simply write the thread to the owner and jump to the runtime. >> >> The x86 C2 port also has some extra oddities. >> >> The mark word read is done early as it showed better scaling in hyper-threaded scenarios on certain intel hardware, and no noticeable downside on other tested x86 hardware. >> >> The fast path is written to avoid going through conditional branches. This in combination with keeping the ZF output correct, the code does some actions eagerly, decrementing the held monitor count, popping from the lock stack. And jumps to a code stub if a slow path is required which restores the thread local state to a correct state before jumping to the runtime. >> >> The contended unlock was also moved to the code stub. > > Axel Boldt-Christmas has updated the pull request incrementally with three additional commits since the last revision: > > - Fix type > - Move inflated check in fast_locked > - Move top load I see benefits in interleaving the various loads in the locking fast-paths. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 974: > 972: jcc(Assembler::notZero, inflated); > 973: > 974: // Load top. I have found it to be beneficial to move up the load of the top-offset to between the load/prefetch of the mark-word and the test for monitor. This way we do the test while the top-offset arrives and reduce the latency of the lock-stack-full-check. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1072: > 1070: > 1071: // Check if obj is top of lock-stack. > 1072: movl(top, Address(thread, JavaThread::lock_stack_top_offset())); Like above, moving the load of the top-offset up above ent mark-load should be harmless and potentially reduces the time that the following instructions have to wait for the top-offset to arrive. ------------- PR Review: https://git.openjdk.org/jdk/pull/16607#pullrequestreview-1727614494 PR Review Comment: https://git.openjdk.org/jdk/pull/16607#discussion_r1391298977 PR Review Comment: https://git.openjdk.org/jdk/pull/16607#discussion_r1391301749 From mli at openjdk.org Mon Nov 13 16:42:11 2023 From: mli at openjdk.org (Hamlin Li) Date: Mon, 13 Nov 2023 16:42:11 GMT Subject: RFR: 8319781: RISC-V: Refactor UseRVV related checks [v3] In-Reply-To: References: Message-ID: > Hi, > Can you review the patch to refactor the code related UseRVV checks? > Thanks! > > There are some code (flag setting/checking) depending on UseRVV's value, these code should be refactored, especially after the change of https://bugs.openjdk.org/browse/JDK-8319408: > 1. some code needs to get the final UseRVV's value rather than the initial value, e.g. ChaCha20 intrinsic. > 2. refactored to be more readable. > 3. also add note to make sure the future code get the final UseRVV value instead of inital value. Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: remove code setting SpecialEncodeISOArray ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16580/files - new: https://git.openjdk.org/jdk/pull/16580/files/096ea83c..7d05b867 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16580&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16580&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16580.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16580/head:pull/16580 PR: https://git.openjdk.org/jdk/pull/16580 From mli at openjdk.org Mon Nov 13 16:42:12 2023 From: mli at openjdk.org (Hamlin Li) Date: Mon, 13 Nov 2023 16:42:12 GMT Subject: RFR: 8319781: RISC-V: Refactor UseRVV related checks [v3] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 15:25:31 GMT, Fei Yang wrote: >> I'm not sure, as there are some code in share/classfile/vmIntrinsics.cpp which depends on SpecialEncodeISOArray's value: >> >> case vmIntrinsics::_encodeISOArray: >> case vmIntrinsics::_encodeAsciiArray: >> case vmIntrinsics::_encodeByteISOArray: >> if (!SpecialEncodeISOArray) return true; >> break; > > The reason is that `SpecialEncodeISOArray` is a DIAGNOSTIC option which I think is for debugging / trouble-shooting purpose. The other four ones are `SpecialStringCompareTo`, `SpecialStringIndexOf`, `SpecialStringEquals` and `SpecialArraysEquals`. So this code snippet checking for `SpecialEncodeISOArray` is in function `vmIntrinsics::disabled_by_jvm_flags`. So I think we are safe to have this change. In fact, `SpecialEncodeISOArray` and the others are not checked or modified by other platforms either. It makes sense, Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16580#discussion_r1391368835 From cslucas at openjdk.org Mon Nov 13 17:02:28 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 13 Nov 2023 17:02:28 GMT Subject: RFR: JDK-8316991: Reduce nullable allocation merges [v5] In-Reply-To: References: Message-ID: > ### Description > > Many, if not most, allocation merges (Phis) are nullable because they join object allocations with "NULL", or objects returned from method calls, etc. Please review this Pull Request that improves Reduce Allocation Merge implementation so that it can reduce at least some of these allocation merges. > > Overall, the improvements are related to 1) making rematerialization of merges able to represent "NULL" objects, and 2) being able to reduce merges used by CmpP/N and CastPP. > > The approach to reducing CmpP/N and CastPP is pretty similar to that used in the `MemNode::split_through_phi` method: a clone of the node being split is added on each input of the Phi. I make use of `optimize_ptr_compare` and some type information to remove redundant CmpP and CastPP nodes. I added a bunch of ASCII diagrams illustrating what some of the more important methods are doing. > > ### Benchmarking > > **Note:** In some of these tests no reduction happens. I left them in to validate that no perf. regression happens in that case. > **Note 2:** Marging of error was negligible. > > | Benchmark | No RAM (ms/op) | Yes RAM (ms/op) | > |--------------------------------------|------------------|-------------------| > | TestTrapAfterMerge | 19.515 | 13.386 | > | TestArgEscape | 33.165 | 33.254 | > | TestCallTwoSide | 70.547 | 69.427 | > | TestCmpAfterMerge | 16.400 | 2.984 | > | TestCmpMergeWithNull_Second | 27.204 | 27.293 | > | TestCmpMergeWithNull | 8.248 | 4.920 | > | TestCondAfterMergeWithAllocate | 12.890 | 5.252 | > | TestCondAfterMergeWithNull | 6.265 | 5.078 | > | TestCondLoadAfterMerge | 12.713 | 5.163 | > | TestConsecutiveSimpleMerge | 30.863 | 4.068 | > | TestDoubleIfElseMerge | 16.069 | 2.444 | > | TestEscapeInCallAfterMerge | 23.111 | 22.924 | > | TestGlobalEscape | 14.459 | 14.425 | > | TestIfElseInLoop | 246.061 | 42.786 | > | TestLoadAfterLoopAlias | 45.808 | 45.812 | > | TestLoadAfterTrap | 28.370 | ... Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Update test/micro/org/openjdk/bench/vm/compiler/AllocationMerges.java Co-authored-by: Andrey Turbanov ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15825/files - new: https://git.openjdk.org/jdk/pull/15825/files/ad6b9d1a..97b0fb71 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15825&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15825&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15825.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15825/head:pull/15825 PR: https://git.openjdk.org/jdk/pull/15825 From redestad at openjdk.org Mon Nov 13 17:11:00 2023 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 13 Nov 2023 17:11:00 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v5] In-Reply-To: References: Message-ID: On Tue, 7 Nov 2023 20:22:13 GMT, Aleksey Shipilev wrote: >> See the symptoms, reproducer and analysis in the bug. >> >> Current code waits on `disarm()`, which effectively stalls leaving the safepoint if some threads lag behind. Having more runnable threads than CPUs nearly guarantees that we would wait for quite some time, but it also reproduces well if you have enough threads near the CPU count. Just waiting at `arm()` is insufficient, but we can have several `Semaphores` to do what we want. >> >> This PR implements a more efficient `GenericWaitBarrier` to recover the performance. Most of the implementation discussion is in the code comments. The key observation that drives this work is that we want to reuse `Semaphore` and related counters without being stuck waiting for threads to leave. >> >> (AFAICS, futex-based `LinuxWaitBarrier` does roughly the same, but handles this reuse on futex side, by assigning the "address" per futex.) >> >> This issue affects everything except Linux. I initially found this on my M1 Mac, but pretty sure it is easy to reproduce on Windows as well. The safepoints from the reproducer in the bug improved dramatically on a Mac. Note not only the orders of magnitude better safepoint times, but also the several times more GC safepoints in the time-bound allocation test, which means the attainable GC throughput is similarly better, since we don't waste time at this wait barrier. >> >> ![plot-generic-wait-barrier-macos](https://github.com/openjdk/jdk/assets/1858943/3440f200-2360-482a-b8b2-71385cca35b7) >> >> Additional testing: >> - [x] MacOS AArch64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] MacOS AArch64 server fastdebug, `tier2 tier3` >> - [x] Linux x86_64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Rework paddings Ok, I'll submit some runs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1808603193 From duke at openjdk.org Mon Nov 13 17:34:10 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Mon, 13 Nov 2023 17:34:10 GMT Subject: RFR: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v2] In-Reply-To: References: Message-ID: <9i5yHmpRi3-XqL5lw0-0IexhCDr2FOi5nT4dgY7cWao=.ab8a1d6e-c9fc-4108-820b-374ce7815463@github.com> > Hello All, > > Please review these changes to support _vectorizedHashCode intrinsic on > RISC-V platform. The patch adds the "scalar" code for the intrinsic without > usage of any RVV instruction but provides manual unrolling of the appropriate > loop. The code with usage of RVV instruction could be added as follow-up of > the patch or independently. > > Thanks, > -Yuri Gaevsky > > P.S. My OCA has been accepted recently (ygaevsky). > > ### Correctness checks > > Testing: tier1 tests successfully passed on a RISC-V StarFive JH7110 board with Linux. > > ### Performance results (the numbers for non-ints are similar) > > #### StarFive JH7110 board: > > > ArraysHashCode: without intrinsic with intrinsic > ------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > ------------------------------------------------------------------------------- > multiints 0 avgt 30 2.658 ? 0.001 2.661 ? 0.004 ns/op > multiints 1 avgt 30 4.881 ? 0.011 4.892 ? 0.015 ns/op > multiints 2 avgt 30 16.109 ? 0.041 10.451 ? 0.075 ns/op > multiints 3 avgt 30 14.873 ? 0.068 11.753 ? 0.024 ns/op > multiints 4 avgt 30 17.283 ? 0.078 13.176 ? 0.044 ns/op > multiints 5 avgt 30 19.691 ? 0.136 14.723 ? 0.046 ns/op > multiints 6 avgt 30 21.727 ? 0.166 15.463 ? 0.124 ns/op > multiints 7 avgt 30 23.790 ? 0.126 18.298 ? 0.059 ns/op > multiints 8 avgt 30 23.527 ? 0.116 18.267 ? 0.046 ns/op > multiints 9 avgt 30 27.981 ? 0.303 20.453 ? 0.069 ns/op > multiints 10 avgt 30 26.947 ? 0.215 20.541 ? 0.051 ns/op > multiints 50 avgt 30 95.373 ? 0.588 69.238 ? 0.208 ns/op > multiints 100 avgt 30 177.109 ? 0.525 137.852 ? 0.417 ns/op > multiints 200 avgt 30 341.074 ? 1.363 296.832 ? 0.725 ns/op > multiints 500 avgt 30 847.993 ? 1.713 752.415 ? 1.918 ns/op > multiints 1000 avgt 30 1610.199 ? 5.424 1426.112 ? 3.407 ns/op > multiints 10000 avgt 30 16234.260 ? 26.789 14447.936 ? 26.345 ns/op > multiints 100000 avgt 30 170726.025 ? 184.003 152587.649 ? 381.964 ns/op > ------------------------------------------------------------------------------- > > #### T-Head RVB-ICE board: > > > ArraysHashCode: ... Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: Minor cosmetic fixes. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16629/files - new: https://git.openjdk.org/jdk/pull/16629/files/c299fda9..daae9961 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16629&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16629&range=00-01 Stats: 5 lines in 1 file changed: 2 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16629.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16629/head:pull/16629 PR: https://git.openjdk.org/jdk/pull/16629 From dcubed at openjdk.org Mon Nov 13 17:39:05 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 13 Nov 2023 17:39:05 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v10] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 09:00:31 GMT, Stefan Karlsson wrote: >> A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Tweak test What CI testing has been done with this PR? I saw that some was planned, but I don't see the actual tiers executed has been mentioned... ------------- PR Comment: https://git.openjdk.org/jdk/pull/16519#issuecomment-1808652592 From ccheung at openjdk.org Mon Nov 13 18:01:57 2023 From: ccheung at openjdk.org (Calvin Cheung) Date: Mon, 13 Nov 2023 18:01:57 GMT Subject: RFR: 8319944: Remove DynamicDumpSharedSpaces [v2] In-Reply-To: References: Message-ID: <5UjwpMCJ56LiMJW8w1V-_aBJwZpUM9_ypsan8aoBe9M=.15af5572-21b8-4357-bbe2-483306837478@github.com> On Mon, 13 Nov 2023 05:53:14 GMT, Ioi Lam wrote: >> Please review this cleanup. Most of the changes are the following patterns: >> >> - `if (DumpSharedSpaces)` => `if (CDSConfig::is_dumping_dynamic_archive())` >> - `DumpSharedSpaces = true` => `CDSConfig::enable_dumping_dynamic_archive()` >> - `DumpSharedSpaces = false` => `CDSConfig::disable_dumping_dynamic_archive()` > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > fixed typo Looks good. ------------- Marked as reviewed by ccheung (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16626#pullrequestreview-1727906921 From matsaave at openjdk.org Mon Nov 13 18:01:58 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Mon, 13 Nov 2023 18:01:58 GMT Subject: RFR: 8319944: Remove DynamicDumpSharedSpaces [v2] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 05:53:14 GMT, Ioi Lam wrote: >> Please review this cleanup. Most of the changes are the following patterns: >> >> - `if (DumpSharedSpaces)` => `if (CDSConfig::is_dumping_dynamic_archive())` >> - `DumpSharedSpaces = true` => `CDSConfig::enable_dumping_dynamic_archive()` >> - `DumpSharedSpaces = false` => `CDSConfig::disable_dumping_dynamic_archive()` > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > fixed typo LGTM! ------------- Marked as reviewed by matsaave (Committer). PR Review: https://git.openjdk.org/jdk/pull/16626#pullrequestreview-1727909730 From iklam at openjdk.org Mon Nov 13 18:12:14 2023 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 13 Nov 2023 18:12:14 GMT Subject: RFR: 8319944: Remove DynamicDumpSharedSpaces [v2] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 06:08:14 GMT, David Holmes wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed typo > > Looks good. > > Thanks Thanks @dholmes-ora @calvinccheung @matias9927 for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16626#issuecomment-1808729839 From iklam at openjdk.org Mon Nov 13 18:12:15 2023 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 13 Nov 2023 18:12:15 GMT Subject: Integrated: 8319944: Remove DynamicDumpSharedSpaces In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 04:52:57 GMT, Ioi Lam wrote: > Please review this cleanup. Most of the changes are the following patterns: > > - `if (DumpSharedSpaces)` => `if (CDSConfig::is_dumping_dynamic_archive())` > - `DumpSharedSpaces = true` => `CDSConfig::enable_dumping_dynamic_archive()` > - `DumpSharedSpaces = false` => `CDSConfig::disable_dumping_dynamic_archive()` This pull request has now been integrated. Changeset: 115b0744 Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/115b0744c6ba8d990eef5a31d64d6a184182c754 Stats: 60 lines in 15 files changed: 10 ins; 8 del; 42 mod 8319944: Remove DynamicDumpSharedSpaces Reviewed-by: dholmes, ccheung, matsaave ------------- PR: https://git.openjdk.org/jdk/pull/16626 From matsaave at openjdk.org Mon Nov 13 18:14:36 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Mon, 13 Nov 2023 18:14:36 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v11] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. > > This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. > > Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. > > To streamline the review, please consider these major areas that have been changed: > 1. ResolvedMethodEntry class > 2. Rewriter for initialization of the structure > 3. cpCache for resolution > 4. InterpreterRuntime, linkResolver, and templateTable > 5. JVMCI > 6. SA > > Verified with tier 1-9 tests. > > This change supports the following platforms: x86, aarch64, RISCV, PPC Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: RISCV update 2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15455/files - new: https://git.openjdk.org/jdk/pull/15455/files/ea067795..dded6de4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15455&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15455&range=09-10 Stats: 13 lines in 1 file changed: 4 ins; 6 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/15455.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15455/head:pull/15455 PR: https://git.openjdk.org/jdk/pull/15455 From shade at openjdk.org Mon Nov 13 19:02:58 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 13 Nov 2023 19:02:58 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v5] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 17:08:08 GMT, Claes Redestad wrote: > Ok, I'll submit some runs. Thanks! Note that in current patch Linux implementation is switched for generic one to simplify functional testing. I expect Linux benchmarks to regress, as it is probably hard to win against futex implementation. For the actual change, we are interested in what Windows and MacOS benchmarks would show. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1808827404 From stefank at openjdk.org Mon Nov 13 19:37:54 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 13 Nov 2023 19:37:54 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v10] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 09:00:31 GMT, Stefan Karlsson wrote: >> A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Tweak test So far I've been running tier1-7 (linux-x64 only) with LockingMode set to LM_LIGHTWEIGHT in globals.hpp. I'm currently running tier8. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16519#issuecomment-1808895108 From redestad at openjdk.org Mon Nov 13 20:16:46 2023 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 13 Nov 2023 20:16:46 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v5] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 19:00:43 GMT, Aleksey Shipilev wrote: > > Ok, I'll submit some runs. > > Thanks! Note that in current patch Linux implementation is switched for generic one to simplify functional testing. I expect Linux benchmarks to regress, as it is probably hard to win against futex implementation. For the actual change, we are interested in what Windows and MacOS benchmarks would show. OK, I had already submitted a set across all platforms, but the results on linux will serve as a good check up on the generic vs futex impl. Do you intend to switch back before integration or is the intent to integrate and evaluate if it's on par then make a go/no-go decision later? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1808996691 From rriggs at openjdk.org Mon Nov 13 20:25:29 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Mon, 13 Nov 2023 20:25:29 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v2] In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: On Thu, 9 Nov 2023 04:16:25 GMT, Roger Riggs wrote: >> Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. >> The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. >> >> The changes include: >> >> - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate >> if the input array is modified before the constructor returns. >> The resulting string may contain any combination of characters sampled from the input array. >> >> - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. >> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. >> If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. >> If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. >> Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. >> Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. >> >> - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. >> >> - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. > > Roger Riggs has updated the pull request incrementally with three additional commits since the last revision: > > - Refactored extractCodePoints to avoid multiple resizes if the array was modified > - Replaced isLatin1 implementation with `getChar(buf, ndx) <= 0xff` > It performs better than the single byte array access by avoiding the bounds check. > - Misc updates for review comments, javadoc cleanup > Extra checking on maximum string lengths when calling toBytes(). Contributed update to s390.ad. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16425#issuecomment-1809008222 From kvn at openjdk.org Mon Nov 13 20:29:57 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 13 Nov 2023 20:29:57 GMT Subject: RFR: 8319406: x86: Shorter movptr(reg, imm) for 32-bit immediates [v4] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 09:30:19 GMT, Aleksey Shipilev wrote: >> Noticed this while doing C1 work, but the issue is more generic. If you look into x86 code, then sometimes you'll notice `movabs` with small immediates on x86. That `movabs` actually carries the full-blown 64-bit immediate. >> >> Similar to [JDK-8255838](https://bugs.openjdk.org/browse/JDK-8255838), it would be useful to shorten movptr(reg, imm) when immediate fits in 32 bits. This would compact some code, notably the code in C1 profiling ([JDK-8315843](https://bugs.openjdk.org/browse/JDK-8315843)), but also other code, generically. >> >> For example, sample branch profiling hunk from C1 tier3 on x86_64: >> >> >> Before: >> 0x00007f269065ed02: test %edx,%edx >> 0x00007f269065ed04: movabs $0x7f260a4ddd68,%rax ; {metadata(method data for {method} ? >> 0x00007f269065ed0e: movabs $0x138,%rsi >> ? 0x00007f269065ed18: je 0x00007f269065ed24 >> ? 0x00007f269065ed1a: movabs $0x148,%rsi >> ? 0x00007f269065ed24: mov (%rax,%rsi,1),%rdi >> 0x00007f269065ed28: lea 0x1(%rdi),%rdi >> 0x00007f269065ed2c: mov %rdi,(%rax,%rsi,1) >> 0x00007f269065ed30: je 0x00007f269065ed4e >> >> After: >> 0x00007f1370dcd782: test %edx,%edx >> 0x00007f1370dcd784: movabs $0x7f12f64ddd68,%rax ; {metadata(method data for {method} ? >> 0x00007f1370dcd78e: mov $0x138,%esi >> ? 0x00007f1370dcd793: je 0x00007f1370dcd79a >> ? 0x00007f1370dcd795: mov $0x148,%esi >> ? 0x00007f1370dcd79a: mov (%rax,%rsi,1),%rdi >> 0x00007f1370dcd79e: lea 0x1(%rdi),%rdi >> 0x00007f1370dcd7a2: mov %rdi,(%rax,%rsi,1) >> 0x00007f1370dcd7a6: je 0x00007f1370dcd7c4 >> >> >> We can use a shorter 32-bit immediate moves. In the hunk above, this saves about 8 bytes. >> >> This is not limited to the profiling code. There is observable code space savings on larger tests in C2, e.g. on `-Xcomp -XX:TieredStopAtLevel=... HelloWorld`. >> >> >> # Before >> nmethod code size : 430328 bytes >> nmethod code size : 467032 bytes >> nmethod code size : 908936 bytes >> nmethod code size : 1267816 bytes >> >> # After >> nmethod code size : 429616 bytes (-0.1%) >> nmethod code size : 466344 bytes (-0.1%) >> nmethod code size : 897144 bytes (-1.3%) >> nmethod code size : 1256216 bytes (-0.9%) >> >> >> There are two wrinkles: >> 1. Current `movslq(Register, int32_t)` is broken and protected by `ShouldNotReachHere()`. I would have used it in... > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Remove the movslq declaration as well > - Merge branch 'master' into JDK-8319406-shorter-movptr-32 > - Enlighs > - Remove new imm64 method completely, inline at use > - Easy review feedback > - Merge branch 'master' into JDK-8319406-shorter-movptr-32 > - Fix Looks good to me too. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16497#pullrequestreview-1728218275 From rriggs at openjdk.org Mon Nov 13 20:42:00 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Mon, 13 Nov 2023 20:42:00 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v3] In-Reply-To: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: > Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. > The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. > > The changes include: > > - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate > if the input array is modified before the constructor returns. > The resulting string may contain any combination of characters sampled from the input array. > > - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. > For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. > If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. > If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. > Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. > Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. > > - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. > > - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. Roger Riggs has updated the pull request incrementally with two additional commits since the last revision: - code and doc cleanup in StringRacyConstructor test - Update of string_compress for the s390 port to return the index of the non-latin1 char. Contributed by Amit Kumar. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16425/files - new: https://git.openjdk.org/jdk/pull/16425/files/ad73a2a6..f6080595 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16425&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16425&range=01-02 Stats: 11 lines in 2 files changed: 0 ins; 1 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/16425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16425/head:pull/16425 PR: https://git.openjdk.org/jdk/pull/16425 From macarte at openjdk.org Mon Nov 13 21:01:43 2023 From: macarte at openjdk.org (Mat Carter) Date: Mon, 13 Nov 2023 21:01:43 GMT Subject: Integrated: 8317562: [JFR] Compilation queue statistics In-Reply-To: References: Message-ID: On Tue, 17 Oct 2023 00:35:54 GMT, Mat Carter wrote: > Adding a new periodic jfr event to monitor and output statistics for the compiler queues. You will see one event per compiler queue (c1 and c2) > > Passes tier1 on linux (x86) and mac (aarch64) This pull request has now been integrated. Changeset: d9920334 Author: Mat Carter Committer: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/d992033439073d35877a2c0296fbd01ad5cbcb07 Stats: 249 lines in 10 files changed: 249 ins; 0 del; 0 mod 8317562: [JFR] Compilation queue statistics Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/16211 From dholmes at openjdk.org Mon Nov 13 21:35:28 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 13 Nov 2023 21:35:28 GMT Subject: RFR: 8319777: Zero: Support 8-byte cmpxchg In-Reply-To: References: Message-ID: <04inV6abqW4etamQXMc3QSFAH77roMCCYQYP7dZ4b2s=.70053ee1-491e-44b1-af1c-e80cf0d2f3d3@github.com> On Mon, 13 Nov 2023 14:25:28 GMT, Aleksey Shipilev wrote: >> src/hotspot/cpu/zero/globalDefinitions_zero.hpp line 30: >> >>> 28: >>> 29: // Unconditionally supports 8-byte cmpxchg either with >>> 30: // compiler intrinsics or with library/kernel helpers. >> >> That's not what "native support for cx8" was meant to mean though - e.g. see the ARM header - it only sets this when building for ARMv7. >> >> You can just leave this file alone and simply set `_supports_cx8` below to achieve the same goal. And that will fit in cleaner with the changes I am making. > > Well, yes, we can just do `_supports_cx8 = true`. > > But I am confused by the meaning of `SUPPORTS_NATIVE_CX8`. What is it? I read it as "we know statically, at compile time, that the target platform supports CX8". Otherwise, we poll it at runtime and let the runtime code decide by checking `VMVersion::supports_cx8()`. Defining `SUPPORTS_NATIVE_CX8` compiles out access backend locking paths completely, for example, without resorting to runtime checks. > > What I am missing? Is the wording for the comment misleading? Yeah it is something that can be read two ways and the current code is confused about it. I take it to mean there is actual native ISA support, versus there is some way of achieving the same effect. That is the way the ARM code uses it: if you build for ARMv7 then` SUPPORTS_NATIVE_CX8` is defined, otherwise runtime checks exist for `ldrex` or `kuser_helper` support. Other platforms confuse things somewhat. Here's the definition of `supports_cx8()`: static bool supports_cx8() { #ifdef SUPPORTS_NATIVE_CX8 return true; #else return _supports_cx8; #endif } So if you define `SUPPORTS_NATIVE_CX8` then `supports_cx8()` is always true - no runtime checks involved, no read of `_supports_cx8`. You've indicated that you have built a binary to only run where there is native CX8 support. Otherwise you should use runtime checks to set `_supports_cx8` as appropriate to control what `supports_cx8()` returns. Setting both is redundant/pointless and existing code is very confused about this. Take a look at x86 for example, it both defines SUPPORTS_NATIVE_CX8 and has ` _supports_cx8 = supports_cmpxchg8();` - but the latter is dead code as nothing will ever read it. So for me Zero was correct to only set `SUPPORTS_NATIVE_CX8` for 64-bit, but what it failed to do was set `_supports_cx8` on 32-bit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16614#discussion_r1391728740 From redestad at openjdk.org Mon Nov 13 21:54:30 2023 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 13 Nov 2023 21:54:30 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v5] In-Reply-To: References: Message-ID: On Tue, 7 Nov 2023 20:22:13 GMT, Aleksey Shipilev wrote: >> See the symptoms, reproducer and analysis in the bug. >> >> Current code waits on `disarm()`, which effectively stalls leaving the safepoint if some threads lag behind. Having more runnable threads than CPUs nearly guarantees that we would wait for quite some time, but it also reproduces well if you have enough threads near the CPU count. Just waiting at `arm()` is insufficient, but we can have several `Semaphores` to do what we want. >> >> This PR implements a more efficient `GenericWaitBarrier` to recover the performance. Most of the implementation discussion is in the code comments. The key observation that drives this work is that we want to reuse `Semaphore` and related counters without being stuck waiting for threads to leave. >> >> (AFAICS, futex-based `LinuxWaitBarrier` does roughly the same, but handles this reuse on futex side, by assigning the "address" per futex.) >> >> This issue affects everything except Linux. I initially found this on my M1 Mac, but pretty sure it is easy to reproduce on Windows as well. The safepoints from the reproducer in the bug improved dramatically on a Mac. Note not only the orders of magnitude better safepoint times, but also the several times more GC safepoints in the time-bound allocation test, which means the attainable GC throughput is similarly better, since we don't waste time at this wait barrier. >> >> ![plot-generic-wait-barrier-macos](https://github.com/openjdk/jdk/assets/1858943/3440f200-2360-482a-b8b2-71385cca35b7) >> >> Additional testing: >> - [x] MacOS AArch64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] MacOS AArch64 server fastdebug, `tier2 tier3` >> - [x] Linux x86_64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Rework paddings Most quick benchmarks are done, but the bulk will take the night to complete. Most results so far are in the noise, though I see a few minor Windows-only regressions that raise suspicion in aggregate: Most of the `org.openjdk.bench.javax.crypto.small.SignatureBench.RSA` and `org.openjdk.bench.javax.crypto.small.RSABench` microbenchmarks regress (0.3-0.8%), then a few J2DBench sub-benchmarks that regress 0.8-1.5% across, similarly across all tested GC combinations (parallel, G1, ZGC). None of these look like integration blockers on their own, but if they are real it'd be nice to get on top of them early. I'll have to re-run and diagnose a bit further to say for sure that this isn't a red herring (or caused by something sneaking into my baseline build). Perhaps you can check if the RSA micro regressions are reproducible on your end? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1809187098 From jiangli at openjdk.org Mon Nov 13 22:01:10 2023 From: jiangli at openjdk.org (Jiangli Zhou) Date: Mon, 13 Nov 2023 22:01:10 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is create for one JavaThread associated with attached native thread Message-ID: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Please review JvmtiThreadState::state_for_while_locked change to handle the state->get_thread_oop() null case. Please see https://bugs.openjdk.org/browse/JDK-8319935 for details. ------------- Commit messages: - 8319935: Ensure only one JvmtiThreadState is create for one JavaThread associated with attached native thread Changes: https://git.openjdk.org/jdk/pull/16642/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16642&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319935 Stats: 14 lines in 1 file changed: 9 ins; 4 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16642.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16642/head:pull/16642 PR: https://git.openjdk.org/jdk/pull/16642 From dcubed at openjdk.org Mon Nov 13 22:18:27 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 13 Nov 2023 22:18:27 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is create for one JavaThread associated with attached native thread In-Reply-To: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Mon, 13 Nov 2023 21:52:22 GMT, Jiangli Zhou wrote: > Please review JvmtiThreadState::state_for_while_locked change to handle the state->get_thread_oop() null case. Please see https://bugs.openjdk.org/browse/JDK-8319935 for details. @jianglizhou - I fixed a typo in the bug's synopsis line. Change this PR's title: s/is create/is created/ ------------- PR Comment: https://git.openjdk.org/jdk/pull/16642#issuecomment-1809221878 From dholmes at openjdk.org Mon Nov 13 22:28:27 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 13 Nov 2023 22:28:27 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v3] In-Reply-To: References: Message-ID: <_a3NEb-EFfx_NSuwCSv3EV_FvYcjCMnrguOK1JMNV4Q=.61575855-0c1f-4db1-9499-e9e176a3b8a7@github.com> On Mon, 13 Nov 2023 09:20:26 GMT, Axel Boldt-Christmas wrote: >> Though rather than walk the lockstack twice can't we just change the check below to something like: >> >> if (VM_Version::supports_recursive_lightweight_locking() && i != j - 1) { >> assert(_base[i] != _base[j], "entries must be unique: %s", msg); >> } > > Not sure I understand. > > The change here is that [for every object A, there exists no other object B such that A == B] is changed to [for every contiguous run of object A, there exists no other object B outside that run such that A == B]. Both of these checks are `O(n^2)` in the worst case, and the second is is `O(n)` in the best case. > > I think you have describe the algorithm you envision. Or maybe you want to change what property we are asserting. I thought the issue was that without recursion support we must never find an A and B such that A == B. But with recursion support we can allow A == B if they are adjacent in the lock-stock. So that is what the check is performing, but apparently in two loops, where I was suggesting it can perhaps be done in a single loop. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1391770153 From dholmes at openjdk.org Mon Nov 13 22:46:26 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 13 Nov 2023 22:46:26 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v3] In-Reply-To: References: <-lwt39Gx_QJfxgzgSLHkysdtOrVxgP8dFh7gN4TDkmY=.86139caf-08c2-484f-999f-fa6cf121f9df@github.com> Message-ID: On Mon, 13 Nov 2023 08:52:41 GMT, Axel Boldt-Christmas wrote: >> I'm also unclear on the rationale, and again on checking for a full-stack upfront like this, when it should be a rare case. If recursion support means the lockStack is no longer big enough then we need to increase its size to accommodate that. > >> What is the rationale behind this block? Is it beneficial to inflate the top-most lock to make room for the new one, because that might be hotter? If so, then it may be even more useful to inflate the bottom-most entry instead? > > The current implementation inflates the bottom (least recently added) entry. > > The rational is that because the emitted code always goes into the runtime for monitorenter if the lock stack is full, we need to inflate at least one object on the lock stack to not get into a scenario where we are constantly going into the runtime because we are in some deeply nested critical sections entering and exiting in a loop with the lock stack full. > > I've also have versions of this which goes through the lock stack, and first inflates the already inflated objects, and only inflate a not inflated object if the lock stack is still full. > > As for inflating the bottom instead of the top. I am unsure what would be best. The idea behind the bottom is that it is furthest away from the current running code, and in case the top is in a loop with different objects every time it would cause a lot of inflation. But it could obviously also be that the stack is in a loop and the bottom most object is different every time while the top is the same. > I can't say that I have seen programs with this either of this behaviour. Both can have equally bad worst case programs (with respect to number of inflations) but my gut feeling is that the worst case is less likely when inflating the bottom. > >> If recursion support means the lockStack is no longer big enough then we need to increase its size to accommodate that. > > I have not seen it being a problem, but it would be worth looking for programs where this could be an issue and evaluate increasing the lock stack size. Regardless of the capacity, if (and when) the lock stack gets full it needs to be handled in some way. > >> I'm also unclear on the rationale, and again on checking for a full-stack upfront like this, when it should be a rare case. > > The check for a full lock stack is always performed in every codepath, emitted C2, emitted shared and the runtime. > > This only adds an escape hatch for the degenerate behaviour we could arrive in. I would expect that we will encounter a full lockstack, of the current size 4, much more often with recursion support, and so we probably should increase it. But the handling of a full stack should still come last IMO. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1391782328 From omikhaltcova at openjdk.org Mon Nov 13 22:57:10 2023 From: omikhaltcova at openjdk.org (Olga Mikhaltsova) Date: Mon, 13 Nov 2023 22:57:10 GMT Subject: RFR: 8318158: RISC-V: implement roundD/roundF intrinsics [v3] In-Reply-To: References: Message-ID: <8w_N32xU_iH-wqThzDciiSUNfCDOd0hrtfUz4o_475k=.decd8294-9c52-4440-8956-5ee048bf02ca@github.com> > Please, review this Implementation of the roundD/roundF intrinsics for RISC-V platform. > > In the table below it is shown that NaN argument should be processed as a special case. > > RISC-V Java > (FCVT.W.S) (FCVT.L.D) (long round(double a)) (int round(float a)) > Minimum valid input (after rounding) ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE > Maximum valid input (after rounding) 2^31 ? 1 2^63 ? 1 Long.MAX_VALUE Integer.MAX_VALUE > Output for out-of-range negative input ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE > Output for ?? ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE > Output for out-of-range positive input 2^31 ? 1 2^63 - 1 Long.MAX_VALUE Integer.MAX_VALUE > Output for +? 2^31 ? 1 2^63 - 1 Long.MAX_VALUE Integer.MAX_VALUE > Output for NaN 2^31 ? 1 2^63 - 1 0 0 > > The benchmark running with the 2nd fixed implementation on the T-Head RVB-ICE board shows the following performance improvement:: > > **Before** > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.test_round_double 2048 thrpt 15 59.555 0.179 ops/ms > FpRoundingBenchmark.test_round_float 2048 thrpt 15 49.760 0.103 ops/ms > > > **After** > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.test_round_double 2048 thrpt 15 110.956 0.186 ops/ms > FpRoundingBenchmark.test_round_float 2048 thrpt 15 115.947 0.122 ops/ms Olga Mikhaltsova has updated the pull request incrementally with one additional commit since the last revision: Set RoundingMode::rdn for fadd_s/fadd_d ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16382/files - new: https://git.openjdk.org/jdk/pull/16382/files/13a65e48..ee6fd671 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16382&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16382&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16382.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16382/head:pull/16382 PR: https://git.openjdk.org/jdk/pull/16382 From omikhaltcova at openjdk.org Mon Nov 13 23:04:27 2023 From: omikhaltcova at openjdk.org (Olga Mikhaltsova) Date: Mon, 13 Nov 2023 23:04:27 GMT Subject: RFR: 8318158: RISC-V: implement roundD/roundF intrinsics [v2] In-Reply-To: <1o0ZvsGehXm52tpvB1okWb0OKM1R7B-dJZpLXRC-oA0=.a4f3ad9e-26c8-4cfe-b167-4954230045dc@github.com> References: <5i-hk8Vm8mfogwTT8eQv9PV41MGRZ0P8JkoogXyzovY=.b305d21d-934c-4ce7-9206-6bd32e926b42@github.com> <1o0ZvsGehXm52tpvB1okWb0OKM1R7B-dJZpLXRC-oA0=.a4f3ad9e-26c8-4cfe-b167-4954230045dc@github.com> Message-ID: <0OLIgmsu48B6zYK0VC0XMY7pase4MMPvioHt_Pu-Y0U=.552de390-cfbc-45ad-ad00-c4e172b970b2@github.com> On Mon, 13 Nov 2023 09:33:02 GMT, Andrew Haley wrote: >> I have consulted with our h/w team and they told me next: >> multi-issue FP Unit can process few (data-independent) fp instructions at a time, even if they have different rounding mode. >> The only issue is when the rounding mode is set to dynamic rounding (aka get it from csr), but it's not our case here > >> I have consulted with our h/w team and they told me next: multi-issue FP Unit can process few (data-independent) fp instructions at a time, even if they have different rounding mode. The only issue is when the rounding mode is set to dynamic rounding (aka get it from csr), but it's not our case here > > OK. Do what you will, but you're coding for an architecture not an implementation, and this code may stand for many years. First I've tried to implement rounding on riscv similar to aarch64, separated negative and positive numbers but branching is too expensive and I got not much performance improvement against the current java implementation. I like Vladimir's idea, the algorithm gives significant performance improvement and why not to take this advantage. But the above mentioned fix, related to fadd_s(), should be made. Not all numbers are processed correctly without it. Having changed rounding mode of fadd_s() we've got two sequential instructions with the same rounding mode - rdn. The performance improvement with the last fix on the T-Head RVB-ICE board is as follow: Benchmark (TESTSIZE) Mode Cnt Score Error Units FpRoundingBenchmark.test_round_double 2048 thrpt 15 111.278 0.349 ops/ms FpRoundingBenchmark.test_round_float 2048 thrpt 15 115.776 0.323 ops/ms @theRealAph As you suggested I've also tested this algorithm on the full 32-bit range [0; 0xFFFFFFFF] using Float.intBitsToFloat(x) and all the numbers were processed correctly. The output of this algorithm is equal to the current java Math.round() implementation output. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16382#discussion_r1391794054 From jiangli at openjdk.org Mon Nov 13 23:12:27 2023 From: jiangli at openjdk.org (Jiangli Zhou) Date: Mon, 13 Nov 2023 23:12:27 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is create for one JavaThread associated with attached native thread In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Mon, 13 Nov 2023 22:15:18 GMT, Daniel D. Daugherty wrote: > @jianglizhou - I fixed a typo in the bug's synopsis line. Change this PR's title: s/is create/is created/ Thanks, @dcubed-ojdk! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16642#issuecomment-1809279446 From omikhaltcova at openjdk.org Mon Nov 13 23:34:06 2023 From: omikhaltcova at openjdk.org (Olga Mikhaltsova) Date: Mon, 13 Nov 2023 23:34:06 GMT Subject: RFR: 8318158: RISC-V: implement roundD/roundF intrinsics [v4] In-Reply-To: References: Message-ID: > Please, review this Implementation of the roundD/roundF intrinsics for RISC-V platform. > > In the table below it is shown that NaN argument should be processed as a special case. > > RISC-V Java > (FCVT.W.S) (FCVT.L.D) (long round(double a)) (int round(float a)) > Minimum valid input (after rounding) ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE > Maximum valid input (after rounding) 2^31 ? 1 2^63 ? 1 Long.MAX_VALUE Integer.MAX_VALUE > Output for out-of-range negative input ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE > Output for ?? ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE > Output for out-of-range positive input 2^31 ? 1 2^63 - 1 Long.MAX_VALUE Integer.MAX_VALUE > Output for +? 2^31 ? 1 2^63 - 1 Long.MAX_VALUE Integer.MAX_VALUE > Output for NaN 2^31 ? 1 2^63 - 1 0 0 > > The benchmark running with the 2nd fixed implementation on the T-Head RVB-ICE board shows the following performance improvement:: > > **Before** > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.test_round_double 2048 thrpt 15 59.555 0.179 ops/ms > FpRoundingBenchmark.test_round_float 2048 thrpt 15 49.760 0.103 ops/ms > > > **After** > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.test_round_double 2048 thrpt 15 110.956 0.186 ops/ms > FpRoundingBenchmark.test_round_float 2048 thrpt 15 115.947 0.122 ops/ms Olga Mikhaltsova has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into jdk-8318158 - Set RoundingMode::rdn for fadd_s/fadd_d - Fixed intrinsics implementation. Reverted changes of FCVT_SAFE. - 8318158: RISC-V: implement roundD/roundF intrinsics ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16382/files - new: https://git.openjdk.org/jdk/pull/16382/files/ee6fd671..3e40b550 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16382&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16382&range=02-03 Stats: 665196 lines in 2673 files changed: 109010 ins; 482737 del; 73449 mod Patch: https://git.openjdk.org/jdk/pull/16382.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16382/head:pull/16382 PR: https://git.openjdk.org/jdk/pull/16382 From manc at openjdk.org Mon Nov 13 23:39:29 2023 From: manc at openjdk.org (Man Cao) Date: Mon, 13 Nov 2023 23:39:29 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is create for one JavaThread associated with attached native thread In-Reply-To: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Mon, 13 Nov 2023 21:52:22 GMT, Jiangli Zhou wrote: > Please review JvmtiThreadState::state_for_while_locked change to handle the state->get_thread_oop() null case. Please see https://bugs.openjdk.org/browse/JDK-8319935 for details. Changes requested by manc (Committer). src/hotspot/share/prims/jvmtiThreadState.inline.hpp line 94: > 92: // The state->get_thread_oop() may be null if the state is created during > 93: // the allocation of the thread oop when a native thread is attaching. Make > 94: // sure we don't create a new state for the JavaThread. I think it is important to maintain `JvmtiThreadState::_thread_oop_h` correctly for the attached native thread. In the existing logic, with and without the proposed change, `JvmtiThreadState::_thread_oop_h` could stay null for an attached native thread, which seems wrong. It may be OK since `JvmtiThreadState::_thread_oop_h` is only used by support for virtual threads. It is unlikely that an attached native thread becomes a carrier for a virtual thread. However, it is probably still desirable to update `JvmtiThreadState::_thread_oop_h` to the correct java.lang.Thread oop. src/hotspot/share/prims/jvmtiThreadState.inline.hpp line 95: > 93: // the allocation of the thread oop when a native thread is attaching. Make > 94: // sure we don't create a new state for the JavaThread. > 95: if (state == nullptr || (state->get_thread_oop() != nullptr && In my limited testing with and without virtual threads, when `state` is not null, `state->get_thread_oop()` is always either null or equal to `thread_oop`. So I suspect this whole if is equivalent to `if (state == nullptr)` in practice. I think it is better to split this if into two conditions. Here is my proposed change: https://github.com/openjdk/jdk/commit/00ace66c36243671a0fb1b673b3f9845460c6d22. It adds a `JvmtiThreadState::set_thread_oop()` method, and checks the above observation with `state->get_thread_oop()`, and addresses the problem of `JvmtiThreadState::_thread_oop_h` incorrectly staying null for attached native thread. I tested it with the 10000 virtual threads running Thread.sleep() example from https://openjdk.org/jeps/425, and `make test TEST=jdk_loom`. src/hotspot/share/prims/jvmtiThreadState.inline.hpp line 98: > 96: state->get_thread_oop() != thread_oop)) { > 97: // Check if java_lang_Thread already has a link to the JvmtiThreadState. > 98: if (thread_oop != nullptr) { // thread_oop can be null during early VMStart. This comment is another case of `state->get_thread_oop()` being null. We should merge this comment with the new comment about attaching native thread. ------------- PR Review: https://git.openjdk.org/jdk/pull/16642#pullrequestreview-1728423451 PR Review Comment: https://git.openjdk.org/jdk/pull/16642#discussion_r1391795730 PR Review Comment: https://git.openjdk.org/jdk/pull/16642#discussion_r1391805673 PR Review Comment: https://git.openjdk.org/jdk/pull/16642#discussion_r1391813214 From omikhaltcova at openjdk.org Tue Nov 14 00:57:47 2023 From: omikhaltcova at openjdk.org (Olga Mikhaltsova) Date: Tue, 14 Nov 2023 00:57:47 GMT Subject: RFR: 8318158: RISC-V: implement roundD/roundF intrinsics [v5] In-Reply-To: References: Message-ID: > Please, review this Implementation of the roundD/roundF intrinsics for RISC-V platform. > > In the table below it is shown that NaN argument should be processed as a special case. > > RISC-V Java > (FCVT.W.S) (FCVT.L.D) (long round(double a)) (int round(float a)) > Minimum valid input (after rounding) ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE > Maximum valid input (after rounding) 2^31 ? 1 2^63 ? 1 Long.MAX_VALUE Integer.MAX_VALUE > Output for out-of-range negative input ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE > Output for ?? ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE > Output for out-of-range positive input 2^31 ? 1 2^63 - 1 Long.MAX_VALUE Integer.MAX_VALUE > Output for +? 2^31 ? 1 2^63 - 1 Long.MAX_VALUE Integer.MAX_VALUE > Output for NaN 2^31 ? 1 2^63 - 1 0 0 > > The benchmark running with the 2nd fixed implementation on the T-Head RVB-ICE board shows the following performance improvement:: > > **Before** > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.test_round_double 2048 thrpt 15 59.555 0.179 ops/ms > FpRoundingBenchmark.test_round_float 2048 thrpt 15 49.760 0.103 ops/ms > > > **After** > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.test_round_double 2048 thrpt 15 110.956 0.186 ops/ms > FpRoundingBenchmark.test_round_float 2048 thrpt 15 115.947 0.122 ops/ms Olga Mikhaltsova has updated the pull request incrementally with one additional commit since the last revision: Used fclass_mask ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16382/files - new: https://git.openjdk.org/jdk/pull/16382/files/3e40b550..630a26b8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16382&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16382&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16382.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16382/head:pull/16382 PR: https://git.openjdk.org/jdk/pull/16382 From omikhaltcova at openjdk.org Tue Nov 14 00:57:48 2023 From: omikhaltcova at openjdk.org (Olga Mikhaltsova) Date: Tue, 14 Nov 2023 00:57:48 GMT Subject: RFR: 8318158: RISC-V: implement roundD/roundF intrinsics [v2] In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 11:46:58 GMT, Vladimir Kempik wrote: >> Olga Mikhaltsova has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed intrinsics implementation. Reverted changes of FCVT_SAFE. > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4254: > >> 4252: // dst = 0 >> 4253: // if +/-0, +/-subnormal numbers, signaling/quiet NaN >> 4254: andi(tmp, tmp, 0b1100111100); > > Please, update this line ( and same in doubles) for new scheme of working with fclass mask ( https://github.com/openjdk/jdk/pull/16362/files#diff-314214875276cd9a11ecdfd52b68403ded286710ba0820461b0b510506f61a33R1077 ) Fixed. Thx! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16382#discussion_r1391856816 From jiangli at openjdk.org Tue Nov 14 02:13:33 2023 From: jiangli at openjdk.org (Jiangli Zhou) Date: Tue, 14 Nov 2023 02:13:33 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is create for one JavaThread associated with attached native thread In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Mon, 13 Nov 2023 23:21:56 GMT, Man Cao wrote: >> Please review JvmtiThreadState::state_for_while_locked change to handle the state->get_thread_oop() null case. Please see https://bugs.openjdk.org/browse/JDK-8319935 for details. > > src/hotspot/share/prims/jvmtiThreadState.inline.hpp line 95: > >> 93: // the allocation of the thread oop when a native thread is attaching. Make >> 94: // sure we don't create a new state for the JavaThread. >> 95: if (state == nullptr || (state->get_thread_oop() != nullptr && > > In my limited testing with and without virtual threads, when `state` is not null, `state->get_thread_oop()` is always either null or equal to `thread_oop`. So I suspect this whole if is equivalent to `if (state == nullptr)` in practice. > > I think it is better to split this if into two conditions. Here is my proposed change: https://github.com/openjdk/jdk/commit/00ace66c36243671a0fb1b673b3f9845460c6d22. It adds a `JvmtiThreadState::set_thread_oop()` method, and checks the above observation with `state->get_thread_oop()`, and addresses the problem of `JvmtiThreadState::_thread_oop_h` incorrectly staying null for attached native thread. > > I tested it with the 10000 virtual threads running Thread.sleep() example from https://openjdk.org/jeps/425, and `make test TEST=jdk_loom`. If my understand of [JavaThread::rebind_to_jvmti_thread_state_of](https://github.com/openjdk/jdk/blob/fe0ccdf5f8a5559a608d2e2cd2b6aecbe245c5ec/src/hotspot/share/runtime/javaThread.cpp#L1819) is correct, state->get_thread_oop() could be either `JavaThread::_threadObj` or `JavaThread::_jvmti_vthread`, ignoring the null case here. The `state` could be null if the virtual thread is unmounted and the thread is null, or if a JvmtiThreadState has not been created for the thread. Currently the `JvmtiThreadState::_thread_oop_h` is 'sealed' once a JvmtiThreadState is created. It can never be altered before a JvmtiThreadState is destroyed. Adding a `JvmtiThreadState:set_thread_oop()` changes that assumption. Let's wait for others who are more involved in this area to comment as well. I just took a look of your https://github.com/openjdk/jdk/commit/00ace66c36243671a0fb1b673b3f9845460c6d22 change. I think it does not address the specific issue that we were seeing in our tests. At the time when `JvmtiThreadState::state_for_while_locked` is called for a JavaThread associated with a native thread being attached, the thread oop is being allocated. There is no non-null thread oop to be set into `JvmtiThreadState::_thread_oop_h`. I'll address other comments later. Thanks, @caoman. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16642#discussion_r1391895645 From vlivanov at openjdk.org Tue Nov 14 02:53:35 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 14 Nov 2023 02:53:35 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v14] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 12:51:36 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. >> >> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. >> >> Components of this patch: >> >> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. >> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. >> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. >> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. >> - The object/oop + offset is exposed as temporary address to native code. >> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). >> - Only x64 and AArch64 for now. >> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 >> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. >> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` >> >> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. >> >> Numbers for the included benchmark on my machine are: >> >> >> Benchmar... > > Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 52 commits: > > - Merge branch 'master' into AllowHeapNoLock > - fix type and reformat doc in Linker > - Merge branch 'master' into AllowHeapNoLock > - tweak whitespace > - a -> an > - add note to downcallHandle about passing heap segments by-reference > - Merge branch 'master' into AllowHeapNoLock > - bump up argument counts in TestLargeStub to their maximum > - s390 updates > - add stub size stress test for allowHeap > - ... and 42 more: https://git.openjdk.org/jdk/compare/03db8281...36da79d1 Marked as reviewed by vlivanov (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16201#pullrequestreview-1728635684 From dholmes at openjdk.org Tue Nov 14 03:13:30 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 14 Nov 2023 03:13:30 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is create for one JavaThread associated with attached native thread In-Reply-To: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Mon, 13 Nov 2023 21:52:22 GMT, Jiangli Zhou wrote: > Please review JvmtiThreadState::state_for_while_locked change to handle the state->get_thread_oop() null case. Please see https://bugs.openjdk.org/browse/JDK-8319935 for details. Is this a case where the code should be checking for `is_attaching_via_jni()`? ------------- PR Review: https://git.openjdk.org/jdk/pull/16642#pullrequestreview-1728674210 From dlong at openjdk.org Tue Nov 14 03:24:27 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 14 Nov 2023 03:24:27 GMT Subject: RFR: 8314220: Configurable InlineCacheBuffer size [v2] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 14:40:23 GMT, Ekaterina Vergizova wrote: >> src/hotspot/share/runtime/flags/jvmFlagConstraintsCompiler.cpp line 459: >> >>> 457: } >>> 458: >>> 459: if ((value % CodeEntryAlignment) != 0) { >> >> I don't understand why this is necessary. > > It is needed to avoid 'failed: _buffer_size not aligned' crashes on debug builds: > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/stubs.cpp#L221 That sounds like a bug. We already align the code_begin(). I see no reason to align code_end() as well. It just wastes space. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15271#discussion_r1391940158 From dlong at openjdk.org Tue Nov 14 03:29:28 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 14 Nov 2023 03:29:28 GMT Subject: RFR: 8314220: Configurable InlineCacheBuffer size [v2] In-Reply-To: References: <5X84H-etGQ-RDan1RTnnZVmXujoo6aleWopu3Hl_J0k=.d82350d3-be39-4acb-accb-ddcb7a8a6fa4@github.com> Message-ID: On Mon, 13 Nov 2023 14:42:56 GMT, Ekaterina Vergizova wrote: >> I'd rather have the type as size_t and change StubQueue accordingly. > > Thanks @dean-long. > I would like to keep this enhancement simple and minimal so that it can be backported to 17 and 11. > So I'd like to avoid changes to StubQueue. I can change the type of InlineCacheBufferSize to size_t and add checked_cast to StubQueue constructor in InlineCacheBuffer::initialize(): > _buffer = new StubQueue(new ICStubInterface, checked_cast(InlineCacheBufferSize), InlineCacheBuffer_lock, "InlineCacheBuffer"); > > Because in any case InlineCacheBufferSize can't be greater than INT_MAX: > `InlineCacheBufferSize < NonNMethodCodeHeapSize < ReservedCodeCacheSize < CODE_CACHE_DEFAULT_LIMIT = 2G`: > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/codeCache.cpp#L191 > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/compilerDefinitions.cpp#L492 > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/globalDefinitions.hpp#L589 > > Will that be OK? OK. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15271#discussion_r1391942486 From vlivanov at openjdk.org Tue Nov 14 03:50:31 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 14 Nov 2023 03:50:31 GMT Subject: RFR: 8267532: Try/catch block not optimized as expected [v5] In-Reply-To: References: Message-ID: On Tue, 7 Nov 2023 15:49:08 GMT, Jorn Vernee wrote: >> The issue is essentially that for the Java try-with-resource construct, javac generates multiple calls to `close(`) the resource. One of those calls is inside the hidden exception handler of the try block. The issue for us is that typically the exception handler is never entered (since no exception is thrown), however we don't profile exception handlers at the moment, so the block is not pruned. C2 doesn't inline the `close()` call in the handler due to low call site frequency. As a result, the receiver of that call escapes and can not be scalar replaced, which then leads to a loss in performance. >> >> There has been some discussion on the JBS issue that this could be fixed by profiling catch blocks. And another suggestion that partial escape analysis could help here to prevent the object from escaping. But, I think there are other benefits to being able to prune dead catch blocks, such as general reduction in code size, and other optimizations being possible by dead code being eliminated. So, I've implemented catch block profiling + pruning in this patch. >> >> The implementation is essentially very straightforward: we allocate an extra bit of profiling data for each >> exception handler of a method in the `MethodData` for that method (which holds all the profiling >> data). Then when looking up the exception handler after an exception is thrown, we mark the >> exception handler as entered. When C2 parses the exception handler block, and it sees that it has >> never been entered, we emit an uncommon trap instead. >> >> I've also cleaned up the handling of profiling data sections a bit. After adding the extra section of data to MethodData, I was seeing several crashes when ciMethodData was used. The underlying issue seemed to be that the offset of the parameter data was computed based on the total data size - parameter data size (which doesn't work if we add an additional section for exception handler data). I've re-written the code around this a bit to try and prevent issues in the future. Both MethodData and ciMethodData now track offsets of parameter data and exception handler data, and the size of the each data section is derived from the offsets. >> >> Finally, there was an assert firing in `freeze_internal` in `continuationFreezeThaw.cpp`: >> >> assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), >> "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int... > > Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: > > - track catch block enters in deoptimization code too > - Add @requires vm.debug to test Thanks for taking care of it, Jorn. Overall, the patch looks good. What kind of performance testing have you done? The current bug summary is too vague. Please, reword it describing what the proposed enhancement does. I don't fully understand the issue with `has_monitor`. It does look like a pre-existing issue and it's better to handle it separately. It's interesting to note that the underlying issue for FFM is not that exception handlers aren't profiled, but that unreached calls are not pruned. It complicates the job for EA making arguments non-scalarizable. Pruning unreachable calls would fix the issue in a more disciplined manner, but it would also have more pervasive effects requiring deeper performance evaluation. Overall, it would be helpful to ensure there are no unreachable calls encountered during C2 compilation at all. `ciTypeFlow` may benefit from new profiling information as well. Speaking of code changes: * I don't see much value in 2 separate product flags to control profiling and optimization logic (`ProfileExceptionHandlers` and `PruneDeadExceptionHandlers`); having a single product flag should be enough; * product flags should be marked diagnostic ------------- PR Review: https://git.openjdk.org/jdk/pull/16416#pullrequestreview-1728745064 From iklam at openjdk.org Tue Nov 14 04:00:50 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 14 Nov 2023 04:00:50 GMT Subject: RFR: 8319999: Refactor MetaspaceShared::use_full_module_graph() Message-ID: This is another step of moving CDS config management into cdsConfig.hpp: The function `MetaspaceShared::use_full_module_graph()` is split into two: - `CDSConfig::is_dumping_full_module_graph()` - `CDSConfig::is_loading_full_module_graph()` ------------- Commit messages: - 8319999: Refactor MetaspaceShared::use_full_module_graph() Changes: https://git.openjdk.org/jdk/pull/16646/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16646&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319999 Stats: 143 lines in 15 files changed: 69 ins; 41 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/16646.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16646/head:pull/16646 PR: https://git.openjdk.org/jdk/pull/16646 From iklam at openjdk.org Tue Nov 14 04:19:37 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 14 Nov 2023 04:19:37 GMT Subject: RFR: 8319999: Refactor MetaspaceShared::use_full_module_graph() [v2] In-Reply-To: References: Message-ID: <-yUzyesc4fyJc5y1JefrQJcseeTRiKfwbeGv3-6si30=.4527a2d4-210c-4e3f-9c9c-e37b0383ae0b@github.com> > This is another step of moving CDS config management into cdsConfig.hpp: > > The function `MetaspaceShared::use_full_module_graph()` is split into two: > - `CDSConfig::is_dumping_full_module_graph()` > - `CDSConfig::is_loading_full_module_graph()` Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: rename FileMapHeader::_use_full_module_graph -> _has_full_module_graph ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16646/files - new: https://git.openjdk.org/jdk/pull/16646/files/ece0d2be..e66b2a5d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16646&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16646&range=00-01 Stats: 7 lines in 2 files changed: 2 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16646.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16646/head:pull/16646 PR: https://git.openjdk.org/jdk/pull/16646 From dholmes at openjdk.org Tue Nov 14 04:39:30 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 14 Nov 2023 04:39:30 GMT Subject: RFR: 8319999: Refactor MetaspaceShared::use_full_module_graph() [v2] In-Reply-To: <-yUzyesc4fyJc5y1JefrQJcseeTRiKfwbeGv3-6si30=.4527a2d4-210c-4e3f-9c9c-e37b0383ae0b@github.com> References: <-yUzyesc4fyJc5y1JefrQJcseeTRiKfwbeGv3-6si30=.4527a2d4-210c-4e3f-9c9c-e37b0383ae0b@github.com> Message-ID: On Tue, 14 Nov 2023 04:19:37 GMT, Ioi Lam wrote: >> This is another step of moving CDS config management into cdsConfig.hpp: >> >> The function `MetaspaceShared::use_full_module_graph()` is split into two: >> - `CDSConfig::is_dumping_full_module_graph()` >> - `CDSConfig::is_loading_full_module_graph()` > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > rename FileMapHeader::_use_full_module_graph -> _has_full_module_graph Seems reasonable but some of the naming seems a little off to me. Thanks. src/hotspot/share/cds/cdsConfig.cpp line 34: > 32: bool CDSConfig::_is_dumping_dynamic_archive = false; > 33: bool CDSConfig::_enable_dumping_full_module_graph = true; > 34: bool CDSConfig::_enable_loading_full_module_graph = true; The "enable" naming isn't quite right - these fields hold the enabled state e.g. `_dumping_full_module_graph_enabled`. src/hotspot/share/cds/cdsConfig.hpp line 33: > 31: class CDSConfig : public AllStatic { > 32: #if INCLUDE_CDS > 33: static bool _is_dumping_dynamic_archive; Nit: Extra spaces src/hotspot/share/cds/filemap.cpp line 216: > 214: _max_heap_size = MaxHeapSize; > 215: _use_optimized_module_handling = MetaspaceShared::use_optimized_module_handling(); > 216: _has_full_module_graph = CDSConfig::is_dumping_full_module_graph(); `_uses_full_module_graph` would seem more appropriate than `use` or `has` ------------- PR Review: https://git.openjdk.org/jdk/pull/16646#pullrequestreview-1728817091 PR Review Comment: https://git.openjdk.org/jdk/pull/16646#discussion_r1391969300 PR Review Comment: https://git.openjdk.org/jdk/pull/16646#discussion_r1391969858 PR Review Comment: https://git.openjdk.org/jdk/pull/16646#discussion_r1391971212 From iklam at openjdk.org Tue Nov 14 04:59:31 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 14 Nov 2023 04:59:31 GMT Subject: RFR: 8319999: Refactor MetaspaceShared::use_full_module_graph() [v2] In-Reply-To: References: <-yUzyesc4fyJc5y1JefrQJcseeTRiKfwbeGv3-6si30=.4527a2d4-210c-4e3f-9c9c-e37b0383ae0b@github.com> Message-ID: On Tue, 14 Nov 2023 04:30:37 GMT, David Holmes wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> rename FileMapHeader::_use_full_module_graph -> _has_full_module_graph > > src/hotspot/share/cds/cdsConfig.hpp line 33: > >> 31: class CDSConfig : public AllStatic { >> 32: #if INCLUDE_CDS >> 33: static bool _is_dumping_dynamic_archive; > > Nit: Extra spaces The alignment is the same as the is_dumping_xxx()/enable_dumping_xxx() functions below, so it's easier to read the "dumping" and "loading" part of the names. > src/hotspot/share/cds/filemap.cpp line 216: > >> 214: _max_heap_size = MaxHeapSize; >> 215: _use_optimized_module_handling = MetaspaceShared::use_optimized_module_handling(); >> 216: _has_full_module_graph = CDSConfig::is_dumping_full_module_graph(); > > `_uses_full_module_graph` would seem more appropriate than `use` or `has` The name is consistent with other fields in filemap.hpp: bool _has_non_jar_in_classpath; // non-jar file entry exists in classpath bool _has_platform_or_app_classes; // Archive contains app classes bool _has_full_module_graph; // Does this CDS archive contain the full archived module graph? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16646#discussion_r1391984645 PR Review Comment: https://git.openjdk.org/jdk/pull/16646#discussion_r1391983234 From iklam at openjdk.org Tue Nov 14 05:18:27 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 14 Nov 2023 05:18:27 GMT Subject: RFR: 8319999: Refactor MetaspaceShared::use_full_module_graph() [v2] In-Reply-To: References: <-yUzyesc4fyJc5y1JefrQJcseeTRiKfwbeGv3-6si30=.4527a2d4-210c-4e3f-9c9c-e37b0383ae0b@github.com> Message-ID: <30Zk2Eg83M2C3IvR9PmrBdCf-EEWIpQi__NzIRGu9zg=.2a2f8eb2-ab6a-4edc-9324-9aeaf0b053d2@github.com> On Tue, 14 Nov 2023 04:29:06 GMT, David Holmes wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> rename FileMapHeader::_use_full_module_graph -> _has_full_module_graph > > src/hotspot/share/cds/cdsConfig.cpp line 34: > >> 32: bool CDSConfig::_is_dumping_dynamic_archive = false; >> 33: bool CDSConfig::_enable_dumping_full_module_graph = true; >> 34: bool CDSConfig::_enable_loading_full_module_graph = true; > > The "enable" naming isn't quite right - these fields hold the enabled state e.g. `_dumping_full_module_graph_enabled`. I like the variable name to be the same as the function name, so it's easier to search. We also have `bool _enable_preview;` in arguments.hpp ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16646#discussion_r1391992956 From dholmes at openjdk.org Tue Nov 14 05:22:27 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 14 Nov 2023 05:22:27 GMT Subject: RFR: 8319999: Refactor MetaspaceShared::use_full_module_graph() [v2] In-Reply-To: References: <-yUzyesc4fyJc5y1JefrQJcseeTRiKfwbeGv3-6si30=.4527a2d4-210c-4e3f-9c9c-e37b0383ae0b@github.com> Message-ID: On Tue, 14 Nov 2023 04:57:14 GMT, Ioi Lam wrote: >> src/hotspot/share/cds/cdsConfig.hpp line 33: >> >>> 31: class CDSConfig : public AllStatic { >>> 32: #if INCLUDE_CDS >>> 33: static bool _is_dumping_dynamic_archive; >> >> Nit: Extra spaces > > The alignment is the same as the is_dumping_xxx()/enable_dumping_xxx() functions below, so it's easier to read the "dumping" and "loading" part of the names. Sorry that is not a valid reason IMO. It's bad enough when people want the start of names to line up, never-mind the middle of them! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16646#discussion_r1391995395 From dholmes at openjdk.org Tue Nov 14 05:26:27 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 14 Nov 2023 05:26:27 GMT Subject: RFR: 8319999: Refactor MetaspaceShared::use_full_module_graph() [v2] In-Reply-To: References: <-yUzyesc4fyJc5y1JefrQJcseeTRiKfwbeGv3-6si30=.4527a2d4-210c-4e3f-9c9c-e37b0383ae0b@github.com> Message-ID: <-EIv1KN56Jqe2FwxBff3hEs-xffdM_XhIjK_v2P8yv4=.e7798d91-0def-4bad-a478-d7773d16217c@github.com> On Tue, 14 Nov 2023 04:54:24 GMT, Ioi Lam wrote: >> src/hotspot/share/cds/filemap.cpp line 216: >> >>> 214: _max_heap_size = MaxHeapSize; >>> 215: _use_optimized_module_handling = MetaspaceShared::use_optimized_module_handling(); >>> 216: _has_full_module_graph = CDSConfig::is_dumping_full_module_graph(); >> >> `_uses_full_module_graph` would seem more appropriate than `use` or `has` > > The name is consistent with other fields in filemap.hpp: > > > bool _has_non_jar_in_classpath; // non-jar file entry exists in classpath > bool _has_platform_or_app_classes; // Archive contains app classes > bool _has_full_module_graph; // Does this CDS archive contain the full archived module graph? Okay makes more sense when expressed like that - but it is not obvious that the MetaspaceShared API should be read as applying "to this CDS archive". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16646#discussion_r1391996994 From dholmes at openjdk.org Tue Nov 14 05:31:27 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 14 Nov 2023 05:31:27 GMT Subject: RFR: 8319999: Refactor MetaspaceShared::use_full_module_graph() [v2] In-Reply-To: <30Zk2Eg83M2C3IvR9PmrBdCf-EEWIpQi__NzIRGu9zg=.2a2f8eb2-ab6a-4edc-9324-9aeaf0b053d2@github.com> References: <-yUzyesc4fyJc5y1JefrQJcseeTRiKfwbeGv3-6si30=.4527a2d4-210c-4e3f-9c9c-e37b0383ae0b@github.com> <30Zk2Eg83M2C3IvR9PmrBdCf-EEWIpQi__NzIRGu9zg=.2a2f8eb2-ab6a-4edc-9324-9aeaf0b053d2@github.com> Message-ID: On Tue, 14 Nov 2023 05:15:24 GMT, Ioi Lam wrote: >> src/hotspot/share/cds/cdsConfig.cpp line 34: >> >>> 32: bool CDSConfig::_is_dumping_dynamic_archive = false; >>> 33: bool CDSConfig::_enable_dumping_full_module_graph = true; >>> 34: bool CDSConfig::_enable_loading_full_module_graph = true; >> >> The "enable" naming isn't quite right - these fields hold the enabled state e.g. `_dumping_full_module_graph_enabled`. > > I like the variable name to be the same as the function name, so it's easier to search. We also have `bool _enable_preview;` in arguments.hpp But the context is different. `enable_xxx()` indicates that the function enables xxx. The variable that holds the enabled/disabled state of xxx should reflect in its name that it holds that state so `_xxx_enabled` Or `_has_xxx`, or `_is_xxx`, depending on what `xxx` actually is. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16646#discussion_r1391999746 From iklam at openjdk.org Tue Nov 14 05:36:26 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 14 Nov 2023 05:36:26 GMT Subject: RFR: 8319999: Refactor MetaspaceShared::use_full_module_graph() [v2] In-Reply-To: References: <-yUzyesc4fyJc5y1JefrQJcseeTRiKfwbeGv3-6si30=.4527a2d4-210c-4e3f-9c9c-e37b0383ae0b@github.com> Message-ID: On Tue, 14 Nov 2023 05:20:15 GMT, David Holmes wrote: >> The alignment is the same as the is_dumping_xxx()/enable_dumping_xxx() functions below, so it's easier to read the "dumping" and "loading" part of the names. > > Sorry that is not a valid reason IMO. It's bad enough when people want the start of names to line up, never-mind the middle of them! That's my personal preference. Many people align the code differently. Are there any rules in the code guide about code alignments? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16646#discussion_r1392002360 From dholmes at openjdk.org Tue Nov 14 05:45:25 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 14 Nov 2023 05:45:25 GMT Subject: RFR: 8319999: Refactor MetaspaceShared::use_full_module_graph() [v2] In-Reply-To: References: <-yUzyesc4fyJc5y1JefrQJcseeTRiKfwbeGv3-6si30=.4527a2d4-210c-4e3f-9c9c-e37b0383ae0b@github.com> <30Zk2Eg83M2C3IvR9PmrBdCf-EEWIpQi__NzIRGu9zg=.2a2f8eb2-ab6a-4edc-9324-9aeaf0b053d2@github.com> Message-ID: <4bktQehVUaah0Fl2SipAWA6f16_5wE4KIczlWt7P5hE=.4b5dbd41-38c8-4bbb-9f94-6804b9277924@github.com> On Tue, 14 Nov 2023 05:29:01 GMT, David Holmes wrote: >> I like the variable name to be the same as the function name, so it's easier to search. We also have `bool _enable_preview;` in arguments.hpp > > But the context is different. `enable_xxx()` indicates that the function enables xxx. The variable that holds the enabled/disabled state of xxx should reflect in its name that it holds that state so `_xxx_enabled` Or `_has_xxx`, or `_is_xxx`, depending on what `xxx` actually is. Or `_use_xxx` as was originally used here. I guess I just find it particularly irksome that `enable` is often used where `enabled` would be the grammatically correct choice. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16646#discussion_r1392008700 From iklam at openjdk.org Tue Nov 14 06:10:52 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 14 Nov 2023 06:10:52 GMT Subject: RFR: 8319999: Refactor MetaspaceShared::use_full_module_graph() [v3] In-Reply-To: References: Message-ID: > This is another step of moving CDS config management into cdsConfig.hpp: > > The function `MetaspaceShared::use_full_module_graph()` is split into two: > - `CDSConfig::is_dumping_full_module_graph()` > - `CDSConfig::is_loading_full_module_graph()` Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: Changed flag to CDSConfig::_dumping_full_module_graph_disabled ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16646/files - new: https://git.openjdk.org/jdk/pull/16646/files/e66b2a5d..f370aea4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16646&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16646&range=01-02 Stats: 21 lines in 2 files changed: 6 ins; 1 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/16646.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16646/head:pull/16646 PR: https://git.openjdk.org/jdk/pull/16646 From dholmes at openjdk.org Tue Nov 14 06:10:53 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 14 Nov 2023 06:10:53 GMT Subject: RFR: 8319999: Refactor MetaspaceShared::use_full_module_graph() [v3] In-Reply-To: References: Message-ID: <7wcZM0VIAvAPq2j5zg2sq6qHfRVdTxv9qyUcGm5TMWI=.07e2308a-f3ce-41de-a301-c81f94d57ade@github.com> On Tue, 14 Nov 2023 06:06:55 GMT, Ioi Lam wrote: >> This is another step of moving CDS config management into cdsConfig.hpp: >> >> The function `MetaspaceShared::use_full_module_graph()` is split into two: >> - `CDSConfig::is_dumping_full_module_graph()` >> - `CDSConfig::is_loading_full_module_graph()` > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > Changed flag to CDSConfig::_dumping_full_module_graph_disabled Marked as reviewed by dholmes (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16646#pullrequestreview-1728976315 From dholmes at openjdk.org Tue Nov 14 06:10:53 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 14 Nov 2023 06:10:53 GMT Subject: RFR: 8319999: Refactor MetaspaceShared::use_full_module_graph() [v2] In-Reply-To: References: <-yUzyesc4fyJc5y1JefrQJcseeTRiKfwbeGv3-6si30=.4527a2d4-210c-4e3f-9c9c-e37b0383ae0b@github.com> Message-ID: On Tue, 14 Nov 2023 05:34:05 GMT, Ioi Lam wrote: >> Sorry that is not a valid reason IMO. It's bad enough when people want the start of names to line up, never-mind the middle of them! > > That's my personal preference. Many people align the code differently. Are there any rules in the code guide about code alignments? Nope no rules as such. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16646#discussion_r1392026272 From iklam at openjdk.org Tue Nov 14 06:22:41 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 14 Nov 2023 06:22:41 GMT Subject: RFR: 8319999: Refactor MetaspaceShared::use_full_module_graph() [v4] In-Reply-To: References: Message-ID: > This is another step of moving CDS config management into cdsConfig.hpp: > > The function `MetaspaceShared::use_full_module_graph()` is split into two: > - `CDSConfig::is_dumping_full_module_graph()` > - `CDSConfig::is_loading_full_module_graph()` Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: fixed white spaces ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16646/files - new: https://git.openjdk.org/jdk/pull/16646/files/f370aea4..da947425 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16646&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16646&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16646.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16646/head:pull/16646 PR: https://git.openjdk.org/jdk/pull/16646 From iklam at openjdk.org Tue Nov 14 06:22:43 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 14 Nov 2023 06:22:43 GMT Subject: RFR: 8319999: Refactor MetaspaceShared::use_full_module_graph() [v2] In-Reply-To: <4bktQehVUaah0Fl2SipAWA6f16_5wE4KIczlWt7P5hE=.4b5dbd41-38c8-4bbb-9f94-6804b9277924@github.com> References: <-yUzyesc4fyJc5y1JefrQJcseeTRiKfwbeGv3-6si30=.4527a2d4-210c-4e3f-9c9c-e37b0383ae0b@github.com> <30Zk2Eg83M2C3IvR9PmrBdCf-EEWIpQi__NzIRGu9zg=.2a2f8eb2-ab6a-4edc-9324-9aeaf0b053d2@github.com> <4bktQehVUaah0Fl2SipAWA6f16_5wE4KIczlWt7P5hE=.4b5dbd41-38c8-4bbb-9f94-6804b9277924@github.com> Message-ID: On Tue, 14 Nov 2023 05:42:56 GMT, David Holmes wrote: >> But the context is different. `enable_xxx()` indicates that the function enables xxx. The variable that holds the enabled/disabled state of xxx should reflect in its name that it holds that state so `_xxx_enabled` Or `_has_xxx`, or `_is_xxx`, depending on what `xxx` actually is. > > Or `_use_xxx` as was originally used here. I guess I just find it particularly irksome that `enable` is often used where `enabled` would be the grammatically correct choice. I don't think enable_xxx is grammatically wrong, when it's read as "(we should) enable xxx". We have hundreds of global flags that start with Use/Enable/Disable, like EnableDynamicAgentLoading and UseTLAB. Anyway, the reason I didn't use `_is_loading_full_module_graph` is that I want a flag that unconditionally disables the feature, but this flag alone doesn't guarantee that the feature exists. I changed the flag to `_loading_full_module_graph_disabled` and added comments to make it clear. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16646#discussion_r1392036140 From stefank at openjdk.org Tue Nov 14 06:26:32 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 14 Nov 2023 06:26:32 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v10] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 09:00:31 GMT, Stefan Karlsson wrote: >> A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Tweak test The patch passed tier1-8 on Linux x64. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16519#issuecomment-1809614500 From rmarchenko at openjdk.org Tue Nov 14 06:56:49 2023 From: rmarchenko at openjdk.org (Roman Marchenko) Date: Tue, 14 Nov 2023 06:56:49 GMT Subject: RFR: 8319961: JvmtiEnvBase doesn't zero _ext_event_callbacks Message-ID: Zero'ing memory of extension event callbacks ------------- Commit messages: - JDK-8319961: JvmtiEnvBase doesn't zero _ext_event_callbacks Changes: https://git.openjdk.org/jdk/pull/16647/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16647&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319961 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16647.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16647/head:pull/16647 PR: https://git.openjdk.org/jdk/pull/16647 From aboldtch at openjdk.org Tue Nov 14 07:02:27 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 14 Nov 2023 07:02:27 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v3] In-Reply-To: <_a3NEb-EFfx_NSuwCSv3EV_FvYcjCMnrguOK1JMNV4Q=.61575855-0c1f-4db1-9499-e9e176a3b8a7@github.com> References: <_a3NEb-EFfx_NSuwCSv3EV_FvYcjCMnrguOK1JMNV4Q=.61575855-0c1f-4db1-9499-e9e176a3b8a7@github.com> Message-ID: On Mon, 13 Nov 2023 22:25:43 GMT, David Holmes wrote: >> Not sure I understand. >> >> The change here is that [for every object A, there exists no other object B such that A == B] is changed to [for every contiguous run of object A, there exists no other object B outside that run such that A == B]. Both of these checks are `O(n^2)` in the worst case, and the second is is `O(n)` in the best case. >> >> I think you have describe the algorithm you envision. Or maybe you want to change what property we are asserting. > > I thought the issue was that without recursion support we must never find an A and B such that A == B. But with recursion support we can allow A == B if they are adjacent in the lock-stock. So that is what the check is performing, but apparently in two loops, where I was suggesting it can perhaps be done in a single loop. The recursive locking is more strict. There can only be one consecutive range of the same objects. (e.g. `AABCC` is allowed while `AABAA` is not.) The outer loop finds (the two for loop constructs which works on the index `i`) iterates over each range of consecutive objects which are the same. (Picks out the object at the range start index bumps the index until the object is different, finding the range end index). And the inner loop (the for loop construct which works on the index `j`) asserts that no other entry is the same as the object in the this range. (Only walks higher indexes as the lower entries have already been checked in previous iterations). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1392074056 From fyang at openjdk.org Tue Nov 14 07:13:33 2023 From: fyang at openjdk.org (Fei Yang) Date: Tue, 14 Nov 2023 07:13:33 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v9] In-Reply-To: <7nzXUcZTMe6tmV7VPZmIYTWA7z7aakB_oL-jQuYLI-8=.5ae719d6-6d6d-41fb-8128-09f634c45538@github.com> References: <7nzXUcZTMe6tmV7VPZmIYTWA7z7aakB_oL-jQuYLI-8=.5ae719d6-6d6d-41fb-8128-09f634c45538@github.com> Message-ID: On Mon, 13 Nov 2023 15:27:28 GMT, Hamlin Li wrote: >> Hi, >> Can you review the change to add intrinsic for CompressBits for Long & Integer? >> Thanks! >> >> ##?Test >> pass jtreg test: >> test/jdk/java/lang/CompressExpand*.java > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > Fix typos Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16481#pullrequestreview-1729044404 From fjiang at openjdk.org Tue Nov 14 07:17:29 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 14 Nov 2023 07:17:29 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v9] In-Reply-To: <7nzXUcZTMe6tmV7VPZmIYTWA7z7aakB_oL-jQuYLI-8=.5ae719d6-6d6d-41fb-8128-09f634c45538@github.com> References: <7nzXUcZTMe6tmV7VPZmIYTWA7z7aakB_oL-jQuYLI-8=.5ae719d6-6d6d-41fb-8128-09f634c45538@github.com> Message-ID: <5z_LYlwhbfVSG_7u97YGbGFzNLSWXqkqSfk3OVnNsRk=.fb6c95e9-a877-45b8-98fa-c7bc2d1b6cd4@github.com> On Mon, 13 Nov 2023 15:27:28 GMT, Hamlin Li wrote: >> Hi, >> Can you review the change to add intrinsic for CompressBits for Long & Integer? >> Thanks! >> >> ##?Test >> pass jtreg test: >> test/jdk/java/lang/CompressExpand*.java > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > Fix typos Marked as reviewed by fjiang (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16481#pullrequestreview-1729048970 From fyang at openjdk.org Tue Nov 14 07:26:28 2023 From: fyang at openjdk.org (Fei Yang) Date: Tue, 14 Nov 2023 07:26:28 GMT Subject: RFR: 8319781: RISC-V: Refactor UseRVV related checks [v3] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 16:42:11 GMT, Hamlin Li wrote: >> Hi, >> Can you review the patch to refactor the code related UseRVV checks? >> Thanks! >> >> There are some code (flag setting/checking) depending on UseRVV's value, these code should be refactored, especially after the change of https://bugs.openjdk.org/browse/JDK-8319408: >> 1. some code needs to get the final UseRVV's value rather than the initial value, e.g. ChaCha20 intrinsic. >> 2. refactored to be more readable. >> 3. also add note to make sure the future code get the final UseRVV value instead of inital value. > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > remove code setting SpecialEncodeISOArray Updated change looks good. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16580#pullrequestreview-1729060632 From dholmes at openjdk.org Tue Nov 14 07:45:29 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 14 Nov 2023 07:45:29 GMT Subject: RFR: 8319961: JvmtiEnvBase doesn't zero _ext_event_callbacks In-Reply-To: References: Message-ID: <5maZLsqVb1mNYUvUJLwSiSF9xAww0GryVFtdKMBOmHk=.c8ff2d36-7337-4965-acc9-4a08750a258b@github.com> On Tue, 14 Nov 2023 06:50:58 GMT, Roman Marchenko wrote: > Zero'ing memory of extension event callbacks Marked as reviewed by dholmes (Reviewer). src/hotspot/share/prims/jvmtiEnvBase.cpp line 217: > 215: // all callbacks initially null > 216: memset(&_event_callbacks,0,sizeof(jvmtiEventCallbacks)); > 217: memset(&_ext_event_callbacks, 0, sizeof(jvmtiExtEventCallbacks)); Looks good. While you are here could you adjust the existing memset line and add spaces after the commas please. Thanks. ------------- PR Review: https://git.openjdk.org/jdk/pull/16647#pullrequestreview-1729084003 PR Review Comment: https://git.openjdk.org/jdk/pull/16647#discussion_r1392118874 From dholmes at openjdk.org Tue Nov 14 07:52:32 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 14 Nov 2023 07:52:32 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v3] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 09:07:59 GMT, Axel Boldt-Christmas wrote: >> src/hotspot/share/runtime/lockStack.cpp line 50: >> >>> 48: STATIC_ASSERT(sizeof(_bad_oop_sentinel) == oopSize); >>> 49: STATIC_ASSERT(sizeof(_base[0]) == oopSize); >>> 50: STATIC_ASSERT(std::is_standard_layout::value); >> >> What is this? Are we allowed to use it? > > There is probably more nuance here w.r.t. `offsetof` than I know. > My belief was that reason we did not use `offsetof` is because we use it on non standard layout types, for which is invalid. But the lock stack is a standard layout. > > However, reading some of issues surrounding `offsetof` (mainly poor compiler support and becoming conditionally supported in C++17) there might be more reasons to avoid it. If that is the case this property would have to be asserted at runtime instead. > > Maybe @kimbarrett has some more insight. To be clear I was querying the use of `std::is_standard_layout` here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1392124042 From dholmes at openjdk.org Tue Nov 14 07:52:35 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 14 Nov 2023 07:52:35 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v3] In-Reply-To: References: <_a3NEb-EFfx_NSuwCSv3EV_FvYcjCMnrguOK1JMNV4Q=.61575855-0c1f-4db1-9499-e9e176a3b8a7@github.com> Message-ID: On Tue, 14 Nov 2023 07:00:10 GMT, Axel Boldt-Christmas wrote: >> I thought the issue was that without recursion support we must never find an A and B such that A == B. But with recursion support we can allow A == B if they are adjacent in the lock-stock. So that is what the check is performing, but apparently in two loops, where I was suggesting it can perhaps be done in a single loop. > > The recursive locking is more strict. There can only be one consecutive range of the same objects. (e.g. `AABCC` is allowed while `AABAA` is not.) > > The outer loop finds (the two for loop constructs which works on the index `i`) iterates over each range of consecutive objects which are the same. (Picks out the object at the range start index bumps the index until the object is different, finding the range end index). > And the inner loop (the for loop construct which works on the index `j`) asserts that no other entry is the same as the object in the this range. (Only walks higher indexes as the lower entries have already been checked in previous iterations). Got it. Sorry I totally misread the code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1392126836 From fyang at openjdk.org Tue Nov 14 08:16:29 2023 From: fyang at openjdk.org (Fei Yang) Date: Tue, 14 Nov 2023 08:16:29 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 In-Reply-To: References: Message-ID: On Wed, 8 Nov 2023 14:47:07 GMT, Robbin Ehn wrote: > Hi, please consider. > > Main author is @luhenry, I only fixed some minor things and tested it. > > Such as: > test/hotspot/jtreg/compiler/intrinsics/sha/ > test/jdk/java/security/MessageDigest/ > test/jdk/jdk/security/ > tier1 > > And still running some test. src/hotspot/cpu/riscv/vm_version_riscv.cpp line 160: > 158: } > 159: > 160: if (UseZvknha && UseZvkb) { A simple question here: Does the existence of `Zvknhb` means availability of `Zvknha`? Or should this be something like `if ((UseZvknha || UseZvknhb) && UseZvkb)`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1392153578 From rmarchenko at openjdk.org Tue Nov 14 08:18:41 2023 From: rmarchenko at openjdk.org (Roman Marchenko) Date: Tue, 14 Nov 2023 08:18:41 GMT Subject: RFR: 8319961: JvmtiEnvBase doesn't zero _ext_event_callbacks [v2] In-Reply-To: References: Message-ID: > Zero'ing memory of extension event callbacks Roman Marchenko has updated the pull request incrementally with one additional commit since the last revision: Fixing review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16647/files - new: https://git.openjdk.org/jdk/pull/16647/files/aa3bfaec..2ef1e341 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16647&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16647&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16647.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16647/head:pull/16647 PR: https://git.openjdk.org/jdk/pull/16647 From rmarchenko at openjdk.org Tue Nov 14 08:18:43 2023 From: rmarchenko at openjdk.org (Roman Marchenko) Date: Tue, 14 Nov 2023 08:18:43 GMT Subject: RFR: 8319961: JvmtiEnvBase doesn't zero _ext_event_callbacks [v2] In-Reply-To: <5maZLsqVb1mNYUvUJLwSiSF9xAww0GryVFtdKMBOmHk=.c8ff2d36-7337-4965-acc9-4a08750a258b@github.com> References: <5maZLsqVb1mNYUvUJLwSiSF9xAww0GryVFtdKMBOmHk=.c8ff2d36-7337-4965-acc9-4a08750a258b@github.com> Message-ID: On Tue, 14 Nov 2023 07:42:08 GMT, David Holmes wrote: >> Roman Marchenko has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixing review comments > > src/hotspot/share/prims/jvmtiEnvBase.cpp line 217: > >> 215: // all callbacks initially null >> 216: memset(&_event_callbacks,0,sizeof(jvmtiEventCallbacks)); >> 217: memset(&_ext_event_callbacks, 0, sizeof(jvmtiExtEventCallbacks)); > > Looks good. > > While you are here could you adjust the existing memset line and add spaces after the commas please. Thanks. Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16647#discussion_r1392157095 From aboldtch at openjdk.org Tue Nov 14 08:19:29 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 14 Nov 2023 08:19:29 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v3] In-Reply-To: References: <-lwt39Gx_QJfxgzgSLHkysdtOrVxgP8dFh7gN4TDkmY=.86139caf-08c2-484f-999f-fa6cf121f9df@github.com> Message-ID: On Mon, 13 Nov 2023 22:43:48 GMT, David Holmes wrote: >>> What is the rationale behind this block? Is it beneficial to inflate the top-most lock to make room for the new one, because that might be hotter? If so, then it may be even more useful to inflate the bottom-most entry instead? >> >> The current implementation inflates the bottom (least recently added) entry. >> >> The rational is that because the emitted code always goes into the runtime for monitorenter if the lock stack is full, we need to inflate at least one object on the lock stack to not get into a scenario where we are constantly going into the runtime because we are in some deeply nested critical sections entering and exiting in a loop with the lock stack full. >> >> I've also have versions of this which goes through the lock stack, and first inflates the already inflated objects, and only inflate a not inflated object if the lock stack is still full. >> >> As for inflating the bottom instead of the top. I am unsure what would be best. The idea behind the bottom is that it is furthest away from the current running code, and in case the top is in a loop with different objects every time it would cause a lot of inflation. But it could obviously also be that the stack is in a loop and the bottom most object is different every time while the top is the same. >> I can't say that I have seen programs with this either of this behaviour. Both can have equally bad worst case programs (with respect to number of inflations) but my gut feeling is that the worst case is less likely when inflating the bottom. >> >>> If recursion support means the lockStack is no longer big enough then we need to increase its size to accommodate that. >> >> I have not seen it being a problem, but it would be worth looking for programs where this could be an issue and evaluate increasing the lock stack size. Regardless of the capacity, if (and when) the lock stack gets full it needs to be handled in some way. >> >>> I'm also unclear on the rationale, and again on checking for a full-stack upfront like this, when it should be a rare case. >> >> The check for a full lock stack is always performed in every codepath, emitted C2, emitted shared and the runtime. >> >> This only adds an escape hatch for the degenerate behaviour we could arrive in. > > I would expect that we will encounter a full lockstack, of the current size 4, much more often with recursion support, and so we probably should increase it. But the handling of a full stack should still come last IMO. The current lock stack capacity is 8. Worth noting is that when we add support for holding monitor in loom we will probably transfer the lock stack while mounting and un-mounting. If this is done via indirection instead of copying, adding a dynamically resizable lock stack would not be that much of an effort. > But the handling of a full stack should still come last IMO. I am unsure what you mean with last here. The idea is to make room the current object on the lock stack. We could make this conditional on the state of the objects locking. But the only cases where making room on the lock stack is arguable less useful are if this is a recursive enter on an inflated monitor or is a contended enter on a fast locked object. Then there is also what I already described where you can remove already inflated objects first. But having a full lock stack is not something I have encounter as a problem in the different benchmarks. So I did not want to add complex logic which tried to reason about when and how to make room on the lock stack and simply say, if it is full, make room. So if I understand you correctly, you want to inflate the current objects monitor unconditionally if the lock stack is full. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1392158983 From shade at openjdk.org Tue Nov 14 08:46:28 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 14 Nov 2023 08:46:28 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v5] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 20:14:25 GMT, Claes Redestad wrote: > OK, I had already submitted a set across all platforms, but the results on linux will serve as a good check up on the generic vs futex impl. Do you intend to switch back before integration or is the intent to integrate and evaluate if it's on par then make a go/no-go decision later? The intent to leave futex implementation enabled for Linux. Maybe we would do a series of touchups there. Generic implementation just tries to do what futex implementation is already doing well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1809764047 From mdoerr at openjdk.org Tue Nov 14 08:58:32 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 14 Nov 2023 08:58:32 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v3] In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: On Mon, 13 Nov 2023 20:42:00 GMT, Roger Riggs wrote: >> Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. >> The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. >> >> The changes include: >> >> - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate >> if the input array is modified before the constructor returns. >> The resulting string may contain any combination of characters sampled from the input array. >> >> - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. >> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. >> If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. >> If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. >> Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. >> Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. >> >> - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. >> >> - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. > > Roger Riggs has updated the pull request incrementally with two additional commits since the last revision: > > - code and doc cleanup in StringRacyConstructor test > - Update of string_compress for the s390 port to return the index of the non-latin1 char. > Contributed by Amit Kumar. Please don't forget PPC64 (see my comment above). ------------- PR Comment: https://git.openjdk.org/jdk/pull/16425#issuecomment-1809784009 From shade at openjdk.org Tue Nov 14 09:02:42 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 14 Nov 2023 09:02:42 GMT Subject: RFR: 8319406: x86: Shorter movptr(reg, imm) for 32-bit immediates [v4] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 09:30:19 GMT, Aleksey Shipilev wrote: >> Noticed this while doing C1 work, but the issue is more generic. If you look into x86 code, then sometimes you'll notice `movabs` with small immediates on x86. That `movabs` actually carries the full-blown 64-bit immediate. >> >> Similar to [JDK-8255838](https://bugs.openjdk.org/browse/JDK-8255838), it would be useful to shorten movptr(reg, imm) when immediate fits in 32 bits. This would compact some code, notably the code in C1 profiling ([JDK-8315843](https://bugs.openjdk.org/browse/JDK-8315843)), but also other code, generically. >> >> For example, sample branch profiling hunk from C1 tier3 on x86_64: >> >> >> Before: >> 0x00007f269065ed02: test %edx,%edx >> 0x00007f269065ed04: movabs $0x7f260a4ddd68,%rax ; {metadata(method data for {method} ? >> 0x00007f269065ed0e: movabs $0x138,%rsi >> ? 0x00007f269065ed18: je 0x00007f269065ed24 >> ? 0x00007f269065ed1a: movabs $0x148,%rsi >> ? 0x00007f269065ed24: mov (%rax,%rsi,1),%rdi >> 0x00007f269065ed28: lea 0x1(%rdi),%rdi >> 0x00007f269065ed2c: mov %rdi,(%rax,%rsi,1) >> 0x00007f269065ed30: je 0x00007f269065ed4e >> >> After: >> 0x00007f1370dcd782: test %edx,%edx >> 0x00007f1370dcd784: movabs $0x7f12f64ddd68,%rax ; {metadata(method data for {method} ? >> 0x00007f1370dcd78e: mov $0x138,%esi >> ? 0x00007f1370dcd793: je 0x00007f1370dcd79a >> ? 0x00007f1370dcd795: mov $0x148,%esi >> ? 0x00007f1370dcd79a: mov (%rax,%rsi,1),%rdi >> 0x00007f1370dcd79e: lea 0x1(%rdi),%rdi >> 0x00007f1370dcd7a2: mov %rdi,(%rax,%rsi,1) >> 0x00007f1370dcd7a6: je 0x00007f1370dcd7c4 >> >> >> We can use shorter 32-bit immediate moves. In the hunk above, this saves about 8 bytes, look around `movabs` -> `mov` changes. But this is not limited to the profiling code. There are nearly 1% code space savings on larger tests in C2. For example, on `-Xcomp -XX:TieredStopAtLevel=... HelloWorld`: >> >> >> # Before >> nmethod code size : 430328 bytes >> nmethod code size : 467032 bytes >> nmethod code size : 908936 bytes >> nmethod code size : 1267816 bytes >> >> # After >> nmethod code size : 429616 bytes (-0.1%) >> nmethod code size : 466344 bytes (-0.1%) >> nmethod code size : 897144 bytes (-1.3%) >> nmethod code size : 1256216 bytes (-0.9%) >> >> >> There are two wrinkles: >> 1. Current `movslq(Register, int32_t)` is broken and protected by `Sh... > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Remove the movslq declaration as well > - Merge branch 'master' into JDK-8319406-shorter-movptr-32 > - Enlighs > - Remove new imm64 method completely, inline at use > - Easy review feedback > - Merge branch 'master' into JDK-8319406-shorter-movptr-32 > - Fix Thanks! Testing passes, so I am integrating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16497#issuecomment-1809790048 From shade at openjdk.org Tue Nov 14 09:02:44 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 14 Nov 2023 09:02:44 GMT Subject: Integrated: 8319406: x86: Shorter movptr(reg, imm) for 32-bit immediates In-Reply-To: References: Message-ID: On Fri, 3 Nov 2023 16:00:08 GMT, Aleksey Shipilev wrote: > Noticed this while doing C1 work, but the issue is more generic. If you look into x86 code, then sometimes you'll notice `movabs` with small immediates on x86. That `movabs` actually carries the full-blown 64-bit immediate. > > Similar to [JDK-8255838](https://bugs.openjdk.org/browse/JDK-8255838), it would be useful to shorten movptr(reg, imm) when immediate fits in 32 bits. This would compact some code, notably the code in C1 profiling ([JDK-8315843](https://bugs.openjdk.org/browse/JDK-8315843)), but also other code, generically. > > For example, sample branch profiling hunk from C1 tier3 on x86_64: > > > Before: > 0x00007f269065ed02: test %edx,%edx > 0x00007f269065ed04: movabs $0x7f260a4ddd68,%rax ; {metadata(method data for {method} ? > 0x00007f269065ed0e: movabs $0x138,%rsi > ? 0x00007f269065ed18: je 0x00007f269065ed24 > ? 0x00007f269065ed1a: movabs $0x148,%rsi > ? 0x00007f269065ed24: mov (%rax,%rsi,1),%rdi > 0x00007f269065ed28: lea 0x1(%rdi),%rdi > 0x00007f269065ed2c: mov %rdi,(%rax,%rsi,1) > 0x00007f269065ed30: je 0x00007f269065ed4e > > After: > 0x00007f1370dcd782: test %edx,%edx > 0x00007f1370dcd784: movabs $0x7f12f64ddd68,%rax ; {metadata(method data for {method} ? > 0x00007f1370dcd78e: mov $0x138,%esi > ? 0x00007f1370dcd793: je 0x00007f1370dcd79a > ? 0x00007f1370dcd795: mov $0x148,%esi > ? 0x00007f1370dcd79a: mov (%rax,%rsi,1),%rdi > 0x00007f1370dcd79e: lea 0x1(%rdi),%rdi > 0x00007f1370dcd7a2: mov %rdi,(%rax,%rsi,1) > 0x00007f1370dcd7a6: je 0x00007f1370dcd7c4 > > > We can use shorter 32-bit immediate moves. In the hunk above, this saves about 8 bytes, look around `movabs` -> `mov` changes. But this is not limited to the profiling code. There are nearly 1% code space savings on larger tests in C2. For example, on `-Xcomp -XX:TieredStopAtLevel=... HelloWorld`: > > > # Before > nmethod code size : 430328 bytes > nmethod code size : 467032 bytes > nmethod code size : 908936 bytes > nmethod code size : 1267816 bytes > > # After > nmethod code size : 429616 bytes (-0.1%) > nmethod code size : 466344 bytes (-0.1%) > nmethod code size : 897144 bytes (-1.3%) > nmethod code size : 1256216 bytes (-0.9%) > > > There are two wrinkles: > 1. Current `movslq(Register, int32_t)` is broken and protected by `ShouldNotReachHere()`. I would have used it in this patch, but x86_64 does not actually define `mo... This pull request has now been integrated. Changeset: b120a05b Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/b120a05b22426567b33bbe491f791179e377bd78 Stats: 26 lines in 3 files changed: 13 ins; 12 del; 1 mod 8319406: x86: Shorter movptr(reg, imm) for 32-bit immediates Reviewed-by: qamai, kvn ------------- PR: https://git.openjdk.org/jdk/pull/16497 From stefank at openjdk.org Tue Nov 14 09:24:35 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 14 Nov 2023 09:24:35 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v10] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 09:00:31 GMT, Stefan Karlsson wrote: >> A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Tweak test This also passes the full tier1-3 testing. Some reviewers looked at the earlier version of this patch, would you mind reviewing the latest changes as well? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16519#issuecomment-1809831792 From aph at openjdk.org Tue Nov 14 09:53:40 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 14 Nov 2023 09:53:40 GMT Subject: RFR: 8319973: AArch64: Save and restore FPCR in the call stub Message-ID: On AArch64 we don't save and restore the default floating-point control state when we enter and leave Java code. We really should, because if we're called via the JNI invocation interface with a weird FP control state we'll not be Java compatible. ------------- Commit messages: - Correct mistaken change - 8319973: AArch64: Save and restore FPCR in the call stub Changes: https://git.openjdk.org/jdk/pull/16637/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16637&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319973 Stats: 34 lines in 4 files changed: 28 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/16637.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16637/head:pull/16637 PR: https://git.openjdk.org/jdk/pull/16637 From mli at openjdk.org Tue Nov 14 10:01:42 2023 From: mli at openjdk.org (Hamlin Li) Date: Tue, 14 Nov 2023 10:01:42 GMT Subject: RFR: 8318218: RISC-V: C2 CompressBits [v9] In-Reply-To: <7nzXUcZTMe6tmV7VPZmIYTWA7z7aakB_oL-jQuYLI-8=.5ae719d6-6d6d-41fb-8128-09f634c45538@github.com> References: <7nzXUcZTMe6tmV7VPZmIYTWA7z7aakB_oL-jQuYLI-8=.5ae719d6-6d6d-41fb-8128-09f634c45538@github.com> Message-ID: On Mon, 13 Nov 2023 15:27:28 GMT, Hamlin Li wrote: >> Hi, >> Can you review the change to add intrinsic for CompressBits for Long & Integer? >> Thanks! >> >> ##?Test >> pass jtreg test: >> test/jdk/java/lang/CompressExpand*.java > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > Fix typos Thanks everyone for your review and discussion! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16481#issuecomment-1809888169 From mli at openjdk.org Tue Nov 14 10:01:45 2023 From: mli at openjdk.org (Hamlin Li) Date: Tue, 14 Nov 2023 10:01:45 GMT Subject: Integrated: 8318218: RISC-V: C2 CompressBits In-Reply-To: References: Message-ID: On Thu, 2 Nov 2023 16:50:50 GMT, Hamlin Li wrote: > Hi, > Can you review the change to add intrinsic for CompressBits for Long & Integer? > Thanks! > > ##?Test > pass jtreg test: > test/jdk/java/lang/CompressExpand*.java This pull request has now been integrated. Changeset: cb7875d5 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/cb7875d57db652cd49cdc09a92d2c1be2b5ec66a Stats: 105 lines in 4 files changed: 105 ins; 0 del; 0 mod 8318218: RISC-V: C2 CompressBits Reviewed-by: fyang, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/16481 From rehn at openjdk.org Tue Nov 14 10:44:30 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 14 Nov 2023 10:44:30 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 In-Reply-To: References: Message-ID: On Tue, 14 Nov 2023 08:12:57 GMT, Fei Yang wrote: >> Hi, please consider. >> >> Main author is @luhenry, I only fixed some minor things and tested it. >> >> Such as: >> test/hotspot/jtreg/compiler/intrinsics/sha/ >> test/jdk/java/security/MessageDigest/ >> test/jdk/jdk/security/ >> tier1 >> >> And still running some test. > > src/hotspot/cpu/riscv/vm_version_riscv.cpp line 160: > >> 158: } >> 159: >> 160: if (UseZvknha && UseZvkb) { > > A simple question here: Does the existence of `Zvknhb` also means availability of `Zvknha`? Or should this be something like `if ((UseZvknha || UseZvknhb) && UseZvkb)`? It seems like the correct answer is: `? Zvknhb supports SHA-256 and SHA-512.` I suggest we start with supporting Zvkn, which is: Zvkned, Zvknhb, Zvkb, Zvkt They require: Zve64x If someone have a CPU lacking something we can revisit it. (I *think* Zvknha is mainly for 32-bits, as it only require sew 32) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1392369153 From rmarchenko at openjdk.org Tue Nov 14 11:04:31 2023 From: rmarchenko at openjdk.org (Roman Marchenko) Date: Tue, 14 Nov 2023 11:04:31 GMT Subject: RFR: 8319961: JvmtiEnvBase doesn't zero _ext_event_callbacks [v2] In-Reply-To: References: Message-ID: <_6upH3UTrhu5j3PkSfF3na6dRbYKMCf-_8YQAWbctG8=.527eceac-8009-4b8a-92df-773d2c5717bd@github.com> On Tue, 14 Nov 2023 08:18:41 GMT, Roman Marchenko wrote: >> Zero'ing memory of extension event callbacks > > Roman Marchenko has updated the pull request incrementally with one additional commit since the last revision: > > Fixing review comments The previous test run was OK https://github.com/wkia/jdk/actions/runs/6852317395 Now it fails on MacOS: "hotspot/jtreg/gc/TestAllocHumongousFragment.java: java.lang.OutOfMemoryError: Java heap space", so I guess it is caused by infrastructure issues. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16647#issuecomment-1809990842 From redestad at openjdk.org Tue Nov 14 11:06:37 2023 From: redestad at openjdk.org (Claes Redestad) Date: Tue, 14 Nov 2023 11:06:37 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v5] In-Reply-To: References: Message-ID: <4cwWaIVcV-PwqeqHwi6lL6r-X5XIxMqXim5TbKbdVuo=.7220cb77-2508-4d9d-84b2-e4e261dee676@github.com> On Mon, 13 Nov 2023 21:52:09 GMT, Claes Redestad wrote: > Most quick benchmarks are done, but the bulk will take the night to complete. All done, nothing else that stands out across a medium-size set (400+) of benchmark runs. One caveat is that we don't configure all that many server-style and intentionally large GC-heavy benchmarks on Windows and MacOS hosts, so we might have a bit of a blind spot here that I'll see if we can improve. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1809994382 From jvernee at openjdk.org Tue Nov 14 11:22:45 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 14 Nov 2023 11:22:45 GMT Subject: Integrated: 8254693: Add Panama feature to pass heap segments to native code In-Reply-To: References: Message-ID: On Mon, 16 Oct 2023 10:19:17 GMT, Jorn Vernee wrote: > Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. > > The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. > > Components of this patch: > > - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. > - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. > - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. > - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. > - The object/oop + offset is exposed as temporary address to native code. > - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). > - Only x64 and AArch64 for now. > - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 > - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. > - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` > > Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. > > Numbers for the included benchmark on my machine are: > > > Benchmark (size) Mode Cnt ... This pull request has now been integrated. Changeset: 9c982707 Author: Jorn Vernee URL: https://git.openjdk.org/jdk/commit/9c98270737cd2019f230e9359bb9298f8df2ca35 Stats: 2711 lines in 74 files changed: 1722 ins; 692 del; 297 mod 8254693: Add Panama feature to pass heap segments to native code Reviewed-by: mcimadamore, lucy, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/16201 From mdoerr at openjdk.org Tue Nov 14 11:41:46 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 14 Nov 2023 11:41:46 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v14] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 12:51:36 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. >> >> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. >> >> Components of this patch: >> >> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. >> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. >> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. >> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. >> - The object/oop + offset is exposed as temporary address to native code. >> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). >> - Only x64 and AArch64 for now. >> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 >> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. >> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` >> >> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. >> >> Numbers for the included benchmark on my machine are: >> >> >> Benchmar... > > Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 52 commits: > > - Merge branch 'master' into AllowHeapNoLock > - fix type and reformat doc in Linker > - Merge branch 'master' into AllowHeapNoLock > - tweak whitespace > - a -> an > - add note to downcallHandle about passing heap segments by-reference > - Merge branch 'master' into AllowHeapNoLock > - bump up argument counts in TestLargeStub to their maximum > - s390 updates > - add stub size stress test for allowHeap > - ... and 42 more: https://git.openjdk.org/jdk/compare/03db8281...36da79d1 One additional comment on the pinning topic: We may even want to pin objects across several downcalls. One downcall could be used to initiate async I/O and other downcalls check the result. The buffer must be stable in the time between them: "The buffer area being written out must not be accessed during the operation or undefined results may occur. The memory areas involved must remain valid." https://man7.org/linux/man-pages/man3/aio_write.3.html (Not sure if we would use on heap memory for that.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1810042909 From mli at openjdk.org Tue Nov 14 12:15:43 2023 From: mli at openjdk.org (Hamlin Li) Date: Tue, 14 Nov 2023 12:15:43 GMT Subject: RFR: 8319781: RISC-V: Refactor UseRVV related checks [v4] In-Reply-To: References: Message-ID: <3M4mto15zgn33EPWfssIXr8EV2psJ0tetSMH_fxpuco=.c30d847b-adc5-4469-a49d-900ed6edb045@github.com> > Hi, > Can you review the patch to refactor the code related UseRVV checks? > Thanks! > > There are some code (flag setting/checking) depending on UseRVV's value, these code should be refactored, especially after the change of https://bugs.openjdk.org/browse/JDK-8319408: > 1. some code needs to get the final UseRVV's value rather than the initial value, e.g. ChaCha20 intrinsic. > 2. refactored to be more readable. > 3. also add note to make sure the future code get the final UseRVV value instead of inital value. Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - merge master - remove code setting SpecialEncodeISOArray - refine comments - Initial commit ------------- Changes: https://git.openjdk.org/jdk/pull/16580/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16580&range=03 Stats: 53 lines in 2 files changed: 21 ins; 29 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16580.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16580/head:pull/16580 PR: https://git.openjdk.org/jdk/pull/16580 From shade at openjdk.org Tue Nov 14 13:28:09 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 14 Nov 2023 13:28:09 GMT Subject: RFR: 8319777: Zero: Support 8-byte cmpxchg [v2] In-Reply-To: References: Message-ID: <5X9UjtgpVfSFxZQggFfQS1Z99xeFR-u1EjoWtIWdVOA=.1528ea9b-4725-4ae2-8606-65ce20ccb7b4@github.com> > See related discussion in [JDK-8318776](https://bugs.openjdk.org/browse/JDK-8318776) that targets to require `supports_cx8()` unconditionally. > > I think we can claim Zero is `supports_cx8() == true`, because we have enough fallbacks for 8-byte CASes to work. Note that some code already reaches for these without checking for `supports_cx8()`, so the proverbial horses have already left the barn. > > I ran tests with [JDK-8319883](https://bugs.openjdk.org/browse/JDK-8319883) applied to fix known problems with x86_32 Zero. > > Additional testing: > - [x] Linux x86_32 Zero release; jcstress > - [x] Linux x86_32 Zero fastdebug, `compiler/unsafe java/lang/invoke/VarHandles` > - [x] Linux x86_32 Zero fastdebug, bootcycle-images Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Only do _supports_cx8 = true - Merge branch 'master' into JDK-8319777-zero-64cas - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16614/files - new: https://git.openjdk.org/jdk/pull/16614/files/83c93a3c..783330fa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16614&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16614&range=00-01 Stats: 602737 lines in 742 files changed: 78192 ins; 465259 del; 59286 mod Patch: https://git.openjdk.org/jdk/pull/16614.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16614/head:pull/16614 PR: https://git.openjdk.org/jdk/pull/16614 From shade at openjdk.org Tue Nov 14 13:28:09 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 14 Nov 2023 13:28:09 GMT Subject: RFR: 8319777: Zero: Support 8-byte cmpxchg [v2] In-Reply-To: <04inV6abqW4etamQXMc3QSFAH77roMCCYQYP7dZ4b2s=.70053ee1-491e-44b1-af1c-e80cf0d2f3d3@github.com> References: <04inV6abqW4etamQXMc3QSFAH77roMCCYQYP7dZ4b2s=.70053ee1-491e-44b1-af1c-e80cf0d2f3d3@github.com> Message-ID: On Mon, 13 Nov 2023 21:32:33 GMT, David Holmes wrote: >> Well, yes, we can just do `_supports_cx8 = true`. >> >> But I am confused by the meaning of `SUPPORTS_NATIVE_CX8`. What is it? I read it as "we know statically, at compile time, that the target platform supports CX8". Otherwise, we poll it at runtime and let the runtime code decide by checking `VMVersion::supports_cx8()`. Defining `SUPPORTS_NATIVE_CX8` compiles out access backend locking paths completely, for example, without resorting to runtime checks. >> >> What I am missing? Is the wording for the comment misleading? > > Yeah it is something that can be read two ways and the current code is confused about it. I take it to mean there is actual native ISA support, versus there is some way of achieving the same effect. That is the way the ARM code uses it: if you build for ARMv7 then` SUPPORTS_NATIVE_CX8` is defined, otherwise runtime checks exist for `ldrex` or `kuser_helper` support. > > Other platforms confuse things somewhat. Here's the definition of `supports_cx8()`: > > static bool supports_cx8() { > #ifdef SUPPORTS_NATIVE_CX8 > return true; > #else > return _supports_cx8; > #endif > } > > So if you define `SUPPORTS_NATIVE_CX8` then `supports_cx8()` is always true - no runtime checks involved, no read of `_supports_cx8`. You've indicated that you have built a binary to only run where there is native CX8 support. Otherwise you should use runtime checks to set `_supports_cx8` as appropriate to control what `supports_cx8()` returns. Setting both is redundant/pointless and existing code is very confused about this. Take a look at x86 for example, it both defines SUPPORTS_NATIVE_CX8 and has ` _supports_cx8 = supports_cmpxchg8();` - but the latter is dead code as nothing will ever read it. > > So for me Zero was correct to only set `SUPPORTS_NATIVE_CX8` for 64-bit, but what it failed to do was set `_supports_cx8` on 32-bit. All right, I guess that would explain what `NATIVE` means. Although for Zero case, the "native ISA" is basically compiler built-ins, which are arguably "natively" supported :) Anyway, if that makes your follow-ups easier, there is no need to spend more time dwelling on this. I pushed the update now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16614#discussion_r1392576963 From aboldtch at openjdk.org Tue Nov 14 13:34:49 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 14 Nov 2023 13:34:49 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v3] In-Reply-To: References: Message-ID: <2MRTHFoYSaSW2NH922LOEvqKx4NLjshWaHJaYV2RdVY=.e234046a-aac8-4d7b-81b9-269506944165@github.com> > LM_LIGHTWEIGHT only uses the lock bits for its locking. This leaves the hashCode bits free when a monitor is not inflated. So instead of inflating when installing the hashCode on a fast locked object it can simply use the hashCode bits in the markWord. > > The mark word transitions Unlocked (0b01) <=> Locked (0b00) are done by retrying the CAS if it fails due to non-lock bit changes. > The mark word transitions Monitor (0b10) <=> Locked/Unlocked (0b0X) are the same as before, inflation already handles hash codes. This change does not interact with the mark word if it is in a Monitor (0b10) state, so the strong CAS which is used for deflation are still valid, and will not fail to any other reason than the cooperative race to help transition the mark word during deflation. > > This is dependent on JDK-8319778 simply because JDK-8319797 is dependent on both this and JDK-8319778. Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge remote-tracking branch 'upstream_jdk/pr/16602' into JDK-8319773 - Simplify test. - 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16603/files - new: https://git.openjdk.org/jdk/pull/16603/files/f10571e8..e6055689 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16603&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16603&range=01-02 Stats: 599626 lines in 656 files changed: 76230 ins; 464489 del; 58907 mod Patch: https://git.openjdk.org/jdk/pull/16603.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16603/head:pull/16603 PR: https://git.openjdk.org/jdk/pull/16603 From aboldtch at openjdk.org Tue Nov 14 13:42:13 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 14 Nov 2023 13:42:13 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v4] In-Reply-To: References: Message-ID: > Implements the runtime part of JDK-8319796. > The different CPU implementations are/will be created as dependent pull requests. > > This enhancement proposes introducing the ability for LM_LIGHTWEIGHT to handle consecutive recursive monitor enter. Limiting the implementation to only consecutive monitor enters allows for more efficient emitted code which only needs to look at the two top most entires on the lock stack to determine what to do in a monitor exit. > > A high level overview: > * Locking is still performed on the mark word > * Unlocked (0b01) <=> Locked (0b00) > * Monitor enter on Obj with mark word Unlocked (0b01) is the same > * Transition Obj's mark word Unlocked (0b01) => Locked (0b00) > * Push Obj onto the lock stack > * Success > * Monitor enter on Obj with mark word Locked (0b00) will check the top entry on the lock stack > * If top entry is Obj > * Push Obj on the lock stack > * Success > * If top entry is not Obj > * Inflate and call ObjectMonitor::enter > * Monitor exit on Obj with mark word Locked (0b00) will check the two top entries on the lock stack > * If just the top entry is Obj > * Transition Obj's mark word Locked (0b00) => Unlocked (0b01) > * Pop the entry > * Success > * If both entries are Obj > * Pop the top entry > * Success > * Any other case only occurs for unstructured locking, then just inflate and call ObjectMonitor::exit > * If the monitor has been inflated for object Obj which is owned by the current thread > * All corresponding entries for Obj is removed from the lock stack > * The monitor recursions is set to the number of removed entries - 1 > * The owner is changed from anonymous to the thread > * The regular ObjectMonitor::action is called. Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge remote-tracking branch 'upstream_jdk/pr/16603' into JDK-8319797 - Fix nit - Fix comment typos - 8319797: Recursive lightweight locking: Runtime implementation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16606/files - new: https://git.openjdk.org/jdk/pull/16606/files/52b38136..95f5e5aa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16606&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16606&range=02-03 Stats: 599637 lines in 657 files changed: 76230 ins; 464498 del; 58909 mod Patch: https://git.openjdk.org/jdk/pull/16606.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16606/head:pull/16606 PR: https://git.openjdk.org/jdk/pull/16606 From fyang at openjdk.org Tue Nov 14 13:51:32 2023 From: fyang at openjdk.org (Fei Yang) Date: Tue, 14 Nov 2023 13:51:32 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 In-Reply-To: References: Message-ID: On Tue, 14 Nov 2023 10:37:31 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/vm_version_riscv.cpp line 160: >> >>> 158: } >>> 159: >>> 160: if (UseZvknha && UseZvkb) { >> >> A simple question here: Does the existence of `Zvknhb` also means availability of `Zvknha`? Or should this be something like `if ((UseZvknha || UseZvknhb) && UseZvkb)`? > > It seems like the correct answer is: > `? Zvknhb supports SHA-256 and SHA-512.` > > I suggest we start with supporting Zvkn, which is: > Zvkned, Zvknhb, Zvkb, Zvkt > They require: Zve64x > > If someone have a CPU lacking something we can revisit it. > > (I *think* Zvknha is mainly for 32-bits, as it only require sew 32) Yeah, I agree it's more reasonable to check for `Zvkn` here which stands for NIST Algorithm Suite. I see the vector cryptography spec says: The Zvknhb and Zvbc Vector Crypto Extensions --and accordingly the composite extensions Zvkn and Zvks-- require a Zve64x base, or application ("V") base Vector Extension. My understanding is that either `Zve64x` or RVV for our case will do. So we might want to do this check: `if (UseRVV && UseZvkn)`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1392608753 From shade at openjdk.org Tue Nov 14 14:37:58 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 14 Nov 2023 14:37:58 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v6] In-Reply-To: References: Message-ID: <9a-07LkAZPw_SiOpo1uO8uA9i66AVNi9OkJjNZfSHU0=.8f4feee9-2b76-4006-a5ca-f0ce2d71c6b0@github.com> > See the symptoms, reproducer and analysis in the bug. > > Current code waits on `disarm()`, which effectively stalls leaving the safepoint if some threads lag behind. Having more runnable threads than CPUs nearly guarantees that we would wait for quite some time, but it also reproduces well if you have enough threads near the CPU count. Just waiting at `arm()` is insufficient, but we can have several `Semaphores` to do what we want. > > This PR implements a more efficient `GenericWaitBarrier` to recover the performance. Most of the implementation discussion is in the code comments. The key observation that drives this work is that we want to reuse `Semaphore` and related counters without being stuck waiting for threads to leave. > > (AFAICS, futex-based `LinuxWaitBarrier` does roughly the same, but handles this reuse on futex side, by assigning the "address" per futex.) > > This issue affects everything except Linux. I initially found this on my M1 Mac, but pretty sure it is easy to reproduce on Windows as well. The safepoints from the reproducer in the bug improved dramatically on a Mac. Note not only the orders of magnitude better safepoint times, but also the several times more GC safepoints in the time-bound allocation test, which means the attainable GC throughput is similarly better, since we don't waste time at this wait barrier. > > ![plot-generic-wait-barrier-macos](https://github.com/openjdk/jdk/assets/1858943/3440f200-2360-482a-b8b2-71385cca35b7) > > Additional testing: > - [x] MacOS AArch64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) > - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) > - [x] MacOS AArch64 server fastdebug, `tier2 tier3` > - [x] Linux x86_64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) > - [x] Linux AArch64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: - Drop the Linux check in preparation for integration - Merge branch 'master' into JDK-8318986-generic-wait-barrier - Merge branch 'master' into JDK-8318986-generic-wait-barrier - Rework paddings - Encode barrier tag into state, resolving another race condition - Simple review feedback fixes: tracking wakeup numbers, reflowing some methods - Merge branch 'master' into JDK-8318986-generic-wait-barrier - Touchups - More comments work - Tight up the comments - ... and 3 more: https://git.openjdk.org/jdk/compare/258078cf...191c0dbb ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16404/files - new: https://git.openjdk.org/jdk/pull/16404/files/bca446d9..191c0dbb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16404&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16404&range=04-05 Stats: 614970 lines in 877 files changed: 81324 ins; 473303 del; 60343 mod Patch: https://git.openjdk.org/jdk/pull/16404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16404/head:pull/16404 PR: https://git.openjdk.org/jdk/pull/16404 From shade at openjdk.org Tue Nov 14 14:37:59 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 14 Nov 2023 14:37:59 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v5] In-Reply-To: <4cwWaIVcV-PwqeqHwi6lL6r-X5XIxMqXim5TbKbdVuo=.7220cb77-2508-4d9d-84b2-e4e261dee676@github.com> References: <4cwWaIVcV-PwqeqHwi6lL6r-X5XIxMqXim5TbKbdVuo=.7220cb77-2508-4d9d-84b2-e4e261dee676@github.com> Message-ID: On Tue, 14 Nov 2023 11:03:43 GMT, Claes Redestad wrote: > > Most quick benchmarks are done, but the bulk will take the night to complete. > > All done, nothing else that stands out across a medium-size set (400+) of benchmark runs. One caveat is that we don't configure all that many server-style and intentionally large GC-heavy benchmarks on Windows and MacOS hosts, so we might have a bit of a blind spot here that I'll see if we can improve. Thank you! I tend to suggest that we do not treat current performance data as the blocker for integration. The realistic cases -- when systems are have more runnable threads than CPUs -- do benefit from this optimization substantially. I wouldn't expect well-tuned benchmarking environments suffer from this problem. FWIW, I ran `RSABench` on Mac and I cannot see the regression, so at least on Mac we look safe. I have no quiet Windows machine readily available to verify it there. I merged from master and removed the Linux switch, in case you want to run this again. Benchmark Mode Cnt Score Error Units # Baseline RSABench.decrypt thrpt 40 1043,037 ? 1,137 ops/s RSABench.encrypt thrpt 40 35054,674 ? 19,084 ops/s # Patched RSABench.decrypt thrpt 40 1042,407 ? 1,838 ops/s RSABench.encrypt thrpt 40 35056,418 ? 46,026 ops/s ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1810337115 From rmarchenko at openjdk.org Tue Nov 14 14:39:43 2023 From: rmarchenko at openjdk.org (Roman Marchenko) Date: Tue, 14 Nov 2023 14:39:43 GMT Subject: Integrated: 8319961: JvmtiEnvBase doesn't zero _ext_event_callbacks In-Reply-To: References: Message-ID: On Tue, 14 Nov 2023 06:50:58 GMT, Roman Marchenko wrote: > Zero'ing memory of extension event callbacks This pull request has now been integrated. Changeset: 97ea5bf0 Author: Roman Marchenko Committer: Yuri Nesterenko URL: https://git.openjdk.org/jdk/commit/97ea5bf0ffafaf8009c19483b9a9b1c30401cf9a Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8319961: JvmtiEnvBase doesn't zero _ext_event_callbacks Reviewed-by: dholmes ------------- PR: https://git.openjdk.org/jdk/pull/16647 From pchilanomate at openjdk.org Tue Nov 14 14:48:41 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 14 Nov 2023 14:48:41 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v10] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 09:00:31 GMT, Stefan Karlsson wrote: >> A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Tweak test Still good. test/hotspot/jtreg/runtime/Monitor/ConcurrentDeflation.java line 79: > 77: monitorCount++; > 78: } > 79: Nit: extra line ------------- Marked as reviewed by pchilanomate (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16519#pullrequestreview-1729910800 PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1392704732 From duke at openjdk.org Tue Nov 14 15:09:42 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Tue, 14 Nov 2023 15:09:42 GMT Subject: RFR: 8318159: RISC-V: Improve itable_stub Message-ID: Please review the change for RISC-V similar to #13792(AARCH64) and #13460(X86). >From #13792: The change replaces two separate iterations over the itable with new algorithm consisting of two loops. First, we look for a match with resolved_klass, checking for a match with holder_klass along the way. Then we continue iterating (not starting over) the itable using the second loop, checking only for a match with holder_klass. ### Correctness checks Testing: tier1 tests successfully passed on HiFive Unmatched board. #### Performance results on RISC-V StarFive JH7110 board: InterfaceCalls: before fix after fix ------------------------------------------------------------------- Benchmark Mode Cnt Score Error Score Error Units ------------------------------------------------------------------- test1stInt2Types avgt 100 14.380 ? 0.017 | 14.370 ? 0.014 ns/op test1stInt3Types avgt 100 72.724 ? 0.552 | 66.290 ? 0.080 ns/op test1stInt5Types avgt 100 73.948 ? 0.524 | 68.781 ? 0.377 ns/op test2ndInt2Types avgt 100 15.705 ? 0.016 | 15.707 ? 0.018 ns/op test2ndInt3Types avgt 100 82.370 ? 0.453 | 75.363 ? 0.156 ns/op test2ndInt5Types avgt 100 85.266 ? 0.466 | 80.969 ? 0.752 ns/op testIfaceCall avgt 100 75.684 ? 0.648 | 72.603 ? 0.460 ns/op testIfaceExtCall avgt 100 86.293 ? 0.567 | 77.939 ? 0.340 ns/op testMonomorphic avgt 100 11.357 ? 0.007 | 11.359 ? 0.009 ns/op ------------------------------------------------------------------- #### Performance results on RISC-V HiFive Unmatched board: InterfaceCalls: before fix after fix --------------------------------------------------------------------- Benchmark Mode Cnt Score Error Score Error Units --------------------------------------------------------------------- test1stInt2Types avgt 100 24.432 ? 1.811 | 23.205 ? 1.512 ns/op test1stInt3Types avgt 100 135.800 ? 3.991 | 127.112 ? 2.299 ns/op test1stInt5Types avgt 100 141.746 ? 4.272 | 136.069 ? 4.919 ns/op test2ndInt2Types avgt 100 31.474 ? 2.468 | 26.978 ? 1.951 ns/op test2ndInt3Types avgt 100 146.410 ? 3.575 | 139.443 ? 3.677 ns/op test2ndInt5Types avgt 100 156.083 ? 3.617 | 150.583 ? 2.909 ns/op testIfaceCall avgt 100 136.392 ? 2.546 | 129.632 ? 1.662 ns/op testIfaceExtCall avgt 100 155.602 ? 3.836 | 138.058 ? 2.147 ns/op testMonomorphic avgt 100 24.018 ? 1.888 | 21.522 ? 1.662 ns/op --------------------------------------------------------------------- #### Performance results on RISC-V THead board: InterfaceCalls: before fix after fix --------------------------------------------------------------------- Benchmark Mode Cnt Score Error Score Error Units --------------------------------------------------------------------- test1stInt2Types avgt 100 33.326 ? 0.184 | 33.501 ? 0.237 ns/op test1stInt3Types avgt 100 111.881 ? 0.707 | 109.882 ? 0.664 ns/op test1stInt5Types avgt 100 115.059 ? 0.728 | 113.751 ? 0.532 ns/op test2ndInt2Types avgt 100 34.784 ? 0.437 | 34.725 ? 0.412 ns/op test2ndInt3Types avgt 100 115.042 ? 0.518 | 113.346 ? 0.543 ns/op test2ndInt5Types avgt 100 114.700 ? 1.214 | 112.871 ? 1.203 ns/op testIfaceCall avgt 100 114.211 ? 1.023 | 113.747 ? 1.323 ns/op testIfaceExtCall avgt 100 116.485 ? 0.864 | 113.872 ? 1.275 ns/op testMonomorphic avgt 100 31.404 ? 0.236 | 32.133 ? 0.802 ns/op --------------------------------------------------------------------- ------------- Commit messages: - 8318159: RISC-V: Improve itable_stub Changes: https://git.openjdk.org/jdk/pull/16657/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16657&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318159 Stats: 134 lines in 3 files changed: 115 ins; 15 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16657/head:pull/16657 PR: https://git.openjdk.org/jdk/pull/16657 From redestad at openjdk.org Tue Nov 14 15:10:34 2023 From: redestad at openjdk.org (Claes Redestad) Date: Tue, 14 Nov 2023 15:10:34 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v5] In-Reply-To: References: <4cwWaIVcV-PwqeqHwi6lL6r-X5XIxMqXim5TbKbdVuo=.7220cb77-2508-4d9d-84b2-e4e261dee676@github.com> Message-ID: On Tue, 14 Nov 2023 14:32:39 GMT, Aleksey Shipilev wrote: > FWIW, I ran `RSABench` on Mac and I cannot see the regression, so at least on Mac we look safe. No issue on Mac (neither aarch64 nor x64) in our testing either. The small set of minor regressions on Windows might be a red herring: definitely not a blocker given current, preliminary testing, but if it reproduces after integration and a few automated retriage runs it might warrant a low-priority bug to investigate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1810415706 From duke at openjdk.org Tue Nov 14 15:26:46 2023 From: duke at openjdk.org (Lei Zaakjyu) Date: Tue, 14 Nov 2023 15:26:46 GMT Subject: RFR: JDK-8234502 : Merge GenCollectedHeap and SerialHeap [v4] In-Reply-To: References: Message-ID: > JDK-8234502 : Merge GenCollectedHeap and SerialHeap Lei Zaakjyu has updated the pull request incrementally with one additional commit since the last revision: Completely removed 'GenCollectedHeap' ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16623/files - new: https://git.openjdk.org/jdk/pull/16623/files/883dd7b1..8f164430 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16623&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16623&range=02-03 Stats: 74 lines in 3 files changed: 4 ins; 69 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16623.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16623/head:pull/16623 PR: https://git.openjdk.org/jdk/pull/16623 From aboldtch at openjdk.org Tue Nov 14 15:39:49 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 14 Nov 2023 15:39:49 GMT Subject: RFR: 8319799: Recursive lightweight locking: x86 implementation [v3] In-Reply-To: References: Message-ID: > Implements the x86 port of JDK-8319796. > > There are two major parts for the port implementation. The C2 part, and the part shared by the interpreter, C1 and the native call wrapper. > > The biggest change for both parts is that we check the lock stack first and if it is a recursive lightweight [un]lock and in that case simply pop/push and finish successfully. > > Only if the recursive lightweight [un]lock fails does it look at the mark word. > > For the shared part if it is an unstructured exit, the monitor is inflated or the mark word transition fails it calls into the runtime. > > The C2 operates under a few more assumptions, that the locking is structured and balanced. This means that some checks can be elided. > > First this means that in C2 unlock if the obj is not on the top of the lock stack, it must be inflated. And reversely if we reach the inflated C2 unlock the obj is not on the lock stack. This second property makes it possible to avoid reading the owner (and checking if it is anonymous). Instead it can either just do an un-contended unlock by writing null to the owner, or if contention happens, simply write the thread to the owner and jump to the runtime. > > The x86 C2 port also has some extra oddities. > > The mark word read is done early as it showed better scaling in hyper-threaded scenarios on certain intel hardware, and no noticeable downside on other tested x86 hardware. > > The fast path is written to avoid going through conditional branches. This in combination with keeping the ZF output correct, the code does some actions eagerly, decrementing the held monitor count, popping from the lock stack. And jumps to a code stub if a slow path is required which restores the thread local state to a correct state before jumping to the runtime. > > The contended unlock was also moved to the code stub. Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge remote-tracking branch 'upstream_jdk/pr/16606' into JDK-8319799 - Fix type - Move inflated check in fast_locked - Move top load - 8319799: Recursive lightweight locking: x86 implementation - Cleanup: C2 fast_lock/fast_unlock x86 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16607/files - new: https://git.openjdk.org/jdk/pull/16607/files/39b421c1..e37d1bd2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16607&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16607&range=01-02 Stats: 599640 lines in 659 files changed: 76230 ins; 464498 del; 58912 mod Patch: https://git.openjdk.org/jdk/pull/16607.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16607/head:pull/16607 PR: https://git.openjdk.org/jdk/pull/16607 From iklam at openjdk.org Tue Nov 14 15:43:46 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 14 Nov 2023 15:43:46 GMT Subject: RFR: 8319999: Refactor MetaspaceShared::use_full_module_graph() [v5] In-Reply-To: References: Message-ID: <0dtawSG7Oa7sgwNKX_NzpeasA_7fZThrP9oArHaIUDA=.0b826549-811d-4fcd-91e6-8b3bafc00bc0@github.com> > This is another step of moving CDS config management into cdsConfig.hpp: > > The function `MetaspaceShared::use_full_module_graph()` is split into two: > - `CDSConfig::is_dumping_full_module_graph()` > - `CDSConfig::is_loading_full_module_graph()` Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into 8319999-refactor-metaspaceshared-use-full-module-graph - fixed white spaces - Changed flag to CDSConfig::_dumping_full_module_graph_disabled - rename FileMapHeader::_use_full_module_graph -> _has_full_module_graph - 8319999: Refactor MetaspaceShared::use_full_module_graph() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16646/files - new: https://git.openjdk.org/jdk/pull/16646/files/da947425..33c9e39b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16646&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16646&range=03-04 Stats: 4209 lines in 116 files changed: 2715 ins; 881 del; 613 mod Patch: https://git.openjdk.org/jdk/pull/16646.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16646/head:pull/16646 PR: https://git.openjdk.org/jdk/pull/16646 From shade at openjdk.org Tue Nov 14 15:44:33 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 14 Nov 2023 15:44:33 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v5] In-Reply-To: References: <4cwWaIVcV-PwqeqHwi6lL6r-X5XIxMqXim5TbKbdVuo=.7220cb77-2508-4d9d-84b2-e4e261dee676@github.com> Message-ID: On Tue, 14 Nov 2023 15:07:30 GMT, Claes Redestad wrote: > > FWIW, I ran `RSABench` on Mac and I cannot see the regression, so at least on Mac we look safe. > > No issue on Mac (neither aarch64 nor x64) in our testing either. The small set of minor regressions on Windows might be a red herring: definitely not a blocker given current, preliminary testing, but if it reproduces after integration and a few automated retriage runs it might warrant a low-priority bug to investigate. Agreed. Would you like to formally approve this PR? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1810483307 From duke at openjdk.org Tue Nov 14 15:53:11 2023 From: duke at openjdk.org (Lei Zaakjyu) Date: Tue, 14 Nov 2023 15:53:11 GMT Subject: RFR: JDK-8234502 : Merge GenCollectedHeap and SerialHeap [v5] In-Reply-To: References: Message-ID: <_YhRVRgg-b-GnysoG7zHG1lVODxbofxAi4VipF-6jO0=.47e30bef-31e8-4bbe-9c4b-a3cf464d7a58@github.com> > JDK-8234502 : Merge GenCollectedHeap and SerialHeap Lei Zaakjyu has updated the pull request incrementally with one additional commit since the last revision: add some headers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16623/files - new: https://git.openjdk.org/jdk/pull/16623/files/8f164430..bf022a87 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16623&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16623&range=03-04 Stats: 2 lines in 2 files changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16623.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16623/head:pull/16623 PR: https://git.openjdk.org/jdk/pull/16623 From mli at openjdk.org Tue Nov 14 15:57:41 2023 From: mli at openjdk.org (Hamlin Li) Date: Tue, 14 Nov 2023 15:57:41 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v2] In-Reply-To: <9i5yHmpRi3-XqL5lw0-0IexhCDr2FOi5nT4dgY7cWao=.ab8a1d6e-c9fc-4108-820b-374ce7815463@github.com> References: <9i5yHmpRi3-XqL5lw0-0IexhCDr2FOi5nT4dgY7cWao=.ab8a1d6e-c9fc-4108-820b-374ce7815463@github.com> Message-ID: <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> On Mon, 13 Nov 2023 17:34:10 GMT, Yuri Gaevsky wrote: >> Hello All, >> >> Please review these changes to support _vectorizedHashCode intrinsic on >> RISC-V platform. The patch adds the "scalar" code for the intrinsic without >> usage of any RVV instruction but provides manual unrolling of the appropriate >> loop. The code with usage of RVV instruction could be added as follow-up of >> the patch or independently. >> >> Thanks, >> -Yuri Gaevsky >> >> P.S. My OCA has been accepted recently (ygaevsky). >> >> ### Correctness checks >> >> Testing: tier1 tests successfully passed on a RISC-V StarFive JH7110 board with Linux. >> >> ### Performance results (the numbers for non-ints are similar) >> >> #### StarFive JH7110 board: >> >> >> ArraysHashCode: without intrinsic with intrinsic >> ------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> ------------------------------------------------------------------------------- >> multiints 0 avgt 30 2.658 ? 0.001 2.661 ? 0.004 ns/op >> multiints 1 avgt 30 4.881 ? 0.011 4.892 ? 0.015 ns/op >> multiints 2 avgt 30 16.109 ? 0.041 10.451 ? 0.075 ns/op >> multiints 3 avgt 30 14.873 ? 0.068 11.753 ? 0.024 ns/op >> multiints 4 avgt 30 17.283 ? 0.078 13.176 ? 0.044 ns/op >> multiints 5 avgt 30 19.691 ? 0.136 14.723 ? 0.046 ns/op >> multiints 6 avgt 30 21.727 ? 0.166 15.463 ? 0.124 ns/op >> multiints 7 avgt 30 23.790 ? 0.126 18.298 ? 0.059 ns/op >> multiints 8 avgt 30 23.527 ? 0.116 18.267 ? 0.046 ns/op >> multiints 9 avgt 30 27.981 ? 0.303 20.453 ? 0.069 ns/op >> multiints 10 avgt 30 26.947 ? 0.215 20.541 ? 0.051 ns/op >> multiints 50 avgt 30 95.373 ? 0.588 69.238 ? 0.208 ns/op >> multiints 100 avgt 30 177.109 ? 0.525 137.852 ? 0.417 ns/op >> multiints 200 avgt 30 341.074 ? 1.363 296.832 ? 0.725 ns/op >> multiints 500 avgt 30 847.993 ? 1.713 752.415 ? 1.918 ns/op >> multiints 1000 avgt 30 1610.199 ? 5.424 1426.112 ? 3.407 ns/op >> multiints 10000 avgt 30 16234.260 ? 26.789 14447.936 ? 26.345 ns/op >> multiints 100000 avgt 30 170726.025 ? 184.003 152587.649 ? 381.964 ns/op >> ---------------------------------------... > > Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: > > Minor cosmetic fixes. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1477: > 1475: case T_SHORT: BLOCK_COMMENT("arrays_hashcode(short) {"); break; > 1476: case T_INT: BLOCK_COMMENT("arrays_hashcode(int) {"); break; > 1477: default: BLOCK_COMMENT("arrays_hashcode {"); break; In `C2_MacroAssembler::arrays_hashcode_elsize`, default action is `ShouldNotReachHere();`, should it be consistent here? src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1491: > 1489: beqz(cnt, DONE); > 1490: > 1491: ld(pow31_1_2, ExternalAddress(StubRoutines::riscv::arrays_hashcode_powers_of_31() Does `mv` of the power values to pow31_1_2 do the same effect as the `ld` here? If it does, mv might be better than ld. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1508: > 1506: } \ > 1507: > 1508: ld(pow31_3_4, ExternalAddress(StubRoutines::riscv::arrays_hashcode_powers_of_31() same comment as the above `ld` of pow31_1_2. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1512: > 1510: > 1511: bind(WIDE_LOOP); > 1512: DO_ELEMENT_LOAD(tmp1, 0) Can you add `;` at the end of the statement? similar comments for other DO_ELEMENT_LOAD's src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1513: > 1511: bind(WIDE_LOOP); > 1512: DO_ELEMENT_LOAD(tmp1, 0) > 1513: DO_ELEMENT_LOAD(tmp3, 1) Would it help to optimize the perf by moving `DO_ELEMENT_LOAD(tmp3, 1)` after `srli(tmp2, pow31_3_4, 32);`? src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1523: > 1521: DO_ELEMENT_LOAD(tmp3, 3) > 1522: srli(tmp2, pow31_1_2, 32); > 1523: mulw(tmp1, tmp1, tmp2); // 31^^1 * ary[i+2] Could this line be optimized as `x<<5-x`? Just as TAIL_LOOP below. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1527: > 1525: addw(result, result, tmp3); // 31^^4 * h + 31^^3 * ary[i+0] + 31^^2 * ary[i+1] > 1526: // + 31^^1 * ary[i+2] + 31^^0 * ary[i+3] > 1527: subw(chunk, chunk, stride); Could chunk and ary be merged into one variable? so we don't need one sub and one add, but only one add here. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1540: > 1538: addw(result, result, tmp1); // result = result + ary[i] > 1539: subw(cnt, cnt, 1); > 1540: add(ary, ary, elsize); Similar comment for cnt and ary as chunk and ary above. src/hotspot/cpu/riscv/riscv.ad line 10306: > 10304: > 10305: > 10306: instruct arrays_hashcode(iRegP_R11 ary, iRegI_R12 cnt, iRegI_R10 result, immI basic_type, Is it necessary to specify the regs(r11/12/10) here? src/hotspot/cpu/riscv/riscv.ad line 10312: > 10310: match(Set result (VectorizedHashCode (Binary ary cnt) (Binary result basic_type))); > 10311: effect(TEMP tmp1, TEMP tmp2, TEMP tmp3, TEMP tmp4, TEMP tmp5, TEMP tmp6, > 10312: USE_KILL ary, USE_KILL cnt, USE basic_type, KILL cr); should `TEMP_DEF result` be added here? src/hotspot/cpu/riscv/stubRoutines_riscv.cpp line 58: > 56: address StubRoutines::riscv::_method_entry_barrier = nullptr; > 57: > 58: ATTRIBUTE_ALIGNED(64) const jint StubRoutines::riscv::_arrays_hashcode_powers_of_31[] = If we use `mv` to replace the `ld` of power values above, these related code could be removed here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1392818497 PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1392805158 PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1392805457 PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1392805546 PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1392805782 PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1392805847 PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1392805967 PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1392806033 PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1392806124 PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1392809666 PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1392820954 From rriggs at openjdk.org Tue Nov 14 16:05:51 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Tue, 14 Nov 2023 16:05:51 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v4] In-Reply-To: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: <8cuxVznAEsVV5haAaBg0aew7QOc9-iFBMLGFvpuUPtM=.8acbbd32-2000-4e2b-bbbe-b73e6b06839e@github.com> > Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. > The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. > > The changes include: > > - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate > if the input array is modified before the constructor returns. > The resulting string may contain any combination of characters sampled from the input array. > > - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. > For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. > If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. > If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. > Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. > Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. > > - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. > > - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. Roger Riggs has updated the pull request incrementally with one additional commit since the last revision: Update PPC implementation of string_compress to return the index of the non-latin1 char Patch supplied by TheRealMDoerr ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16425/files - new: https://git.openjdk.org/jdk/pull/16425/files/f6080595..08f365f9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16425&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16425&range=02-03 Stats: 10 lines in 1 file changed: 0 ins; 8 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16425/head:pull/16425 PR: https://git.openjdk.org/jdk/pull/16425 From rriggs at openjdk.org Tue Nov 14 16:09:33 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Tue, 14 Nov 2023 16:09:33 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v4] In-Reply-To: <8cuxVznAEsVV5haAaBg0aew7QOc9-iFBMLGFvpuUPtM=.8acbbd32-2000-4e2b-bbbe-b73e6b06839e@github.com> References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> <8cuxVznAEsVV5haAaBg0aew7QOc9-iFBMLGFvpuUPtM=.8acbbd32-2000-4e2b-bbbe-b73e6b06839e@github.com> Message-ID: <4Er4Zl1-Xp8krEPHJG4ds-b0xHplbrsHiBWK9sBmHpo=.c4e1cd30-aea2-4660-b632-0f7154aa1300@github.com> On Tue, 14 Nov 2023 16:05:51 GMT, Roger Riggs wrote: >> Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. >> The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. >> >> The changes include: >> >> - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate >> if the input array is modified before the constructor returns. >> The resulting string may contain any combination of characters sampled from the input array. >> >> - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. >> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. >> If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. >> If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. >> Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. >> Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. >> >> - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. >> >> - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. > > Roger Riggs has updated the pull request incrementally with one additional commit since the last revision: > > Update PPC implementation of string_compress to return the index of the non-latin1 char > Patch supplied by TheRealMDoerr Thanks for the PPC update to compress_strings ------------- PR Comment: https://git.openjdk.org/jdk/pull/16425#issuecomment-1810568010 From mli at openjdk.org Tue Nov 14 16:22:29 2023 From: mli at openjdk.org (Hamlin Li) Date: Tue, 14 Nov 2023 16:22:29 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v2] In-Reply-To: <9i5yHmpRi3-XqL5lw0-0IexhCDr2FOi5nT4dgY7cWao=.ab8a1d6e-c9fc-4108-820b-374ce7815463@github.com> References: <9i5yHmpRi3-XqL5lw0-0IexhCDr2FOi5nT4dgY7cWao=.ab8a1d6e-c9fc-4108-820b-374ce7815463@github.com> Message-ID: On Mon, 13 Nov 2023 17:34:10 GMT, Yuri Gaevsky wrote: >> Hello All, >> >> Please review these changes to support _vectorizedHashCode intrinsic on >> RISC-V platform. The patch adds the "scalar" code for the intrinsic without >> usage of any RVV instruction but provides manual unrolling of the appropriate >> loop. The code with usage of RVV instruction could be added as follow-up of >> the patch or independently. >> >> Thanks, >> -Yuri Gaevsky >> >> P.S. My OCA has been accepted recently (ygaevsky). >> >> ### Correctness checks >> >> Testing: tier1 tests successfully passed on a RISC-V StarFive JH7110 board with Linux. >> >> ### Performance results (the numbers for non-ints are similar) >> >> #### StarFive JH7110 board: >> >> >> ArraysHashCode: without intrinsic with intrinsic >> ------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> ------------------------------------------------------------------------------- >> multiints 0 avgt 30 2.658 ? 0.001 2.661 ? 0.004 ns/op >> multiints 1 avgt 30 4.881 ? 0.011 4.892 ? 0.015 ns/op >> multiints 2 avgt 30 16.109 ? 0.041 10.451 ? 0.075 ns/op >> multiints 3 avgt 30 14.873 ? 0.068 11.753 ? 0.024 ns/op >> multiints 4 avgt 30 17.283 ? 0.078 13.176 ? 0.044 ns/op >> multiints 5 avgt 30 19.691 ? 0.136 14.723 ? 0.046 ns/op >> multiints 6 avgt 30 21.727 ? 0.166 15.463 ? 0.124 ns/op >> multiints 7 avgt 30 23.790 ? 0.126 18.298 ? 0.059 ns/op >> multiints 8 avgt 30 23.527 ? 0.116 18.267 ? 0.046 ns/op >> multiints 9 avgt 30 27.981 ? 0.303 20.453 ? 0.069 ns/op >> multiints 10 avgt 30 26.947 ? 0.215 20.541 ? 0.051 ns/op >> multiints 50 avgt 30 95.373 ? 0.588 69.238 ? 0.208 ns/op >> multiints 100 avgt 30 177.109 ? 0.525 137.852 ? 0.417 ns/op >> multiints 200 avgt 30 341.074 ? 1.363 296.832 ? 0.725 ns/op >> multiints 500 avgt 30 847.993 ? 1.713 752.415 ? 1.918 ns/op >> multiints 1000 avgt 30 1610.199 ? 5.424 1426.112 ? 3.407 ns/op >> multiints 10000 avgt 30 16234.260 ? 26.789 14447.936 ? 26.345 ns/op >> multiints 100000 avgt 30 170726.025 ? 184.003 152587.649 ? 381.964 ns/op >> ---------------------------------------... > > Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: > > Minor cosmetic fixes. > The code with usage of RVV instruction could be added as follow-up of the patch or independently. Hey @ygaevsky, I can work on this real vectorized intrinsic implementation, please let me know how you think about it. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16629#issuecomment-1810594088 From jvernee at openjdk.org Tue Nov 14 17:03:32 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 14 Nov 2023 17:03:32 GMT Subject: RFR: 8267532: C2: Profile and prune untaken exception handlers [v5] In-Reply-To: References: Message-ID: On Tue, 14 Nov 2023 03:47:50 GMT, Vladimir Ivanov wrote: > What kind of performance testing have you done? I tested with the benchmarks attached to the PR. All of those improve. I also did some general benchmark runs like DaCapo, specjbb2008, and other internal startup and compiler benchmarks we have (not that Renaisance is broken on the latest JDK). I didn't spot any big differences, but I'm currently doing more followup testing to eliminate false positives/negatives. > The current bug summary is too vague. Please, reword it describing what the proposed enhancement does. Done. > I don't fully understand the issue with `has_monitor`. It does look like a pre-existing issue and it's better to handle it separately. I don't mind moving it to a separate patch, but I don't think it's possible to trigger a failure of the current code without the changes in this patch. So, I don't think it would be possible to add a test in that case. > It's interesting to note that the underlying issue for FFM is not that exception handlers aren't profiled, but that unreached calls are not pruned. It complicates the job for EA making arguments non-scalarizable. Pruning unreachable calls would fix the issue in a more disciplined manner, but it would also have more pervasive effects requiring deeper performance evaluation. Overall, it would be helpful to ensure there are no unreachable calls encountered during C2 compilation at all. My sense when working on this was that C2 relies on dead branches being pruned to eliminate unreached calls (and other code) within those branches. That potentially allows multiple unreached calls to be eliminated using a single uncommon trap for the whole branch. It also allows eliminating other dead code for which we don't profile. So, in other words: pruning at the branch level seems more efficient. Exception dispatch/handlers are just one type of 'branch' that we don't handle at the moment. I agree it would be useful to do a followup search for other cases in which we encounter unreached calls, as an indicator that we failed to eliminate a dead branch. One case I can think of where a block as a whole might appear 'alive', while a call somewhere in that block appears 'dead', is when an instruction before the call always throws an exception, meaning the branch is taken, but the call never executes. Are there other cases you had in mind? > `ciTypeFlow` may benefit from new profiling information as well. Do you mean that successors of an untaken exception handler do not need to care about the type updates in the exception handler block? I can file an RFE if you want. > * I don't see much value in 2 separate product flags to control profiling and optimization logic (`ProfileExceptionHandlers` and `PruneDeadExceptionHandlers`); having a single product flag should be enough; Okay, I can use `PruneDeadExceptionHandlers` everywhere instead. Or do you want to just always turn on the profiling? > * product flags should be marked diagnostic Ok. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1810698528 From jbachorik at openjdk.org Tue Nov 14 17:59:41 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Tue, 14 Nov 2023 17:59:41 GMT Subject: Withdrawn: 8313816: Accessing jmethodID might lead to spurious crashes In-Reply-To: <9OQfGXwDLioWWVoJpNbaJ_seGXH_XB-m8-zx462dIag=.a0641514-a759-4a38-bcd8-c808c7c1aa2e@github.com> References: <9OQfGXwDLioWWVoJpNbaJ_seGXH_XB-m8-zx462dIag=.a0641514-a759-4a38-bcd8-c808c7c1aa2e@github.com> Message-ID: On Mon, 7 Aug 2023 00:50:30 GMT, Jaroslav Bachorik wrote: > This is a best effort attempt to harmonize the handling of jmethodIDs when the instanceKlass internal structures are being deallocated (eg. due to `ClassLoaderData::free_deallocate_list`). > In all other instances, the jmethodID is NULLed (or set to `_free_method`) before the references Method structures are deallocated. This actually makes possible to query such jmethodIDs without crashing JVM even if the corresponding classes/methods were unloaded. > > While it is understandable why JVM can not keep the method metadata around forever it really should be possible to at least assert the validity of a jmethodID without crashing JVM. If nothing else, this makes using the JVMTI functions `GetAllStackTraces` or `GetStackTrace` a risk to use in JVMTI agents - the jmethodids can become invalid the very next moment after the stacktrace is obtained. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/15171 From mli at openjdk.org Tue Nov 14 19:37:45 2023 From: mli at openjdk.org (Hamlin Li) Date: Tue, 14 Nov 2023 19:37:45 GMT Subject: RFR: 8318158: RISC-V: implement roundD/roundF intrinsics [v5] In-Reply-To: References: Message-ID: On Tue, 14 Nov 2023 00:57:47 GMT, Olga Mikhaltsova wrote: >> Please, review this Implementation of the roundD/roundF intrinsics for RISC-V platform. >> >> In the table below it is shown that NaN argument should be processed as a special case. >> >> RISC-V Java >> (FCVT.W.S) (FCVT.L.D) (long round(double a)) (int round(float a)) >> Minimum valid input (after rounding) ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE >> Maximum valid input (after rounding) 2^31 ? 1 2^63 ? 1 Long.MAX_VALUE Integer.MAX_VALUE >> Output for out-of-range negative input ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE >> Output for ?? ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE >> Output for out-of-range positive input 2^31 ? 1 2^63 - 1 Long.MAX_VALUE Integer.MAX_VALUE >> Output for +? 2^31 ? 1 2^63 - 1 Long.MAX_VALUE Integer.MAX_VALUE >> Output for NaN 2^31 ? 1 2^63 - 1 0 0 >> >> The benchmark running with the 2nd fixed implementation on the T-Head RVB-ICE board shows the following performance improvement:: >> >> **Before** >> >> Benchmark (TESTSIZE) Mode Cnt Score Error Units >> FpRoundingBenchmark.test_round_double 2048 thrpt 15 59.555 0.179 ops/ms >> FpRoundingBenchmark.test_round_float 2048 thrpt 15 49.760 0.103 ops/ms >> >> >> **After** >> >> Benchmark (TESTSIZE) Mode Cnt Score Error Units >> FpRoundingBenchmark.test_round_double 2048 thrpt 15 110.956 0.186 ops/ms >> FpRoundingBenchmark.test_round_float 2048 thrpt 15 115.947 0.122 ops/ms > > Olga Mikhaltsova has updated the pull request incrementally with one additional commit since the last revision: > > Used fclass_mask src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4263: > 4261: > 4262: void MacroAssembler::java_round_float(Register dst, FloatRegister src, > 4263: FloatRegister ftmp, Register tmp) { Can we remove the `tmp` parameter here, and use `t0` directly in java_round_float/double? As it's more clear, and in fact in round_float/double_reg it does not allocate a register indeed, and `assert_different_registers ` can be removed too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16382#discussion_r1393166705 From jiangli at openjdk.org Tue Nov 14 20:24:27 2023 From: jiangli at openjdk.org (Jiangli Zhou) Date: Tue, 14 Nov 2023 20:24:27 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is create for one JavaThread associated with attached native thread In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Tue, 14 Nov 2023 03:10:20 GMT, David Holmes wrote: > Is this a case where the code should be checking for `is_attaching_via_jni()`? That's a good question. I think maybe we should try to completely avoid the situation where a 'partial' `JvmtiThreadState` is created when a native thread is attaching and is in the middle of allocating the thread oop. Looking at `JvmtiSampledObjectAllocEventCollector::start()`, I think we can check if the current JavaThread `is_attaching_via_jni()` and the `threadObj()` is null. If that's the case, don't try `setup_jvmti_thread_state()` as things are not ready. In `JvmtiThreadState::state_for_while_locked` we probably want to assert that thread->threadObj() is not null if thread->jvmti_thread_state() not null, to make sure that we don't see a incomplete `JvmtiThreadState`. @caoman, I think this can also address your input on keeping `JvmtiThreadState::_thread_oop_h` always properly initialized for the attaching native thread. I tested it and it seems to work well. I'll update this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16642#issuecomment-1811187153 From dcubed at openjdk.org Tue Nov 14 20:54:40 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 14 Nov 2023 20:54:40 GMT Subject: RFR: 8319961: JvmtiEnvBase doesn't zero _ext_event_callbacks [v2] In-Reply-To: References: Message-ID: On Tue, 14 Nov 2023 08:18:41 GMT, Roman Marchenko wrote: >> Zero'ing memory of extension event callbacks > > Roman Marchenko has updated the pull request incrementally with one additional commit since the last revision: > > Fixing review comments Doing a post integration review. This is a trivial fix and does not need a second review nor wait 24 hours. Just a heads up that HotSpot code normally requires two reviews (1 from a (R)eviewer) and 24 hours unless it is called trivial AND agreed to be trivial by your reviewers. ------------- PR Review: https://git.openjdk.org/jdk/pull/16647#pullrequestreview-1730761590 PR Comment: https://git.openjdk.org/jdk/pull/16647#issuecomment-1811258657 From jiangli at openjdk.org Tue Nov 14 20:57:09 2023 From: jiangli at openjdk.org (Jiangli Zhou) Date: Tue, 14 Nov 2023 20:57:09 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is create for one JavaThread associated with attached native thread [v2] In-Reply-To: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: > Please review JvmtiThreadState::state_for_while_locked change to handle the state->get_thread_oop() null case. Please see https://bugs.openjdk.org/browse/JDK-8319935 for details. Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: Don't try to setup_jvmti_thread_state for obj allocation sampling if the current thread is attaching from native and is allocating the thread oop. That's to make sure we don't create a 'partial' JvmtiThreadState. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16642/files - new: https://git.openjdk.org/jdk/pull/16642/files/959305be..c2f83e8a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16642&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16642&range=00-01 Stats: 24 lines in 2 files changed: 19 ins; 4 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16642.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16642/head:pull/16642 PR: https://git.openjdk.org/jdk/pull/16642 From matsaave at openjdk.org Tue Nov 14 21:10:32 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 14 Nov 2023 21:10:32 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v12] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. > > This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. > > Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. > > To streamline the review, please consider these major areas that have been changed: > 1. ResolvedMethodEntry class > 2. Rewriter for initialization of the structure > 3. cpCache for resolution > 4. InterpreterRuntime, linkResolver, and templateTable > 5. JVMCI > 6. SA > > Verified with tier 1-9 tests. > > This change supports the following platforms: x86, aarch64, RISCV, PPC Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: - Merge branch 'master' into method_entry_8301997 - Merge branch 'master' into method_entry_8301997 - RISCV update 2 - PPC port - Improved load_resolved_method_entry_handle on x86 and aarch64 - RISCV port update - Prepare_invoke args and hard coded registers - RISCV Port - Merge branch 'master' into method_entry_8301997 - Merge branch 'master' into method_entry_8301997 - ... and 9 more: https://git.openjdk.org/jdk/compare/12fce4b7...b653baa6 ------------- Changes: https://git.openjdk.org/jdk/pull/15455/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15455&range=11 Stats: 3656 lines in 74 files changed: 1179 ins; 1906 del; 571 mod Patch: https://git.openjdk.org/jdk/pull/15455.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15455/head:pull/15455 PR: https://git.openjdk.org/jdk/pull/15455 From jjoo at openjdk.org Tue Nov 14 21:33:37 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Tue, 14 Nov 2023 21:33:37 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v42] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Update parallel workers time after Remark ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/533af850..189d1852 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=41 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=40-41 Stats: 10 lines in 4 files changed: 5 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From jjoo at openjdk.org Tue Nov 14 21:33:38 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Tue, 14 Nov 2023 21:33:38 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v41] In-Reply-To: <3kqtWcZA2lY3fPjUgyo5aO_-4SicTOPzF6AnKGyRCBA=.53d5b175-7e0f-4f6b-91a7-e4cfea62cb7a@github.com> References: <3kqtWcZA2lY3fPjUgyo5aO_-4SicTOPzF6AnKGyRCBA=.53d5b175-7e0f-4f6b-91a7-e4cfea62cb7a@github.com> Message-ID: On Sat, 11 Nov 2023 00:23:28 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with two additional commits since the last revision: > > - Refactor ConcurrentRefine logic > - Make CPUTimeCounters a singleton class I believe all comments have been addressed and this PR is once again RFR! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15082#issuecomment-1811353654 From jiangli at openjdk.org Tue Nov 14 21:45:33 2023 From: jiangli at openjdk.org (Jiangli Zhou) Date: Tue, 14 Nov 2023 21:45:33 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is create for one JavaThread associated with attached native thread [v2] In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Mon, 13 Nov 2023 23:04:19 GMT, Man Cao wrote: >> Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: >> >> Don't try to setup_jvmti_thread_state for obj allocation sampling if the current thread is attaching from native and is allocating the thread oop. That's to make sure we don't create a 'partial' JvmtiThreadState. > > src/hotspot/share/prims/jvmtiThreadState.inline.hpp line 94: > >> 92: // The state->get_thread_oop() may be null if the state is created during >> 93: // the allocation of the thread oop when a native thread is attaching. Make >> 94: // sure we don't create a new state for the JavaThread. > > I think it is important to maintain `JvmtiThreadState::_thread_oop_h` correctly for the attached native thread. In the existing logic, with and without the proposed change, `JvmtiThreadState::_thread_oop_h` could stay null for an attached native thread, which seems wrong. > > It may be OK since `JvmtiThreadState::_thread_oop_h` is only used by support for virtual threads. It is unlikely that an attached native thread becomes a carrier for a virtual thread. However, it is probably still desirable to update `JvmtiThreadState::_thread_oop_h` to the correct java.lang.Thread oop. Thanks for the input @caoman. I updated the PR to avoid creating a JvmtiThreadState during attaching and allocating thread oop. I think that avoids a incomplete JvmtiThreadState being created, which is seems to be cleaner. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16642#discussion_r1393334558 From dholmes at openjdk.org Tue Nov 14 22:28:29 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 14 Nov 2023 22:28:29 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v4] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 09:59:03 GMT, Axel Boldt-Christmas wrote: >> src/hotspot/share/runtime/synchronizer.cpp line 609: >> >>> 607: return; >>> 608: } else if (mark.is_fast_locked() && lock_stack.is_recursive(object)) { >>> 609: // This lock is recursive but unstructured exit. Just inflate the lock. >> >> Again this seems in the wrong place - this should be a very rare case so we should not be checking it explicitly before the expected cases! > > In exit we must always check for recursions first. Unsure what you are proposing here. > > Maybe you want to call remove first, and have a branch on if the number removed is greater than 1. And in that case inflate an update the recessions field before falling through. Something like this: > ```c++ > // Fast-locking does not use the 'lock' argument. > LockStack& lock_stack = current->lock_stack(); > if (mark.is_fast_locked()) { > if (lock_stack.try_recursive_exit(object)) { > // Recursively unlocked. > return; > } > > size_t recursions = lock_stack.remove(object) - 1; > > if (recursions == 0) { > while (mark.is_fast_locked()) { > const markWord new_mark = mark.set_unlocked(); > const markWord old_mark = mark; > mark = object->cas_set_mark(new_mark, old_mark); > if (old_mark == mark) { > return; > } > } > } > > lock_stack.push(object); > ObjectMonitor* mon = inflate(current, object, inflate_cause_vm_internal); > if (mon->is_owner_anonymous()) { > mon->set_owner_from_anonymous(current); > } > mon->set_recursions(recursions); > } > > > This make the code a little more like the emitted code. Except it is conditioned on the mark word lock bits. > Hard to believe this will have a measurable difference. But at least to me it is more noisy. It wasn't the recursions I was querying but the unstructured locking aspect. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1393385025 From ccheung at openjdk.org Tue Nov 14 22:29:34 2023 From: ccheung at openjdk.org (Calvin Cheung) Date: Tue, 14 Nov 2023 22:29:34 GMT Subject: RFR: 8319999: Refactor MetaspaceShared::use_full_module_graph() [v5] In-Reply-To: <0dtawSG7Oa7sgwNKX_NzpeasA_7fZThrP9oArHaIUDA=.0b826549-811d-4fcd-91e6-8b3bafc00bc0@github.com> References: <0dtawSG7Oa7sgwNKX_NzpeasA_7fZThrP9oArHaIUDA=.0b826549-811d-4fcd-91e6-8b3bafc00bc0@github.com> Message-ID: On Tue, 14 Nov 2023 15:43:46 GMT, Ioi Lam wrote: >> This is another step of moving CDS config management into cdsConfig.hpp: >> >> The function `MetaspaceShared::use_full_module_graph()` is split into two: >> - `CDSConfig::is_dumping_full_module_graph()` >> - `CDSConfig::is_loading_full_module_graph()` > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into 8319999-refactor-metaspaceshared-use-full-module-graph > - fixed white spaces > - Changed flag to CDSConfig::_dumping_full_module_graph_disabled > - rename FileMapHeader::_use_full_module_graph -> _has_full_module_graph > - 8319999: Refactor MetaspaceShared::use_full_module_graph() Spotted a few minor items. src/hotspot/share/cds/filemap.hpp line 283: > 281: void set_heap_roots_offset(size_t n) { _heap_roots_offset = n; } > 282: void copy_base_archive_name(const char* name); > 283: Line removed by accident? src/hotspot/share/classfile/classLoaderDataShared.cpp line 185: > 183: > 184: void ClassLoaderDataShared::clear_archived_oops() { > 185: assert(UseSharedSpaces && !CDSConfig::is_loading_full_module_graph(), "must be"); Is the `UseSharedSpaces `needed? src/hotspot/share/classfile/modules.cpp line 608: > 606: > 607: void Modules::define_archived_modules(Handle h_platform_loader, Handle h_system_loader, TRAPS) { > 608: assert(UseSharedSpaces && CDSConfig::is_loading_full_module_graph(), "must be"); Is the `UseSharedSpaces` needed? ------------- PR Review: https://git.openjdk.org/jdk/pull/16646#pullrequestreview-1730911156 PR Review Comment: https://git.openjdk.org/jdk/pull/16646#discussion_r1393372200 PR Review Comment: https://git.openjdk.org/jdk/pull/16646#discussion_r1393379538 PR Review Comment: https://git.openjdk.org/jdk/pull/16646#discussion_r1393381654 From mdoerr at openjdk.org Tue Nov 14 22:39:48 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 14 Nov 2023 22:39:48 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v12] In-Reply-To: References: Message-ID: <9GX5TXSznmXLGz3me-ulLNsXXKXyuq5Ikep2EqB0MYg=.4401a373-e6d6-4fea-ad72-a2b993810303@github.com> On Tue, 14 Nov 2023 21:10:32 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64, RISCV, PPC > > Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: > > - Merge branch 'master' into method_entry_8301997 > - Merge branch 'master' into method_entry_8301997 > - RISCV update 2 > - PPC port > - Improved load_resolved_method_entry_handle on x86 and aarch64 > - RISCV port update > - Prepare_invoke args and hard coded registers > - RISCV Port > - Merge branch 'master' into method_entry_8301997 > - Merge branch 'master' into method_entry_8301997 > - ... and 9 more: https://git.openjdk.org/jdk/compare/12fce4b7...b653baa6 src/hotspot/share/oops/resolvedMethodEntry.hpp line 79: > 77: u1 _tos_state; // TOS state > 78: u1 _flags; // Flags: [00|has_resolved_ref_index|has_local_signature|has_appendix|forced_virtual|final|virtual_final] > 79: u1 _bytecode1, _bytecode2; // Resovled invoke codes Typo: Resovled ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1393395024 From dholmes at openjdk.org Tue Nov 14 22:42:28 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 14 Nov 2023 22:42:28 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v4] In-Reply-To: References: Message-ID: <2dVPtwS-M9xk4yHIZcFr3y_d1xSgGFqkfW3ABZvvb8M=.529435cb-d62d-4a5d-a545-5ee446457e5d@github.com> On Tue, 14 Nov 2023 22:25:34 GMT, David Holmes wrote: >> In exit we must always check for recursions first. Unsure what you are proposing here. >> >> Maybe you want to call remove first, and have a branch on if the number removed is greater than 1. And in that case inflate an update the recessions field before falling through. Something like this: >> ```c++ >> // Fast-locking does not use the 'lock' argument. >> LockStack& lock_stack = current->lock_stack(); >> if (mark.is_fast_locked()) { >> if (lock_stack.try_recursive_exit(object)) { >> // Recursively unlocked. >> return; >> } >> >> size_t recursions = lock_stack.remove(object) - 1; >> >> if (recursions == 0) { >> while (mark.is_fast_locked()) { >> const markWord new_mark = mark.set_unlocked(); >> const markWord old_mark = mark; >> mark = object->cas_set_mark(new_mark, old_mark); >> if (old_mark == mark) { >> return; >> } >> } >> } >> >> lock_stack.push(object); >> ObjectMonitor* mon = inflate(current, object, inflate_cause_vm_internal); >> if (mon->is_owner_anonymous()) { >> mon->set_owner_from_anonymous(current); >> } >> mon->set_recursions(recursions); >> } >> >> >> This make the code a little more like the emitted code. Except it is conditioned on the mark word lock bits. >> Hard to believe this will have a measurable difference. But at least to me it is more noisy. > > It wasn't the recursions I was querying but the unstructured locking aspect. To be clear, my concern is that for a simple exit we not only have to first check for a recursive exit (fine) but also this unexpected rare unstructured locking recursive case. Thinking it through part of the problem is that a simple-exit does itself allow for unstructured locking. Is it worth adding an additional case to peek at the top of the lock-stack and then do an exit with a pop for the most common non-recursive case? That way we in effect handle things as follows: - recursive exit - direct exit - recursive unstructured exit - direct unstructured exit ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1393397999 From jvernee at openjdk.org Tue Nov 14 22:57:54 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 14 Nov 2023 22:57:54 GMT Subject: RFR: 8267532: C2: Profile and prune untaken exception handlers [v6] In-Reply-To: References: Message-ID: > The issue is essentially that for the Java try-with-resource construct, javac generates multiple calls to `close(`) the resource. One of those calls is inside the hidden exception handler of the try block. The issue for us is that typically the exception handler is never entered (since no exception is thrown), however we don't profile exception handlers at the moment, so the block is not pruned. C2 doesn't inline the `close()` call in the handler due to low call site frequency. As a result, the receiver of that call escapes and can not be scalar replaced, which then leads to a loss in performance. > > There has been some discussion on the JBS issue that this could be fixed by profiling catch blocks. And another suggestion that partial escape analysis could help here to prevent the object from escaping. But, I think there are other benefits to being able to prune dead catch blocks, such as general reduction in code size, and other optimizations being possible by dead code being eliminated. So, I've implemented catch block profiling + pruning in this patch. > > The implementation is essentially very straightforward: we allocate an extra bit of profiling data for each > exception handler of a method in the `MethodData` for that method (which holds all the profiling > data). Then when looking up the exception handler after an exception is thrown, we mark the > exception handler as entered. When C2 parses the exception handler block, and it sees that it has > never been entered, we emit an uncommon trap instead. > > I've also cleaned up the handling of profiling data sections a bit. After adding the extra section of data to MethodData, I was seeing several crashes when ciMethodData was used. The underlying issue seemed to be that the offset of the parameter data was computed based on the total data size - parameter data size (which doesn't work if we add an additional section for exception handler data). I've re-written the code around this a bit to try and prevent issues in the future. Both MethodData and ciMethodData now track offsets of parameter data and exception handler data, and the size of the each data section is derived from the offsets. > > Finally, there was an assert firing in `freeze_internal` in `continuationFreezeThaw.cpp`: > > assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), > "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int64_t)current->held_monitor_count... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: drop ProfileExceptionHandlers flag ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16416/files - new: https://git.openjdk.org/jdk/pull/16416/files/007664ad..3586404f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16416&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16416&range=04-05 Stats: 12 lines in 8 files changed: 0 ins; 3 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/16416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16416/head:pull/16416 PR: https://git.openjdk.org/jdk/pull/16416 From serb at openjdk.org Tue Nov 14 23:06:45 2023 From: serb at openjdk.org (Sergey Bylokhov) Date: Tue, 14 Nov 2023 23:06:45 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v14] In-Reply-To: References: Message-ID: <0jecv-L_TCdE6CeVsT1rbA-jnuGrDjAyGka1Zxq-87s=.5a708c2a-867f-48d5-91fa-34dd4df48691@github.com> On Mon, 13 Nov 2023 12:51:36 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. >> >> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. >> >> Components of this patch: >> >> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. >> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. >> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. >> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. >> - The object/oop + offset is exposed as temporary address to native code. >> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). >> - Only x64 and AArch64 for now. >> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 >> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. >> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` >> >> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. >> >> Numbers for the included benchmark on my machine are: >> >> >> Benchmar... > > Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 52 commits: > > - Merge branch 'master' into AllowHeapNoLock > - fix type and reformat doc in Linker > - Merge branch 'master' into AllowHeapNoLock > - tweak whitespace > - a -> an > - add note to downcallHandle about passing heap segments by-reference > - Merge branch 'master' into AllowHeapNoLock > - bump up argument counts in TestLargeStub to their maximum > - s390 updates > - add stub size stress test for allowHeap > - ... and 42 more: https://git.openjdk.org/jdk/compare/03db8281...36da79d1 Does the usage of this new API can be checked by something like -Xcheck:jni? Especially a part when the app still doing some upcalls to the JVM from native when the pin is "active". ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1811531474 From jvernee at openjdk.org Tue Nov 14 23:43:47 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 14 Nov 2023 23:43:47 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v14] In-Reply-To: <0jecv-L_TCdE6CeVsT1rbA-jnuGrDjAyGka1Zxq-87s=.5a708c2a-867f-48d5-91fa-34dd4df48691@github.com> References: <0jecv-L_TCdE6CeVsT1rbA-jnuGrDjAyGka1Zxq-87s=.5a708c2a-867f-48d5-91fa-34dd4df48691@github.com> Message-ID: On Tue, 14 Nov 2023 23:04:16 GMT, Sergey Bylokhov wrote: >> Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 52 commits: >> >> - Merge branch 'master' into AllowHeapNoLock >> - fix type and reformat doc in Linker >> - Merge branch 'master' into AllowHeapNoLock >> - tweak whitespace >> - a -> an >> - add note to downcallHandle about passing heap segments by-reference >> - Merge branch 'master' into AllowHeapNoLock >> - bump up argument counts in TestLargeStub to their maximum >> - s390 updates >> - add stub size stress test for allowHeap >> - ... and 42 more: https://git.openjdk.org/jdk/compare/03db8281...36da79d1 > > Does the usage of this new API can be checked by something like -Xcheck:jni? Especially a part when the app still doing some upcalls to the JVM from native when the pin is "active". @mrserb Upcalls are blocked unless a thread is in the `native` thread state [1]. So, if an upcall happens from a critical function (which doesn't transition to the `native` state), the VM will terminate with a fatal error. [1]: https://github.com/openjdk/jdk/blob/d5abe49670d81b9c4749ce777ed6bf2886228f2f/src/hotspot/share/prims/upcallLinker.cpp#L79 ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1811566526 From serb at openjdk.org Wed Nov 15 00:06:48 2023 From: serb at openjdk.org (Sergey Bylokhov) Date: Wed, 15 Nov 2023 00:06:48 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v14] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 12:51:36 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. >> >> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. >> >> Components of this patch: >> >> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. >> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. >> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. >> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. >> - The object/oop + offset is exposed as temporary address to native code. >> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). >> - Only x64 and AArch64 for now. >> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 >> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. >> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` >> >> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. >> >> Numbers for the included benchmark on my machine are: >> >> >> Benchmar... > > Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 52 commits: > > - Merge branch 'master' into AllowHeapNoLock > - fix type and reformat doc in Linker > - Merge branch 'master' into AllowHeapNoLock > - tweak whitespace > - a -> an > - add note to downcallHandle about passing heap segments by-reference > - Merge branch 'master' into AllowHeapNoLock > - bump up argument counts in TestLargeStub to their maximum > - s390 updates > - add stub size stress test for allowHeap > - ... and 42 more: https://git.openjdk.org/jdk/compare/03db8281...36da79d1 Can I assume it will always cause the fatal error, or it is not specified/UB? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1811585932 From jvernee at openjdk.org Wed Nov 15 00:10:50 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 15 Nov 2023 00:10:50 GMT Subject: RFR: 8254693: Add Panama feature to pass heap segments to native code [v14] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 12:51:36 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using `Linker.Option.critical(true)` as a linker option. It has the same limitations as normal critical calls, namely: upcalls into Java are not allowed, and the native function should return relatively quickly. Heap segments are exposed to native code through temporary native addresses that are valid for the duration of the native call. >> >> The motivation for this is supporting existing Java array-based APIs that might have to pass multi-megabyte size arrays to native code, and are current relying on Get-/ReleasePrimitiveArrayCritical from JNI. Where making a copy of the array would be overly prohibitive. >> >> Components of this patch: >> >> - New binding operator `SegmentBase`, which gets the base object of a `MemorySegment`. >> - Rename `UnboxAddress` to `SegmentOffset`. Add flag to specify whether processing heap segments should be allowed. >> - `CallArranger` impls use new binding operators when `Linker.Option.critical(/* allowHeap= */ true)` is specified. >> - `NativeMethodHandle`/`NativeEntryPoint` allow `Object` in their signatures. >> - The object/oop + offset is exposed as temporary address to native code. >> - Since we stay in the `_thread_in_Java` state, we can safely expose the oops passed to the downcall stub to native code, without needing GCLocker. These oops are valid until we poll for safepoint, which we never do (invoking pure native code). >> - Only x64 and AArch64 for now. >> - I've refactored `ArgumentShuffle` in the C++ code to no longer rely on callbacks to get the set of source and destination registers (using `CallingConventionClosure`), but instead just rely on 2 equal size arrays with source and destination registers. This allows filtering the input java registers before passing them to `ArgumentShuffle`, which is required to filter out registers holding segment offsets. Replacing placeholder registers is also done as a separate pre-processing step now. See changes in: https://github.com/openjdk/jdk/pull/16201/commits/d2b40f1117d63cc6d74e377bf88cdcf6d15ff866 >> - I've factored out `DowncallStubGenerator` in the x64 and AArch64 code to use a common `DowncallLinker::StubGenerator`. >> - Fallback linker is also supported using JNI's `GetPrimitiveArrayCritical`/`ReleasePrimitiveArrayCritical` >> >> Aside: fixed existing issue with `DowncallLinker` not properly acquiring segments in interpreted mode. >> >> Numbers for the included benchmark on my machine are: >> >> >> Benchmar... > > Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 52 commits: > > - Merge branch 'master' into AllowHeapNoLock > - fix type and reformat doc in Linker > - Merge branch 'master' into AllowHeapNoLock > - tweak whitespace > - a -> an > - add note to downcallHandle about passing heap segments by-reference > - Merge branch 'master' into AllowHeapNoLock > - bump up argument counts in TestLargeStub to their maximum > - s390 updates > - add stub size stress test for allowHeap > - ... and 42 more: https://git.openjdk.org/jdk/compare/03db8281...36da79d1 It is not specified, but the current implementation will always cause a fatal error. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16201#issuecomment-1811589537 From redestad at openjdk.org Wed Nov 15 00:33:32 2023 From: redestad at openjdk.org (Claes Redestad) Date: Wed, 15 Nov 2023 00:33:32 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v5] In-Reply-To: References: <4cwWaIVcV-PwqeqHwi6lL6r-X5XIxMqXim5TbKbdVuo=.7220cb77-2508-4d9d-84b2-e4e261dee676@github.com> Message-ID: <8n2dP28lfeI0S7_FkEkh2zbrYduQ5HqRk8pPorY5JTo=.6be84e77-0e28-4679-b953-50253d528aff@github.com> On Tue, 14 Nov 2023 15:41:59 GMT, Aleksey Shipilev wrote: > Agreed. Would you like to formally approve this PR? Even with Robbin's prior approval I feel it would be presumptuous of me to approve a PR in a piece of code I'm completely unfamiliar with. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1811610089 From xgong at openjdk.org Wed Nov 15 01:32:00 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 15 Nov 2023 01:32:00 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: References: Message-ID: > Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). > > SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. > > To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. > > Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. > > [1] https://github.com/openjdk/jdk/pull/3638 > [2] https://sleef.org/ > [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ > [4] https://packages.debian.org/bookworm/libsleef3 > [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Add a bundled native lib in jdk as a bridge to libsleef - Merge 'jdk:master' into JDK-8312425 - Disable sleef by default - Merge 'jdk:master' into JDK-8312425 - 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16234/files - new: https://git.openjdk.org/jdk/pull/16234/files/f2098d4e..b29df846 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16234&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16234&range=01-02 Stats: 61715 lines in 1820 files changed: 28750 ins; 17255 del; 15710 mod Patch: https://git.openjdk.org/jdk/pull/16234.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16234/head:pull/16234 PR: https://git.openjdk.org/jdk/pull/16234 From jvernee at openjdk.org Wed Nov 15 01:37:32 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 15 Nov 2023 01:37:32 GMT Subject: RFR: 8267532: C2: Profile and prune untaken exception handlers [v5] In-Reply-To: References: Message-ID: On Tue, 14 Nov 2023 17:00:56 GMT, Jorn Vernee wrote: > > ``` > > I don't see much value in 2 separate product flags to control profiling and optimization logic (`ProfileExceptionHandlers` and `PruneDeadExceptionHandlers`); having a single product flag should be enough; > > ``` > > > Okay, I can use `PruneDeadExceptionHandlers` everywhere instead. I gave this a go. However, it looks like the build is failing on some platforms because the profiling code is also included in builds excluding C2, but `PruneDeadExceptionHandlers` is a C2 only flag. The two options I see to work around that are to: 1. make the `PruneDeadExceptionHandlers` flag a global flag, which is a bit strange since it is only used by C2. Or: 2. put all the profiling code in `#ifdef COMPILER2` blocks, which makes the code harder to read. But, if other parts of the VM want to use the profiling data as well (besides C2), we'd also need 2 flags again, since theoretically C2 might be excluded, but profiling would still be needed. I think after all, it's better to have 2 separate flags to avoid these issues (now and in the future). Unless we want to turn on profiling always (without a flag). ------------- PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1811657788 From xgong at openjdk.org Wed Nov 15 01:38:30 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 15 Nov 2023 01:38:30 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 09:45:01 GMT, Andrew Haley wrote: >> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Add a bundled native lib in jdk as a bridge to libsleef >> - Merge 'jdk:master' into JDK-8312425 >> - Disable sleef by default >> - Merge 'jdk:master' into JDK-8312425 >> - 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF > > This looks good. As far as I can tell the choice you've made of accuracy matches what we need to meet the spec. > I'm very nervous about binding ourselves to a specific version of the SLEEF ABI, because Java releases are maintained for decades, and we don't want to be dependent on other projects. > > We'll have to make a plan for version evolution. Hi @theRealAph , The latest commit created a native library as a bridge to the third-party sleef library. Could you please help check whether it's a better solution to fix the hard-coding sleef ABI version issue and the further evolution? It re-defines all the vector math functions and implements them by calling the relative functions in libsleef. The library is bundled into the jdk image. With this way, we doesn't need to hard-code the libsleef ABI version into jdk. And the potential issue caused by the future ABI updating may be catched earlier. Meanwhile, the original added VM option (i.e. `-XX:UseSleefLib`) is not needed anymore. So we removed it then. Thanks, Xiaohong Gong ------------- PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1811659809 From xgong at openjdk.org Wed Nov 15 01:38:33 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 15 Nov 2023 01:38:33 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v2] In-Reply-To: References: <94xEhtV9YjxUS5QN2oHOWCzwhFaKi05PO9o3Y5tieDI=.ecd425b8-a7c3-4c4a-9e7b-1ae099b92b52@github.com> Message-ID: On Wed, 25 Oct 2023 01:25:09 GMT, Xiaohong Gong wrote: >> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 8496: >> >>> 8494: // Get sleef stub routine addresses >>> 8495: char ebuf[1024]; >>> 8496: void* libsleef = os::dll_load(UseSleefLib, ebuf, sizeof ebuf); >> >> Shouldn't this check that UseSleefLib has been set to something other than "" ? (To save the failing `dll_load` call.) > > Yeah, it's better to do that. Currently it returns "nullptr" without any errors. But I agree that having a pre-check is better. Thanks! `UseSleefLib` is removed in latest commit. So this change is not needed anymore. Thanks for the suggestion all the time! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1393523299 From iklam at openjdk.org Wed Nov 15 01:46:02 2023 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 15 Nov 2023 01:46:02 GMT Subject: RFR: 8319999: Refactor MetaspaceShared::use_full_module_graph() [v6] In-Reply-To: References: Message-ID: > This is another step of moving CDS config management into cdsConfig.hpp: > > The function `MetaspaceShared::use_full_module_graph()` is split into two: > - `CDSConfig::is_dumping_full_module_graph()` > - `CDSConfig::is_loading_full_module_graph()` Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: @calvinccheung review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16646/files - new: https://git.openjdk.org/jdk/pull/16646/files/33c9e39b..1c781799 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16646&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16646&range=04-05 Stats: 3 lines in 3 files changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16646.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16646/head:pull/16646 PR: https://git.openjdk.org/jdk/pull/16646 From jvernee at openjdk.org Wed Nov 15 02:43:12 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 15 Nov 2023 02:43:12 GMT Subject: RFR: 8267532: C2: Profile and prune untaken exception handlers [v7] In-Reply-To: References: Message-ID: <4viUZ8xgGyoVMs8nwClL38FZPfa1P1jw_eVwZfAXbfI=.4473faad-5a60-4cc6-857b-253ac8c70648@github.com> > The issue is essentially that for the Java try-with-resource construct, javac generates multiple calls to `close(`) the resource. One of those calls is inside the hidden exception handler of the try block. The issue for us is that typically the exception handler is never entered (since no exception is thrown), however we don't profile exception handlers at the moment, so the block is not pruned. C2 doesn't inline the `close()` call in the handler due to low call site frequency. As a result, the receiver of that call escapes and can not be scalar replaced, which then leads to a loss in performance. > > There has been some discussion on the JBS issue that this could be fixed by profiling catch blocks. And another suggestion that partial escape analysis could help here to prevent the object from escaping. But, I think there are other benefits to being able to prune dead catch blocks, such as general reduction in code size, and other optimizations being possible by dead code being eliminated. So, I've implemented catch block profiling + pruning in this patch. > > The implementation is essentially very straightforward: we allocate an extra bit of profiling data for each > exception handler of a method in the `MethodData` for that method (which holds all the profiling > data). Then when looking up the exception handler after an exception is thrown, we mark the > exception handler as entered. When C2 parses the exception handler block, and it sees that it has > never been entered, we emit an uncommon trap instead. > > I've also cleaned up the handling of profiling data sections a bit. After adding the extra section of data to MethodData, I was seeing several crashes when ciMethodData was used. The underlying issue seemed to be that the offset of the parameter data was computed based on the total data size - parameter data size (which doesn't work if we add an additional section for exception handler data). I've re-written the code around this a bit to try and prevent issues in the future. Both MethodData and ciMethodData now track offsets of parameter data and exception handler data, and the size of the each data section is derived from the offsets. > > Finally, there was an assert firing in `freeze_internal` in `continuationFreezeThaw.cpp`: > > assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), > "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int64_t)current->held_monitor_count... Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: Only use ProfileExceptionHandlers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16416/files - new: https://git.openjdk.org/jdk/pull/16416/files/3586404f..86700da6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16416&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16416&range=05-06 Stats: 14 lines in 8 files changed: 3 ins; 3 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/16416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16416/head:pull/16416 PR: https://git.openjdk.org/jdk/pull/16416 From vlivanov at openjdk.org Wed Nov 15 03:17:34 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 15 Nov 2023 03:17:34 GMT Subject: RFR: 8267532: C2: Profile and prune untaken exception handlers [v5] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 01:31:29 GMT, Jorn Vernee wrote: > P.S. for now I've changed it to only have the ProfileExceptionHandlers flag, which then also implicitly turns on the optimization. The only purpose I see is to simplify diagnosis of possible issues in production. The logic (both profiling and the optimization) will be unconditionally enabled irrespective of configuration. Since profiling-related code is scattered across runtime and compiler areas, the corresponding flag belongs to `globals.hpp`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1811742684 From vlivanov at openjdk.org Wed Nov 15 03:40:30 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 15 Nov 2023 03:40:30 GMT Subject: RFR: 8267532: C2: Profile and prune untaken exception handlers [v5] In-Reply-To: References: Message-ID: On Tue, 14 Nov 2023 17:00:56 GMT, Jorn Vernee wrote: > My sense when working on this was that C2 relies on dead branches being pruned to eliminate unreached calls (and other code) within those branches. That potentially allows multiple unreached calls to be eliminated using a single uncommon trap for the whole branch. Irrespective of where it is placed (beginning of a block, invoke bytecode, or any other bytecode instruction), an uncommon trap effectively prunes everything it dominates. So, the earlier it is placed, the better. In that respect, having an uncommon trap placed right at the entry of the block is preferred. But pruning unreached calls would achieve the same result (and without any additional profiling logic) when it comes to try-with-resource construct. I'd like to point out that proposed solution doesn't completely eliminate the problem reported in the bug. (And it also suffers from profile pollution - a single exception thrown defeats the whole optimization.) The original problem boils down to EA failure in presence of escape points in cold regions. A proper fix would be to teach EA to move such allocations closer to escape point, but it would require much more complex code changes. There's already work happening to improve C2 EA in that respect. Still, IMO enhancing profiling machinery to track unreached exception handlers is a good thing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1811758433 From ccheung at openjdk.org Wed Nov 15 03:46:34 2023 From: ccheung at openjdk.org (Calvin Cheung) Date: Wed, 15 Nov 2023 03:46:34 GMT Subject: RFR: 8319999: Refactor MetaspaceShared::use_full_module_graph() [v6] In-Reply-To: References: Message-ID: <3vg0RhBoEYkInRGaS-iS_Pthp04Tk2wl7Hy2yHTI-c4=.d0a96e8c-40ff-4df8-8bd3-54d41d73cc82@github.com> On Wed, 15 Nov 2023 01:46:02 GMT, Ioi Lam wrote: >> This is another step of moving CDS config management into cdsConfig.hpp: >> >> The function `MetaspaceShared::use_full_module_graph()` is split into two: >> - `CDSConfig::is_dumping_full_module_graph()` >> - `CDSConfig::is_loading_full_module_graph()` > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @calvinccheung review comments Updates look good. ------------- Marked as reviewed by ccheung (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16646#pullrequestreview-1731188605 From vlivanov at openjdk.org Wed Nov 15 03:48:30 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 15 Nov 2023 03:48:30 GMT Subject: RFR: 8267532: C2: Profile and prune untaken exception handlers [v5] In-Reply-To: References: Message-ID: On Tue, 14 Nov 2023 17:00:56 GMT, Jorn Vernee wrote: > One case I can think of where a block as a whole might appear 'alive', while a call somewhere in that block appears 'dead', is when an instruction before the call always throws an exception, meaning the branch is taken, but the call never executes. Are there other cases you had in mind? Yes, that's a legitimate case when branch profiling is not enough to catch unreachable code. I'm more concerned about other places where we may omit profiling and hence miss optimization opportunities. (Also, keep in mind that both interpreter and C1 have their own profiling implementation which can easily diverge. Not sure how to automatically filter such cases from legitimate ones though.) In general, pruning effectively unreachable calls looks like the more in the right direction. But it looks riskier and, hence, requires more performance validation. >> I don't fully understand the issue with has_monitor. It does look like a pre-existing issue and it's better to handle it separately. > I don't mind moving it to a separate patch, but I don't think it's possible to trigger a failure of the current code without the changes in this patch. So, I don't think it would be possible to add a test in that case. As of now, C2 should prune exception handlers when the corresponding exception class is unloaded. How is it different from what you implemented? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1811762391 PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1811763540 From jvernee at openjdk.org Wed Nov 15 04:11:31 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 15 Nov 2023 04:11:31 GMT Subject: RFR: 8267532: C2: Profile and prune untaken exception handlers [v5] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 03:45:48 GMT, Vladimir Ivanov wrote: > > > I don't fully understand the issue with has_monitor. It does look like a pre-existing issue and it's better to handle it separately. > > > I don't mind moving it to a separate patch, but I don't think it's possible to trigger a failure of the current code without the changes in this patch. So, I don't think it would be possible to add a test in that case. > > As of now, C2 should prune exception handlers when the corresponding exception class is unloaded. How is it different from what you implemented? The issue occurs for OSR compilations where the monitorenter is before the loop (outside of the compiled code), and all the monitorexits are pruned. In that case the existing logic does not detect that the method has monitors during parsing. javac generates 2 monitorexists or synchronized blocks/method: one for the normal exit, and one in a synthetic exception handler. For OSR compilations, the regular monitorexit gets pruned because it is in a dead branch (after the loop). Theoretically if the handler used an unloaded exception type, it would be pruned as well in the existing code. But, javac currently only generates an exception handler that catches `any` exception type, as far as I can tell. Though, thinking about it, I can try generating some bytecode that uses a different handler with a custom (unloaded) exception type instead, and see if I can get both the `monitorexits` to be pruned. > Irrespective of where it is placed (beginning of a block, invoke bytecode, or any other bytecode instruction), an uncommon trap effectively prunes everything it dominates. Okay, I see what you mean. Then my argument is not valid. > Still, IMO enhancing profiling machinery to track unreached exception handlers is a good thing. > In general, pruning effectively unreachable calls looks like the more in the right direction. But it looks riskier and, hence, requires more performance validation. So, how do you want to move forward? Should I attempt to re-implement the current patch to prune infrequent calls instead? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1811776416 PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1811777862 From jvernee at openjdk.org Wed Nov 15 04:24:32 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 15 Nov 2023 04:24:32 GMT Subject: RFR: 8267532: C2: Profile and prune untaken exception handlers [v5] In-Reply-To: References: Message-ID: <11GMv1Clo20OntJAX2WJlcEEkDfcTskQJ3Ik9Ml3V5I=.2328b75b-6ac2-47b4-a24b-1bb2841cd7d5@github.com> On Wed, 15 Nov 2023 03:43:50 GMT, Vladimir Ivanov wrote: > I'm more concerned about other places where we may omit profiling and hence miss optimization opportunities. Missing profiling would be bad, as in that case we'd always try to prune the exception handler. i.e. it's not just a missed optimization. The code in the PR tries to avoid that by checking if the profile is mature before pruning, but if there are cases where the profile is mature, and we somehow did not detect the exception handler being entered, we might get into a situation where the compiled code is immediately thrown away, because we actually do regularly throw exceptions that then hit the pruned handler and cause a deoptimization. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1811786117 From amitkumar at openjdk.org Wed Nov 15 04:57:45 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 15 Nov 2023 04:57:45 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v10] In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 22:48:59 GMT, Matias Saavedra Silva wrote: >> Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: >> >> - PPC port >> - Improved load_resolved_method_entry_handle on x86 and aarch64 > >> I have a version which works for PPC64: [TheRealMDoerr at 6bff392](https://github.com/TheRealMDoerr/jdk/commit/6bff39224e3129a898711a392b64c38b331d79a2) >> >> Note that I have implemented a few things slightly differently: >> >> * `TemplateTable::load_resolved_method_entry_handle`: I'm loading the method at the end which avoids pushing and popping it on the expression stack which is not so nice IMHO. This works because I'm using a non-volatile register (asserted) for `cache` which is still valid after the C-call in `resolve_oop_handle`. >> >> * `TemplateTable::load_resolved_method_entry_interface` and `TemplateTable::load_resolved_method_entry_virtual`: I'm not putting values in registers depending on the flags because it doesn't fit nicely into the PPC64 design. I found myself scratching my head and thinking about what is in the register in which case. Instead of that, I'm loading the fields where they are needed which leads to a much cleaner design. I always know what is in which register this way. >> >> >> Please take a look and take these differences into consideration for other platforms. Thanks! > > Thank you for the port! I liked your recommendation with regards to invokehandle and added that change to x86 and aarch64 as well. Hi @matias9927, Please add s390-Port from here: https://github.com/offamitkumar/jdk/commit/3f7017467b1a4ae8fe70530c7183c2667cf2c7f2 ------------- PR Comment: https://git.openjdk.org/jdk/pull/15455#issuecomment-1811807516 From iklam at openjdk.org Wed Nov 15 05:11:41 2023 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 15 Nov 2023 05:11:41 GMT Subject: RFR: 8319999: Refactor MetaspaceShared::use_full_module_graph() [v3] In-Reply-To: <7wcZM0VIAvAPq2j5zg2sq6qHfRVdTxv9qyUcGm5TMWI=.07e2308a-f3ce-41de-a301-c81f94d57ade@github.com> References: <7wcZM0VIAvAPq2j5zg2sq6qHfRVdTxv9qyUcGm5TMWI=.07e2308a-f3ce-41de-a301-c81f94d57ade@github.com> Message-ID: On Tue, 14 Nov 2023 06:05:49 GMT, David Holmes wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> Changed flag to CDSConfig::_dumping_full_module_graph_disabled > > Marked as reviewed by dholmes (Reviewer). Thanks @dholmes-ora and @calvinccheung for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16646#issuecomment-1811816286 From iklam at openjdk.org Wed Nov 15 05:11:43 2023 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 15 Nov 2023 05:11:43 GMT Subject: Integrated: 8319999: Refactor MetaspaceShared::use_full_module_graph() In-Reply-To: References: Message-ID: On Tue, 14 Nov 2023 03:53:32 GMT, Ioi Lam wrote: > This is another step of moving CDS config management into cdsConfig.hpp: > > The function `MetaspaceShared::use_full_module_graph()` is split into two: > - `CDSConfig::is_dumping_full_module_graph()` > - `CDSConfig::is_loading_full_module_graph()` This pull request has now been integrated. Changeset: a6343c0b Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/a6343c0b7b14563f9d219506ba431f96befd5401 Stats: 151 lines in 16 files changed: 75 ins; 40 del; 36 mod 8319999: Refactor MetaspaceShared::use_full_module_graph() Reviewed-by: dholmes, ccheung ------------- PR: https://git.openjdk.org/jdk/pull/16646 From aboldtch at openjdk.org Wed Nov 15 07:25:30 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 15 Nov 2023 07:25:30 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v4] In-Reply-To: <2dVPtwS-M9xk4yHIZcFr3y_d1xSgGFqkfW3ABZvvb8M=.529435cb-d62d-4a5d-a545-5ee446457e5d@github.com> References: <2dVPtwS-M9xk4yHIZcFr3y_d1xSgGFqkfW3ABZvvb8M=.529435cb-d62d-4a5d-a545-5ee446457e5d@github.com> Message-ID: On Tue, 14 Nov 2023 22:39:56 GMT, David Holmes wrote: >> It wasn't the recursions I was querying but the unstructured locking aspect. > > To be clear, my concern is that for a simple exit we not only have to first check for a recursive exit (fine) but also this unexpected rare unstructured locking recursive case. Thinking it through part of the problem is that a simple-exit does itself allow for unstructured locking. Is it worth adding an additional case to peek at the top of the lock-stack and then do an exit with a pop for the most common non-recursive case? That way we in effect handle things as follows: > - recursive exit > - direct exit > - recursive unstructured exit > - direct unstructured exit First of let us note that when reaching this code the unstructured exit is the common case. The normal exit and recursive exit is usually handled in the emitted code (this includes the interpreter). We reach this because either a CAS failed somewhere due to a concurrent hashCode instalment, or the exit was unstructured. Inflated monitors exit just jumps passed this code (everything is conditioned on `mark.is_fast_locked()`). Is this motivated by the structure/layout of the C++ code. Or an optimisation? If it is motivated by the structure/layout. Then we can lay it out as you described. It would add some code duplication. If it is motivated as an optimisation then after the recursive exit fail, we should just call remove and act based on the return value. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1393755482 From aboldtch at openjdk.org Wed Nov 15 07:33:45 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 15 Nov 2023 07:33:45 GMT Subject: RFR: 8319799: Recursive lightweight locking: x86 implementation [v4] In-Reply-To: References: Message-ID: <9NzLbU-9jOXCjxrOH5oq21ZfsdqHECgUpSSxh5touYI=.fdbcfe2d-3591-4374-a7e9-c6c6d84d351b@github.com> > Implements the x86 port of JDK-8319796. > > There are two major parts for the port implementation. The C2 part, and the part shared by the interpreter, C1 and the native call wrapper. > > The biggest change for both parts is that we check the lock stack first and if it is a recursive lightweight [un]lock and in that case simply pop/push and finish successfully. > > Only if the recursive lightweight [un]lock fails does it look at the mark word. > > For the shared part if it is an unstructured exit, the monitor is inflated or the mark word transition fails it calls into the runtime. > > The C2 operates under a few more assumptions, that the locking is structured and balanced. This means that some checks can be elided. > > First this means that in C2 unlock if the obj is not on the top of the lock stack, it must be inflated. And reversely if we reach the inflated C2 unlock the obj is not on the lock stack. This second property makes it possible to avoid reading the owner (and checking if it is anonymous). Instead it can either just do an un-contended unlock by writing null to the owner, or if contention happens, simply write the thread to the owner and jump to the runtime. > > The x86 C2 port also has some extra oddities. > > The mark word read is done early as it showed better scaling in hyper-threaded scenarios on certain intel hardware, and no noticeable downside on other tested x86 hardware. > > The fast path is written to avoid going through conditional branches. This in combination with keeping the ZF output correct, the code does some actions eagerly, decrementing the held monitor count, popping from the lock stack. And jumps to a code stub if a slow path is required which restores the thread local state to a correct state before jumping to the runtime. > > The contended unlock was also moved to the code stub. Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: top load adjustments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16607/files - new: https://git.openjdk.org/jdk/pull/16607/files/e37d1bd2..37d1a0d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16607&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16607&range=02-03 Stats: 10 lines in 1 file changed: 6 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16607.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16607/head:pull/16607 PR: https://git.openjdk.org/jdk/pull/16607 From aboldtch at openjdk.org Wed Nov 15 07:33:48 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 15 Nov 2023 07:33:48 GMT Subject: RFR: 8319799: Recursive lightweight locking: x86 implementation [v2] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 15:48:00 GMT, Roman Kennke wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with three additional commits since the last revision: >> >> - Fix type >> - Move inflated check in fast_locked >> - Move top load > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 974: > >> 972: jcc(Assembler::notZero, inflated); >> 973: >> 974: // Load top. > > I have found it to be beneficial to move up the load of the top-offset to between the load/prefetch of the mark-word and the test for monitor. This way we do the test while the top-offset arrives and reduce the latency of the lock-stack-full-check. Done. > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1072: > >> 1070: >> 1071: // Check if obj is top of lock-stack. >> 1072: movl(top, Address(thread, JavaThread::lock_stack_top_offset())); > > Like above, moving the load of the top-offset up above ent mark-load should be harmless and potentially reduces the time that the following instructions have to wait for the top-offset to arrive. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16607#discussion_r1393763423 PR Review Comment: https://git.openjdk.org/jdk/pull/16607#discussion_r1393763501 From rmarchenko at openjdk.org Wed Nov 15 07:38:39 2023 From: rmarchenko at openjdk.org (Roman Marchenko) Date: Wed, 15 Nov 2023 07:38:39 GMT Subject: RFR: 8319961: JvmtiEnvBase doesn't zero _ext_event_callbacks [v2] In-Reply-To: References: Message-ID: On Tue, 14 Nov 2023 20:51:22 GMT, Daniel D. Daugherty wrote: >> Roman Marchenko has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixing review comments > > Just a heads up that HotSpot code normally requires two reviews (1 from a (R)eviewer) > and 24 hours unless it is called trivial AND agreed to be trivial by your reviewers. @dcubed-ojdk Sorry, I didn't know that. Could you point the discussion about +24 hours waiting please? BTW I seems like both requirements may be automated in github. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16647#issuecomment-1811943616 From aboldtch at openjdk.org Wed Nov 15 07:41:29 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 15 Nov 2023 07:41:29 GMT Subject: RFR: 8319801: Recursive lightweight locking: aarch64 implementation In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 13:23:37 GMT, Andrew Haley wrote: >> Implements the aarch64 port of JDK-8319796. >> >> There are two major parts for the port implementation. The C2 part, and the part shared by the interpreter, C1 and the native call wrapper. >> >> The biggest change for both parts is that we check the lock stack first and if it is a recursive lightweight [un]lock and in that case simply pop/push and finish successfully. >> >> Only if the recursive lightweight [un]lock fails does it look at the mark word. >> >> For the shared part if it is an unstructured exit, the monitor is inflated or the mark word transition fails it calls into the runtime. >> >> The C2 operates under a few more assumptions, that the locking is structured and balanced. This means that some checks can be elided. >> >> First this means that in C2 unlock if the obj is not on the top of the lock stack, it must be inflated. And reversely if we reach the inflated C2 unlock the obj is not on the lock stack. This second property makes it possible to avoid reading the owner (and checking if it is anonymous). Instead it can either just do an un-contended unlock by writing null to the owner, or if contention happens, simply write the thread to the owner and jump to the runtime. >> >> The aarch64 C2 port tries to avoid stronger memory semantics where ever possible. In C2 lock it first does a relaxed load of the mark word to check for inflation. Both lock and unlock uses a load/store exclusive register pair to transition the mark word. > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6344: > >> 6342: >> 6343: // Try to lock. Transition lock bits 0b01 => 0b00 >> 6344: assert(oopDesc::mark_offset_in_bytes() == 0, "required to avoid lea"); > > It might be cleaner just to put in the `lea`. I believe that nothing will be emitted if the addend is zero. It's up to you. It is only a nop if we load it into obj. And the current contract is that we do not change the value in obj. So we would still have to assert that the mark offset it 0 or we break the contract. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16608#discussion_r1393771857 From fyang at openjdk.org Wed Nov 15 07:45:37 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 15 Nov 2023 07:45:37 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 In-Reply-To: References: Message-ID: On Wed, 8 Nov 2023 14:47:07 GMT, Robbin Ehn wrote: > Hi, please consider. > > Main author is @luhenry, I only fixed some minor things and tested it. > > Such as: > test/hotspot/jtreg/compiler/intrinsics/sha/ > test/jdk/java/security/MessageDigest/ > test/jdk/jdk/security/ > tier1 > > And still running some test. Changes requested by fyang (Reviewer). src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 3700: > 3698: Register ofs = c_rarg2; > 3699: Register limit = c_rarg3; > 3700: Register consts = t0; I would suggest choose a different temporary register for `consts`, maybe `t2`. Using x5 (t0) / x6 (t1) to keep some long-lived values like `consts` can be error prone. Those two are reserved scratch registers which could be explictly / implicitly clobberred by various assembler functions. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 3769: > 3767: __ vslidedown_vi(v16, v27, 2); // v16 = {_,_,e,f} > 3768: // Merge elements [3..2] of v26 ({a,b}) into elements [3..2] of v16 > 3769: __ vmerge_vvm(v16, v26, v16); // v16 = {a,b,e,f} I see the openssl version makes use of index-load to get {f,e,b,a},{h,g,d,c} pre-loop and index-store to put {f,e,b,a},{h,g,d,c} back to {a,b,c,d},{e,f,g,h} post-loop, which is much simpler than this code. Please consider. [1] https://github.com/openssl/openssl/blob/master/crypto/sha/asm/sha256-riscv64-zvkb-zvknha_or_zvknhb.pl#L124-L142 src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 3926: > 3924: //-------------------------------------------------------------------------------- > 3925: // Quad-round 1 (+1, v11->v12->v13->v10) > 3926: __ vl1re32_v(v15, consts); I am still worried about the load latency if we do one `vl1re3_v` to get the consts for each round even for single pass. Preloading the constants into vectors is less likely to have this issue, right? We should have enough vector registers for that purpose. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4146: > 4144: Register ofs = c_rarg2; > 4145: Register limit = c_rarg3; > 4146: Register consts = t0; Similar here. Please consider using `t1` instead for `consts`. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4215: > 4213: __ vslidedown_vi(v16, v27, 2); // v16 = {_,_,e,f} > 4214: // Merge elements [3..2] of v26 ({a,b}) into elements [3..2] of v16 > 4215: __ vmerge_vvm(v16, v26, v16); // v16 = {a,b,e,f} Simlar here. Can we make use of index-load and index-store to simplify the code for the 512 case too? ------------- PR Review: https://git.openjdk.org/jdk/pull/16562#pullrequestreview-1731361891 PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1393737654 PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1393767555 PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1393774633 PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1393769966 PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1393769327 From aboldtch at openjdk.org Wed Nov 15 08:00:51 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 15 Nov 2023 08:00:51 GMT Subject: RFR: 8319801: Recursive lightweight locking: aarch64 implementation In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 13:36:51 GMT, Andrew Haley wrote: > > The aarch64 C2 port tries to avoid stronger memory semantics where ever possible. In C2 lock it first does a relaxed load of the mark word to check for inflation. Both lock and unlock uses a load/store exclusive register pair to transition the mark word. > > It's probably not a good idea to use load/store exclusive, because recent AArch64 implementations scale very badly under contention. Better to use atomic update instructions. We have found on some hardware the LSE instructions have terrible performance in the un-contended case. It is worth remembering that currently the lightweight mode is only for un-contended locking. As soon as we encounter contention we inflate to a monitor. (Caveat, if the lock is unlocked during the runtime transition we may still stay in the lightweight mode). Until we add some sort of spinning in the runtime which attempts to stay in the lightweight mode for small critical sections with contention, I am unsure if this is a concern. Similarly when it comes to non locking bits contention. Currently this is only for hashCodes instalment. The GC bits are currently only used at safepoints (maybe an issue for Shenandoah, can't recall of the top of my head if they use the age bits outside safepoints). When running this on hardware where LSE was not a concern we saw no detrimental effects, and on hardware where LSE had issues we saw significant gains. It would be very good to know if there are hardware and program combinations for which this approach shows regressions. We have not seen this so far. The inflated case still uses LSE. (CAS on the owner field). ------------- PR Comment: https://git.openjdk.org/jdk/pull/16608#issuecomment-1811967889 From aboldtch at openjdk.org Wed Nov 15 08:00:50 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 15 Nov 2023 08:00:50 GMT Subject: RFR: 8319801: Recursive lightweight locking: aarch64 implementation [v2] In-Reply-To: References: Message-ID: > Implements the aarch64 port of JDK-8319796. > > There are two major parts for the port implementation. The C2 part, and the part shared by the interpreter, C1 and the native call wrapper. > > The biggest change for both parts is that we check the lock stack first and if it is a recursive lightweight [un]lock and in that case simply pop/push and finish successfully. > > Only if the recursive lightweight [un]lock fails does it look at the mark word. > > For the shared part if it is an unstructured exit, the monitor is inflated or the mark word transition fails it calls into the runtime. > > The C2 operates under a few more assumptions, that the locking is structured and balanced. This means that some checks can be elided. > > First this means that in C2 unlock if the obj is not on the top of the lock stack, it must be inflated. And reversely if we reach the inflated C2 unlock the obj is not on the lock stack. This second property makes it possible to avoid reading the owner (and checking if it is anonymous). Instead it can either just do an un-contended unlock by writing null to the owner, or if contention happens, simply write the thread to the owner and jump to the runtime. > > The aarch64 C2 port tries to avoid stronger memory semantics where ever possible. In C2 lock it first does a relaxed load of the mark word to check for inflation. Both lock and unlock uses a load/store exclusive register pair to transition the mark word. Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge remote-tracking branch 'upstream_jdk/pr/16606' into JDK-8319801 - 8319801: Recursive lightweight locking: aarch64 implementation - Cleanup: C2 fast_lock/fast_unlock aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16608/files - new: https://git.openjdk.org/jdk/pull/16608/files/dc25aff5..1e7a586c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16608&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16608&range=00-01 Stats: 599640 lines in 659 files changed: 76230 ins; 464498 del; 58912 mod Patch: https://git.openjdk.org/jdk/pull/16608.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16608/head:pull/16608 PR: https://git.openjdk.org/jdk/pull/16608 From mbaesken at openjdk.org Wed Nov 15 08:45:35 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 15 Nov 2023 08:45:35 GMT Subject: RFR: JDK-8313764: Offer JVM HS functionality to shared lib load operations done by the JDK codebase [v2] In-Reply-To: References: Message-ID: On Wed, 23 Aug 2023 15:18:03 GMT, Matthias Baesken wrote: >> Currently there is a number of functionality that would be interesting to have for shared lib load operations in the JDK C code. >> Some examples : >> Events::log_dll_message for hs-err files reporting >> JFR event NativeLibraryLoad >> There is the need to update the shared lib Cache on AIX ( see LoadedLibraries::reload() , see also https://bugs.openjdk.org/browse/JDK-8314152 ), >> this is currently not fully in sync with libs loaded form jdk c-libs and sometimes reports outdated information >> >> Offer an interface (e.g. jvm.cpp) to support this. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > windows aarch64 build issues In the meantime, https://bugs.openjdk.org/browse/JDK-8295159 was integrated. There we restore on x86_64 and aarch the floating point environment (fenv_t) after loading libs in HS os::dll_load / dlopen_helper to avoid 'silent' manipulation of the fenv by 'bad' shared libs. But unfortunately this does not work for libs not going through this coding (like the ones loaded from JDK c code). (And even if restoring in these additional cases is not wanted, we could at least *warn* about the change by a trace .) Might be worth considering in context of this change . ------------- PR Comment: https://git.openjdk.org/jdk/pull/15264#issuecomment-1812023112 From rehn at openjdk.org Wed Nov 15 08:46:35 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 15 Nov 2023 08:46:35 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v6] In-Reply-To: <9a-07LkAZPw_SiOpo1uO8uA9i66AVNi9OkJjNZfSHU0=.8f4feee9-2b76-4006-a5ca-f0ce2d71c6b0@github.com> References: <9a-07LkAZPw_SiOpo1uO8uA9i66AVNi9OkJjNZfSHU0=.8f4feee9-2b76-4006-a5ca-f0ce2d71c6b0@github.com> Message-ID: On Tue, 14 Nov 2023 14:37:58 GMT, Aleksey Shipilev wrote: >> See the symptoms, reproducer and analysis in the bug. >> >> Current code waits on `disarm()`, which effectively stalls leaving the safepoint if some threads lag behind. Having more runnable threads than CPUs nearly guarantees that we would wait for quite some time, but it also reproduces well if you have enough threads near the CPU count. Just waiting at `arm()` is insufficient, but we can have several `Semaphores` to do what we want. >> >> This PR implements a more efficient `GenericWaitBarrier` to recover the performance. Most of the implementation discussion is in the code comments. The key observation that drives this work is that we want to reuse `Semaphore` and related counters without being stuck waiting for threads to leave. >> >> (AFAICS, futex-based `LinuxWaitBarrier` does roughly the same, but handles this reuse on futex side, by assigning the "address" per futex.) >> >> This issue affects everything except Linux. I initially found this on my M1 Mac, but pretty sure it is easy to reproduce on Windows as well. The safepoints from the reproducer in the bug improved dramatically on a Mac. Note not only the orders of magnitude better safepoint times, but also the several times more GC safepoints in the time-bound allocation test, which means the attainable GC throughput is similarly better, since we don't waste time at this wait barrier. >> >> ![plot-generic-wait-barrier-macos](https://github.com/openjdk/jdk/assets/1858943/3440f200-2360-482a-b8b2-71385cca35b7) >> >> Additional testing: >> - [x] MacOS AArch64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] MacOS AArch64 server fastdebug, `tier2 tier3` >> - [x] Linux x86_64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - Drop the Linux check in preparation for integration > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Rework paddings > - Encode barrier tag into state, resolving another race condition > - Simple review feedback fixes: tracking wakeup numbers, reflowing some methods > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Touchups > - More comments work > - Tight up the comments > - ... and 3 more: https://git.openjdk.org/jdk/compare/f38c4ddc...191c0dbb @pchilano can you have look ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1812026975 From stefank at openjdk.org Wed Nov 15 09:22:40 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 15 Nov 2023 09:22:40 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v10] In-Reply-To: References: Message-ID: On Tue, 14 Nov 2023 14:43:16 GMT, Patricio Chilano Mateo wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Tweak test > > test/hotspot/jtreg/runtime/Monitor/ConcurrentDeflation.java line 79: > >> 77: monitorCount++; >> 78: } >> 79: > > Nit: extra line Fixed. While fixing this I noticed that the test uses `System.currentTimeMillis()`, which can cause problems when the clock is changed. I'm changing it to use System.nanoTime() instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1393896924 From sjohanss at openjdk.org Wed Nov 15 09:39:46 2023 From: sjohanss at openjdk.org (Stefan Johansson) Date: Wed, 15 Nov 2023 09:39:46 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v42] In-Reply-To: References: Message-ID: On Tue, 14 Nov 2023 21:33:37 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Update parallel workers time after Remark Thanks for addressing my comments. I have a few more things: - I think all changes to `test_g1ServiceThread.cpp` can be reverted. Should not be needed now - Please fix all whitespace issues - Should we move the VMThread and StringDedup counters into `CPUTimeCounters` as well? Any problem with this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15082#issuecomment-1812105608 From azafari at openjdk.org Wed Nov 15 09:41:38 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Wed, 15 Nov 2023 09:41:38 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v3] In-Reply-To: References: Message-ID: On Sun, 29 Oct 2023 08:07:55 GMT, Kim Barrett wrote: >> I still approve of this patch as it's better than what we had before. There are a lot of suggested improvements that can be done either in this PR or in a future RFE. `git blame` shows that this hasn't been touched since 2008, so I don't think applying all suggestions now is in any sense critical :-). > >> I still approve of this patch as it's better than what we had before. There are a lot of suggested improvements that can be done either in this PR or in a future RFE. `git blame` shows that this hasn't been touched since 2008, so I don't think applying all suggestions now is in any sense critical :-). > > Not touched since 2008 suggests to me there might not be a rush to make the change as proposed, and instead take > the (I think small) additional time to do the better thing, e.g. the unary-predicate suggestion made by several folks. @kimbarrett , @dholmes-ora , @merykitty Is there any comment on this PR? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15418#issuecomment-1812108032 From stefank at openjdk.org Wed Nov 15 09:46:21 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 15 Nov 2023 09:46:21 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v11] In-Reply-To: References: Message-ID: > A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: More tweaks to the test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16519/files - new: https://git.openjdk.org/jdk/pull/16519/files/e90e81ff..df04ca04 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16519&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16519&range=09-10 Stats: 7 lines in 1 file changed: 0 ins; 2 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16519.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16519/head:pull/16519 PR: https://git.openjdk.org/jdk/pull/16519 From duke at openjdk.org Wed Nov 15 09:48:46 2023 From: duke at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 15 Nov 2023 09:48:46 GMT Subject: RFR: 8318480: UseCounterDecay and CounterDecayMinIntervalLength are unused and should be removed Message-ID: This changeset deprecates the leftover (i.e., no longer used for anything) product compiler flag `UseCounterDecay` (requires CSR) and removes the leftover develop flag `CounterDecayMinIntervalLength`. Changes: - Deprecate `UseCounterDecay` in JDK 22, obsolete it in JDK 23, and expire it in JDK 24. The flag is, in fact, already obsolete, so I've also removed it from the source code (except for the definition in `globals.hpp` which must remain until obsoletion). - Completely remove `CounterDecayMinIntervalLength`. ### Testing Platforms: windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64. - `tier1` - HotSpot parts of `tier2` and `tier3` ------------- Commit messages: - Fix issue Changes: https://git.openjdk.org/jdk/pull/16673/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16673&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318480 Stats: 38 lines in 21 files changed: 1 ins; 16 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/16673.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16673/head:pull/16673 PR: https://git.openjdk.org/jdk/pull/16673 From mli at openjdk.org Wed Nov 15 09:54:43 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 15 Nov 2023 09:54:43 GMT Subject: RFR: 8319781: RISC-V: Refactor UseRVV related checks In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 12:36:31 GMT, Robbin Ehn wrote: >> Hi, >> Can you review the patch to refactor the code related UseRVV checks? >> Thanks! >> >> There are some code (flag setting/checking) depending on UseRVV's value, these code should be refactored, especially after the change of https://bugs.openjdk.org/browse/JDK-8319408: >> 1. some code needs to get the final UseRVV's value rather than the initial value, e.g. ChaCha20 intrinsic. >> 2. refactored to be more readable. >> 3. also add note to make sure the future code get the final UseRVV value instead of inital value. > >> > Hey, how is the changes to SpecialEncodeISOArray related ? >> >> This patch is also to respond the comment at [#16481 (comment)](https://github.com/openjdk/jdk/pull/16481#discussion_r1386040152) > > Ok, looks good! @robehn @luhenry @RealFYang Thanks for your reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16580#issuecomment-1812124726 From mli at openjdk.org Wed Nov 15 09:54:43 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 15 Nov 2023 09:54:43 GMT Subject: Integrated: 8319781: RISC-V: Refactor UseRVV related checks In-Reply-To: References: Message-ID: <_8T8sYTFxWdDvfLKnx8eHnKl5mRUpMBhD0aO7nf0wtE=.65d7eb53-8675-4b53-8f6c-eac181e827cc@github.com> On Thu, 9 Nov 2023 10:30:41 GMT, Hamlin Li wrote: > Hi, > Can you review the patch to refactor the code related UseRVV checks? > Thanks! > > There are some code (flag setting/checking) depending on UseRVV's value, these code should be refactored, especially after the change of https://bugs.openjdk.org/browse/JDK-8319408: > 1. some code needs to get the final UseRVV's value rather than the initial value, e.g. ChaCha20 intrinsic. > 2. refactored to be more readable. > 3. also add note to make sure the future code get the final UseRVV value instead of inital value. This pull request has now been integrated. Changeset: fac6b516 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/fac6b51699d71440a38c24dfa1594476cb073873 Stats: 53 lines in 2 files changed: 21 ins; 29 del; 3 mod 8319781: RISC-V: Refactor UseRVV related checks Reviewed-by: rehn, fyang ------------- PR: https://git.openjdk.org/jdk/pull/16580 From aph at openjdk.org Wed Nov 15 10:00:35 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 15 Nov 2023 10:00:35 GMT Subject: RFR: 8319801: Recursive lightweight locking: aarch64 implementation In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 07:56:51 GMT, Axel Boldt-Christmas wrote: > > > The aarch64 C2 port tries to avoid stronger memory semantics where ever possible. In C2 lock it first does a relaxed load of the mark word to check for inflation. Both lock and unlock uses a load/store exclusive register pair to transition the mark word. > > > > > > It's probably not a good idea to use load/store exclusive, because recent AArch64 implementations scale very badly under contention. Better to use atomic update instructions. > > We have found on some hardware the LSE instructions have terrible performance in the un-contended case. Hmm. Which hardware is this? This is stuff I need to be aware of. Please contact me off-line if it's hard to say in public. > When running this on hardware where LSE was not a concern we saw no detrimental effects, and on hardware where LSE had issues we saw significant gains. This justification needs a comment in the code. Otherwise this use of non-LSE, as far as I recall the only use in the entire back end, is very surprising to the reader. At least, to this reader. > It would be very good to know if there are hardware and program combinations for which this approach shows regressions. We have not seen this so far. > > The inflated case still uses LSE. (CAS on the owner field). I guess the real question is how far we should go to accommodate a hardware manufacturer who has messed up their implementation. But that's a question for another day; if it is clearly explained in comments in the code why we're using ldx/stx I guess we can live with it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16608#issuecomment-1812137090 From aph at openjdk.org Wed Nov 15 10:07:31 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 15 Nov 2023 10:07:31 GMT Subject: RFR: 8319801: Recursive lightweight locking: aarch64 implementation [v2] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 07:38:43 GMT, Axel Boldt-Christmas wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6344: >> >>> 6342: >>> 6343: // Try to lock. Transition lock bits 0b01 => 0b00 >>> 6344: assert(oopDesc::mark_offset_in_bytes() == 0, "required to avoid lea"); >> >> It might be cleaner just to put in the `lea`. I believe that nothing will be emitted if the addend is zero. It's up to you. > > It is only a nop if we load it into obj. And the current contract is that we do not change the value in obj. So we would still have to assert that the mark offset it 0 or we break the contract. OK, so it's not just about avoiding an LEA. The mark word is never going to move, so it doesn't matter. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16608#discussion_r1393961126 From dchuyko at openjdk.org Wed Nov 15 10:38:44 2023 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Wed, 15 Nov 2023 10:38:44 GMT Subject: RFR: 8309271: A way to align already compiled methods with compiler directives [v10] In-Reply-To: References: Message-ID: > Compiler Control (https://openjdk.org/jeps/165) provides method-context dependent control of the JVM compilers (C1 and C2). The active directive stack is built from the directive files passed with the `-XX:CompilerDirectivesFile` diagnostic command-line option and the Compiler.add_directives diagnostic command. It is also possible to clear all directives or remove the top from the stack. > > A matching directive will be applied at method compilation time when such compilation is started. If directives are added or changed, but compilation does not start, then the state of compiled methods doesn't correspond to the rules. This is not an error, and it happens in long running applications when directives are added or removed after compilation of methods that could be matched. For example, the user decides that C2 compilation needs to be disabled for some method due to a compiler bug, issues such a directive but this does not affect the application behavior. In such case, the target application needs to be restarted, and such an operation can have high costs and risks. Another goal is testing/debugging compilers. > > It would be convenient to optionally reconcile at least existing matching nmethods to the current stack of compiler directives (so bypass inlined methods). > > Natural way to eliminate the discrepancy between the result of compilation and the broken rule is to discard the compilation result, i.e. deoptimization. Prior to that we can try to re-compile the method letting compile broker to perform it taking new directives stack into account. Re-compilation helps to prevent hot methods from execution in the interpreter. > > A new flag `-r` has beed introduced for some directives related to compile commands: `Compiler.add_directives`, `Compiler.remove_directives`, `Compiler.clear_directives`. The default behavior has not changed (no flag). If the new flag is present, the command scans already compiled methods and puts methods that have any active non-default matching compiler directives to re-compilation if possible, otherwise marks them for deoptimization. There is currently no distinction which directives are found. In particular, this means that if there are rules for inlining into some method, it will be refreshed. On the other hand, if there are rules for a method and it was inlined, top-level methods won't be refreshed, but this can be achieved by having rules for them. > > In addition, a new diagnostic command `Compiler.replace_directives`, has been added for ... Dmitry Chuyko has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: - Merge branch 'openjdk:master' into compiler-directives-force-update - Merge branch 'openjdk:master' into compiler-directives-force-update - Merge branch 'openjdk:master' into compiler-directives-force-update - Merge branch 'openjdk:master' into compiler-directives-force-update - Merge branch 'openjdk:master' into compiler-directives-force-update - jcheck - Unnecessary import - force_update->refresh - Merge branch 'openjdk:master' into compiler-directives-force-update - Use only top directive for add/remove; better mutex rank definition; texts - ... and 18 more: https://git.openjdk.org/jdk/compare/2e34a2eb...37c50c74 ------------- Changes: https://git.openjdk.org/jdk/pull/14111/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14111&range=09 Stats: 372 lines in 15 files changed: 339 ins; 3 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/14111.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14111/head:pull/14111 PR: https://git.openjdk.org/jdk/pull/14111 From mli at openjdk.org Wed Nov 15 11:16:30 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 15 Nov 2023 11:16:30 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v2] In-Reply-To: <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> References: <9i5yHmpRi3-XqL5lw0-0IexhCDr2FOi5nT4dgY7cWao=.ab8a1d6e-c9fc-4108-820b-374ce7815463@github.com> <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> Message-ID: On Tue, 14 Nov 2023 15:40:30 GMT, Hamlin Li wrote: >> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: >> >> Minor cosmetic fixes. > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1491: > >> 1489: beqz(cnt, DONE); >> 1490: >> 1491: ld(pow31_1_2, ExternalAddress(StubRoutines::riscv::arrays_hashcode_powers_of_31() > > Does `mv` of the power values to pow31_1_2 do the same effect as the `ld` here? If it does, mv might be better than ld. Seems not for a 64bit immediate by current cost model of riscv in jdk. Please ignore this suggestion, and below one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1394058586 From stuefe at openjdk.org Wed Nov 15 12:39:12 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 15 Nov 2023 12:39:12 GMT Subject: RFR: JDK-8314890: Reduce number of loads for Klass decoding in static code [v10] In-Reply-To: References: Message-ID: <_BRw5Z5g-IMJItThLw_WKxewhL-AhqvoiqUmA12P0ek=.d84141cf-868b-4aa7-9f67-168536fefb7f@github.com> > Small change that reduces the number of loads generated by the C++ compiler for a narrow Klass decoding operation (`CompressedKlassPointers::decode_xxx()`. > > Stock: three loads (with two probably sharing a cache line) - UseCompressedClassPointers, encoding base and shift. > > > 8b7b62: 48 8d 05 7f 1b c3 00 lea 0xc31b7f(%rip),%rax # 14e96e8 > 8b7b69: 0f b6 00 movzbl (%rax),%eax > 8b7b6c: 84 c0 test %al,%al > 8b7b6e: 0f 84 9c 00 00 00 je 8b7c10 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> > 8b7b74: 48 8d 15 05 62 c6 00 lea 0xc66205(%rip),%rdx # 151dd80 <_ZN23CompressedKlassPointers6_shiftE> > 8b7b7b: 8b 7b 08 mov 0x8(%rbx),%edi > 8b7b7e: 8b 0a mov (%rdx),%ecx > 8b7b80: 48 8d 15 01 62 c6 00 lea 0xc66201(%rip),%rdx # 151dd88 <_ZN23CompressedKlassPointers5_baseE> > 8b7b87: 48 d3 e7 shl %cl,%rdi > 8b7b8a: 48 03 3a add (%rdx),%rdi > > > Patched: one load loads all three. Since shift occupies the lowest 8 bits, compiled code uses 8bit register; ditto the UseCompressedOops flag. > > > 8ba302: 48 8d 05 97 9c c2 00 lea 0xc29c97(%rip),%rax # 14e3fa0 <_ZN23CompressedKlassPointers6_comboE> > 8ba309: 48 8b 08 mov (%rax),%rcx > 8ba30c: f6 c5 01 test $0x1,%ch # use compressed klass pointers? > 8ba30f: 0f 84 9b 00 00 00 je 8ba3b0 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> > 8ba315: 8b 7b 08 mov 0x8(%rbx),%edi > 8ba318: 48 d3 e7 shl %cl,%rdi # shift > 8ba31b: 66 31 c9 xor %cx,%cx # zero out lower 16 bits of base > 8ba31e: 48 01 cf add %rcx,%rdi # add base > 8ba321: 8b 4f 08 mov 0x8(%rdi),%ecx > > --- > > Performance measurements: > > G1, doing a full GC over a heap filled with 256 mio life j.l.Object instances. > > I see a reduction of Full Pause times between 1.2% and 5%. I am unsure how reliable these numbers are since, despite my efforts (running tests on isolated CPUs etc.), the standard deviation was quite high at ?4%. Still, in general, numbers seemed to go down rather than up. > > --- > > Future extensions: > > This patch uses the fact that the encoding base is aligned to metaspace reserve alignment (16 Mb). We only use 16 of those 24 bits of alignment shadow and could us... Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/oops/compressedKlass.cpp Co-authored-by: Aleksey Shipil?v ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15389/files - new: https://git.openjdk.org/jdk/pull/15389/files/56cde2a9..18dad587 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15389&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15389&range=08-09 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15389.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15389/head:pull/15389 PR: https://git.openjdk.org/jdk/pull/15389 From stuefe at openjdk.org Wed Nov 15 12:43:35 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 15 Nov 2023 12:43:35 GMT Subject: RFR: JDK-8314890: Reduce number of loads for Klass decoding in static code [v9] In-Reply-To: References: Message-ID: On Wed, 8 Nov 2023 10:36:59 GMT, Aleksey Shipilev wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Renamed _combo >> - Merge branch 'master' into optimize-narrow-klass-decoding-in-c++ >> - simplify assert >> - add comment >> - Update src/hotspot/share/oops/compressedKlass.hpp >> >> Co-authored-by: Aleksey Shipil?v >> - Update src/hotspot/share/oops/compressedKlass.cpp >> >> Co-authored-by: Aleksey Shipil?v >> - Update src/hotspot/share/oops/compressedKlass.cpp >> >> Co-authored-by: Aleksey Shipil?v >> - Update src/hotspot/share/oops/compressedKlass.cpp >> >> Co-authored-by: Aleksey Shipil?v >> - Merge branch 'openjdk:master' into optimize-narrow-klass-decoding-in-c++ >> - Merge branch 'openjdk:master' into optimize-narrow-klass-decoding-in-c++ >> - ... and 6 more: https://git.openjdk.org/jdk/compare/9864951d...56cde2a9 > > src/hotspot/share/oops/compressedKlass.cpp line 45: > >> 43: assert(theshift == 0 || theshift == LogKlassAlignmentInBytes, "invalid shift for klass ptrs"); >> 44: _base = thebase; >> 45: _shift = theshift; > > Do we even need `_base` and `_shift` as separate fields after this change then? No, I don't think so. Lets remove it. > src/hotspot/share/oops/compressedKlass.hpp line 67: > >> 65: // - Bit [0-7] shift >> 66: // - Bit 8 UseCompressedClassPointers >> 67: // - Bits [16-64] the base. > > Suggestion: > > // - Bit [0-7] shift > // - Bit 8 UseCompressedClassPointers > // - Bits [16-64] heap base Let's call it encoding base and encoding shift. It is not the heap base either. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15389#discussion_r1394144917 PR Review Comment: https://git.openjdk.org/jdk/pull/15389#discussion_r1394146454 From aboldtch at openjdk.org Wed Nov 15 12:54:34 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 15 Nov 2023 12:54:34 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v11] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 09:46:21 GMT, Stefan Karlsson wrote: >> A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > More tweaks to the test Marked as reviewed by aboldtch (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16519#pullrequestreview-1731967394 From stuefe at openjdk.org Wed Nov 15 12:55:33 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 15 Nov 2023 12:55:33 GMT Subject: RFR: JDK-8314890: Reduce number of loads for Klass decoding in static code [v9] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 12:39:20 GMT, Thomas Stuefe wrote: >> src/hotspot/share/oops/compressedKlass.cpp line 45: >> >>> 43: assert(theshift == 0 || theshift == LogKlassAlignmentInBytes, "invalid shift for klass ptrs"); >>> 44: _base = thebase; >>> 45: _shift = theshift; >> >> Do we even need `_base` and `_shift` as separate fields after this change then? > > No, I don't think so. Lets remove it. Wait, yes, we still need it for VMStructs. Would require duplicating the decoding logic on the VMStructs consumer side, which I would rather not. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15389#discussion_r1394159945 From thartmann at openjdk.org Wed Nov 15 13:06:30 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 15 Nov 2023 13:06:30 GMT Subject: RFR: 8318480: UseCounterDecay and CounterDecayMinIntervalLength are unused and should be removed In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 09:40:46 GMT, Daniel Lund?n wrote: > This changeset deprecates the leftover (i.e., no longer used for anything) product compiler flag `UseCounterDecay` (requires CSR) and removes the leftover develop flag `CounterDecayMinIntervalLength`. > > Changes: > - Deprecate `UseCounterDecay` in JDK 22, obsolete it in JDK 23, and expire it in JDK 24. The flag is, in fact, already obsolete, so I've also removed it from the source code (except for the definition in `globals.hpp` which must remain until obsoletion). > - Completely remove `CounterDecayMinIntervalLength`. > > ### Testing > Platforms: windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64. > - `tier1` > - HotSpot parts of `tier2` and `tier3` I think the description of `UseCounterDecay` in globals.hpp should be changed to `"Adjust recompilation counters (deprecated)"` ------------- PR Review: https://git.openjdk.org/jdk/pull/16673#pullrequestreview-1731989319 From stuefe at openjdk.org Wed Nov 15 13:11:46 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 15 Nov 2023 13:11:46 GMT Subject: RFR: JDK-8314890: Reduce number of loads for Klass decoding in static code [v11] In-Reply-To: References: Message-ID: > Small change that reduces the number of loads generated by the C++ compiler for a narrow Klass decoding operation (`CompressedKlassPointers::decode_xxx()`. > > Stock: three loads (with two probably sharing a cache line) - UseCompressedClassPointers, encoding base and shift. > > > 8b7b62: 48 8d 05 7f 1b c3 00 lea 0xc31b7f(%rip),%rax # 14e96e8 > 8b7b69: 0f b6 00 movzbl (%rax),%eax > 8b7b6c: 84 c0 test %al,%al > 8b7b6e: 0f 84 9c 00 00 00 je 8b7c10 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> > 8b7b74: 48 8d 15 05 62 c6 00 lea 0xc66205(%rip),%rdx # 151dd80 <_ZN23CompressedKlassPointers6_shiftE> > 8b7b7b: 8b 7b 08 mov 0x8(%rbx),%edi > 8b7b7e: 8b 0a mov (%rdx),%ecx > 8b7b80: 48 8d 15 01 62 c6 00 lea 0xc66201(%rip),%rdx # 151dd88 <_ZN23CompressedKlassPointers5_baseE> > 8b7b87: 48 d3 e7 shl %cl,%rdi > 8b7b8a: 48 03 3a add (%rdx),%rdi > > > Patched: one load loads all three. Since shift occupies the lowest 8 bits, compiled code uses 8bit register; ditto the UseCompressedOops flag. > > > 8ba302: 48 8d 05 97 9c c2 00 lea 0xc29c97(%rip),%rax # 14e3fa0 <_ZN23CompressedKlassPointers6_comboE> > 8ba309: 48 8b 08 mov (%rax),%rcx > 8ba30c: f6 c5 01 test $0x1,%ch # use compressed klass pointers? > 8ba30f: 0f 84 9b 00 00 00 je 8ba3b0 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> > 8ba315: 8b 7b 08 mov 0x8(%rbx),%edi > 8ba318: 48 d3 e7 shl %cl,%rdi # shift > 8ba31b: 66 31 c9 xor %cx,%cx # zero out lower 16 bits of base > 8ba31e: 48 01 cf add %rcx,%rdi # add base > 8ba321: 8b 4f 08 mov 0x8(%rdi),%ecx > > --- > > Performance measurements: > > G1, doing a full GC over a heap filled with 256 mio life j.l.Object instances. > > I see a reduction of Full Pause times between 1.2% and 5%. I am unsure how reliable these numbers are since, despite my efforts (running tests on isolated CPUs etc.), the standard deviation was quite high at ?4%. Still, in general, numbers seemed to go down rather than up. > > --- > > Future extensions: > > This patch uses the fact that the encoding base is aligned to metaspace reserve alignment (16 Mb). We only use 16 of those 24 bits of alignment shadow and could us... Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: - comment change - rename _compressionInfo->_compression_info ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15389/files - new: https://git.openjdk.org/jdk/pull/15389/files/18dad587..93b33cd2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15389&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15389&range=09-10 Stats: 8 lines in 2 files changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/15389.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15389/head:pull/15389 PR: https://git.openjdk.org/jdk/pull/15389 From goetz at openjdk.org Wed Nov 15 13:12:32 2023 From: goetz at openjdk.org (Goetz Lindenmaier) Date: Wed, 15 Nov 2023 13:12:32 GMT Subject: RFR: JDK-8319927: Add some logging after 8295159 In-Reply-To: References: Message-ID: <1d6nPNHxOOJTHsfNi7rqgKCvSQ4SJrCXqmKZWdsdxa4=.f6af34e3-11eb-46ba-bd1e-8b06e9e818b6@github.com> On Fri, 10 Nov 2023 16:06:18 GMT, Matthias Baesken wrote: > [JDK-8295159](https://bugs.openjdk.org/browse/JDK-8295159) added some IEEE conformance checks and corrections on Linux and macOS/BSD , however in case of issues no logging is done, this should be improved. LGTM ------------- Marked as reviewed by goetz (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16618#pullrequestreview-1731998281 From duke at openjdk.org Wed Nov 15 13:39:51 2023 From: duke at openjdk.org (Thomas Obermeier) Date: Wed, 15 Nov 2023 13:39:51 GMT Subject: RFR: 8319542: Fix boundaries of region to be tested with os::is_readable_range Message-ID: <3Yl9BkYAP40CteNLOZCwOkKD5QEZh2yYiAqtTgLeOyI=.bf810bf7-6d0d-4f7e-bded-0e6e304a569c@github.com> PR https://github.com/openjdk/jdk/pull/16381 was already closed when it became obvious that usage of os::is_readable_range() was slightly wrong: the " - 1" looks wrong here, because is_readable_range() checks for < to, not <= to. ------------- Commit messages: - JDK-8319542: repair boundaries to be tested Changes: https://git.openjdk.org/jdk/pull/16676/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16676&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319542 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16676.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16676/head:pull/16676 PR: https://git.openjdk.org/jdk/pull/16676 From mli at openjdk.org Wed Nov 15 13:49:32 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 15 Nov 2023 13:49:32 GMT Subject: RFR: 8318158: RISC-V: implement roundD/roundF intrinsics [v2] In-Reply-To: <0OLIgmsu48B6zYK0VC0XMY7pase4MMPvioHt_Pu-Y0U=.552de390-cfbc-45ad-ad00-c4e172b970b2@github.com> References: <5i-hk8Vm8mfogwTT8eQv9PV41MGRZ0P8JkoogXyzovY=.b305d21d-934c-4ce7-9206-6bd32e926b42@github.com> <1o0ZvsGehXm52tpvB1okWb0OKM1R7B-dJZpLXRC-oA0=.a4f3ad9e-26c8-4cfe-b167-4954230045dc@github.com> <0OLIgmsu48B6zYK0VC0XMY7pase4MMPvioHt_Pu-Y0U=.552de390-cfbc-45ad-ad00-c4e172b970b2@github.com> Message-ID: On Mon, 13 Nov 2023 23:01:30 GMT, Olga Mikhaltsova wrote: >>> I have consulted with our h/w team and they told me next: multi-issue FP Unit can process few (data-independent) fp instructions at a time, even if they have different rounding mode. The only issue is when the rounding mode is set to dynamic rounding (aka get it from csr), but it's not our case here >> >> OK. Do what you will, but you're coding for an architecture not an implementation, and this code may stand for many years. > > First I've tried to implement rounding on riscv similar to aarch64, separated negative and positive numbers but branching is too expensive and I got not much performance improvement against the current java implementation. > > I like Vladimir's idea, the algorithm gives significant performance improvement and why not to take this advantage. But the above mentioned fix, related to fadd_s(), should be made. Not all numbers are processed correctly without it. Having changed rounding mode of fadd_s() we've got two sequential instructions with the same rounding mode - rdn. > > The performance improvement with the last fix on the T-Head RVB-ICE board is as follow: > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.test_round_double 2048 thrpt 15 111.278 0.349 ops/ms > FpRoundingBenchmark.test_round_float 2048 thrpt 15 115.776 0.323 ops/ms > > @theRealAph As you suggested I've also tested this algorithm on the full 32-bit range [0; 0xFFFFFFFF] using Float.intBitsToFloat(x) and all the numbers were processed correctly. The output of this algorithm is equal to the current java Math.round() implementation output. In typical implementations, writes to the dynamic rounding mode CSR state will serialize the pipeline. Static rounding modes are used to implement specialized arithmetic operations that often have to switch frequently between different rounding modes. -- from `?F? Standard Extension` Seems to me, the static rounding is fine in riscv arch? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16382#discussion_r1394222656 From duke at openjdk.org Wed Nov 15 13:58:41 2023 From: duke at openjdk.org (Luis Barreiro) Date: Wed, 15 Nov 2023 13:58:41 GMT Subject: RFR: 8316180: Thread-local backoff for secondary_super_cache updates [v13] In-Reply-To: References: Message-ID: On Thu, 2 Nov 2023 18:13:38 GMT, Aleksey Shipilev wrote: >> See more details in the bug and related issues. >> >> This is the attempt to mitigate [JDK-8180450](https://bugs.openjdk.org/browse/JDK-8180450), while the more complex fix that would obviate the need for `secondary_super_cache` is being worked out. The goal for this fix is to improve performance in pathological cases, while keeping non-pathological cases out of extra risk, *and* staying simple enough and reliable for backports to currently supported JDK releases. >> >> This implements mitigation on most current architectures: >> - ? x86_64: implemented >> - ? x86_32: considered, abandoned; cannot be easily done without blowing up code size >> - ? AArch64: implemented >> - ? ARM32: considered, abandoned; needs cleanups and testing; see [JDK-8318414](https://bugs.openjdk.org/browse/JDK-8318414) >> - ? PPC64: implemented, thanks @TheRealMDoerr >> - ? S390: implemented, thanks @offamitkumar >> - ? RISC-V: implemented, thanks @RealFYang >> - ? Zero: does not need implementation >> >> Note that the code is supposed to be rather compact, because it is inlined in generated code. That is why, for example, we cannot easily do x86_32 version: we need a thread, so the easiest way would be to call into VM. But we cannot that easily: the code blowout would make some forward branches in external code non-short. I think we we cannot implement this mitigation on some architectures, so be it, it would be a sensible tradeoff for simplicity. >> >> Setting backoff at `0` effectively disables the mitigation, and gives us safety hatch if something goes wrong. >> >> I believe we can go in with `1000` as the default, given the experimental results mentioned in this PR. >> >> Additional testing: >> - [x] Linux x86_64 fastdebug, `tier1 tier2 tier3` >> - [x] Linux AArch64 fastdebug, `tier1 tier2 tier3` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 29 additional commits since the last revision: > > - Merge branch 'master' into JDK-8316180-backoff-secondary-super > - Improve benchmarks > - Merge branch 'master' into JDK-8316180-backoff-secondary-super > - Editorial cleanups > - RISC-V implementation > - Mention ARM32 bug > - Make sure benchmark runs with C1 > - Merge branch 'master' into JDK-8316180-backoff-secondary-super > - Touchup benchmark metadata > - S390 implementation > - ... and 19 more: https://git.openjdk.org/jdk/compare/fcf8687c...74921ea9 I have tested this patch in isolation, by running the techempower benchmark on a quarkus version that is heavily impacted by the type pollution issue (`JDK-8316180`). See [1] for the exact commit used for the benchmark. The procedure has been to back-port the patch to the latest JDK 21 build (`jdk-21+35`) (that is the branch in [2]). 6 values for the backoff (-XX:SecondarySuperMissBackoff property) were used: `0`, `1`, `10`, `100`, `1000`, `10000`. The results for the `plaintext` test can be summarized in the graphs below: ![Screenshot_patch](https://github.com/openjdk/jdk/assets/856614/c67e26b1-da9b-4cb9-8f30-6606a00d2d43) None of the test of the benchmark showed any regression. When comparing with JDK 22 (`jdk-22+21`), this patch can still offer a significant improvement. ![Screenshot_compare](https://github.com/openjdk/jdk/assets/856614/7b3fc64c-12c0-496e-a0b7-29fa228120b0) The improved performance of JDK22 can be attributed, but not limited, to `JDK-8308869` that improves C2 type awareness and that is effective for this use case. Other optimizations that have since been merged can impact this comparison between results as well. [1] - https://github.com/TechEmpower/FrameworkBenchmarks/tree/0db323061e4e258d1ce66a34ea2132f8beef5cc8 [2] - https://github.com/barreiro/jdk/tree/shipilev_patch ------------- PR Comment: https://git.openjdk.org/jdk/pull/15718#issuecomment-1812570392 From shade at openjdk.org Wed Nov 15 14:09:35 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 15 Nov 2023 14:09:35 GMT Subject: RFR: JDK-8314890: Reduce number of loads for Klass decoding in static code [v9] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 12:52:43 GMT, Thomas Stuefe wrote: >> No, I don't think so. Lets remove it. > > Wait, yes, we still need it for VMStructs. Would require duplicating the decoding logic on the VMStructs consumer side, which I would rather not. Oh, okay. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15389#discussion_r1394250261 From shade at openjdk.org Wed Nov 15 14:23:39 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 15 Nov 2023 14:23:39 GMT Subject: RFR: JDK-8314890: Reduce number of loads for Klass decoding in static code [v11] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 13:11:46 GMT, Thomas Stuefe wrote: >> Small change that reduces the number of loads generated by the C++ compiler for a narrow Klass decoding operation (`CompressedKlassPointers::decode_xxx()`. >> >> Stock: three loads (with two probably sharing a cache line) - UseCompressedClassPointers, encoding base and shift. >> >> >> 8b7b62: 48 8d 05 7f 1b c3 00 lea 0xc31b7f(%rip),%rax # 14e96e8 >> 8b7b69: 0f b6 00 movzbl (%rax),%eax >> 8b7b6c: 84 c0 test %al,%al >> 8b7b6e: 0f 84 9c 00 00 00 je 8b7c10 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> >> 8b7b74: 48 8d 15 05 62 c6 00 lea 0xc66205(%rip),%rdx # 151dd80 <_ZN23CompressedKlassPointers6_shiftE> >> 8b7b7b: 8b 7b 08 mov 0x8(%rbx),%edi >> 8b7b7e: 8b 0a mov (%rdx),%ecx >> 8b7b80: 48 8d 15 01 62 c6 00 lea 0xc66201(%rip),%rdx # 151dd88 <_ZN23CompressedKlassPointers5_baseE> >> 8b7b87: 48 d3 e7 shl %cl,%rdi >> 8b7b8a: 48 03 3a add (%rdx),%rdi >> >> >> Patched: one load loads all three. Since shift occupies the lowest 8 bits, compiled code uses 8bit register; ditto the UseCompressedOops flag. >> >> >> 8ba302: 48 8d 05 97 9c c2 00 lea 0xc29c97(%rip),%rax # 14e3fa0 <_ZN23CompressedKlassPointers6_comboE> >> 8ba309: 48 8b 08 mov (%rax),%rcx >> 8ba30c: f6 c5 01 test $0x1,%ch # use compressed klass pointers? >> 8ba30f: 0f 84 9b 00 00 00 je 8ba3b0 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> >> 8ba315: 8b 7b 08 mov 0x8(%rbx),%edi >> 8ba318: 48 d3 e7 shl %cl,%rdi # shift >> 8ba31b: 66 31 c9 xor %cx,%cx # zero out lower 16 bits of base >> 8ba31e: 48 01 cf add %rcx,%rdi # add base >> 8ba321: 8b 4f 08 mov 0x8(%rdi),%ecx >> >> --- >> >> Performance measurements: >> >> G1, doing a full GC over a heap filled with 256 mio life j.l.Object instances. >> >> I see a reduction of Full Pause times between 1.2% and 5%. I am unsure how reliable these numbers are since, despite my efforts (running tests on isolated CPUs etc.), the standard deviation was quite high at ?4%. Still, in general, numbers seemed to go down rather than up. >> >> --- >> >> Future extensions: >> >> This patch uses the fact that the encoding base is aligned to metaspace reser... > > Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: > > - comment change > - rename _compressionInfo->_compression_info More cosmetics, but I am good with this. Probably someone from Runtime wants to take a look as well, e.g. @coleenp or @dholmes-ora? src/hotspot/share/oops/compressedKlass.cpp line 37: > 35: // Note: initialization value is unchanged for -UseCompressedClassPointers, so > 36: // the bit mirroring UseCompressedClassPointers is off and matches the switch. > 37: uint64_t CompressedKlassPointers::_compression_info; Should we still initialize it? Suggestion: uint64_t CompressedKlassPointers::_compression_info = 0; src/hotspot/share/oops/compressedKlass.cpp line 57: > 55: assert(base() == _base, "compressionInfo encoding"); > 56: assert(shift() == _shift, "compressionInfo encoding"); > 57: assert(use_compressed_class_pointers() == true, "compressionInfo encoding"); Suggestion: assert(base() == _base, "compression_info encoding"); assert(shift() == _shift, "compression_info encoding"); assert(use_compressed_class_pointers() == true, "compression_info encoding"); src/hotspot/share/oops/compressedKlass.hpp line 67: > 65: // - Bit [0-7] encoding shift > 66: // - Bit 8 UseCompressedClassPointers > 67: // - Bits [16-64] encoding base Suggestion: // - Bits [16-64] encoding base ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15389#pullrequestreview-1732113929 PR Review Comment: https://git.openjdk.org/jdk/pull/15389#discussion_r1394267313 PR Review Comment: https://git.openjdk.org/jdk/pull/15389#discussion_r1394264929 PR Review Comment: https://git.openjdk.org/jdk/pull/15389#discussion_r1394250596 From duke at openjdk.org Wed Nov 15 14:23:50 2023 From: duke at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 15 Nov 2023 14:23:50 GMT Subject: RFR: 8318480: Deprecate UseCounterDecay and remove CounterDecayMinIntervalLength [v2] In-Reply-To: References: Message-ID: > This changeset deprecates the leftover (i.e., no longer used for anything) product compiler flag `UseCounterDecay` (requires CSR) and removes the leftover develop flag `CounterDecayMinIntervalLength`. > > Changes: > - Deprecate `UseCounterDecay` in JDK 22, obsolete it in JDK 23, and expire it in JDK 24. The flag is, in fact, already obsolete, so I've also removed it from the source code (except for the definition in `globals.hpp` which must remain until obsoletion). > - Completely remove `CounterDecayMinIntervalLength`. > > ### Testing > Platforms: windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64. > - `tier1` > - HotSpot parts of `tier2` and `tier3` Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Add deprecated to UseCounterDecay description ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16673/files - new: https://git.openjdk.org/jdk/pull/16673/files/7645b228..cd3343fe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16673&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16673&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16673.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16673/head:pull/16673 PR: https://git.openjdk.org/jdk/pull/16673 From duke at openjdk.org Wed Nov 15 14:23:50 2023 From: duke at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 15 Nov 2023 14:23:50 GMT Subject: RFR: 8318480: Deprecate UseCounterDecay and remove CounterDecayMinIntervalLength [v2] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 13:04:03 GMT, Tobias Hartmann wrote: > I think the description of `UseCounterDecay` in globals.hpp should be changed to `"Adjust recompilation counters (deprecated)"` Thanks, updated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16673#issuecomment-1812621880 From duke at openjdk.org Wed Nov 15 14:32:49 2023 From: duke at openjdk.org (Lei Zaakjyu) Date: Wed, 15 Nov 2023 14:32:49 GMT Subject: RFR: JDK-8234502 : Merge GenCollectedHeap and SerialHeap [v6] In-Reply-To: References: Message-ID: > JDK-8234502 : Merge GenCollectedHeap and SerialHeap Lei Zaakjyu has updated the pull request incrementally with one additional commit since the last revision: fix include statements ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16623/files - new: https://git.openjdk.org/jdk/pull/16623/files/bf022a87..1202a7bd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16623&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16623&range=04-05 Stats: 4 lines in 4 files changed: 2 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16623.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16623/head:pull/16623 PR: https://git.openjdk.org/jdk/pull/16623 From thartmann at openjdk.org Wed Nov 15 14:33:31 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 15 Nov 2023 14:33:31 GMT Subject: RFR: 8318480: Deprecate UseCounterDecay and remove CounterDecayMinIntervalLength [v2] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 14:23:50 GMT, Daniel Lund?n wrote: >> This changeset deprecates the leftover (i.e., no longer used for anything) product compiler flag `UseCounterDecay` (requires CSR) and removes the leftover develop flag `CounterDecayMinIntervalLength`. >> >> Changes: >> - Deprecate `UseCounterDecay` in JDK 22, obsolete it in JDK 23, and expire it in JDK 24. The flag is, in fact, already obsolete, so I've also removed it from the source code (except for the definition in `globals.hpp` which must remain until obsoletion). >> - Completely remove `CounterDecayMinIntervalLength`. >> >> ### Testing >> Platforms: windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64. >> - `tier1` >> - HotSpot parts of `tier2` and `tier3` > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Add deprecated to UseCounterDecay description Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16673#pullrequestreview-1732165372 From duke at openjdk.org Wed Nov 15 14:49:30 2023 From: duke at openjdk.org (Thomas Obermeier) Date: Wed, 15 Nov 2023 14:49:30 GMT Subject: RFR: 8319542: Fix boundaries of region to be tested with os::is_readable_range In-Reply-To: <3Yl9BkYAP40CteNLOZCwOkKD5QEZh2yYiAqtTgLeOyI=.bf810bf7-6d0d-4f7e-bded-0e6e304a569c@github.com> References: <3Yl9BkYAP40CteNLOZCwOkKD5QEZh2yYiAqtTgLeOyI=.bf810bf7-6d0d-4f7e-bded-0e6e304a569c@github.com> Message-ID: <3tQgqJFOYJXgtUdmQneOV6X4__ytxFx7yaHjzEYW1L8=.e6ee6284-27f9-4c4b-9746-b8edc0c06207@github.com> On Wed, 15 Nov 2023 13:33:00 GMT, Thomas Obermeier wrote: > PR https://github.com/openjdk/jdk/pull/16381 was already closed when it became obvious that usage of os::is_readable_range() was slightly wrong: > > the " - 1" looks wrong here, because is_readable_range() checks for < to, not <= to. @dean-long : Hi, just wanted to bring this to your attention... Thomas ------------- PR Comment: https://git.openjdk.org/jdk/pull/16676#issuecomment-1812672852 From stuefe at openjdk.org Wed Nov 15 14:50:48 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 15 Nov 2023 14:50:48 GMT Subject: RFR: JDK-8314890: Reduce number of loads for Klass decoding in static code [v12] In-Reply-To: References: Message-ID: > Small change that reduces the number of loads generated by the C++ compiler for a narrow Klass decoding operation (`CompressedKlassPointers::decode_xxx()`. > > Stock: three loads (with two probably sharing a cache line) - UseCompressedClassPointers, encoding base and shift. > > > 8b7b62: 48 8d 05 7f 1b c3 00 lea 0xc31b7f(%rip),%rax # 14e96e8 > 8b7b69: 0f b6 00 movzbl (%rax),%eax > 8b7b6c: 84 c0 test %al,%al > 8b7b6e: 0f 84 9c 00 00 00 je 8b7c10 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> > 8b7b74: 48 8d 15 05 62 c6 00 lea 0xc66205(%rip),%rdx # 151dd80 <_ZN23CompressedKlassPointers6_shiftE> > 8b7b7b: 8b 7b 08 mov 0x8(%rbx),%edi > 8b7b7e: 8b 0a mov (%rdx),%ecx > 8b7b80: 48 8d 15 01 62 c6 00 lea 0xc66201(%rip),%rdx # 151dd88 <_ZN23CompressedKlassPointers5_baseE> > 8b7b87: 48 d3 e7 shl %cl,%rdi > 8b7b8a: 48 03 3a add (%rdx),%rdi > > > Patched: one load loads all three. Since shift occupies the lowest 8 bits, compiled code uses 8bit register; ditto the UseCompressedOops flag. > > > 8ba302: 48 8d 05 97 9c c2 00 lea 0xc29c97(%rip),%rax # 14e3fa0 <_ZN23CompressedKlassPointers6_comboE> > 8ba309: 48 8b 08 mov (%rax),%rcx > 8ba30c: f6 c5 01 test $0x1,%ch # use compressed klass pointers? > 8ba30f: 0f 84 9b 00 00 00 je 8ba3b0 <_ZN10HeapRegion14object_iterateEP13ObjectClosure+0x260> > 8ba315: 8b 7b 08 mov 0x8(%rbx),%edi > 8ba318: 48 d3 e7 shl %cl,%rdi # shift > 8ba31b: 66 31 c9 xor %cx,%cx # zero out lower 16 bits of base > 8ba31e: 48 01 cf add %rcx,%rdi # add base > 8ba321: 8b 4f 08 mov 0x8(%rdi),%ecx > > --- > > Performance measurements: > > G1, doing a full GC over a heap filled with 256 mio life j.l.Object instances. > > I see a reduction of Full Pause times between 1.2% and 5%. I am unsure how reliable these numbers are since, despite my efforts (running tests on isolated CPUs etc.), the standard deviation was quite high at ?4%. Still, in general, numbers seemed to go down rather than up. > > --- > > Future extensions: > > This patch uses the fact that the encoding base is aligned to metaspace reserve alignment (16 Mb). We only use 16 of those 24 bits of alignment shadow and could us... Thomas Stuefe has updated the pull request incrementally with three additional commits since the last revision: - Update src/hotspot/share/oops/compressedKlass.cpp Co-authored-by: Aleksey Shipil?v - Update src/hotspot/share/oops/compressedKlass.cpp Co-authored-by: Aleksey Shipil?v - Update src/hotspot/share/oops/compressedKlass.hpp Co-authored-by: Aleksey Shipil?v ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15389/files - new: https://git.openjdk.org/jdk/pull/15389/files/93b33cd2..e7bb5561 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15389&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15389&range=10-11 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/15389.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15389/head:pull/15389 PR: https://git.openjdk.org/jdk/pull/15389 From aph at openjdk.org Wed Nov 15 15:24:35 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 15 Nov 2023 15:24:35 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: References: Message-ID: <4S3MMvyIj8XWhyaj4m5VGICvXcjRlKRKnRGsycXW9VY=.6ff51e8e-e2bf-476d-a965-2d1282666906@github.com> On Wed, 15 Nov 2023 01:32:00 GMT, Xiaohong Gong wrote: >> Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). >> >> SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. >> >> To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. >> >> Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. >> >> [1] https://github.com/openjdk/jdk/pull/3638 >> [2] https://sleef.org/ >> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ >> [4] https://packages.debian.org/bookworm/libsleef3 >> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Add a bundled native lib in jdk as a bridge to libsleef > - Merge 'jdk:master' into JDK-8312425 > - Disable sleef by default > - Merge 'jdk:master' into JDK-8312425 > - 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF Hi, > The latest commit created a native library as a bridge to the third-party sleef library. Could you please help check whether it's a better solution to fix the hard-coding sleef ABI version issue and the further evolution? That looks rather nice. > It re-defines all the vector math functions and implements them by calling the relative functions in libsleef. The library is bundled into the jdk image. With this way, we doesn't need to hard-code the libsleef ABI version into jdk. And the potential issue caused by the future ABI updating may be catched earlier. That sounds good. We're still rather vulnerable if sleef changes its ABI, but I don't suppose that will happen, and we can deal with it if it does. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1812736558 From redestad at openjdk.org Wed Nov 15 15:35:36 2023 From: redestad at openjdk.org (Claes Redestad) Date: Wed, 15 Nov 2023 15:35:36 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v4] In-Reply-To: <8cuxVznAEsVV5haAaBg0aew7QOc9-iFBMLGFvpuUPtM=.8acbbd32-2000-4e2b-bbbe-b73e6b06839e@github.com> References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> <8cuxVznAEsVV5haAaBg0aew7QOc9-iFBMLGFvpuUPtM=.8acbbd32-2000-4e2b-bbbe-b73e6b06839e@github.com> Message-ID: On Tue, 14 Nov 2023 16:05:51 GMT, Roger Riggs wrote: >> Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. >> The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. >> >> The changes include: >> >> - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate >> if the input array is modified before the constructor returns. >> The resulting string may contain any combination of characters sampled from the input array. >> >> - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. >> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. >> If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. >> If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. >> Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. >> Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. >> >> - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. >> >> - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. > > Roger Riggs has updated the pull request incrementally with one additional commit since the last revision: > > Update PPC implementation of string_compress to return the index of the non-latin1 char > Patch supplied by TheRealMDoerr src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8617: > 8615: lea(dst, Address(dst, tmp5, Address::times_1)); > 8616: subptr(len, tmp5); > 8617: jmpb(copy_chars_loop); This cause a crash if I run with `-XX:UseAVX=3 -XX:AVX3Threshold=0`: # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (macroAssembler_x86.hpp:122), pid=3400452, tid=3400470 # guarantee(this->is8bit(imm8)) failed: Short forward jump exceeds 8-bit offset at :0 # Needs to be a `jmp(copy_chars_loop)`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394375402 From redestad at openjdk.org Wed Nov 15 15:43:43 2023 From: redestad at openjdk.org (Claes Redestad) Date: Wed, 15 Nov 2023 15:43:43 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v4] In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> <8cuxVznAEsVV5haAaBg0aew7QOc9-iFBMLGFvpuUPtM=.8acbbd32-2000-4e2b-bbbe-b73e6b06839e@github.com> Message-ID: On Wed, 15 Nov 2023 15:32:54 GMT, Claes Redestad wrote: >> Roger Riggs has updated the pull request incrementally with one additional commit since the last revision: >> >> Update PPC implementation of string_compress to return the index of the non-latin1 char >> Patch supplied by TheRealMDoerr > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8617: > >> 8615: lea(dst, Address(dst, tmp5, Address::times_1)); >> 8616: subptr(len, tmp5); >> 8617: jmpb(copy_chars_loop); > > This cause a crash if I run with `-XX:UseAVX=3 -XX:AVX3Threshold=0`: > > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (macroAssembler_x86.hpp:122), pid=3400452, tid=3400470 > # guarantee(this->is8bit(imm8)) failed: Short forward jump exceeds 8-bit offset at :0 > # > > > Needs to be a `jmp(copy_chars_loop)`. Alternatively: if (UseSSE42Intrinsics) { jmpb(copy_chars_loop); } else { jmp(copy_chars_loop); } More generally I do wonder if it'd make most sense to make the AVX512 and SSE42 implementations exclusive, though. Especially since we shouldn't mix AVX and SSE code (the code in this intrinsic seem to follow paths which are either/or, but it seems fragile). Perhaps @TobiHartmann can advise? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394386963 From omikhaltcova at openjdk.org Wed Nov 15 15:44:47 2023 From: omikhaltcova at openjdk.org (Olga Mikhaltsova) Date: Wed, 15 Nov 2023 15:44:47 GMT Subject: RFR: 8318158: RISC-V: implement roundD/roundF intrinsics [v6] In-Reply-To: References: Message-ID: > Please, review this Implementation of the roundD/roundF intrinsics for RISC-V platform. > > In the table below it is shown that NaN argument should be processed as a special case. > > RISC-V Java > (FCVT.W.S) (FCVT.L.D) (long round(double a)) (int round(float a)) > Minimum valid input (after rounding) ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE > Maximum valid input (after rounding) 2^31 ? 1 2^63 ? 1 Long.MAX_VALUE Integer.MAX_VALUE > Output for out-of-range negative input ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE > Output for ?? ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE > Output for out-of-range positive input 2^31 ? 1 2^63 - 1 Long.MAX_VALUE Integer.MAX_VALUE > Output for +? 2^31 ? 1 2^63 - 1 Long.MAX_VALUE Integer.MAX_VALUE > Output for NaN 2^31 ? 1 2^63 - 1 0 0 > > The benchmark running with the 2nd fixed implementation on the T-Head RVB-ICE board shows the following performance improvement:: > > **Before** > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.test_round_double 2048 thrpt 15 59.555 0.179 ops/ms > FpRoundingBenchmark.test_round_float 2048 thrpt 15 49.760 0.103 ops/ms > > > **After** > > Benchmark (TESTSIZE) Mode Cnt Score Error Units > FpRoundingBenchmark.test_round_double 2048 thrpt 15 110.956 0.186 ops/ms > FpRoundingBenchmark.test_round_float 2048 thrpt 15 115.947 0.122 ops/ms Olga Mikhaltsova has updated the pull request incrementally with one additional commit since the last revision: Replaced tmp with t0 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16382/files - new: https://git.openjdk.org/jdk/pull/16382/files/630a26b8..fed920ea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16382&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16382&range=04-05 Stats: 18 lines in 2 files changed: 0 ins; 4 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/16382.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16382/head:pull/16382 PR: https://git.openjdk.org/jdk/pull/16382 From omikhaltcova at openjdk.org Wed Nov 15 15:50:34 2023 From: omikhaltcova at openjdk.org (Olga Mikhaltsova) Date: Wed, 15 Nov 2023 15:50:34 GMT Subject: RFR: 8318158: RISC-V: implement roundD/roundF intrinsics [v5] In-Reply-To: References: Message-ID: <0FmWydPWKEOes5XsyDwGFStdP0JR1JehNTGa3FpZZ7w=.1dd2a989-b413-4ba4-ae8d-e8096fabcfd2@github.com> On Tue, 14 Nov 2023 19:34:41 GMT, Hamlin Li wrote: >> Olga Mikhaltsova has updated the pull request incrementally with one additional commit since the last revision: >> >> Used fclass_mask > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4263: > >> 4261: >> 4262: void MacroAssembler::java_round_float(Register dst, FloatRegister src, >> 4263: FloatRegister ftmp, Register tmp) { > > Can we remove the `tmp` parameter here, and use `t0` directly in java_round_float/double? > As it's more clear, and in fact in round_float/double_reg it does not allocate a register indeed, and `assert_different_registers ` can be removed too. Fixed. Thx! I'm agree, it doesn't make sense to declare `tmp` here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16382#discussion_r1394397758 From aph at openjdk.org Wed Nov 15 15:55:35 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 15 Nov 2023 15:55:35 GMT Subject: RFR: 8319801: Recursive lightweight locking: aarch64 implementation In-Reply-To: References: Message-ID: <-6ooFpvvY-YRB8Q1d_jwkWq7m43-6q8LIVTj_Sm23fQ=.93577a6f-cfd2-4fd1-854b-6ad5928e4ab9@github.com> On Wed, 15 Nov 2023 09:57:40 GMT, Andrew Haley wrote: > We have found on some hardware the LSE instructions have terrible performance in the un-contended case. Thinking about this some more, it's a very bad situation. In general, across the whole JVM, when we're doing a CAS we don't know if contention is likely. GCC/glibc make the guess that it's probably best always to use the LSE instructions. We do the same thing. Any implementation which implements LSE badly in the uncontended fast path is going to suffer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16608#issuecomment-1812793921 From duke at openjdk.org Wed Nov 15 15:55:41 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Wed, 15 Nov 2023 15:55:41 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v2] In-Reply-To: <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> References: <9i5yHmpRi3-XqL5lw0-0IexhCDr2FOi5nT4dgY7cWao=.ab8a1d6e-c9fc-4108-820b-374ce7815463@github.com> <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> Message-ID: <_96DvqVXj75b9Mmz3pewifXTiv6wPr3-IsWBCaxAbs0=.c4227f48-a537-438c-9c36-78c2fa3b79c8@github.com> On Tue, 14 Nov 2023 15:49:48 GMT, Hamlin Li wrote: >> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: >> >> Minor cosmetic fixes. > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1477: > >> 1475: case T_SHORT: BLOCK_COMMENT("arrays_hashcode(short) {"); break; >> 1476: case T_INT: BLOCK_COMMENT("arrays_hashcode(int) {"); break; >> 1477: default: BLOCK_COMMENT("arrays_hashcode {"); break; > > In `C2_MacroAssembler::arrays_hashcode_elsize`, default action is `ShouldNotReachHere();`, should it be consistent here? Sure, thanks for catching! > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1513: > >> 1511: bind(WIDE_LOOP); >> 1512: DO_ELEMENT_LOAD(tmp1, 0) >> 1513: DO_ELEMENT_LOAD(tmp3, 1) > > Would it help to optimize the perf by moving `DO_ELEMENT_LOAD(tmp3, 1)` after `srli(tmp2, pow31_3_4, 32);`? Sure, it makes the code more understandable even though it doesn't improve performance. > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1523: > >> 1521: DO_ELEMENT_LOAD(tmp3, 3) >> 1522: srli(tmp2, pow31_1_2, 32); >> 1523: mulw(tmp1, tmp1, tmp2); // 31^^1 * ary[i+2] > > Could this line be optimized as `x<<5-x`? Just as TAIL_LOOP below. Sure, good catch! > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1527: > >> 1525: addw(result, result, tmp3); // 31^^4 * h + 31^^3 * ary[i+0] + 31^^2 * ary[i+1] >> 1526: // + 31^^1 * ary[i+2] + 31^^0 * ary[i+3] >> 1527: subw(chunk, chunk, stride); > > Could chunk and ary be merged into one variable? so we don't need one sub and one add, but only one add here. Could you please carify? I don't understand how that's possible. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1394403870 PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1394400845 PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1394401160 PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1394402664 From duke at openjdk.org Wed Nov 15 16:00:37 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Wed, 15 Nov 2023 16:00:37 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v2] In-Reply-To: <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> References: <9i5yHmpRi3-XqL5lw0-0IexhCDr2FOi5nT4dgY7cWao=.ab8a1d6e-c9fc-4108-820b-374ce7815463@github.com> <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> Message-ID: <27QmHbAYQxR1dSZFT2lVHIJ4T1kG8QN-OaSqDhrV3Cg=.f9d26f43-5f43-4785-a795-21622f00d8fb@github.com> On Tue, 14 Nov 2023 15:40:46 GMT, Hamlin Li wrote: >> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: >> >> Minor cosmetic fixes. > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1512: > >> 1510: >> 1511: bind(WIDE_LOOP); >> 1512: DO_ELEMENT_LOAD(tmp1, 0) > > Can you add `;` at the end of the statement? similar comments for other DO_ELEMENT_LOAD's Agreed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1394412176 From rgiulietti at openjdk.org Wed Nov 15 16:00:50 2023 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Wed, 15 Nov 2023 16:00:50 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v4] In-Reply-To: <8cuxVznAEsVV5haAaBg0aew7QOc9-iFBMLGFvpuUPtM=.8acbbd32-2000-4e2b-bbbe-b73e6b06839e@github.com> References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> <8cuxVznAEsVV5haAaBg0aew7QOc9-iFBMLGFvpuUPtM=.8acbbd32-2000-4e2b-bbbe-b73e6b06839e@github.com> Message-ID: On Tue, 14 Nov 2023 16:05:51 GMT, Roger Riggs wrote: >> Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. >> The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. >> >> The changes include: >> >> - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate >> if the input array is modified before the constructor returns. >> The resulting string may contain any combination of characters sampled from the input array. >> >> - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. >> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. >> If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. >> If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. >> Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. >> Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. >> >> - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. >> >> - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. > > Roger Riggs has updated the pull request incrementally with one additional commit since the last revision: > > Update PPC implementation of string_compress to return the index of the non-latin1 char > Patch supplied by TheRealMDoerr test/jdk/java/lang/String/StringRacyConstructor.java line 110: > 108: for (int i = 0; i < orig.length(); i++) > 109: accum |= orig.charAt(i); > 110: byte expectedCoder = (accum < 256) ? LATIN1 : UTF16; I think this assumes that compact strings are enabled during this test. test/jdk/java/lang/String/StringRacyConstructor.java line 119: > 117: for (int i = 0; i < orig.length(); i++) > 118: accum |= orig.charAt(i); > 119: byte expectedCoder = (accum < 256) ? LATIN1 : UTF16; Same as above. test/jdk/java/lang/String/StringRacyConstructor.java line 190: > 188: if (printWarningCount == 0) { > 189: printWarningCount = 1; > 190: System.out.println("StringUTF16.compress returned 0, may not be intrinsic"); It seems to me that the Java code for `StringUTF16.compress` also returns the index of the non-latin1 char, so I'm not sure I understand this. Just caution? test/jdk/java/lang/String/StringRacyConstructor.java line 199: > 197: // Exhaustively check compress returning the correct index of the non-latin1 char. > 198: final int SIZE = 48; > 199: final byte FILL_BYTE = 0x52; Suggestion: final byte FILL_BYTE = 'R'; This makes it more clear that `FILL_BYTE != 'A'` for the logic below. test/jdk/java/lang/String/StringRacyConstructor.java line 258: > 256: */ > 257: public static String racyStringConstruction(String original) throws ConcurrentModificationException { > 258: if (original.chars().max().orElseThrow() > 256) { Suggestion: if (original.chars().max().getAsInt() >= 256) { test/jdk/java/lang/String/StringRacyConstructor.java line 288: > 286: } > 287: if (i >= 1_000_000) { > 288: System.err.printf("Unable to produce a UTF16 string in %d iterations: %s%n", i, original); AFAIU, this writes to `System.err` on "success". Is this the intent? test/jdk/java/lang/String/StringRacyConstructor.java line 300: > 298: */ > 299: public static String racyStringConstructionCodepoints(String original) throws ConcurrentModificationException { > 300: if (original.chars().max().orElseThrow() > 256) { Suggestion: if (original.chars().max().getAsInt() >= 256) { test/jdk/java/lang/String/StringRacyConstructor.java line 347: > 345: */ > 346: public static String racyStringConstructionCodepointsSurrogates(String original) throws ConcurrentModificationException { > 347: if (original.chars().max().orElseThrow() > 256) { Suggestion: if (original.chars().max().getAsInt() >= 256) { test/jdk/java/lang/String/StringRacyConstructor.java line 400: > 398: @Override > 399: public int length() { > 400: return aString.length() + 1; Not sure why ` + 1`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394362549 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394363301 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394378639 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394384444 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394364212 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394411147 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394364741 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394367209 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394368361 From duke at openjdk.org Wed Nov 15 16:07:31 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Wed, 15 Nov 2023 16:07:31 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v2] In-Reply-To: <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> References: <9i5yHmpRi3-XqL5lw0-0IexhCDr2FOi5nT4dgY7cWao=.ab8a1d6e-c9fc-4108-820b-374ce7815463@github.com> <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> Message-ID: On Tue, 14 Nov 2023 15:41:04 GMT, Hamlin Li wrote: >> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: >> >> Minor cosmetic fixes. > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1540: > >> 1538: addw(result, result, tmp1); // result = result + ary[i] >> 1539: subw(cnt, cnt, 1); >> 1540: add(ary, ary, elsize); > > Similar comment for cnt and ary as chunk and ary above. As above, please advice how to do that. IIUC, that's possible with INDEX-REG addressing which is absent in RISC-V. :-( ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1394424021 From matsaave at openjdk.org Wed Nov 15 16:44:15 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 15 Nov 2023 16:44:15 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v13] In-Reply-To: References: Message-ID: > The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. > > This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. > > Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. > > To streamline the review, please consider these major areas that have been changed: > 1. ResolvedMethodEntry class > 2. Rewriter for initialization of the structure > 3. cpCache for resolution > 4. InterpreterRuntime, linkResolver, and templateTable > 5. JVMCI > 6. SA > > Verified with tier 1-9 tests. > > This change supports the following platforms: x86, aarch64, RISCV, PPC Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: S390 port ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15455/files - new: https://git.openjdk.org/jdk/pull/15455/files/b653baa6..2cf0b162 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15455&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15455&range=11-12 Stats: 400 lines in 7 files changed: 129 ins; 177 del; 94 mod Patch: https://git.openjdk.org/jdk/pull/15455.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15455/head:pull/15455 PR: https://git.openjdk.org/jdk/pull/15455 From matsaave at openjdk.org Wed Nov 15 16:44:18 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 15 Nov 2023 16:44:18 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v10] In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 22:48:59 GMT, Matias Saavedra Silva wrote: >> Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: >> >> - PPC port >> - Improved load_resolved_method_entry_handle on x86 and aarch64 > >> I have a version which works for PPC64: [TheRealMDoerr at 6bff392](https://github.com/TheRealMDoerr/jdk/commit/6bff39224e3129a898711a392b64c38b331d79a2) >> >> Note that I have implemented a few things slightly differently: >> >> * `TemplateTable::load_resolved_method_entry_handle`: I'm loading the method at the end which avoids pushing and popping it on the expression stack which is not so nice IMHO. This works because I'm using a non-volatile register (asserted) for `cache` which is still valid after the C-call in `resolve_oop_handle`. >> >> * `TemplateTable::load_resolved_method_entry_interface` and `TemplateTable::load_resolved_method_entry_virtual`: I'm not putting values in registers depending on the flags because it doesn't fit nicely into the PPC64 design. I found myself scratching my head and thinking about what is in the register in which case. Instead of that, I'm loading the fields where they are needed which leads to a much cleaner design. I always know what is in which register this way. >> >> >> Please take a look and take these differences into consideration for other platforms. Thanks! > > Thank you for the port! I liked your recommendation with regards to invokehandle and added that change to x86 and aarch64 as well. > Hi @matias9927, Please add s390-Port from here: [offamitkumar at 3f70174](https://github.com/offamitkumar/jdk/commit/3f7017467b1a4ae8fe70530c7183c2667cf2c7f2) Thank you for the port Amit! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15455#issuecomment-1812883015 From lmesnik at openjdk.org Wed Nov 15 17:13:32 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 15 Nov 2023 17:13:32 GMT Subject: RFR: 8318480: Deprecate UseCounterDecay and remove CounterDecayMinIntervalLength [v2] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 14:23:50 GMT, Daniel Lund?n wrote: >> This changeset deprecates the leftover (i.e., no longer used for anything) product compiler flag `UseCounterDecay` (requires CSR) and removes the leftover develop flag `CounterDecayMinIntervalLength`. >> >> Changes: >> - Deprecate `UseCounterDecay` in JDK 22, obsolete it in JDK 23, and expire it in JDK 24. The flag is, in fact, already obsolete, so I've also removed it from the source code (except for the definition in `globals.hpp` which must remain until obsoletion). >> - Completely remove `CounterDecayMinIntervalLength`. >> >> ### Testing >> Platforms: windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64. >> - `tier1` >> - HotSpot parts of `tier2` and `tier3` > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Add deprecated to UseCounterDecay description Please update copyrgiths. ------------- Marked as reviewed by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16673#pullrequestreview-1732521272 From mli at openjdk.org Wed Nov 15 17:16:35 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 15 Nov 2023 17:16:35 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v2] In-Reply-To: <_96DvqVXj75b9Mmz3pewifXTiv6wPr3-IsWBCaxAbs0=.c4227f48-a537-438c-9c36-78c2fa3b79c8@github.com> References: <9i5yHmpRi3-XqL5lw0-0IexhCDr2FOi5nT4dgY7cWao=.ab8a1d6e-c9fc-4108-820b-374ce7815463@github.com> <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> <_96DvqVXj75b9Mmz3pewifXTiv6wPr3-IsWBCaxAbs0=.c4227f48-a537-438c-9c36-78c2fa3b79c8@github.com> Message-ID: On Wed, 15 Nov 2023 15:51:50 GMT, Yuri Gaevsky wrote: >> src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1527: >> >>> 1525: addw(result, result, tmp3); // 31^^4 * h + 31^^3 * ary[i+0] + 31^^2 * ary[i+1] >>> 1526: // + 31^^1 * ary[i+2] + 31^^0 * ary[i+3] >>> 1527: subw(chunk, chunk, stride); >> >> Could chunk and ary be merged into one variable? so we don't need one sub and one add, but only one add here. > > Could you please clarify? I don't understand how that's possible. chunk is only used to tell if the wide loop is done, which can be done by ary too. And as subw of chunk and addi of ary is in a loop which could be a long one, so better to reduce the instructions in the loop. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1394510370 From coleenp at openjdk.org Wed Nov 15 17:20:41 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 15 Nov 2023 17:20:41 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v3] In-Reply-To: <2MRTHFoYSaSW2NH922LOEvqKx4NLjshWaHJaYV2RdVY=.e234046a-aac8-4d7b-81b9-269506944165@github.com> References: <2MRTHFoYSaSW2NH922LOEvqKx4NLjshWaHJaYV2RdVY=.e234046a-aac8-4d7b-81b9-269506944165@github.com> Message-ID: On Tue, 14 Nov 2023 13:34:49 GMT, Axel Boldt-Christmas wrote: >> LM_LIGHTWEIGHT only uses the lock bits for its locking. This leaves the hashCode bits free when a monitor is not inflated. So instead of inflating when installing the hashCode on a fast locked object it can simply use the hashCode bits in the markWord. >> >> The mark word transitions Unlocked (0b01) <=> Locked (0b00) are done by retrying the CAS if it fails due to non-lock bit changes. >> The mark word transitions Monitor (0b10) <=> Locked/Unlocked (0b0X) are the same as before, inflation already handles hash codes. This change does not interact with the mark word if it is in a Monitor (0b10) state, so the strong CAS which is used for deflation are still valid, and will not fail to any other reason than the cooperative race to help transition the mark word during deflation. >> >> This is dependent on JDK-8319778 simply because JDK-8319797 is dependent on both this and JDK-8319778. > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge remote-tracking branch 'upstream_jdk/pr/16602' into JDK-8319773 > - Simplify test. > - 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT src/hotspot/share/runtime/synchronizer.cpp line 966: > 964: // Fall thru so we only have one place that installs the hash in > 965: // the ObjectMonitor. > 966: } else if (LockingMode == LM_LIGHTWEIGHT && mark.is_fast_locked() && is_lock_owned(current, obj)) { You can delete the is_lock_owned() function just above this with this patch. Edit: nope, never mind, I just found another caller. edit2: but you could move it down to before 'inflate' ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16603#discussion_r1394514252 From coleenp at openjdk.org Wed Nov 15 17:24:45 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 15 Nov 2023 17:24:45 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v3] In-Reply-To: References: <2MRTHFoYSaSW2NH922LOEvqKx4NLjshWaHJaYV2RdVY=.e234046a-aac8-4d7b-81b9-269506944165@github.com> Message-ID: On Wed, 15 Nov 2023 17:15:36 GMT, Coleen Phillimore wrote: >> Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge remote-tracking branch 'upstream_jdk/pr/16602' into JDK-8319773 >> - Simplify test. >> - 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT > > src/hotspot/share/runtime/synchronizer.cpp line 966: > >> 964: // Fall thru so we only have one place that installs the hash in >> 965: // the ObjectMonitor. >> 966: } else if (LockingMode == LM_LIGHTWEIGHT && mark.is_fast_locked() && is_lock_owned(current, obj)) { > > You can delete the is_lock_owned() function just above this with this patch. > > Edit: nope, never mind, I just found another caller. edit2: but you could move it down to before 'inflate' Also I think the comment before this function is wrong. We can inflate for deoptimization. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16603#discussion_r1394521859 From evergizova at openjdk.org Wed Nov 15 17:26:33 2023 From: evergizova at openjdk.org (Ekaterina Vergizova) Date: Wed, 15 Nov 2023 17:26:33 GMT Subject: RFR: 8314220: Configurable InlineCacheBuffer size [v2] In-Reply-To: References: Message-ID: On Tue, 14 Nov 2023 03:21:25 GMT, Dean Long wrote: >> It is needed to avoid 'failed: _buffer_size not aligned' crashes on debug builds: >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/stubs.cpp#L221 > > That sounds like a bug. We already align the code_begin(). I see no reason to align code_end() as well. It just wastes space. OK, can I remove these guarantees for _buffer_size and _buffer_limit in this PR? Or do I need to create a separate one? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15271#discussion_r1394523889 From vlivanov at openjdk.org Wed Nov 15 17:45:36 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 15 Nov 2023 17:45:36 GMT Subject: RFR: 8267532: C2: Profile and prune untaken exception handlers [v5] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 04:08:38 GMT, Jorn Vernee wrote: > So, how do you want to move forward? Should I attempt to re-implement the current patch to prune infrequent calls instead? I'm fine with the approach chosen in this patch. Still not done with the review though. Please, file an RFE to explore pruning of unreached call sites. > Missing profiling would be bad, as in that case we'd always try to prune the exception handler. i.e. it's not just a missed optimization. Yes, pathological recompilation is another scenario to consider. You can sprinkle `Compile::too_many_traps` checks (both as asserts and product checks) to ensure profiling information is up-to-date. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1812983248 From vlivanov at openjdk.org Wed Nov 15 18:05:39 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 15 Nov 2023 18:05:39 GMT Subject: RFR: 8267532: C2: Profile and prune untaken exception handlers [v5] In-Reply-To: References: Message-ID: <9UcZgE0ap4Kh9yPY07ItXgYN6GTrUB-r-QMHHmM3i-Q=.cdb9df51-62a5-4760-a085-26b4bfb17411@github.com> On Wed, 15 Nov 2023 04:06:22 GMT, Jorn Vernee wrote: > The issue occurs for OSR compilations where the monitorenter is before the loop (outside of the compiled code) ... Thanks for the clarifications. Spotted another inconsistency: C1 doesn't set `has_monitor` on `monitorexit` while C2 does. So, seems like C1 is also affected (even without branch pruning). Please, file a separate bug for it. The code in question is used in verification code and available only in debug VM. If the failures block this patch, you can comment out relevant asserts. They'll be re-enabled as part of the proper fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1813010083 From dcubed at openjdk.org Wed Nov 15 19:03:52 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 15 Nov 2023 19:03:52 GMT Subject: RFR: 8319961: JvmtiEnvBase doesn't zero _ext_event_callbacks [v2] In-Reply-To: References: Message-ID: On Tue, 14 Nov 2023 08:18:41 GMT, Roman Marchenko wrote: >> Zero'ing memory of extension event callbacks > > Roman Marchenko has updated the pull request incrementally with one additional commit since the last revision: > > Fixing review comments Sure. Here's the relevant sub-section from the "OpenJDK Developers? Guide": https://openjdk.org/guide/index.html#life-of-a-pr Get the required reviews At least one Reviewer knowledgeable in each area being changed must approve every change. Some areas (e.g. Client and HotSpot) require two reviewers in most cases, so be sure to read the relevant OpenJDK group pages for advice or ask your sponsor. Be open to comments and polite in replies. Remember that the reviewer wants to improve the world just as much as you do, only in a slightly different way. If you don?t understand some comment, ask the reviewer to clarify. Accept authority when applicable. If you?re making changes in an area where you?re not the area expert, acknowledge that your reviewers may be. Take their advice seriously, even if it is to not make the change. There are many reasons [why a change may get rejected](https://openjdk.org/guide/index.html#why-is-my-change-rejected). And you did read the section [Things to consider before changing OpenJDK code], right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16647#issuecomment-1813094911 From matsaave at openjdk.org Wed Nov 15 19:08:05 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 15 Nov 2023 19:08:05 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v13] In-Reply-To: References: Message-ID: <9usHIyM1snMGi49cBVnd63nvJESA1PqIDkCCwaj7d6U=.fe26d158-8c05-4335-8d16-167b8652238a@github.com> On Wed, 15 Nov 2023 16:44:15 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64, RISCV, PPC > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > S390 port I believe this PR has reached a stage where it can be safely committed. Thank you to all the reviewers for your excellent feedback and thank you to all porters for your contributions! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15455#issuecomment-1812892367 From matsaave at openjdk.org Wed Nov 15 19:08:09 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 15 Nov 2023 19:08:09 GMT Subject: Integrated: 8301997: Move method resolution information out of the cpCache In-Reply-To: References: Message-ID: <_HaZj9YufRlCQ32SRslebjpGWLikUc0Nitc3cKp-MhU=.bc441584-8271-4014-b880-23d19cddb314@github.com> On Mon, 28 Aug 2023 19:49:23 GMT, Matias Saavedra Silva wrote: > The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. > > This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. > > Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. > > To streamline the review, please consider these major areas that have been changed: > 1. ResolvedMethodEntry class > 2. Rewriter for initialization of the structure > 3. cpCache for resolution > 4. InterpreterRuntime, linkResolver, and templateTable > 5. JVMCI > 6. SA > > Verified with tier 1-9 tests. > > This change supports the following platforms: x86, aarch64, RISCV, PPC This pull request has now been integrated. Changeset: ffa35d8c Author: Matias Saavedra Silva URL: https://git.openjdk.org/jdk/commit/ffa35d8cf181cfbcb54497e997dbd18a9b62b97e Stats: 4055 lines in 79 files changed: 1308 ins; 2083 del; 664 mod 8301997: Move method resolution information out of the cpCache Co-authored-by: Gui Cao Co-authored-by: Fei Yang Co-authored-by: Martin Doerr Co-authored-by: Amit Kumar Reviewed-by: coleenp, adinn, fparain ------------- PR: https://git.openjdk.org/jdk/pull/15455 From coleenp at openjdk.org Wed Nov 15 19:08:05 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 15 Nov 2023 19:08:05 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v13] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 16:44:15 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64, RISCV, PPC > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > S390 port Does it need another approval. If so, here it is. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15455#pullrequestreview-1732732773 From duke at openjdk.org Wed Nov 15 19:24:51 2023 From: duke at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 15 Nov 2023 19:24:51 GMT Subject: RFR: 8318480: Deprecate UseCounterDecay and remove CounterDecayMinIntervalLength [v3] In-Reply-To: References: Message-ID: > This changeset deprecates the leftover (i.e., no longer used for anything) product compiler flag `UseCounterDecay` (requires CSR) and removes the leftover develop flag `CounterDecayMinIntervalLength`. > > Changes: > - Deprecate `UseCounterDecay` in JDK 22, obsolete it in JDK 23, and expire it in JDK 24. The flag is, in fact, already obsolete, so I've also removed it from the source code (except for the definition in `globals.hpp` which must remain until obsoletion). > - Completely remove `CounterDecayMinIntervalLength`. > > ### Testing > Platforms: windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64. > - `tier1` > - HotSpot parts of `tier2` and `tier3` Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Update copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16673/files - new: https://git.openjdk.org/jdk/pull/16673/files/cd3343fe..61e0a104 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16673&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16673&range=01-02 Stats: 17 lines in 17 files changed: 0 ins; 0 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/16673.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16673/head:pull/16673 PR: https://git.openjdk.org/jdk/pull/16673 From duke at openjdk.org Wed Nov 15 19:24:51 2023 From: duke at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 15 Nov 2023 19:24:51 GMT Subject: RFR: 8318480: Deprecate UseCounterDecay and remove CounterDecayMinIntervalLength [v2] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 17:10:34 GMT, Leonid Mesnik wrote: > Please update copyrgiths. Right, thanks @lmesnik. Updated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16673#issuecomment-1813122046 From duke at openjdk.org Wed Nov 15 20:23:29 2023 From: duke at openjdk.org (Lei Zaakjyu) Date: Wed, 15 Nov 2023 20:23:29 GMT Subject: RFR: JDK-8234502 : Merge GenCollectedHeap and SerialHeap [v6] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 14:32:49 GMT, Lei Zaakjyu wrote: >> JDK-8234502 : Merge GenCollectedHeap and SerialHeap > > Lei Zaakjyu has updated the pull request incrementally with one additional commit since the last revision: > > fix include statements done! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16623#issuecomment-1813204231 From prr at openjdk.org Wed Nov 15 20:28:35 2023 From: prr at openjdk.org (Phil Race) Date: Wed, 15 Nov 2023 20:28:35 GMT Subject: RFR: JDK-8313764: Offer JVM HS functionality to shared lib load operations done by the JDK codebase [v2] In-Reply-To: References: Message-ID: On Wed, 23 Aug 2023 15:18:03 GMT, Matthias Baesken wrote: >> Currently there is a number of functionality that would be interesting to have for shared lib load operations in the JDK C code. >> Some examples : >> Events::log_dll_message for hs-err files reporting >> JFR event NativeLibraryLoad >> There is the need to update the shared lib Cache on AIX ( see LoadedLibraries::reload() , see also https://bugs.openjdk.org/browse/JDK-8314152 ), >> this is currently not fully in sync with libs loaded form jdk c-libs and sometimes reports outdated information >> >> Offer an interface (e.g. jvm.cpp) to support this. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > windows aarch64 build issues src/java.desktop/unix/native/common/awt/fontpath.c line 53: > 51: /* for dlopen */ > 52: #include > 53: I have no idea why we would want to use some VM internal in the desktop module. Please take the desktop module changes out of this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15264#discussion_r1394756424 From dlong at openjdk.org Wed Nov 15 21:11:41 2023 From: dlong at openjdk.org (Dean Long) Date: Wed, 15 Nov 2023 21:11:41 GMT Subject: RFR: 8267532: C2: Profile and prune untaken exception handlers [v5] In-Reply-To: <9UcZgE0ap4Kh9yPY07ItXgYN6GTrUB-r-QMHHmM3i-Q=.cdb9df51-62a5-4760-a085-26b4bfb17411@github.com> References: <9UcZgE0ap4Kh9yPY07ItXgYN6GTrUB-r-QMHHmM3i-Q=.cdb9df51-62a5-4760-a085-26b4bfb17411@github.com> Message-ID: On Wed, 15 Nov 2023 18:03:02 GMT, Vladimir Ivanov wrote: > Spotted another inconsistency: C1 doesn't set has_monitor on monitorexit while C2 does. So, seems like C1 is also affected (even without branch pruning). Is it possible for C1 to see a monitorexit but not the earlier monitorenter? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1813263731 From rriggs at openjdk.org Wed Nov 15 22:15:54 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Wed, 15 Nov 2023 22:15:54 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v5] In-Reply-To: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: > Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. > The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. > > The changes include: > > - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate > if the input array is modified before the constructor returns. > The resulting string may contain any combination of characters sampled from the input array. > > - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. > For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. > If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. > If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. > Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. > Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. > > - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. > > - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. Roger Riggs has updated the pull request incrementally with two additional commits since the last revision: - Cleanup of test with review comment recommendations - Enable racy constructor tests iff COMPACT_STRINGS is true Test of string_compress intrinsic is always enabled ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16425/files - new: https://git.openjdk.org/jdk/pull/16425/files/08f365f9..b84d09db Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16425&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16425&range=03-04 Stats: 30 lines in 1 file changed: 23 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/16425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16425/head:pull/16425 PR: https://git.openjdk.org/jdk/pull/16425 From rriggs at openjdk.org Wed Nov 15 22:15:58 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Wed, 15 Nov 2023 22:15:58 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v4] In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> <8cuxVznAEsVV5haAaBg0aew7QOc9-iFBMLGFvpuUPtM=.8acbbd32-2000-4e2b-bbbe-b73e6b06839e@github.com> Message-ID: On Wed, 15 Nov 2023 15:23:48 GMT, Raffaello Giulietti wrote: >> Roger Riggs has updated the pull request incrementally with one additional commit since the last revision: >> >> Update PPC implementation of string_compress to return the index of the non-latin1 char >> Patch supplied by TheRealMDoerr > > test/jdk/java/lang/String/StringRacyConstructor.java line 110: > >> 108: for (int i = 0; i < orig.length(); i++) >> 109: accum |= orig.charAt(i); >> 110: byte expectedCoder = (accum < 256) ? LATIN1 : UTF16; > > I think this assumes that compact strings are enabled during this test. Correct, the test should be enabled iff COMPACT_STRINGS is true. > test/jdk/java/lang/String/StringRacyConstructor.java line 190: > >> 188: if (printWarningCount == 0) { >> 189: printWarningCount = 1; >> 190: System.out.println("StringUTF16.compress returned 0, may not be intrinsic"); > > It seems to me that the Java code for `StringUTF16.compress` also returns the index of the non-latin1 char, so I'm not sure I understand this. Just caution? There are separate implementations of the intrinsic for each platform. This test would help identify the platforms on which the intrinsic had not been updated to return the index instead of zero. > test/jdk/java/lang/String/StringRacyConstructor.java line 288: > >> 286: } >> 287: if (i >= 1_000_000) { >> 288: System.err.printf("Unable to produce a UTF16 string in %d iterations: %s%n", i, original); > > AFAIU, this writes to `System.err` on "success". Is this the intent? System.out is more appropriate for an informational message that a great many attempts had been made to produce an unexpected coder without success. > test/jdk/java/lang/String/StringRacyConstructor.java line 400: > >> 398: @Override >> 399: public int length() { >> 400: return aString.length() + 1; > > Not sure why ` + 1`. Removed; it was a leftover from a prior way to throw an exception during `CharSequence.charAt(n)`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394889724 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394885799 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394892328 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1394883379 From vlivanov at openjdk.org Wed Nov 15 22:28:36 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 15 Nov 2023 22:28:36 GMT Subject: RFR: 8267532: C2: Profile and prune untaken exception handlers [v5] In-Reply-To: References: <9UcZgE0ap4Kh9yPY07ItXgYN6GTrUB-r-QMHHmM3i-Q=.cdb9df51-62a5-4760-a085-26b4bfb17411@github.com> Message-ID: On Wed, 15 Nov 2023 21:08:23 GMT, Dean Long wrote: > Is it possible for C1 to see a monitorexit but not the earlier monitorenter? Hm, good question, Dean. I was under impression that there are no guarantees C1 visits all reachable bytecodes of a method during parsing, so `monitorenter` can be missed. But after reexamining `GraphBuilder` implementation I don't think it is possible. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1813359774 From dlong at openjdk.org Wed Nov 15 22:46:38 2023 From: dlong at openjdk.org (Dean Long) Date: Wed, 15 Nov 2023 22:46:38 GMT Subject: RFR: 8267532: C2: Profile and prune untaken exception handlers [v7] In-Reply-To: <4viUZ8xgGyoVMs8nwClL38FZPfa1P1jw_eVwZfAXbfI=.4473faad-5a60-4cc6-857b-253ac8c70648@github.com> References: <4viUZ8xgGyoVMs8nwClL38FZPfa1P1jw_eVwZfAXbfI=.4473faad-5a60-4cc6-857b-253ac8c70648@github.com> Message-ID: On Wed, 15 Nov 2023 02:43:12 GMT, Jorn Vernee wrote: >> The issue is essentially that for the Java try-with-resource construct, javac generates multiple calls to `close(`) the resource. One of those calls is inside the hidden exception handler of the try block. The issue for us is that typically the exception handler is never entered (since no exception is thrown), however we don't profile exception handlers at the moment, so the block is not pruned. C2 doesn't inline the `close()` call in the handler due to low call site frequency. As a result, the receiver of that call escapes and can not be scalar replaced, which then leads to a loss in performance. >> >> There has been some discussion on the JBS issue that this could be fixed by profiling catch blocks. And another suggestion that partial escape analysis could help here to prevent the object from escaping. But, I think there are other benefits to being able to prune dead catch blocks, such as general reduction in code size, and other optimizations being possible by dead code being eliminated. So, I've implemented catch block profiling + pruning in this patch. >> >> The implementation is essentially very straightforward: we allocate an extra bit of profiling data for each >> exception handler of a method in the `MethodData` for that method (which holds all the profiling >> data). Then when looking up the exception handler after an exception is thrown, we mark the >> exception handler as entered. When C2 parses the exception handler block, and it sees that it has >> never been entered, we emit an uncommon trap instead. >> >> I've also cleaned up the handling of profiling data sections a bit. After adding the extra section of data to MethodData, I was seeing several crashes when ciMethodData was used. The underlying issue seemed to be that the offset of the parameter data was computed based on the total data size - parameter data size (which doesn't work if we add an additional section for exception handler data). I've re-written the code around this a bit to try and prevent issues in the future. Both MethodData and ciMethodData now track offsets of parameter data and exception handler data, and the size of the each data section is derived from the offsets. >> >> Finally, there was an assert firing in `freeze_internal` in `continuationFreezeThaw.cpp`: >> >> assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), >> "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int... > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > Only use ProfileExceptionHandlers I had a similar concern and also convinced myself it's not a problem for C1. If it was, we should have seen an assert in the loom code that depends on has_monitor(). ------------- PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1813383187 From dlong at openjdk.org Wed Nov 15 22:48:30 2023 From: dlong at openjdk.org (Dean Long) Date: Wed, 15 Nov 2023 22:48:30 GMT Subject: RFR: 8314220: Configurable InlineCacheBuffer size [v2] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 17:23:33 GMT, Ekaterina Vergizova wrote: >> That sounds like a bug. We already align the code_begin(). I see no reason to align code_end() as well. It just wastes space. > > OK, can I remove these guarantees for _buffer_size and _buffer_limit in this PR? Or do I need to create a separate one? I would like to see it cleaned up in this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15271#discussion_r1394945675 From pchilanomate at openjdk.org Wed Nov 15 22:56:35 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 15 Nov 2023 22:56:35 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v6] In-Reply-To: References: <9a-07LkAZPw_SiOpo1uO8uA9i66AVNi9OkJjNZfSHU0=.8f4feee9-2b76-4006-a5ca-f0ce2d71c6b0@github.com> Message-ID: On Wed, 15 Nov 2023 08:44:17 GMT, Robbin Ehn wrote: > @pchilano can you have look ? > I will. I might not finish the review until next week though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1813395423 From dlong at openjdk.org Wed Nov 15 22:58:30 2023 From: dlong at openjdk.org (Dean Long) Date: Wed, 15 Nov 2023 22:58:30 GMT Subject: RFR: 8319542: Fix boundaries of region to be tested with os::is_readable_range In-Reply-To: <3Yl9BkYAP40CteNLOZCwOkKD5QEZh2yYiAqtTgLeOyI=.bf810bf7-6d0d-4f7e-bded-0e6e304a569c@github.com> References: <3Yl9BkYAP40CteNLOZCwOkKD5QEZh2yYiAqtTgLeOyI=.bf810bf7-6d0d-4f7e-bded-0e6e304a569c@github.com> Message-ID: <8V0jY9ERgYmaY-54tOkYL_BKODPgr5v25EEx8Zi0M9k=.fa60708e-e3a3-48c9-ad8c-3fc585f3b154@github.com> On Wed, 15 Nov 2023 13:33:00 GMT, Thomas Obermeier wrote: > PR https://github.com/openjdk/jdk/pull/16381 was already closed when it became obvious that usage of os::is_readable_range() was slightly wrong: > > the " - 1" looks wrong here, because is_readable_range() checks for < to, not <= to. Looks good. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16676#pullrequestreview-1733184412 From jjoo at openjdk.org Wed Nov 15 23:43:57 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Wed, 15 Nov 2023 23:43:57 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v43] In-Reply-To: References: Message-ID: <0PG8RL0P8m4dbQe-gmltDKvcQYPJvUvUshV-anYvFFM=.41402e8d-c895-43b9-9567-036723afcabc@github.com> > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with two additional commits since the last revision: - Move vm and conc_dedup counters to cpuTimeCounters class - Revert test changes and fix whitespace issues ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/189d1852..4db8f09f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=42 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=41-42 Stats: 30 lines in 9 files changed: 11 ins; 5 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From jjoo at openjdk.org Wed Nov 15 23:50:02 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Wed, 15 Nov 2023 23:50:02 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v44] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Fix whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/4db8f09f..ce7dbfcf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=43 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=42-43 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From jjoo at openjdk.org Thu Nov 16 00:17:42 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Thu, 16 Nov 2023 00:17:42 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v42] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 09:36:47 GMT, Stefan Johansson wrote: >> Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: >> >> Update parallel workers time after Remark > > Thanks for addressing my comments. I have a few more things: > > - I think all changes to `test_g1ServiceThread.cpp` can be reverted. Should not be needed now > - Please fix all whitespace issues > - Should we move the VMThread and StringDedup counters into `CPUTimeCounters` as well? Any problem with this? @kstefanj Good points - done! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15082#issuecomment-1813500906 From manc at openjdk.org Thu Nov 16 00:27:44 2023 From: manc at openjdk.org (Man Cao) Date: Thu, 16 Nov 2023 00:27:44 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v44] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 23:50:02 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Fix whitespace Overall looks good, a few details could be improved. src/hotspot/share/gc/g1/g1CollectedHeap.hpp line 59: > 57: #include "memory/iterator.hpp" > 58: #include "memory/memRegion.hpp" > 59: #include "runtime/cpuTimeCounters.hpp" Probably move this include to the .cpp file? src/hotspot/share/gc/g1/g1ServiceThread.cpp line 161: > 159: if (UsePerfData && os::is_thread_cpu_time_supported()) { > 160: ThreadTotalCPUTimeClosure tttc(CPUTimeCounters::get_instance(), CPUTimeGroups::gc_service); > 161: tttc.do_thread(task->_service_thread); It could just use `do_thread(this)`, then it can remove the `task` parameter. src/hotspot/share/gc/shared/stringdedup/stringDedupProcessor.cpp line 72: > 70: _processor = new Processor(); > 71: if (UsePerfData && os::is_thread_cpu_time_supported()) { > 72: EXCEPTION_MARK; This whole `if` block could be updated to `CPUTimeCounters::get_instance()->create_counter(CPUTimeGroups::conc_dedup)`. We can also remove the `_concurrent_dedup_thread_cpu_time` field and the `ThreadTotalCPUTimeClosure(PerfCounter* counter)` constructor. In `StringDedup::Processor::run()`, it can call if (UsePerfData && os::is_thread_cpu_time_supported()) { ThreadTotalCPUTimeClosure tttc(CPUTimeCounters::get_instance(), CPUTimeGroups::conc_dedup); tttc.do_thread(thread); } Similar, this can be applied to vmThread. src/hotspot/share/runtime/init.cpp line 124: > 122: codeCache_init(); > 123: VM_Version_init(); // depends on codeCache_init for emitting code > 124: // Initialize CPUTimeCounters object, which must be done before creation of the heap. Would it be possible to move this inside `universe_init()` in universe.cpp, somewhere above `Universe::initialize_heap()`? There's a similar `MetaspaceCounters::initialize_performance_counters()` in `universe_init()`. ------------- Changes requested by manc (Committer). PR Review: https://git.openjdk.org/jdk/pull/15082#pullrequestreview-1731128372 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1395001799 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1395007511 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1395009733 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1395000735 From manc at openjdk.org Thu Nov 16 00:27:54 2023 From: manc at openjdk.org (Man Cao) Date: Thu, 16 Nov 2023 00:27:54 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v42] In-Reply-To: References: Message-ID: On Tue, 14 Nov 2023 21:33:37 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Update parallel workers time after Remark src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1519: > 1517: > 1518: CPUTimeCounters* instance = CPUTimeCounters::get_instance(); > 1519: assert(instance != nullptr, "no instance found"); It's probably better to move this `assert` inside the `CPUTimeCounters::get_instance()` body. src/hotspot/share/runtime/cpuTimeCounters.cpp line 43: > 41: case gc_service: > 42: return "gc_service"; > 43: case COUNT: "default" seems more appropriate than COUNT. This seems better? default: ShouldNotReachHere(); src/hotspot/share/runtime/cpuTimeCounters.cpp line 65: > 63: } > 64: > 65: CPUTimeCounters* CPUTimeCounters::_instance = nullptr; No need for extra whitespaces. Single space should be fine. src/hotspot/share/runtime/cpuTimeCounters.hpp line 2: > 1: /* > 2: * Copyright (c) 2002, 2019, Oracle and/or its affiliates. All rights reserved. Year should be "2023". src/hotspot/share/runtime/cpuTimeCounters.hpp line 32: > 30: #include "runtime/perfData.hpp" > 31: #include "runtime/perfDataTypes.hpp" > 32: Include "memory/iterator.hpp" and remove this include from perfData.hpp? src/hotspot/share/runtime/cpuTimeCounters.hpp line 35: > 33: class CPUTimeGroups : public AllStatic { > 34: public: > 35: enum CPUTimeType { I think new code should prefer `enum class` over plain `enum`. https://stackoverflow.com/q/18335861 src/hotspot/share/runtime/cpuTimeCounters.hpp line 36: > 34: public: > 35: enum CPUTimeType { > 36: total, Probably `gc_total` instead of `total`, since we will include non-GC counters in this class. Also for naming style, I find it better to be `GcTotal` or `GC_TOTAL` for public enum constants. But HotSpot has mixed styles for enums, so changing the names is optional. src/hotspot/share/runtime/cpuTimeCounters.hpp line 48: > 46: }; > 47: > 48: class CPUTimeCounters: public CHeapObj { Perhaps `mtServiceability` as hsperf counters are part of serviceability, and we will include non-GC hsperf counters. src/hotspot/share/runtime/cpuTimeCounters.hpp line 50: > 48: class CPUTimeCounters: public CHeapObj { > 49: private: > 50: // We want CPUTimeCounters to be a singleton instance accessed by the vm thread. Suggest remove "accessed by the vm thread". It is already access by non-VM threads like G1 concurrent mark thread and concurrent refine thread. src/hotspot/share/runtime/cpuTimeCounters.hpp line 61: > 59: // since the last time we called `publish_total_cpu_time()`. > 60: // It is incremented using Atomic::add() to prevent race conditions, and > 61: // is added to the `total` CPUTimeGroup at the end of GC. Ditto, better to have "gc" substring in these names: _total_cpu_time_diff inc_total_cpu_time() publish_total_cpu_time() src/hotspot/share/runtime/cpuTimeCounters.hpp line 80: > 78: void operator=(const CPUTimeCounters& copy) = delete; > 79: > 80: ~CPUTimeCounters(); No need to declare destructor since it is not overwritten. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1393555723 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1394989485 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1394992509 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1394981679 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1394997677 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1394972807 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1394976545 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1394981147 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1394969526 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1394977769 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1394983409 From pli at openjdk.org Thu Nov 16 01:19:58 2023 From: pli at openjdk.org (Pengfei Li) Date: Thu, 16 Nov 2023 01:19:58 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v4] In-Reply-To: References: Message-ID: On Wed, 27 Sep 2023 08:36:43 GMT, Pengfei Li wrote: >> ## TL;DR >> >> This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. > > Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: > > Fix code style issues and add loop head dump src/hotspot/cpu/x86/x86.ad line 9022: > 9020: %} > 9021: > 9022: instruct loop_vmask_gen_small_trip(kReg dst, rRegI from, rRegI to, rRegI tmp1, rRegL tmp2) %{ We probably need to mark rflags being killed for this. See https://github.com/openjdk/jdk/pull/16680 src/hotspot/cpu/x86/x86.ad line 9040: > 9038: %} > 9039: > 9040: instruct loop_vmask_gen(kReg dst, rRegI from, rRegI to, rRegI tmp1, rRegL tmp2) %{ ditto ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1395040161 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1395040240 From duke at openjdk.org Thu Nov 16 04:07:35 2023 From: duke at openjdk.org (Liming Liu) Date: Thu, 16 Nov 2023 04:07:35 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v12] In-Reply-To: References: Message-ID: On Mon, 30 Oct 2023 06:20:07 GMT, Liming Liu wrote: >> As described at [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923), this patch uses madvise with MADV_POPULATE_WRITE to pretouch memory when supported (since kernel 5.14). >> >> Ran the newly added jtreg test on 64c Neoverse-N1 machines with kernel 4.18, 5.13 and 6.1, and observed that transparent huge pages formed right after pretouch on kernel 6.1. Recorded the time spent on the test in *seconds* with `VERBOSE=time` as the table below, and got that the patch takes improvements when the system call is supported, while does not hurt if not supported: >> >> >> >> >> >> >> >> >> >> >> >>
Kernel-XX:-TransparentHugePages-XX:+TransparentHugePages
UnpatchedPatchedUnpatchedPatched
4.1811.3011.300.250.25
5.130.220.223.423.42
6.10.270.333.540.33
> > Liming Liu has updated the pull request incrementally with one additional commit since the last revision: > > Update the name of the method Could anyone continue to review this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15781#issuecomment-1813756737 From fyang at openjdk.org Thu Nov 16 04:44:30 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 16 Nov 2023 04:44:30 GMT Subject: RFR: 8318159: RISC-V: Improve itable_stub In-Reply-To: References: Message-ID: On Tue, 14 Nov 2023 15:01:51 GMT, Yuri Gaevsky wrote: > Please review the change for RISC-V similar to #13792(AARCH64) and #13460(X86). > > From #13792: > The change replaces two separate iterations over the itable with new algorithm > consisting of two loops. First, we look for a match with resolved_klass, > checking for a match with holder_klass along the way. Then we continue iterating > (not starting over) the itable using the second loop, checking only for a match > with holder_klass. > > ### Correctness checks > > Testing: tier1 tests successfully passed on HiFive Unmatched board. > > #### Performance results on RISC-V StarFive JH7110 board: > > > InterfaceCalls: before fix after fix > ------------------------------------------------------------------- > Benchmark Mode Cnt Score Error Score Error Units > ------------------------------------------------------------------- > test1stInt2Types avgt 100 14.380 ? 0.017 | 14.370 ? 0.014 ns/op > test1stInt3Types avgt 100 72.724 ? 0.552 | 66.290 ? 0.080 ns/op > test1stInt5Types avgt 100 73.948 ? 0.524 | 68.781 ? 0.377 ns/op > test2ndInt2Types avgt 100 15.705 ? 0.016 | 15.707 ? 0.018 ns/op > test2ndInt3Types avgt 100 82.370 ? 0.453 | 75.363 ? 0.156 ns/op > test2ndInt5Types avgt 100 85.266 ? 0.466 | 80.969 ? 0.752 ns/op > testIfaceCall avgt 100 75.684 ? 0.648 | 72.603 ? 0.460 ns/op > testIfaceExtCall avgt 100 86.293 ? 0.567 | 77.939 ? 0.340 ns/op > testMonomorphic avgt 100 11.357 ? 0.007 | 11.359 ? 0.009 ns/op > ------------------------------------------------------------------- > > > #### Performance results on RISC-V HiFive Unmatched board: > > > InterfaceCalls: before fix after fix > --------------------------------------------------------------------- > Benchmark Mode Cnt Score Error Score Error Units > --------------------------------------------------------------------- > test1stInt2Types avgt 100 24.432 ? 1.811 | 23.205 ? 1.512 ns/op > test1stInt3Types avgt 100 135.800 ? 3.991 | 127.112 ? 2.299 ns/op > test1stInt5Types avgt 100 141.746 ? 4.272 | 136.069 ? 4.919 ns/op > test2ndInt2Types avgt 100 31.474 ? 2.468 | 26.978 ? 1.951 ns/op > test2ndInt3Types avgt 100 146.410 ? 3.575 | 139.443 ? 3.677 ns/op > test2ndInt5Types avgt 100 156.083 ? 3.617 | 150.583 ? 2.909 ns/op > testIfaceCall avgt 100 136.392 ? 2.546 | 129.632 ? 1.662 ns/op > testIfaceExtCall avgt 100 155.602 ? 3.836 | 138.058 ? 2.147 ns/op > testMonomorphic avgt 100 24.018 ? 1.888 | 21.522 ? 1.662 ns/op > ---------... Changes requested by fyang (Reviewer). src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 2562: > 2560: mv(holder_offset, zr); > 2561: // scan_temp = &(itable[0]._interface) > 2562: la(scan_temp, Address(scan_temp)); The `la` call here won't emit any code on riscv [1]. So I think we can simply remove it and apply the code comment at L2561 to the preceding `shadd` call. [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L745 ------------- PR Review: https://git.openjdk.org/jdk/pull/16657#pullrequestreview-1733407682 PR Review Comment: https://git.openjdk.org/jdk/pull/16657#discussion_r1395092707 From dholmes at openjdk.org Thu Nov 16 05:29:34 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 16 Nov 2023 05:29:34 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 01:32:00 GMT, Xiaohong Gong wrote: >> Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). >> >> SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. >> >> To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. >> >> Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. >> >> [1] https://github.com/openjdk/jdk/pull/3638 >> [2] https://sleef.org/ >> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ >> [4] https://packages.debian.org/bookworm/libsleef3 >> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Add a bundled native lib in jdk as a bridge to libsleef > - Merge 'jdk:master' into JDK-8312425 > - Disable sleef by default > - Merge 'jdk:master' into JDK-8312425 > - 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF Really this should go to panama-dev but core-libs will have to do. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1813811597 From dholmes at openjdk.org Thu Nov 16 05:35:34 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 16 Nov 2023 05:35:34 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: References: Message-ID: <5-IZTbs5nEjKJrCDkqBAns57xOfOsbAKDjV21khh_Qc=.1d539b01-c4c8-4366-9915-34fe5a90af75@github.com> On Wed, 15 Nov 2023 01:32:00 GMT, Xiaohong Gong wrote: >> Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). >> >> SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. >> >> To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. >> >> Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. >> >> [1] https://github.com/openjdk/jdk/pull/3638 >> [2] https://sleef.org/ >> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ >> [4] https://packages.debian.org/bookworm/libsleef3 >> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Add a bundled native lib in jdk as a bridge to libsleef > - Merge 'jdk:master' into JDK-8312425 > - Disable sleef by default > - Merge 'jdk:master' into JDK-8312425 > - 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF So to be clear, now you have to opt-in to using `libsleef` by building a binary that will use it. That binary will always use `libsleef` if found, so there is no way to opt-out at runtime. Is that the way it works on x86_64 too? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1813816824 From dholmes at openjdk.org Thu Nov 16 06:00:39 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 16 Nov 2023 06:00:39 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v8] In-Reply-To: <97IBSrr12htoiw751JlhL4f7jiEZeoYVF9hQjas8vrI=.a7143156-e1d5-4774-ba4b-08e29eb05389@github.com> References: <97IBSrr12htoiw751JlhL4f7jiEZeoYVF9hQjas8vrI=.a7143156-e1d5-4774-ba4b-08e29eb05389@github.com> Message-ID: On Tue, 7 Nov 2023 11:40:02 GMT, Afshin Zafari wrote: >> The `find` method now is >> ```C++ >> template >> int find(T* token, bool f(T*, E)) const { >> ... >> >> Any other functions which use this are also changed. >> Local linux-x64-debug hotspot:tier1 passed. Mach5 tier1 build on linux and Windows passed. > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > function pointer is replaced with template Functor. src/hotspot/share/gc/parallel/mutableNUMASpace.cpp line 163: > 161: } > 162: // That's the normal case, where we know the locality group of the thread. > 163: int i = lgrp_spaces()->find((uint*)&lgrp_id, LGRPSpace::equals); I guess it is somewhat outside the scope of this PR but I wish this code would make its mind up about whether the NUMA group ids are `int` or `uint`! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15418#discussion_r1395182714 From dholmes at openjdk.org Thu Nov 16 06:07:33 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 16 Nov 2023 06:07:33 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v3] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 09:38:25 GMT, Afshin Zafari wrote: >>> I still approve of this patch as it's better than what we had before. There are a lot of suggested improvements that can be done either in this PR or in a future RFE. `git blame` shows that this hasn't been touched since 2008, so I don't think applying all suggestions now is in any sense critical :-). >> >> Not touched since 2008 suggests to me there might not be a rush to make the change as proposed, and instead take >> the (I think small) additional time to do the better thing, e.g. the unary-predicate suggestion made by several folks. > > @kimbarrett , @dholmes-ora , @merykitty > Is there any comment on this PR? @afshin-zafari I will leave it to other to (re-) review the latest changes. I don't grok this template stuff enough to pass judgement. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15418#issuecomment-1813839612 From dholmes at openjdk.org Thu Nov 16 06:28:30 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 16 Nov 2023 06:28:30 GMT Subject: RFR: 8318480: Deprecate UseCounterDecay and remove CounterDecayMinIntervalLength [v3] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 19:24:51 GMT, Daniel Lund?n wrote: >> This changeset deprecates the leftover (i.e., no longer used for anything) product compiler flag `UseCounterDecay` (requires CSR) and removes the leftover develop flag `CounterDecayMinIntervalLength`. >> >> Changes: >> - Deprecate `UseCounterDecay` in JDK 22, obsolete it in JDK 23, and expire it in JDK 24. The flag is, in fact, already obsolete, so I've also removed it from the source code (except for the definition in `globals.hpp` which must remain until obsoletion). >> - Completely remove `CounterDecayMinIntervalLength`. >> >> ### Testing >> Platforms: windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64. >> - `tier1` >> - HotSpot parts of `tier2` and `tier3` > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright As `UseCounterDecay` already does nothing there is no need to have a deprecation step - just go straight to obsoletion. The only difference to the user will be the message they see. ------------- PR Review: https://git.openjdk.org/jdk/pull/16673#pullrequestreview-1733590890 From sspitsyn at openjdk.org Thu Nov 16 06:41:44 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 16 Nov 2023 06:41:44 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v5] In-Reply-To: References: Message-ID: > The handshakes support for virtual threads is needed to simplify the JVMTI implementation for virtual threads. There is a significant duplication in the JVMTI code to differentiate code intended to support platform, virtual threads or both. The handshakes are unified, so it is enough to define just one handshake for both platform and virtual thread. > At the low level, the JVMTI code supporting platform and virtual threads still can be different. > This implementation is based on the `JvmtiVTMSTransitionDisabler` class. > > The internal API includes two new classes: > - `JvmtiHandshake` and `JvmtiUnifiedHandshakeClosure` > > The `JvmtiUnifiedHandshakeClosure` defines two different callback functions: `do_thread()` and `do_vthread()`. > > The first JVMTI functions are picked first to be converted to use the `JvmtiHandshake`: > - `GetStackTrace`, `GetFrameCount`, `GetFrameLocation`, `NotifyFramePop` > > To get the test results clean, the update also fixes the test issue: > [8318631](https://bugs.openjdk.org/browse/JDK-8318631): GetStackTraceSuspendedStressTest.java failed with "check_jvmti_status: JVMTI function returned error: JVMTI_ERROR_THREAD_NOT_ALIVE (15)" > > Testing: > - the mach5 tiers 1-6 are all passed Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: addressed review: added check for jdk_internal_vm_Continuation::done(cont) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16460/files - new: https://git.openjdk.org/jdk/pull/16460/files/ca2fbb98..13e05ab1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16460&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16460&range=03-04 Stats: 12 lines in 2 files changed: 4 ins; 5 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16460.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16460/head:pull/16460 PR: https://git.openjdk.org/jdk/pull/16460 From qamai at openjdk.org Thu Nov 16 06:49:33 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 16 Nov 2023 06:49:33 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v8] In-Reply-To: <97IBSrr12htoiw751JlhL4f7jiEZeoYVF9hQjas8vrI=.a7143156-e1d5-4774-ba4b-08e29eb05389@github.com> References: <97IBSrr12htoiw751JlhL4f7jiEZeoYVF9hQjas8vrI=.a7143156-e1d5-4774-ba4b-08e29eb05389@github.com> Message-ID: <648SrHxCX6_kRX7cmxyGurxOWecLTWUw0_C79J_okbo=.473eaa8c-7dfc-4aa0-839d-bb580cc9d312@github.com> On Tue, 7 Nov 2023 11:40:02 GMT, Afshin Zafari wrote: >> The `find` method now is >> ```C++ >> template >> int find(T* token, bool f(T*, E)) const { >> ... >> >> Any other functions which use this are also changed. >> Local linux-x64-debug hotspot:tier1 passed. Mach5 tier1 build on linux and Windows passed. > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > function pointer is replaced with template Functor. src/hotspot/share/utilities/growableArray.hpp line 213: > 211: > 212: template > 213: int find(T* token, F f) const { Should be template int find(F f) const { for (int i = 0; i < _len; i++) { if (f(_data[i]) { return i; } } return -1; } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15418#discussion_r1395219935 From sspitsyn at openjdk.org Thu Nov 16 06:59:49 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 16 Nov 2023 06:59:49 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v6] In-Reply-To: References: Message-ID: > The handshakes support for virtual threads is needed to simplify the JVMTI implementation for virtual threads. There is a significant duplication in the JVMTI code to differentiate code intended to support platform, virtual threads or both. The handshakes are unified, so it is enough to define just one handshake for both platform and virtual thread. > At the low level, the JVMTI code supporting platform and virtual threads still can be different. > This implementation is based on the `JvmtiVTMSTransitionDisabler` class. > > The internal API includes two new classes: > - `JvmtiHandshake` and `JvmtiUnifiedHandshakeClosure` > > The `JvmtiUnifiedHandshakeClosure` defines two different callback functions: `do_thread()` and `do_vthread()`. > > The first JVMTI functions are picked first to be converted to use the `JvmtiHandshake`: > - `GetStackTrace`, `GetFrameCount`, `GetFrameLocation`, `NotifyFramePop` > > To get the test results clean, the update also fixes the test issue: > [8318631](https://bugs.openjdk.org/browse/JDK-8318631): GetStackTraceSuspendedStressTest.java failed with "check_jvmti_status: JVMTI function returned error: JVMTI_ERROR_THREAD_NOT_ALIVE (15)" > > Testing: > - the mach5 tiers 1-6 are all passed Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: addressed review: (1) removed is_vthread_alive checks; (2) reverted one update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16460/files - new: https://git.openjdk.org/jdk/pull/16460/files/13e05ab1..a526bd60 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16460&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16460&range=04-05 Stats: 17 lines in 1 file changed: 0 ins; 15 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16460.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16460/head:pull/16460 PR: https://git.openjdk.org/jdk/pull/16460 From sspitsyn at openjdk.org Thu Nov 16 07:13:36 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 16 Nov 2023 07:13:36 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v4] In-Reply-To: <2F8ze1cLKzvTMEqwY8JJMZ9QbZUxqrMCv7nl6uFJLMI=.7c4dba0a-8c8d-4825-9670-75b76b9bf184@github.com> References: <6vZHqAagtrtDw-c9xNZNpaBCa1HrpYw22uhYtDTHwAw=.3a59dde0-ab33-43b8-a08f-d481e7acffef@github.com> <2F8ze1cLKzvTMEqwY8JJMZ9QbZUxqrMCv7nl6uFJLMI=.7c4dba0a-8c8d-4825-9670-75b76b9bf184@github.com> Message-ID: On Wed, 8 Nov 2023 16:11:44 GMT, Serguei Spitsyn wrote: >> src/hotspot/share/prims/jvmtiEnvBase.cpp line 1974: >> >>> 1972: >>> 1973: if (java_lang_VirtualThread::is_instance(target_h())) { // virtual thread >>> 1974: if (!JvmtiEnvBase::is_vthread_alive(target_h())) { >> >> There is only one issue I see in how this check is implemented and the removal of the VM_op for unmounted vthreads. The change of state to TERMINATED happens after notifyJvmtiUnmount(), i.e we can see that this vthread is alive here but a check later can return is not. This might hit the assert in JvmtiEnvBase::get_vthread_jvf() (maybe this the issue you saw on your first prototype). We can either change that order at the Java level, or maybe better change this function to read the state and add a case where if the state is RUNNING check whether the continuation is done or not (jdk_internal_vm_Continuation::done(cont)). > > Thank you for the suggestion. Will check it. I've added the check for `!jdk_internal_vm_Continuation::done(cont)` into ` JvmtiEnvBase::is_vthread_alive(oop vt)` but then decided to remove it again. This is racy for `JvmtiHandshake` execution. As you correctly stated, the change of state to `TERMINATED` happens after `notifyJvmtiUnmount()`. The target virtual thread will be blocked in the `notifyJvmtiUnmount()` because the `JvmtiVTMSTransitionDisabler` is set. This gives us a guaranty that the target virtual thread won't change its state to `TERMINATED` while a handshake is executed. But it becomes not true if we add the `!jdk_internal_vm_Continuation::done(cont)` check. Form the other hand, absence of this check allows for target virtual thread stack to become empty (with no frames). This is a known problem but I'd prefer to attack it separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16460#discussion_r1395238429 From lmesnik at openjdk.org Thu Nov 16 07:13:38 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 16 Nov 2023 07:13:38 GMT Subject: RFR: 8319200: Don't use test thread factory in ProcessTools.createLimitedTestJavaProcessBuilder() [v5] In-Reply-To: References: <3damdMQpRBrkUN2S32tBD0Tmrl2tmSqA31NniV8FzHU=.d3a36aa7-d5f4-4a63-b2ff-8b9b616a9637@github.com> Message-ID: On Fri, 10 Nov 2023 01:49:17 GMT, Leonid Mesnik wrote: >> Test thread factory is a mode similar to VM flags and should not be used in ProcessTools.createLimitedTestJavaProcessBuilder(). Only createTestJavaProcessBuilder() should use it like jtreg VM options. >> >> Adding the test thread factory requires the injection of arguments in the middle of the list. I don't think it makes sense to modify arguments in several places so I replaced it with the flag isLimited and moved all modifications in createJavaProcessBuilder(). >> >> Testing tier1-5. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > variable was renamed. It is expected from `createLimitedTestJavaProcessBuilder` to execute the process exactly as specified in the test, without any additional changes from jtreg. It is very likely that the test might fail if the process is executed in other modes. I agree that usage of this method is always a potential loss of coverage for vm flags being tested and for virtual threads. So this method should be used only when it is necessary. As well as we should minimize the number of flagless testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16442#issuecomment-1813899682 From dholmes at openjdk.org Thu Nov 16 07:15:35 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 16 Nov 2023 07:15:35 GMT Subject: RFR: 8319777: Zero: Support 8-byte cmpxchg [v2] In-Reply-To: <5X9UjtgpVfSFxZQggFfQS1Z99xeFR-u1EjoWtIWdVOA=.1528ea9b-4725-4ae2-8606-65ce20ccb7b4@github.com> References: <5X9UjtgpVfSFxZQggFfQS1Z99xeFR-u1EjoWtIWdVOA=.1528ea9b-4725-4ae2-8606-65ce20ccb7b4@github.com> Message-ID: On Tue, 14 Nov 2023 13:28:09 GMT, Aleksey Shipilev wrote: >> See related discussion in [JDK-8318776](https://bugs.openjdk.org/browse/JDK-8318776) that targets to require `supports_cx8()` unconditionally. >> >> I think we can claim Zero is `supports_cx8() == true`, because we have enough fallbacks for 8-byte CASes to work. Note that some code already reaches for these without checking for `supports_cx8()`, so the proverbial horses have already left the barn. >> >> I ran tests with [JDK-8319883](https://bugs.openjdk.org/browse/JDK-8319883) applied to fix known problems with x86_32 Zero. >> >> Additional testing: >> - [x] Linux x86_32 Zero release; jcstress >> - [x] Linux x86_32 Zero fastdebug, `compiler/unsafe java/lang/invoke/VarHandles` >> - [x] Linux x86_32 Zero fastdebug, bootcycle-images > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Only do _supports_cx8 = true > - Merge branch 'master' into JDK-8319777-zero-64cas > - Fix Looks good - thanks. Sorry for the delay - swamped at the moment. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16614#pullrequestreview-1733647796 From stuefe at openjdk.org Thu Nov 16 07:26:32 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 16 Nov 2023 07:26:32 GMT Subject: RFR: 8319777: Zero: Support 8-byte cmpxchg [v2] In-Reply-To: <5X9UjtgpVfSFxZQggFfQS1Z99xeFR-u1EjoWtIWdVOA=.1528ea9b-4725-4ae2-8606-65ce20ccb7b4@github.com> References: <5X9UjtgpVfSFxZQggFfQS1Z99xeFR-u1EjoWtIWdVOA=.1528ea9b-4725-4ae2-8606-65ce20ccb7b4@github.com> Message-ID: On Tue, 14 Nov 2023 13:28:09 GMT, Aleksey Shipilev wrote: >> See related discussion in [JDK-8318776](https://bugs.openjdk.org/browse/JDK-8318776) that targets to require `supports_cx8()` unconditionally. >> >> I think we can claim Zero is `supports_cx8() == true`, because we have enough fallbacks for 8-byte CASes to work. Note that some code already reaches for these without checking for `supports_cx8()`, so the proverbial horses have already left the barn. >> >> I ran tests with [JDK-8319883](https://bugs.openjdk.org/browse/JDK-8319883) applied to fix known problems with x86_32 Zero. >> >> Additional testing: >> - [x] Linux x86_32 Zero release; jcstress >> - [x] Linux x86_32 Zero fastdebug, `compiler/unsafe java/lang/invoke/VarHandles` >> - [x] Linux x86_32 Zero fastdebug, bootcycle-images > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Only do _supports_cx8 = true > - Merge branch 'master' into JDK-8319777-zero-64cas > - Fix Good. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16614#pullrequestreview-1733667506 From xgong at openjdk.org Thu Nov 16 07:29:36 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 16 Nov 2023 07:29:36 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: <5-IZTbs5nEjKJrCDkqBAns57xOfOsbAKDjV21khh_Qc=.1d539b01-c4c8-4366-9915-34fe5a90af75@github.com> References: <5-IZTbs5nEjKJrCDkqBAns57xOfOsbAKDjV21khh_Qc=.1d539b01-c4c8-4366-9915-34fe5a90af75@github.com> Message-ID: On Thu, 16 Nov 2023 05:33:13 GMT, David Holmes wrote: >> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Add a bundled native lib in jdk as a bridge to libsleef >> - Merge 'jdk:master' into JDK-8312425 >> - Disable sleef by default >> - Merge 'jdk:master' into JDK-8312425 >> - 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF > > So to be clear, now you have to opt-in to using `libsleef` by building a binary that will use it. That binary will always use `libsleef` if found, so there is no way to opt-out at runtime. Is that the way it works on x86_64 too? Thanks for adding the labels @dholmes-ora ! > So to be clear, now you have to opt-in to using libsleef by building a binary that will use it. That binary will always use libsleef if found, so there is no way to opt-out at runtime. Is that the way it works on x86_64 too? Yes, libsleef is used to build the binary if found. And at runtime, if the libsleef with right version is not found, `dlopen` to the libvmath.so will fail. And then all the operations will be fall-back to the java default implementation. X86_64 has also bundled the Intel's SVML binary to jdk image at build time. And we use the same way loading/opening the library at runtime. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1813916971 From clanger at openjdk.org Thu Nov 16 07:46:30 2023 From: clanger at openjdk.org (Christoph Langer) Date: Thu, 16 Nov 2023 07:46:30 GMT Subject: RFR: 8319542: Fix boundaries of region to be tested with os::is_readable_range In-Reply-To: <3Yl9BkYAP40CteNLOZCwOkKD5QEZh2yYiAqtTgLeOyI=.bf810bf7-6d0d-4f7e-bded-0e6e304a569c@github.com> References: <3Yl9BkYAP40CteNLOZCwOkKD5QEZh2yYiAqtTgLeOyI=.bf810bf7-6d0d-4f7e-bded-0e6e304a569c@github.com> Message-ID: On Wed, 15 Nov 2023 13:33:00 GMT, Thomas Obermeier wrote: > PR https://github.com/openjdk/jdk/pull/16381 was already closed when it became obvious that usage of os::is_readable_range() was slightly wrong: > > the " - 1" looks wrong here, because is_readable_range() checks for < to, not <= to. Marked as reviewed by clanger (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16676#pullrequestreview-1733702884 From duke at openjdk.org Thu Nov 16 08:26:33 2023 From: duke at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 16 Nov 2023 08:26:33 GMT Subject: RFR: 8318480: Deprecate UseCounterDecay and remove CounterDecayMinIntervalLength [v3] In-Reply-To: References: Message-ID: On Thu, 16 Nov 2023 06:26:07 GMT, David Holmes wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Update copyright > > As `UseCounterDecay` already does nothing there is no need to have a deprecation step - just go straight to obsoletion. The only difference to the user will be the message they see. Thanks @dholmes-ora, I agree (but was under the impression you always had to deprecate flags as a first step). I'll revise the CSR and changeset to directly obsolete the flag instead. Edit: now saw the CSR was already approved. I'll investigate if we can reopen and revise it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16673#issuecomment-1813982769 From duke at openjdk.org Thu Nov 16 08:41:43 2023 From: duke at openjdk.org (Thomas Obermeier) Date: Thu, 16 Nov 2023 08:41:43 GMT Subject: Integrated: 8319542: Fix boundaries of region to be tested with os::is_readable_range In-Reply-To: <3Yl9BkYAP40CteNLOZCwOkKD5QEZh2yYiAqtTgLeOyI=.bf810bf7-6d0d-4f7e-bded-0e6e304a569c@github.com> References: <3Yl9BkYAP40CteNLOZCwOkKD5QEZh2yYiAqtTgLeOyI=.bf810bf7-6d0d-4f7e-bded-0e6e304a569c@github.com> Message-ID: On Wed, 15 Nov 2023 13:33:00 GMT, Thomas Obermeier wrote: > PR https://github.com/openjdk/jdk/pull/16381 was already closed when it became obvious that usage of os::is_readable_range() was slightly wrong: > > the " - 1" looks wrong here, because is_readable_range() checks for < to, not <= to. This pull request has now been integrated. Changeset: b4c2d1c1 Author: Thomas Obermeier Committer: Dean Long URL: https://git.openjdk.org/jdk/commit/b4c2d1c1af76da4b326e7acea2ccb740728a8c7c Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8319542: Fix boundaries of region to be tested with os::is_readable_range Reviewed-by: dlong, clanger ------------- PR: https://git.openjdk.org/jdk/pull/16676 From dholmes at openjdk.org Thu Nov 16 08:54:36 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 16 Nov 2023 08:54:36 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v4] In-Reply-To: References: <2dVPtwS-M9xk4yHIZcFr3y_d1xSgGFqkfW3ABZvvb8M=.529435cb-d62d-4a5d-a545-5ee446457e5d@github.com> Message-ID: On Wed, 15 Nov 2023 07:22:25 GMT, Axel Boldt-Christmas wrote: >> To be clear, my concern is that for a simple exit we not only have to first check for a recursive exit (fine) but also this unexpected rare unstructured locking recursive case. Thinking it through part of the problem is that a simple-exit does itself allow for unstructured locking. Is it worth adding an additional case to peek at the top of the lock-stack and then do an exit with a pop for the most common non-recursive case? That way we in effect handle things as follows: >> - recursive exit >> - direct exit >> - recursive unstructured exit >> - direct unstructured exit > > First of let us note that when reaching this code the unstructured exit is the common case. The normal exit and recursive exit is usually handled in the emitted code (this includes the interpreter). We reach this because either a CAS failed somewhere due to a concurrent hashCode instalment, or the exit was unstructured. Inflated monitors exit just jumps passed this code (everything is conditioned on `mark.is_fast_locked()`). > > Is this motivated by the structure/layout of the C++ code. Or an optimisation? > > If it is motivated by the structure/layout. Then we can lay it out as you described. It would add some code duplication. > > If it is motivated as an optimisation then after the recursive exit fail, we should just call remove and act based on the return value. I would not go so far as to say the unstructured locking case is common. Sure we are on the slow-path, which may be due to unstructured locking, or we may be here through deop (also a slow path) or through the native method wrapper, or ... but yes this is not really performance critical. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1395353343 From rgiulietti at openjdk.org Thu Nov 16 08:56:43 2023 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Thu, 16 Nov 2023 08:56:43 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v4] In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> <8cuxVznAEsVV5haAaBg0aew7QOc9-iFBMLGFvpuUPtM=.8acbbd32-2000-4e2b-bbbe-b73e6b06839e@github.com> Message-ID: On Wed, 15 Nov 2023 22:08:19 GMT, Roger Riggs wrote: >> test/jdk/java/lang/String/StringRacyConstructor.java line 190: >> >>> 188: if (printWarningCount == 0) { >>> 189: printWarningCount = 1; >>> 190: System.out.println("StringUTF16.compress returned 0, may not be intrinsic"); >> >> It seems to me that the Java code for `StringUTF16.compress` also returns the index of the non-latin1 char, so I'm not sure I understand this. Just caution? > > There are separate implementations of the intrinsic for each platform. > This test would help identify the platforms on which the intrinsic had not been updated to return the index instead of zero. Ah OK. So maybe the message should read `"intrinsic for StringUTF16.compress returned 0, may not have been updated"` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1395319817 From rgiulietti at openjdk.org Thu Nov 16 08:56:49 2023 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Thu, 16 Nov 2023 08:56:49 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v4] In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> <8cuxVznAEsVV5haAaBg0aew7QOc9-iFBMLGFvpuUPtM=.8acbbd32-2000-4e2b-bbbe-b73e6b06839e@github.com> Message-ID: On Wed, 15 Nov 2023 15:25:21 GMT, Raffaello Giulietti wrote: >> Roger Riggs has updated the pull request incrementally with one additional commit since the last revision: >> >> Update PPC implementation of string_compress to return the index of the non-latin1 char >> Patch supplied by TheRealMDoerr > > test/jdk/java/lang/String/StringRacyConstructor.java line 300: > >> 298: */ >> 299: public static String racyStringConstructionCodepoints(String original) throws ConcurrentModificationException { >> 300: if (original.chars().max().orElseThrow() > 256) { > > Suggestion: > > if (original.chars().max().getAsInt() >= 256) { The correct comparison is `>=`, as codepoint 256 is not Latin1. > test/jdk/java/lang/String/StringRacyConstructor.java line 347: > >> 345: */ >> 346: public static String racyStringConstructionCodepointsSurrogates(String original) throws ConcurrentModificationException { >> 347: if (original.chars().max().orElseThrow() > 256) { > > Suggestion: > > if (original.chars().max().getAsInt() >= 256) { Same as above ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1395326065 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1395326394 From rgiulietti at openjdk.org Thu Nov 16 08:56:53 2023 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Thu, 16 Nov 2023 08:56:53 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v5] In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: <2vzYUNSbN6OssnYgXiJigD2aGFRwiPpSUgA_uQJUuEU=.17e23598-dbfa-46fe-b507-d4bd3dbf5c57@github.com> On Wed, 15 Nov 2023 22:15:54 GMT, Roger Riggs wrote: >> Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. >> The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. >> >> The changes include: >> >> - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate >> if the input array is modified before the constructor returns. >> The resulting string may contain any combination of characters sampled from the input array. >> >> - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. >> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. >> If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. >> If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. >> Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. >> Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. >> >> - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. >> >> - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. > > Roger Riggs has updated the pull request incrementally with two additional commits since the last revision: > > - Cleanup of test with review comment recommendations > - Enable racy constructor tests iff COMPACT_STRINGS is true > Test of string_compress intrinsic is always enabled test/jdk/java/lang/String/StringRacyConstructor.java line 311: > 309: } > 310: if (i >= 1_000_000) { > 311: System.out.printf("Unable to produce a UTF16 string in %d iterations: %s%n", i, original); What I meant with my previous comment is that this leaves a trace on "success", which is kind of unusual. There might be good reasons to do so, but they are not apparent to me. test/jdk/java/lang/String/StringRacyConstructor.java line 404: > 402: } > 403: if (i >= 1_000_000) { > 404: System.out.printf("Unable to create a string in %d iterations: %s%n", i, original); Suggestion: System.out.printf("Unable to produce a UTF16 string in %d iterations: %s%n", i, original); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1395344966 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1395346293 From duke at openjdk.org Thu Nov 16 09:00:41 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Thu, 16 Nov 2023 09:00:41 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v2] In-Reply-To: <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> References: <9i5yHmpRi3-XqL5lw0-0IexhCDr2FOi5nT4dgY7cWao=.ab8a1d6e-c9fc-4108-820b-374ce7815463@github.com> <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> Message-ID: <1j98smd6Cs4B7bN4FpH3bOKnR7CQ1Pp4z269w7iYCb8=.30db5f5b-dd7e-4bd3-9b9a-023e72637753@github.com> On Tue, 14 Nov 2023 15:43:39 GMT, Hamlin Li wrote: >> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: >> >> Minor cosmetic fixes. > > src/hotspot/cpu/riscv/riscv.ad line 10312: > >> 10310: match(Set result (VectorizedHashCode (Binary ary cnt) (Binary result basic_type))); >> 10311: effect(TEMP tmp1, TEMP tmp2, TEMP tmp3, TEMP tmp4, TEMP tmp5, TEMP tmp6, >> 10312: USE_KILL ary, USE_KILL cnt, USE basic_type, KILL cr); > > should `TEMP_DEF result` be added here? Hmm, addition of TEMP_DEF result makes the bencmark results even worse tha without intrinsic (I haven't look at the generated assembler though). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1395361296 From thartmann at openjdk.org Thu Nov 16 09:02:53 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 16 Nov 2023 09:02:53 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v4] In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> <8cuxVznAEsVV5haAaBg0aew7QOc9-iFBMLGFvpuUPtM=.8acbbd32-2000-4e2b-bbbe-b73e6b06839e@github.com> Message-ID: On Wed, 15 Nov 2023 15:40:49 GMT, Claes Redestad wrote: >> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8617: >> >>> 8615: lea(dst, Address(dst, tmp5, Address::times_1)); >>> 8616: subptr(len, tmp5); >>> 8617: jmpb(copy_chars_loop); >> >> This cause a crash if I run with `-XX:UseAVX=3 -XX:AVX3Threshold=0`: >> >> >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (macroAssembler_x86.hpp:122), pid=3400452, tid=3400470 >> # guarantee(this->is8bit(imm8)) failed: Short forward jump exceeds 8-bit offset at :0 >> # >> >> >> Needs to be a `jmp(copy_chars_loop)`. > > Alternatively: > > if (UseSSE42Intrinsics) { > jmpb(copy_chars_loop); > } else { > jmp(copy_chars_loop); > } > > > More generally I do wonder if it'd make most sense to make the AVX512 and SSE42 implementations exclusive, though. Especially since we shouldn't mix AVX and SSE code (the code in this intrinsic seem to follow paths which are either/or, but it seems fragile). Perhaps @TobiHartmann can advise? > This cause a crash if I run with -XX:UseAVX=3 -XX:AVX3Threshold=0: Good catch! Do we have a test for that scenario? If not, one should be added. > Alternatively [...] I would suggest to use `jmp(copy_chars_loop)` here for consistency with surrounding code. > More generally I do wonder if it'd make most sense to make the AVX512 and SSE42 implementations exclusive We don't mix AVX and SSE code here, right? We just fall back to SSE42 when AVX512 is not available or when the remaining length is below a threshold. Are you suggesting to better encapsulate both implementations (for example, factor them out into separate methods) or only ever using one? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1395363072 From dholmes at openjdk.org Thu Nov 16 09:04:41 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 16 Nov 2023 09:04:41 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v4] In-Reply-To: References: <-lwt39Gx_QJfxgzgSLHkysdtOrVxgP8dFh7gN4TDkmY=.86139caf-08c2-484f-999f-fa6cf121f9df@github.com> Message-ID: On Tue, 14 Nov 2023 08:16:38 GMT, Axel Boldt-Christmas wrote: > The current lock stack capacity is 8 Okay my recollection was they found 4 was sufficient, but I guess that changed. > So if I understand you correctly, you want to inflate the current objects monitor unconditionally if the lock stack is full. No I just expected checking for a full lock stack to happen when it is actually needed, not up front ie try to recursive lock, have it fail because the stack is full, then make some room and try again. The fullness check would be happening inside the lock stack code, not out in the caller. I'm not asking for you to do this just clarifying for you what I meant by not checking for it first. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1395366485 From duke at openjdk.org Thu Nov 16 09:08:32 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Thu, 16 Nov 2023 09:08:32 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v2] In-Reply-To: <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> References: <9i5yHmpRi3-XqL5lw0-0IexhCDr2FOi5nT4dgY7cWao=.ab8a1d6e-c9fc-4108-820b-374ce7815463@github.com> <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> Message-ID: On Tue, 14 Nov 2023 15:41:07 GMT, Hamlin Li wrote: >> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: >> >> Minor cosmetic fixes. > > src/hotspot/cpu/riscv/riscv.ad line 10306: > >> 10304: >> 10305: >> 10306: instruct arrays_hashcode(iRegP_R11 ary, iRegI_R12 cnt, iRegI_R10 result, immI basic_type, > > Is it necessary to specify the regs(r11/12/10) here? I've just "borrowed" those definitions from other intrinsics around. Do you think we can improve this with iRegP/iRegI? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1395371122 From redestad at openjdk.org Thu Nov 16 09:34:36 2023 From: redestad at openjdk.org (Claes Redestad) Date: Thu, 16 Nov 2023 09:34:36 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v4] In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> <8cuxVznAEsVV5haAaBg0aew7QOc9-iFBMLGFvpuUPtM=.8acbbd32-2000-4e2b-bbbe-b73e6b06839e@github.com> Message-ID: <_UjSprG43BPmeY-K7ykBQnuoZ0mAphNzJtKgCDSStbY=.bde70903-98af-4375-b85e-a7ad0f5f16a7@github.com> On Thu, 16 Nov 2023 08:59:25 GMT, Tobias Hartmann wrote: >> Alternatively: >> >> if (UseSSE42Intrinsics) { >> jmpb(copy_chars_loop); >> } else { >> jmp(copy_chars_loop); >> } >> >> >> More generally I do wonder if it'd make most sense to make the AVX512 and SSE42 implementations exclusive, though. Especially since we shouldn't mix AVX and SSE code (the code in this intrinsic seem to follow paths which are either/or, but it seems fragile). Perhaps @TobiHartmann can advise? > >> This cause a crash if I run with -XX:UseAVX=3 -XX:AVX3Threshold=0: > > Good catch! Do we have a test for that scenario? If not, one should be added. > >> Alternatively [...] > > I would suggest to use `jmp(copy_chars_loop)` here for consistency with surrounding code. > >> More generally I do wonder if it'd make most sense to make the AVX512 and SSE42 implementations exclusive > > We don't mix AVX and SSE code here, right? We just fall back to SSE42 when AVX512 is not available or when the remaining length is below a threshold. Are you suggesting to better encapsulate both implementations (for example, factor them out into separate methods) or only ever using one? No, we don't mix: the SSE code is used as fallback only when the length is below 32 (if length is above 32 we check the tail with AVX code by shifting). I would suggest factoring out so that the implementations don't mix as much, mainly to reduce the number of possible variants to test and not to constrain one too much with the design of the other. We now have AVX3-only, AVX3+SSE, SSE-only and plain, and I suggest dropping AVX3+SSE and fixing the AVX3-only so that it more efficiently handles strings of length 16-31 by duplicating (or using AVX instructions for copying 16 and 8 chars at once. Some code duplication perhaps, but simpler flow through each variant. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1395404549 From duke at openjdk.org Thu Nov 16 09:38:08 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Thu, 16 Nov 2023 09:38:08 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v3] In-Reply-To: References: Message-ID: <0T5rzQVycnjhsaQN4NIY8XMM-a-JwDAojLc7dENbetI=.424ac5f2-2748-4e6b-a70d-34ade9cd8e81@github.com> > Hello All, > > Please review these changes to support _vectorizedHashCode intrinsic on > RISC-V platform. The patch adds the "scalar" code for the intrinsic without > usage of any RVV instruction but provides manual unrolling of the appropriate > loop. The code with usage of RVV instruction could be added as follow-up of > the patch or independently. > > Thanks, > -Yuri Gaevsky > > P.S. My OCA has been accepted recently (ygaevsky). > > ### Correctness checks > > Testing: tier1 tests successfully passed on a RISC-V StarFive JH7110 board with Linux. > > ### Performance results (the numbers for non-ints are similar) > > #### StarFive JH7110 board: > > > ArraysHashCode: without intrinsic with intrinsic > ------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > ------------------------------------------------------------------------------- > multiints 0 avgt 30 2.658 ? 0.001 2.661 ? 0.004 ns/op > multiints 1 avgt 30 4.881 ? 0.011 4.892 ? 0.015 ns/op > multiints 2 avgt 30 16.109 ? 0.041 10.451 ? 0.075 ns/op > multiints 3 avgt 30 14.873 ? 0.068 11.753 ? 0.024 ns/op > multiints 4 avgt 30 17.283 ? 0.078 13.176 ? 0.044 ns/op > multiints 5 avgt 30 19.691 ? 0.136 14.723 ? 0.046 ns/op > multiints 6 avgt 30 21.727 ? 0.166 15.463 ? 0.124 ns/op > multiints 7 avgt 30 23.790 ? 0.126 18.298 ? 0.059 ns/op > multiints 8 avgt 30 23.527 ? 0.116 18.267 ? 0.046 ns/op > multiints 9 avgt 30 27.981 ? 0.303 20.453 ? 0.069 ns/op > multiints 10 avgt 30 26.947 ? 0.215 20.541 ? 0.051 ns/op > multiints 50 avgt 30 95.373 ? 0.588 69.238 ? 0.208 ns/op > multiints 100 avgt 30 177.109 ? 0.525 137.852 ? 0.417 ns/op > multiints 200 avgt 30 341.074 ? 1.363 296.832 ? 0.725 ns/op > multiints 500 avgt 30 847.993 ? 1.713 752.415 ? 1.918 ns/op > multiints 1000 avgt 30 1610.199 ? 5.424 1426.112 ? 3.407 ns/op > multiints 10000 avgt 30 16234.260 ? 26.789 14447.936 ? 26.345 ns/op > multiints 100000 avgt 30 170726.025 ? 184.003 152587.649 ? 381.964 ns/op > ------------------------------------------------------------------------------- > > #### T-Head RVB-ICE board: > > > ArraysHashCode: ... Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: Addressed most of suggestions for code improvements from @Hamlin-Li ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16629/files - new: https://git.openjdk.org/jdk/pull/16629/files/daae9961..86bcccee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16629&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16629&range=01-02 Stats: 18 lines in 1 file changed: 4 ins; 3 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/16629.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16629/head:pull/16629 PR: https://git.openjdk.org/jdk/pull/16629 From rgiulietti at openjdk.org Thu Nov 16 09:40:41 2023 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Thu, 16 Nov 2023 09:40:41 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v5] In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: <_Sj-6CA7xAykquiHsyPfjY4r2n4GbTfkNmxeKiEdeHE=.3cc97cb3-3b99-4cb3-b737-4429cc900cbb@github.com> On Wed, 15 Nov 2023 22:15:54 GMT, Roger Riggs wrote: >> Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. >> The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. >> >> The changes include: >> >> - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate >> if the input array is modified before the constructor returns. >> The resulting string may contain any combination of characters sampled from the input array. >> >> - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. >> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. >> If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. >> If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. >> Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. >> Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. >> >> - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. >> >> - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. > > Roger Riggs has updated the pull request incrementally with two additional commits since the last revision: > > - Cleanup of test with review comment recommendations > - Enable racy constructor tests iff COMPACT_STRINGS is true > Test of string_compress intrinsic is always enabled test/jdk/java/lang/String/StringRacyConstructor.java line 323: > 321: */ > 322: public static String racyStringConstructionCodepoints(String original) throws ConcurrentModificationException { > 323: if (original.chars().max().orElseThrow() > 256) { The correct comparison is `>=`, as codepoint 256 is not Latin1. test/jdk/java/lang/String/StringRacyConstructor.java line 370: > 368: */ > 369: public static String racyStringConstructionCodepointsSurrogates(String original) throws ConcurrentModificationException { > 370: if (original.chars().max().orElseThrow() > 256) { Same as above ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1395409456 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1395410689 From duke at openjdk.org Thu Nov 16 09:42:30 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Thu, 16 Nov 2023 09:42:30 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v2] In-Reply-To: References: <9i5yHmpRi3-XqL5lw0-0IexhCDr2FOi5nT4dgY7cWao=.ab8a1d6e-c9fc-4108-820b-374ce7815463@github.com> <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> <_96DvqVXj75b9Mmz3pewifXTiv6wPr3-IsWBCaxAbs0=.c4227f48-a537-438c-9c36-78c2fa3b79c8@github.com> Message-ID: On Wed, 15 Nov 2023 17:12:19 GMT, Hamlin Li wrote: >> Could you please clarify? I don't understand how that's possible. > > chunk is only used to tell if the wide loop is done, which can be done by ary too. > > And as subw of chunk and addi of ary is in a loop which could be a long one, so better to reduce the instructions in the loop. Oh, thanks: look like I understand now what can be done here and there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1395418679 From dholmes at openjdk.org Thu Nov 16 09:51:34 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 16 Nov 2023 09:51:34 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is create for one JavaThread associated with attached native thread [v2] In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: <1sURlXfpTlEu9U30aZAojUIhDgtzeyA8MYrJ_q3xDUs=.bda6a027-ebc8-4107-a1f7-be3edf737e5f@github.com> On Tue, 14 Nov 2023 20:57:09 GMT, Jiangli Zhou wrote: >> Please review JvmtiThreadState::state_for_while_locked change to handle the state->get_thread_oop() null case. Please see https://bugs.openjdk.org/browse/JDK-8319935 for details. > > Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: > > Don't try to setup_jvmti_thread_state for obj allocation sampling if the current thread is attaching from native and is allocating the thread oop. That's to make sure we don't create a 'partial' JvmtiThreadState. src/hotspot/share/prims/jvmtiThreadState.inline.hpp line 87: > 85: // Don't add a JvmtiThreadState to a thread that is exiting. > 86: return nullptr; > 87: } I'm wondering if there should also be an `is_jni_attaching` check here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16642#discussion_r1395431239 From aboldtch at openjdk.org Thu Nov 16 10:04:11 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 16 Nov 2023 10:04:11 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v4] In-Reply-To: References: Message-ID: > LM_LIGHTWEIGHT only uses the lock bits for its locking. This leaves the hashCode bits free when a monitor is not inflated. So instead of inflating when installing the hashCode on a fast locked object it can simply use the hashCode bits in the markWord. > > The mark word transitions Unlocked (0b01) <=> Locked (0b00) are done by retrying the CAS if it fails due to non-lock bit changes. > The mark word transitions Monitor (0b10) <=> Locked/Unlocked (0b0X) are the same as before, inflation already handles hash codes. This change does not interact with the mark word if it is in a Monitor (0b10) state, so the strong CAS which is used for deflation are still valid, and will not fail to any other reason than the cooperative race to help transition the mark word during deflation. > > This is dependent on JDK-8319778 simply because JDK-8319797 is dependent on both this and JDK-8319778. Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Move is_lock_owned closer to its only use ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16603/files - new: https://git.openjdk.org/jdk/pull/16603/files/e6055689..eac6d691 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16603&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16603&range=02-03 Stats: 14 lines in 1 file changed: 7 ins; 7 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16603.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16603/head:pull/16603 PR: https://git.openjdk.org/jdk/pull/16603 From aboldtch at openjdk.org Thu Nov 16 10:05:51 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 16 Nov 2023 10:05:51 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v4] In-Reply-To: References: <2MRTHFoYSaSW2NH922LOEvqKx4NLjshWaHJaYV2RdVY=.e234046a-aac8-4d7b-81b9-269506944165@github.com> Message-ID: On Wed, 15 Nov 2023 17:22:01 GMT, Coleen Phillimore wrote: >> src/hotspot/share/runtime/synchronizer.cpp line 966: >> >>> 964: // Fall thru so we only have one place that installs the hash in >>> 965: // the ObjectMonitor. >>> 966: } else if (LockingMode == LM_LIGHTWEIGHT && mark.is_fast_locked() && is_lock_owned(current, obj)) { >> >> You can delete the is_lock_owned() function just above this with this patch. >> >> Edit: nope, never mind, I just found another caller. edit2: but you could move it down to before 'inflate' > > Also I think the comment before this function is wrong. We can inflate for deoptimization. > but you could move it down to before 'inflate' Done. > Also I think the comment before this function is wrong. We can inflate for deoptimization. The comment on `is_lock_owned`? For relock_objects `current` is the deoptee's thread which is a JavaThread. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16603#discussion_r1395449578 From thartmann at openjdk.org Thu Nov 16 10:07:39 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 16 Nov 2023 10:07:39 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v4] In-Reply-To: <_UjSprG43BPmeY-K7ykBQnuoZ0mAphNzJtKgCDSStbY=.bde70903-98af-4375-b85e-a7ad0f5f16a7@github.com> References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> <8cuxVznAEsVV5haAaBg0aew7QOc9-iFBMLGFvpuUPtM=.8acbbd32-2000-4e2b-bbbe-b73e6b06839e@github.com> <_UjSprG43BPmeY-K7ykBQnuoZ0mAphNzJtKgCDSStbY=.bde70903-98af-4375-b85e-a7ad0f5f16a7@github.com> Message-ID: On Thu, 16 Nov 2023 09:31:57 GMT, Claes Redestad wrote: >>> This cause a crash if I run with -XX:UseAVX=3 -XX:AVX3Threshold=0: >> >> Good catch! Do we have a test for that scenario? If not, one should be added. >> >>> Alternatively [...] >> >> I would suggest to use `jmp(copy_chars_loop)` here for consistency with surrounding code. >> >>> More generally I do wonder if it'd make most sense to make the AVX512 and SSE42 implementations exclusive >> >> We don't mix AVX and SSE code here, right? We just fall back to SSE42 when AVX512 is not available or when the remaining length is below a threshold. Are you suggesting to better encapsulate both implementations (for example, factor them out into separate methods) or only ever using one? > > No, we don't mix: the SSE code is used as fallback only when the length is below 32 (if length is above 32 we check the tail with AVX code by shifting). > > I would suggest factoring out so that the implementations don't mix as much, mainly to reduce the number of possible variants to test and not to constrain one too much with the design of the other. We now have AVX3-only, AVX3+SSE, SSE-only and plain, and I suggest dropping AVX3+SSE and fixing the AVX3-only so that it more efficiently handles strings of length 16-31 by duplicating (or using AVX instructions for copying 16 and 8 chars at once. Some code duplication perhaps, but simpler flow through each variant. That seems reasonable but would be out of scope for this RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1395452513 From dholmes at openjdk.org Thu Nov 16 10:22:47 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 16 Nov 2023 10:22:47 GMT Subject: RFR: 8319961: JvmtiEnvBase doesn't zero _ext_event_callbacks [v2] In-Reply-To: References: Message-ID: On Tue, 14 Nov 2023 08:18:41 GMT, Roman Marchenko wrote: >> Zero'ing memory of extension event callbacks > > Roman Marchenko has updated the pull request incrementally with one additional commit since the last revision: > > Fixing review comments Also see: https://openjdk.org/guide/index.html#hotspot-development and the definition of "trivial" in the Glossary. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16647#issuecomment-1814158850 From sspitsyn at openjdk.org Thu Nov 16 10:40:52 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 16 Nov 2023 10:40:52 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v7] In-Reply-To: References: Message-ID: > The handshakes support for virtual threads is needed to simplify the JVMTI implementation for virtual threads. There is a significant duplication in the JVMTI code to differentiate code intended to support platform, virtual threads or both. The handshakes are unified, so it is enough to define just one handshake for both platform and virtual thread. > At the low level, the JVMTI code supporting platform and virtual threads still can be different. > This implementation is based on the `JvmtiVTMSTransitionDisabler` class. > > The internal API includes two new classes: > - `JvmtiHandshake` and `JvmtiUnifiedHandshakeClosure` > > The `JvmtiUnifiedHandshakeClosure` defines two different callback functions: `do_thread()` and `do_vthread()`. > > The first JVMTI functions are picked first to be converted to use the `JvmtiHandshake`: > - `GetStackTrace`, `GetFrameCount`, `GetFrameLocation`, `NotifyFramePop` > > To get the test results clean, the update also fixes the test issue: > [8318631](https://bugs.openjdk.org/browse/JDK-8318631): GetStackTraceSuspendedStressTest.java failed with "check_jvmti_status: JVMTI function returned error: JVMTI_ERROR_THREAD_NOT_ALIVE (15)" > > Testing: > - the mach5 tiers 1-6 are all passed Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge - addressed review: (1) removed is_vthread_alive checks; (2) reverted one update - addressed review: added check for jdk_internal_vm_Continuation::done(cont) - review: get rid of the VM_HandshakeUnmountedVirtualThread - remove unneeded ResourceMark from JVMTI GetStackTrace - address review: remove fix in libGetStackTraceSuspendedStress.cpp - addressed initial minor review comments - 8319244: implement JVMTI handshakes support for virtual threads ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16460/files - new: https://git.openjdk.org/jdk/pull/16460/files/a526bd60..2df63547 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16460&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16460&range=05-06 Stats: 634349 lines in 1310 files changed: 89706 ins; 480412 del; 64231 mod Patch: https://git.openjdk.org/jdk/pull/16460.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16460/head:pull/16460 PR: https://git.openjdk.org/jdk/pull/16460 From lucy at openjdk.org Thu Nov 16 10:42:32 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 16 Nov 2023 10:42:32 GMT Subject: RFR: JDK-8319927: Log that IEEE rounding mode was corrupted by loading a library In-Reply-To: References: Message-ID: <9Y70EelkjDzjBrMsr4t_smoZoXDAQUMA-mVcb_tbtdE=.5d999b48-c164-4136-bed2-3c5bb1e70702@github.com> On Fri, 10 Nov 2023 16:06:18 GMT, Matthias Baesken wrote: > [JDK-8295159](https://bugs.openjdk.org/browse/JDK-8295159) added some IEEE conformance checks and corrections on Linux and macOS/BSD , however in case of issues no logging is done, this should be improved. LGTM. The supplemental logging may prove helpful. Thanks for adding. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16618#pullrequestreview-1734061299 From redestad at openjdk.org Thu Nov 16 10:57:36 2023 From: redestad at openjdk.org (Claes Redestad) Date: Thu, 16 Nov 2023 10:57:36 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v4] In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> <8cuxVznAEsVV5haAaBg0aew7QOc9-iFBMLGFvpuUPtM=.8acbbd32-2000-4e2b-bbbe-b73e6b06839e@github.com> <_UjSprG43BPmeY-K7ykBQnuoZ0mAphNzJtKgCDSStbY=.bde70903-98af-4375-b85e-a7ad0f5f16a7@github.com> Message-ID: On Thu, 16 Nov 2023 10:05:14 GMT, Tobias Hartmann wrote: >> No, we don't mix: the SSE code is used as fallback only when the length is below 32 (if length is above 32 we check the tail with AVX code by shifting). >> >> I would suggest factoring out so that the implementations don't mix as much, mainly to reduce the number of possible variants to test and not to constrain one too much with the design of the other. We now have AVX3-only, AVX3+SSE, SSE-only and plain, and I suggest dropping AVX3+SSE and fixing the AVX3-only so that it more efficiently handles strings of length 16-31 by duplicating (or using AVX instructions for copying 16 and 8 chars at once. Some code duplication perhaps, but simpler flow through each variant. > > That seems reasonable but would be out of scope for this RFE. Yes, for now we mainly need to make sure this works. There are a few regressions in microbenchmarks that I'm trying to get on top of, and the x64 intrinsics in particular seem problematic, but it seems reasonable to not hold up this PR and work on such improvements separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1395517581 From sspitsyn at openjdk.org Thu Nov 16 11:20:42 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 16 Nov 2023 11:20:42 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 Message-ID: This is a fix of a performance/scalability related issue. The `JvmtiThreadState` objects for virtual thread filtered events enabled globally are created eagerly because it is needed when the `interp_only_mode` is enabled. Otherwise, some events which are generated in `interp_only_mode` from the debugging version of interpreter chunks can be missed. However, it has to be okay to avoid eager creation of these object if no `interp_only_mode` has ever been requested. It seems to be an extremely important optimization to create JvmtiThreadState objects lazily in such cases. It is done by introducing the flag `JvmtiThreadState::_seen_interp_only_mode` which indicates when the `JvmtiThreadState` objects have to be created eagerly. Additionally, the fix includes the following related changes: - Use condition double checking idiom for `MutexLocker mu(JvmtiThreadState_lock)` in the function `JvmtiVTMSTransitionDisabler::VTMS_mount_end` which is on a performance-critical path and looks like this: JvmtiThreadState* state = thread->jvmti_thread_state(); if (state != nullptr && state->is_pending_interp_only_mode()) { MutexLocker mu(JvmtiThreadState_lock); state = thread->jvmti_thread_state(); if (state != nullptr && state->is_pending_interp_only_mode()) { JvmtiEventController::enter_interp_only_mode(); } } - Add extra check of `JvmtiExport::can_support_virtual_threads()` when virtual thread mount and unmount are posted. - Minor: Added a `ThreadsListHandle` to the `JvmtiEventControllerPrivate::enter_interp_only_mode`. It is needed because of the dynamic creation of compensating carrier threads which is racy for JVMTI `SetEventNotificationMode` implementation. Performance mesurements: - Without this fix the test provided by the bug submitter gives execution numbers: - no ClassLoad events enabled: 3251 ms - ClassLoad events are enabled: 40534 ms - With the fix: - no ClassLoad events enabled: 3270 ms - ClassLoad events are enabled: 3385 ms Testing: - Ran mach5 tiers 1-6, no regressions are noticed ------------- Commit messages: - 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 Changes: https://git.openjdk.org/jdk/pull/16686/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16686&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308614 Stats: 37 lines in 3 files changed: 24 ins; 4 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/16686.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16686/head:pull/16686 PR: https://git.openjdk.org/jdk/pull/16686 From aboldtch at openjdk.org Thu Nov 16 12:02:31 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 16 Nov 2023 12:02:31 GMT Subject: RFR: 8319801: Recursive lightweight locking: aarch64 implementation In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 09:57:40 GMT, Andrew Haley wrote: > Hmm. Which hardware is this? This is stuff I need to be aware of. Please contact me off-line if it's hard to say in public. This has been observed with different versions of the Apple M1 processors. To clarify, when I say contention I am referring to java monitor contention, that is, multiple threads are trying to lock the same object. The performance is particularly bad if the LSE CAS fails. This pattern is something that is prevalent in the un-contended inflated recursive lock. In the current implementation this is still an issue, but as we are removing most of the common reason why a un-contended lock gets inflated we should not see this as often. We have at some point also had some code which improves this (e.g. https://github.com/xmas92/jdk/blob/3150426b261bfceacdceda1b2ebccd82b6e6fb41/src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp#L162-L167 ) But I did not want to also change the inflated lock / unlock paths in this PR. We also have had tried different recursive lightweight unlock paths, some where avoiding the LSE CAS has been more important. In the current PR it is less important as we make decisions based on the state of the lock stack first. This avoids most of the cases of un-contended failing CASes that occur in the main line implementation. However it still seemed to be more performant on this hardware to use LL-SC pair. Here are some microbenchmarks running on an Apple M1 Pro chip. This is an extended version of the LockUnlock.java JMH micros. (Patch 3a7eb137140971f6b21ffea5dbf512300b38371a) Extended because some of the tests never get compiled because C2 bails out. (Clearly identified in the results as they are an order of magnitude worse).
Base(c80e691adf6f9ac1a41b2329ce366710e604e34e) Legacy -UseLSE
  Benchmark                                           (innerCount)  Mode  Cnt      Score     Error  Units
  LockUnlock.testContendedLock                                 100  avgt    4     77,003 ?   7,558  ns/op
  LockUnlock.testMonitorRecursiveLockUnlock                    100  avgt    4   1280,276 ?  11,565  ns/op
  LockUnlock.testMonitorRecursiveLockUnlockLocal               100  avgt    4  16525,732 ? 222,518  ns/op
  LockUnlock.testMonitorRecursiveOnlyLockUnlock                100  avgt    4    602,364 ?  18,365  ns/op
  LockUnlock.testMonitorRecursiveOnlyLockUnlockLocal           100  avgt    4   8984,140 ? 389,655  ns/op
  LockUnlock.testRecursiveLockUnlock                           100  avgt    4   1804,546 ?  14,954  ns/op
  LockUnlock.testRecursiveLockUnlockLocal                      100  avgt    4   3504,367 ?  48,076  ns/op
  LockUnlock.testRecursiveSynchronization                      100  avgt    4     40,477 ?  11,088  ns/op
  LockUnlock.testSerialLockUnlock                              100  avgt    4   2275,810 ? 222,888  ns/op
  LockUnlock.testSerialLockUnlockLocal                         100  avgt    4   1135,063 ?   9,118  ns/op
  LockUnlock.testSimpleLockUnlock                              100  avgt    4   1130,178 ?  58,801  ns/op
  LockUnlock.testSimpleLockUnlockLocal                         100  avgt    4   1134,359 ?   8,701  ns/op
  
Base(c80e691adf6f9ac1a41b2329ce366710e604e34e) Legacy +UseLSE
  Benchmark                                           (innerCount)  Mode  Cnt      Score      Error  Units
  LockUnlock.testContendedLock                                 100  avgt    4     52,511 ?   24,029  ns/op
  LockUnlock.testMonitorRecursiveLockUnlock                    100  avgt    4   2473,421 ?  117,243  ns/op
  LockUnlock.testMonitorRecursiveLockUnlockLocal               100  avgt    4  22371,364 ? 1364,761  ns/op
  LockUnlock.testMonitorRecursiveOnlyLockUnlock                100  avgt    4   1106,888 ?   26,179  ns/op
  LockUnlock.testMonitorRecursiveOnlyLockUnlockLocal           100  avgt    4  12081,724 ?  793,498  ns/op
  LockUnlock.testRecursiveLockUnlock                           100  avgt    4   3265,306 ?  214,527  ns/op
  LockUnlock.testRecursiveLockUnlockLocal                      100  avgt    4   3586,900 ?  165,551  ns/op
  LockUnlock.testRecursiveSynchronization                      100  avgt    4     88,162 ?    3,763  ns/op
  LockUnlock.testSerialLockUnlock                              100  avgt    4   1891,455 ?   67,336  ns/op
  LockUnlock.testSerialLockUnlockLocal                         100  avgt    4    943,267 ?   39,638  ns/op
  LockUnlock.testSimpleLockUnlock                              100  avgt    4    958,670 ?   24,282  ns/op
  LockUnlock.testSimpleLockUnlockLocal                         100  avgt    4    930,944 ?   13,019  ns/op
  
Base(c80e691adf6f9ac1a41b2329ce366710e604e34e) Lightweight -UseLSE
  Benchmark                                           (innerCount)  Mode  Cnt      Score     Error  Units
  LockUnlock.testContendedLock                                 100  avgt    4     51,767 ?   1,708  ns/op
  LockUnlock.testMonitorRecursiveLockUnlock                    100  avgt    4   1320,017 ?  12,844  ns/op
  LockUnlock.testMonitorRecursiveLockUnlockLocal               100  avgt    4  15297,789 ? 538,970  ns/op
  LockUnlock.testMonitorRecursiveOnlyLockUnlock                100  avgt    4    599,823 ?  13,903  ns/op
  LockUnlock.testMonitorRecursiveOnlyLockUnlockLocal           100  avgt    4   8181,012 ? 266,438  ns/op
  LockUnlock.testRecursiveLockUnlock                           100  avgt    4   1285,344 ?   9,739  ns/op
  LockUnlock.testRecursiveLockUnlockLocal                      100  avgt    4  15249,363 ?  59,621  ns/op
  LockUnlock.testRecursiveSynchronization                      100  avgt    4     33,060 ?   0,260  ns/op
  LockUnlock.testSerialLockUnlock                              100  avgt    4   2550,867 ?  32,597  ns/op
  LockUnlock.testSerialLockUnlockLocal                         100  avgt    4   1274,052 ?   6,240  ns/op
  LockUnlock.testSimpleLockUnlock                              100  avgt    4   1286,234 ?  65,275  ns/op
  LockUnlock.testSimpleLockUnlockLocal                         100  avgt    4   1278,423 ?  11,065  ns/op
  
Base(c80e691adf6f9ac1a41b2329ce366710e604e34e) Lightweight +UseLSE
  Benchmark                                           (innerCount)  Mode  Cnt      Score     Error  Units
  LockUnlock.testContendedLock                                 100  avgt    4     93,536 ?   2,062  ns/op
  LockUnlock.testMonitorRecursiveLockUnlock                    100  avgt    4   2993,243 ?  49,181  ns/op
  LockUnlock.testMonitorRecursiveLockUnlockLocal               100  avgt    4  16840,772 ?  86,835  ns/op
  LockUnlock.testMonitorRecursiveOnlyLockUnlock                100  avgt    4   1949,685 ?  10,739  ns/op
  LockUnlock.testMonitorRecursiveOnlyLockUnlockLocal           100  avgt    4   8992,361 ?  42,743  ns/op
  LockUnlock.testRecursiveLockUnlock                           100  avgt    4   3129,174 ?  77,245  ns/op
  LockUnlock.testRecursiveLockUnlockLocal                      100  avgt    4  16841,642 ? 237,059  ns/op
  LockUnlock.testRecursiveSynchronization                      100  avgt    4    107,438 ?   5,077  ns/op
  LockUnlock.testSerialLockUnlock                              100  avgt    4   2657,087 ?  65,913  ns/op
  LockUnlock.testSerialLockUnlockLocal                         100  avgt    4   1328,323 ?  85,543  ns/op
  LockUnlock.testSimpleLockUnlock                              100  avgt    4   1310,857 ?  17,261  ns/op
  LockUnlock.testSimpleLockUnlockLocal                         100  avgt    4   1311,644 ?  39,859  ns/op
  
Recursive Lightweight(1e7a586c027b6c84f42f317381e6b35ebb45cea0) -UseLSE
  Benchmark                                           (innerCount)  Mode  Cnt      Score     Error  Units
  LockUnlock.testContendedLock                                 100  avgt    4     66,658 ?   4,420  ns/op
  LockUnlock.testMonitorRecursiveLockUnlock                    100  avgt    4   1288,176 ?  14,966  ns/op
  LockUnlock.testMonitorRecursiveLockUnlockLocal               100  avgt    4  15743,745 ? 293,414  ns/op
  LockUnlock.testMonitorRecursiveOnlyLockUnlock                100  avgt    4    611,030 ?   8,646  ns/op
  LockUnlock.testMonitorRecursiveOnlyLockUnlockLocal           100  avgt    4   8273,894 ?  54,006  ns/op
  LockUnlock.testRecursiveLockUnlock                           100  avgt    4    885,686 ?   5,822  ns/op
  LockUnlock.testRecursiveLockUnlockLocal                      100  avgt    4   3678,847 ?   6,472  ns/op
  LockUnlock.testRecursiveSynchronization                      100  avgt    4     38,393 ?   9,834  ns/op
  LockUnlock.testSerialLockUnlock                              100  avgt    4   1653,768 ?  10,920  ns/op
  LockUnlock.testSerialLockUnlockLocal                         100  avgt    4    829,223 ?   2,152  ns/op
  LockUnlock.testSimpleLockUnlock                              100  avgt    4    830,576 ?  24,810  ns/op
  LockUnlock.testSimpleLockUnlockLocal                         100  avgt    4    835,194 ?  66,321  ns/op
  
Recursive Lightweight(1e7a586c027b6c84f42f317381e6b35ebb45cea0) +UseLSE
  Benchmark                                           (innerCount)  Mode  Cnt      Score     Error  Units
  LockUnlock.testContendedLock                                 100  avgt    4     85,688 ?  17,538  ns/op
  LockUnlock.testMonitorRecursiveLockUnlock                    100  avgt    4   2334,429 ?  70,698  ns/op
  LockUnlock.testMonitorRecursiveLockUnlockLocal               100  avgt    4  15601,593 ? 480,278  ns/op
  LockUnlock.testMonitorRecursiveOnlyLockUnlock                100  avgt    4   1065,708 ?  10,372  ns/op
  LockUnlock.testMonitorRecursiveOnlyLockUnlockLocal           100  avgt    4   8239,642 ?  98,829  ns/op
  LockUnlock.testRecursiveLockUnlock                           100  avgt    4    885,525 ?   1,831  ns/op
  LockUnlock.testRecursiveLockUnlockLocal                      100  avgt    4   3647,819 ? 120,980  ns/op
  LockUnlock.testRecursiveSynchronization                      100  avgt    4     89,187 ?   0,787  ns/op
  LockUnlock.testSerialLockUnlock                              100  avgt    4   1661,228 ?  24,971  ns/op
  LockUnlock.testSerialLockUnlockLocal                         100  avgt    4    837,762 ?  43,297  ns/op
  LockUnlock.testSimpleLockUnlock                              100  avgt    4    829,542 ?  11,918  ns/op
  LockUnlock.testSimpleLockUnlockLocal                         100  avgt    4    828,762 ?   3,844  ns/op
  
Recursive Lightweight (+ Patch switch to CAS over LL-SC 8dbe0762b98c1427d1588795d77ea73e306d045d) -UseLSE
  Benchmark                                           (innerCount)  Mode  Cnt      Score     Error  Units
  LockUnlock.testContendedLock                                 100  avgt    4     94,994 ?  19,096  ns/op
  LockUnlock.testMonitorRecursiveLockUnlock                    100  avgt    4   1258,710 ?   4,664  ns/op
  LockUnlock.testMonitorRecursiveLockUnlockLocal               100  avgt    4  15381,962 ?  84,907  ns/op
  LockUnlock.testMonitorRecursiveOnlyLockUnlock                100  avgt    4    597,632 ?   1,807  ns/op
  LockUnlock.testMonitorRecursiveOnlyLockUnlockLocal           100  avgt    4   8212,172 ? 125,500  ns/op
  LockUnlock.testRecursiveLockUnlock                           100  avgt    4    933,620 ?  45,059  ns/op
  LockUnlock.testRecursiveLockUnlockLocal                      100  avgt    4   3631,726 ?  23,656  ns/op
  LockUnlock.testRecursiveSynchronization                      100  avgt    4     36,777 ?   0,349  ns/op
  LockUnlock.testSerialLockUnlock                              100  avgt    4   1764,221 ?   6,173  ns/op
  LockUnlock.testSerialLockUnlockLocal                         100  avgt    4    889,761 ?   1,720  ns/op
  LockUnlock.testSimpleLockUnlock                              100  avgt    4    895,285 ?   9,457  ns/op
  LockUnlock.testSimpleLockUnlockLocal                         100  avgt    4    889,444 ?   5,734  ns/op
  
Recursive Lightweight (+ Patch switch to CAS over LL-SC 8dbe0762b98c1427d1588795d77ea73e306d045d) +UseLSE
  Benchmark                                           (innerCount)  Mode  Cnt      Score     Error  Units
  LockUnlock.testContendedLock                                 100  avgt    4     74,835 ?   9,992  ns/op
  LockUnlock.testMonitorRecursiveLockUnlock                    100  avgt    4   2299,803 ?   6,954  ns/op
  LockUnlock.testMonitorRecursiveLockUnlockLocal               100  avgt    4  15452,039 ? 776,829  ns/op
  LockUnlock.testMonitorRecursiveOnlyLockUnlock                100  avgt    4   1067,769 ?   6,606  ns/op
  LockUnlock.testMonitorRecursiveOnlyLockUnlockLocal           100  avgt    4   8219,391 ?  46,559  ns/op
  LockUnlock.testRecursiveLockUnlock                           100  avgt    4    944,968 ?  57,425  ns/op
  LockUnlock.testRecursiveLockUnlockLocal                      100  avgt    4   3633,174 ?  66,667  ns/op
  LockUnlock.testRecursiveSynchronization                      100  avgt    4     88,720 ?   0,754  ns/op
  LockUnlock.testSerialLockUnlock                              100  avgt    4   1720,471 ?  58,517  ns/op
  LockUnlock.testSerialLockUnlockLocal                         100  avgt    4    885,344 ?  39,917  ns/op
  LockUnlock.testSimpleLockUnlock                              100  avgt    4    864,052 ?  35,072  ns/op
  LockUnlock.testSimpleLockUnlockLocal                         100  avgt    4    879,373 ?   5,401  ns/op
  
I agree that this should commented. And probably tracked somewhere in JBS. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16608#issuecomment-1814306544 From duke at openjdk.org Thu Nov 16 12:18:50 2023 From: duke at openjdk.org (duke) Date: Thu, 16 Nov 2023 12:18:50 GMT Subject: Withdrawn: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) In-Reply-To: References: Message-ID: On Fri, 12 May 2023 17:27:25 GMT, Roman Kennke wrote: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/13961 From sspitsyn at openjdk.org Thu Nov 16 12:41:38 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 16 Nov 2023 12:41:38 GMT Subject: RFR: 8320239: add dynamic switch for JvmtiVTMSTransitionDisabler sync protocol Message-ID: This is an update for a performance/scalability enhancement. The `JvmtiVTMSTransitionDisabler`sync protocol is on a performance critical path of the virtual threads mount state transitions (VTMS transitions). It has a noticeable performance overhead (about 10%) which contributes to the combined JVMTI performance overhead when Java apps are executed with loaded JVMTI agents. Please, also see another/related performance issue which contributes around 70% of total performance overhead: [8308614](https://bugs.openjdk.org/browse/JDK-8308614): Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 Testing: - Ran mach5 tiers 1-6 with no regressions noticed. ------------- Commit messages: - 8320239: add dynamic switch for JvmtiVTMSTransitionDisabler sync protocol Changes: https://git.openjdk.org/jdk/pull/16688/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16688&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320239 Stats: 41 lines in 2 files changed: 38 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16688.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16688/head:pull/16688 PR: https://git.openjdk.org/jdk/pull/16688 From mbaesken at openjdk.org Thu Nov 16 12:54:33 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 16 Nov 2023 12:54:33 GMT Subject: RFR: JDK-8319927: Log that IEEE rounding mode was corrupted by loading a library In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 16:06:18 GMT, Matthias Baesken wrote: > [JDK-8295159](https://bugs.openjdk.org/browse/JDK-8295159) added some IEEE conformance checks and corrections on Linux and macOS/BSD , however in case of issues no logging is done, this should be improved. Hi Goetz and Lutz, thanks for the reviews ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16618#issuecomment-1814380615 From mbaesken at openjdk.org Thu Nov 16 12:58:35 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 16 Nov 2023 12:58:35 GMT Subject: Integrated: JDK-8319927: Log that IEEE rounding mode was corrupted by loading a library In-Reply-To: References: Message-ID: <2ncON1-yGidKWosB0DcbCPzv02rt9ORRWjs7hAZz9o8=.e6ad2340-2065-4763-aa6a-fee864c93285@github.com> On Fri, 10 Nov 2023 16:06:18 GMT, Matthias Baesken wrote: > [JDK-8295159](https://bugs.openjdk.org/browse/JDK-8295159) added some IEEE conformance checks and corrections on Linux and macOS/BSD , however in case of issues no logging is done, this should be improved. This pull request has now been integrated. Changeset: 9faead14 Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/9faead1469481e268b451f2853c8fec8613426b9 Stats: 37 lines in 4 files changed: 31 ins; 0 del; 6 mod 8319927: Log that IEEE rounding mode was corrupted by loading a library Reviewed-by: goetz, lucy ------------- PR: https://git.openjdk.org/jdk/pull/16618 From ihse at openjdk.org Thu Nov 16 12:59:38 2023 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 16 Nov 2023 12:59:38 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 01:32:00 GMT, Xiaohong Gong wrote: >> Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). >> >> SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. >> >> To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. >> >> Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. >> >> [1] https://github.com/openjdk/jdk/pull/3638 >> [2] https://sleef.org/ >> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ >> [4] https://packages.debian.org/bookworm/libsleef3 >> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Add a bundled native lib in jdk as a bridge to libsleef > - Merge 'jdk:master' into JDK-8312425 > - Disable sleef by default > - Merge 'jdk:master' into JDK-8312425 > - 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF doc/building.md line 549: > 547: files. > 548: > 549: ### libsleef You will need to regenerate building.html as well. `make update-build-docs` using pandoc 2.19.2. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1395653735 From ihse at openjdk.org Thu Nov 16 13:04:36 2023 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 16 Nov 2023 13:04:36 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 01:32:00 GMT, Xiaohong Gong wrote: >> Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). >> >> SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. >> >> To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. >> >> Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. >> >> [1] https://github.com/openjdk/jdk/pull/3638 >> [2] https://sleef.org/ >> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ >> [4] https://packages.debian.org/bookworm/libsleef3 >> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Add a bundled native lib in jdk as a bridge to libsleef > - Merge 'jdk:master' into JDK-8312425 > - Disable sleef by default > - Merge 'jdk:master' into JDK-8312425 > - 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF doc/building.md line 552: > 550: > 551: libsleef, the [SIMD Library for Evaluating Elementary Functions]( > 552: https://sleef.org/) is required when building libvmath.so on Linux+AArch64 The conventional way we have refered to os/cpu combinations in the build documentation is like this: `Linux/aarch64`. I also think you need to expand a bit that this is optional, and if you do not provide libsleef, the build will succeed but without the vector performance enhancements provided by libvmath. make/autoconf/lib-vmath.m4 line 102: > 100: fi > 101: > 102: AC_SUBST(LIBSLEEF_FOUND) Do not export LIBSLEEF_FOUND. It is okay to use internally here, but you should instead export ENABLE_LIBSLEEF, using true/false (instead of yes/no). This is the way we handle all other optional components. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1395657610 PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1395659965 From adinn at openjdk.org Thu Nov 16 13:19:30 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 16 Nov 2023 13:19:30 GMT Subject: RFR: 8319973: AArch64: Save and restore FPCR in the call stub In-Reply-To: References: Message-ID: <95gOkI9zT6jJkpSPTBZlT2BejmDXWr59h1EX31qlFEo=.71aeb51c-e23f-4167-aed3-c517a17bde50@github.com> On Mon, 13 Nov 2023 18:18:35 GMT, Andrew Haley wrote: > On AArch64 we don't save and restore the default floating-point control state when we enter and leave Java code. We really should, because if we're called via the JNI invocation interface with a weird FP control state we'll not be Java compatible. All looks good. ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16637#pullrequestreview-1734351073 From ihse at openjdk.org Thu Nov 16 13:25:46 2023 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 16 Nov 2023 13:25:46 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 01:32:00 GMT, Xiaohong Gong wrote: >> Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). >> >> SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. >> >> To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. >> >> Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. >> >> [1] https://github.com/openjdk/jdk/pull/3638 >> [2] https://sleef.org/ >> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ >> [4] https://packages.debian.org/bookworm/libsleef3 >> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Add a bundled native lib in jdk as a bridge to libsleef > - Merge 'jdk:master' into JDK-8312425 > - Disable sleef by default > - Merge 'jdk:master' into JDK-8312425 > - 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF There's a lot to work with regarding the build system changes here... make/autoconf/lib-vmath.m4 line 39: > 37: LIBVMATH_CFLAGS= > 38: LIBVMATH_LIBS= > 39: There are multiple issues with this function. Please have a look at how other libraries are handled. Some remarks: 1) You always need to pair AC_MSG_CHECKING and AC_MSG_RESULT. Do not make any operations in between that can cause output. 2) If the user runs just --with-libsleef, the value will be "yes". You need to treat this, not as a path, but as a request to enable the library using default methods (like pkg-check or well known locations). make/autoconf/lib-vmath.m4 line 49: > 47: test -e ${with_libsleef}/include/sleef.h; then > 48: LIBSLEEF_FOUND=yes > 49: LIBVMATH_LIBS="-L${with_libsleef}/lib" This should be LIBSLEEF_LIBS and ...CFLAGS. make/autoconf/lib-vmath.m4 line 92: > 90: [] > 91: ) > 92: AC_MSG_RESULT([${SVE_FEATURE_SUPPORT}]) What is this test even for? I can't see any usage of SVE_FEATURE_SUPPORT outside this function. make/autoconf/libraries.m4 line 129: > 127: LIB_SETUP_LIBFFI > 128: LIB_SETUP_MISC_LIBS > 129: LIB_SETUP_VMATH The function (and file) should be named after "sleef", not "vmath". ------------- Changes requested by ihse (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16234#pullrequestreview-1734359314 PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1395684054 PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1395687129 PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1395684964 PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1395686104 From coleenp at openjdk.org Thu Nov 16 13:40:49 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 16 Nov 2023 13:40:49 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v4] In-Reply-To: References: <2MRTHFoYSaSW2NH922LOEvqKx4NLjshWaHJaYV2RdVY=.e234046a-aac8-4d7b-81b9-269506944165@github.com> Message-ID: On Thu, 16 Nov 2023 10:03:07 GMT, Axel Boldt-Christmas wrote: >> Also I think the comment before this function is wrong. We can inflate for deoptimization. > >> but you could move it down to before 'inflate' > > Done. > >> Also I think the comment before this function is wrong. We can inflate for deoptimization. > > The comment on `is_lock_owned`? For relock_objects `current` is the deoptee's thread which is a JavaThread. Yes that comment. If we don't inflate for hashcode anymore for lightweight locking mode, why would we inflate with a non-JavaThread? Is it impossible? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16603#discussion_r1395703162 From shade at openjdk.org Thu Nov 16 13:49:40 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 16 Nov 2023 13:49:40 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v11] In-Reply-To: References: Message-ID: <6vHoiTpD6QereGjUR4QJJHDuQWgc5V2jO-Ky0t5aBQk=.83fcaff6-6280-43f5-a63b-00bce26b97f8@github.com> On Wed, 15 Nov 2023 09:46:21 GMT, Stefan Karlsson wrote: >> A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > More tweaks to the test Still good. Are we waiting for some other reviews to integrate this? I have a related fix in pipeline that is blocked by this :) ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16519#pullrequestreview-1734410630 From stefank at openjdk.org Thu Nov 16 13:55:39 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 16 Nov 2023 13:55:39 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v11] In-Reply-To: <6vHoiTpD6QereGjUR4QJJHDuQWgc5V2jO-Ky0t5aBQk=.83fcaff6-6280-43f5-a63b-00bce26b97f8@github.com> References: <6vHoiTpD6QereGjUR4QJJHDuQWgc5V2jO-Ky0t5aBQk=.83fcaff6-6280-43f5-a63b-00bce26b97f8@github.com> Message-ID: On Thu, 16 Nov 2023 13:46:36 GMT, Aleksey Shipilev wrote: > Still good. Are we waiting for some other reviews to integrate this? I have a related fix in pipeline that is blocked by this :) I got internal feedback that I should seek re-reviews from the Reviewers. I think I've given enough time for more feedback, so I'll integrate now. Thanks for the nudge. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16519#issuecomment-1814476828 From aboldtch at openjdk.org Thu Nov 16 13:57:50 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 16 Nov 2023 13:57:50 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v4] In-Reply-To: References: <2MRTHFoYSaSW2NH922LOEvqKx4NLjshWaHJaYV2RdVY=.e234046a-aac8-4d7b-81b9-269506944165@github.com> Message-ID: On Thu, 16 Nov 2023 13:35:04 GMT, Coleen Phillimore wrote: >>> but you could move it down to before 'inflate' >> >> Done. >> >>> Also I think the comment before this function is wrong. We can inflate for deoptimization. >> >> The comment on `is_lock_owned`? For relock_objects `current` is the deoptee's thread which is a JavaThread. > > Yes that comment. If we don't inflate for hashcode anymore for lightweight locking mode, why would we inflate with a non-JavaThread? Is it impossible? I see what you mean. Because we need to put the hash code in the ObjectMonitor if it already is inflated and it may be the case that the ObjectMonitor has an anonymous owner we can still call `is_lock_owned` when retrieving the monitor for FastHashCode. Technically we could condition the inflate call on if the mark word lock bits and pick out the ObjectMonitor from the mark word. It would always be installed when reaching that point in the LM_LIGHTWEIGHT mode. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16603#discussion_r1395730640 From stefank at openjdk.org Thu Nov 16 14:07:18 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 16 Nov 2023 14:07:18 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v12] In-Reply-To: References: Message-ID: > A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 21 additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into 8318757_interleaved_monitor_deflation - More tweaks to the test - Tweak test - Do stuff in the synchronized block of the test - Update names in test - Remove the limit for deflation requests - Remove reinitialization in test - Update comments - Tweak the flag comment a bit - Add AsyncMonitorDeflationForThreadDumpLimit flag - ... and 11 more: https://git.openjdk.org/jdk/compare/3637348d...9e97ed90 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16519/files - new: https://git.openjdk.org/jdk/pull/16519/files/df04ca04..9e97ed90 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16519&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16519&range=10-11 Stats: 635344 lines in 1331 files changed: 90458 ins; 480441 del; 64445 mod Patch: https://git.openjdk.org/jdk/pull/16519.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16519/head:pull/16519 PR: https://git.openjdk.org/jdk/pull/16519 From stefank at openjdk.org Thu Nov 16 14:36:48 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 16 Nov 2023 14:36:48 GMT Subject: Integrated: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls In-Reply-To: References: Message-ID: <0B0cxfBR_Ta5HWDqVbC3BPWWQGbTgE2_yxU_ypl2LoQ=.b8544ba3-73cb-4d74-9e93-1ab551cdd430@github.com> On Mon, 6 Nov 2023 09:46:11 GMT, Stefan Karlsson wrote: > A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... This pull request has now been integrated. Changeset: 87be6b69 Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/87be6b69fe985eee01fc3344f9153d774db792c1 Stats: 426 lines in 11 files changed: 208 ins; 147 del; 71 mod 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls Reviewed-by: shade, aboldtch, pchilanomate, dcubed ------------- PR: https://git.openjdk.org/jdk/pull/16519 From ayang at openjdk.org Thu Nov 16 15:17:35 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 16 Nov 2023 15:17:35 GMT Subject: RFR: JDK-8234502 : Merge GenCollectedHeap and SerialHeap [v6] In-Reply-To: References: Message-ID: <9bW9Gq13fwD5HG4rS6IITvQGHBE8s0U5QrKVJa80r90=.722bda1b-2ec0-499e-9f20-cbd6bbc474aa@github.com> On Wed, 15 Nov 2023 14:32:49 GMT, Lei Zaakjyu wrote: >> JDK-8234502 : Merge GenCollectedHeap and SerialHeap > > Lei Zaakjyu has updated the pull request incrementally with one additional commit since the last revision: > > fix include statements src/hotspot/share/gc/serial/serialHeap.cpp line 94: > 92: > 93: SerialHeap::SerialHeap() : > 94: #if 0 Isn't this effectively dead code? src/hotspot/share/gc/serial/serialVMOperations.hpp line 66: > 64: > 65: > 66: #endif // SHARE_GC_SERIAL_SERIALVMOPERATIONS_HPP I think it misses a line-break for EOF, which is why this special icon shows here. (The same issue exists in some other files updated by this PR as well.) src/hotspot/share/gc/shenandoah/shenandoahMonitoringSupport.hpp line 29: > 27: > 28: #include "memory/allocation.hpp" > 29: #include "gc/shared/collectorCounters.hpp" If this is required, could it be made in its own PR? It's odd to see this PR changes files exclusively owned by other collectors. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16623#discussion_r1395875545 PR Review Comment: https://git.openjdk.org/jdk/pull/16623#discussion_r1395878314 PR Review Comment: https://git.openjdk.org/jdk/pull/16623#discussion_r1395869900 From pchilanomate at openjdk.org Thu Nov 16 15:24:30 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 16 Nov 2023 15:24:30 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v4] In-Reply-To: References: <6vZHqAagtrtDw-c9xNZNpaBCa1HrpYw22uhYtDTHwAw=.3a59dde0-ab33-43b8-a08f-d481e7acffef@github.com> <2F8ze1cLKzvTMEqwY8JJMZ9QbZUxqrMCv7nl6uFJLMI=.7c4dba0a-8c8d-4825-9670-75b76b9bf184@github.com> Message-ID: On Thu, 16 Nov 2023 07:11:03 GMT, Serguei Spitsyn wrote: >> Thank you for the suggestion. Will check it. > > I've added the check for `!jdk_internal_vm_Continuation::done(cont)` into ` JvmtiEnvBase::is_vthread_alive(oop vt)` but then decided to remove it again. This is racy for `JvmtiHandshake` execution. As you correctly stated, the change of state to `TERMINATED` happens after `notifyJvmtiUnmount()`. The target virtual thread will be blocked in the `notifyJvmtiUnmount()` because the `JvmtiVTMSTransitionDisabler` is set. This gives us a guaranty that the target virtual thread won't change its state to `TERMINATED` while a handshake is executed. But it becomes not true if we add the `!jdk_internal_vm_Continuation::done(cont)` check. > Form the other hand, absence of this check allows for target virtual thread stack to become empty (with no frames). This is a known problem but I'd prefer to attack it separately. So the problematic case I'm thinking is when the JvmtiVTMSTransitionDisabler starts after the vthread executed notifyJvmtiUnmount(), i.e the vthread is already outside the transition, but before changing the state to TERMINATED. JvmtiVTMSTransitionDisabler will proceed, and since the carrierThread field has already been cleared we will treat it as an unmounted vthread. Then we can see first that is alive in JvmtiHandshake::execute() but then we could hit the assert that is already TERMINATED in JvmtiEnvBase::get_vthread_jvf(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16460#discussion_r1395892738 From pchilanomate at openjdk.org Thu Nov 16 15:24:34 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 16 Nov 2023 15:24:34 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v4] In-Reply-To: References: <6vZHqAagtrtDw-c9xNZNpaBCa1HrpYw22uhYtDTHwAw=.3a59dde0-ab33-43b8-a08f-d481e7acffef@github.com> Message-ID: On Wed, 8 Nov 2023 15:59:16 GMT, Serguei Spitsyn wrote: >> src/hotspot/share/prims/jvmtiEnvBase.cpp line 1978: >> >>> 1976: } >>> 1977: if (target_jt == nullptr) { // unmounted virtual thread >>> 1978: hs_cl->do_vthread(target_h); // execute handshake closure callback on current thread directly >> >> I think comment should be: s/current thread/unmounted vthread > > Thank you for the comment but I'm not sure what do you mean. > If target virtual thread is unmounted we execute the hs_cl callback on current thread. Ok, I see. When I read "closure executed on" I think of the intended target thread of the closure rather than the current thread used. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16460#discussion_r1395894618 From dcubed at openjdk.org Thu Nov 16 16:21:53 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 16 Nov 2023 16:21:53 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v12] In-Reply-To: References: Message-ID: <1LwfabsRaUptiYnTucjmRptVmbW5m80hZOKxlFoLEtQ=.940ff20e-22b6-47f5-925b-8b694aeb36e8@github.com> On Thu, 16 Nov 2023 14:07:18 GMT, Stefan Karlsson wrote: >> A safepointed monitor deflation pass can run interleaved with a paused async monitor deflation pass. The code is not written to handle that situation and asserts when it finds a DEFLATER_MARKER in the owner field. @pchilano also found other issues with having to monitor deflation passes interleaved. More info below ... > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 21 additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/master' into 8318757_interleaved_monitor_deflation > - More tweaks to the test > - Tweak test > - Do stuff in the synchronized block of the test > - Update names in test > - Remove the limit for deflation requests > - Remove reinitialization in test > - Update comments > - Tweak the flag comment a bit > - Add AsyncMonitorDeflationForThreadDumpLimit flag > - ... and 11 more: https://git.openjdk.org/jdk/compare/e75ed0ec...9e97ed90 I finally got to a wide spot in the road where I could get back to doing code reviews. Sorry for the delay on this re-review. Thumbs up. I found only one minor typo that could be handled as part of some other fix. src/hotspot/share/runtime/vmOperations.cpp line 347: > 345: if (monitor->is_owner_anonymous()) { > 346: // There's no need to collect anonymous owned monitors > 347: // because the callers of this code is only interested nit typo: s/is only/are only/ ------------- PR Review: https://git.openjdk.org/jdk/pull/16519#pullrequestreview-1734855705 PR Review Comment: https://git.openjdk.org/jdk/pull/16519#discussion_r1395984320 From sspitsyn at openjdk.org Thu Nov 16 16:30:29 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 16 Nov 2023 16:30:29 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v4] In-Reply-To: References: <6vZHqAagtrtDw-c9xNZNpaBCa1HrpYw22uhYtDTHwAw=.3a59dde0-ab33-43b8-a08f-d481e7acffef@github.com> <2F8ze1cLKzvTMEqwY8JJMZ9QbZUxqrMCv7nl6uFJLMI=.7c4dba0a-8c8d-4825-9670-75b76b9bf184@github.com> Message-ID: On Thu, 16 Nov 2023 15:20:31 GMT, Patricio Chilano Mateo wrote: >> I've added the check for `!jdk_internal_vm_Continuation::done(cont)` into ` JvmtiEnvBase::is_vthread_alive(oop vt)` but then decided to remove it again. This is racy for `JvmtiHandshake` execution. As you correctly stated, the change of state to `TERMINATED` happens after `notifyJvmtiUnmount()`. The target virtual thread will be blocked in the `notifyJvmtiUnmount()` because the `JvmtiVTMSTransitionDisabler` is set. This gives us a guaranty that the target virtual thread won't change its state to `TERMINATED` while a handshake is executed. But it becomes not true if we add the `!jdk_internal_vm_Continuation::done(cont)` check. >> Form the other hand, absence of this check allows for target virtual thread stack to become empty (with no frames). This is a known problem but I'd prefer to attack it separately. > > So the problematic case I'm thinking is when the JvmtiVTMSTransitionDisabler starts after the vthread executed notifyJvmtiUnmount(), i.e the vthread is already outside the transition, but before changing the state to TERMINATED. JvmtiVTMSTransitionDisabler will proceed, and since the carrierThread field has already been cleared we will treat it as an unmounted vthread. Then we can see first that is alive in JvmtiHandshake::execute() but then we could hit the assert that is already TERMINATED in JvmtiEnvBase::get_vthread_jvf(). Thanks! This is a valid concern. Will try to fix this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16460#discussion_r1396000487 From rriggs at openjdk.org Thu Nov 16 16:48:00 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Thu, 16 Nov 2023 16:48:00 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v6] In-Reply-To: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: > Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. > The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. > > The changes include: > > - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate > if the input array is modified before the constructor returns. > The resulting string may contain any combination of characters sampled from the input array. > > - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. > For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. > If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. > If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. > Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. > Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. > > - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. > > - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. Roger Riggs has updated the pull request incrementally with one additional commit since the last revision: Additional corrections from review comments, dropped informational output ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16425/files - new: https://git.openjdk.org/jdk/pull/16425/files/b84d09db..5dda14c4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16425&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16425&range=04-05 Stats: 17 lines in 1 file changed: 5 ins; 9 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16425/head:pull/16425 PR: https://git.openjdk.org/jdk/pull/16425 From duke at openjdk.org Thu Nov 16 16:49:54 2023 From: duke at openjdk.org (Lei Zaakjyu) Date: Thu, 16 Nov 2023 16:49:54 GMT Subject: RFR: JDK-8234502 : Merge GenCollectedHeap and SerialHeap [v7] In-Reply-To: References: Message-ID: > JDK-8234502 : Merge GenCollectedHeap and SerialHeap Lei Zaakjyu has updated the pull request incrementally with three additional commits since the last revision: - replace a necessary include statement - clean up - add line-breaks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16623/files - new: https://git.openjdk.org/jdk/pull/16623/files/1202a7bd..0563686a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16623&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16623&range=05-06 Stats: 10 lines in 5 files changed: 1 ins; 6 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16623.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16623/head:pull/16623 PR: https://git.openjdk.org/jdk/pull/16623 From lmesnik at openjdk.org Thu Nov 16 17:16:32 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 16 Nov 2023 17:16:32 GMT Subject: RFR: 8320239: add dynamic switch for JvmtiVTMSTransitionDisabler sync protocol In-Reply-To: References: Message-ID: On Thu, 16 Nov 2023 12:35:08 GMT, Serguei Spitsyn wrote: > This is an update for a performance/scalability enhancement. > > The `JvmtiVTMSTransitionDisabler`sync protocol is on a performance critical path of the virtual threads mount state transitions (VTMS transitions). It has a noticeable performance overhead (about 10%) which contributes to the combined JVMTI performance overhead when Java apps are executed with loaded JVMTI agents. > > Please, also see another/related performance issue which contributes around 70% of total performance overhead: > [8308614](https://bugs.openjdk.org/browse/JDK-8308614): Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 > > Testing: > - Ran mach5 tiers 1-6 with no regressions noticed. src/hotspot/share/prims/jvmtiThreadState.cpp line 430: > 428: assert(!thread->is_in_VTMS_transition(), "VTMS_transition sanity check"); > 429: thread->set_is_in_VTMS_transition(true); > 430: java_lang_Thread::set_is_in_VTMS_transition(vt, true); indentation is incorrect. src/hotspot/share/prims/jvmtiThreadState.hpp line 86: > 84: static volatile bool _SR_mode; // there is an active suspender or resumer > 85: static volatile int _VTMS_transition_count; // current number of VTMS transitions > 86: static int _sync_protocol_enabled_count; // current number of JvmtiVTMSTransitionDisablers enabled sync protocol The _sync_protocol_enabled_count and _sync_protocol_enabled_permanently are read/updated in different threads. How access to them is protected from racing? Might be make sense to add this info in comment? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16688#discussion_r1396061383 PR Review Comment: https://git.openjdk.org/jdk/pull/16688#discussion_r1396071674 From sspitsyn at openjdk.org Thu Nov 16 17:27:31 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 16 Nov 2023 17:27:31 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v4] In-Reply-To: References: <6vZHqAagtrtDw-c9xNZNpaBCa1HrpYw22uhYtDTHwAw=.3a59dde0-ab33-43b8-a08f-d481e7acffef@github.com> <2F8ze1cLKzvTMEqwY8JJMZ9QbZUxqrMCv7nl6uFJLMI=.7c4dba0a-8c8d-4825-9670-75b76b9bf184@github.com> Message-ID: On Thu, 16 Nov 2023 16:27:23 GMT, Serguei Spitsyn wrote: >> So the problematic case I'm thinking is when the JvmtiVTMSTransitionDisabler starts after the vthread executed notifyJvmtiUnmount(), i.e the vthread is already outside the transition, but before changing the state to TERMINATED. JvmtiVTMSTransitionDisabler will proceed, and since the carrierThread field has already been cleared we will treat it as an unmounted vthread. Then we can see first that is alive in JvmtiHandshake::execute() but then we could hit the assert that is already TERMINATED in JvmtiEnvBase::get_vthread_jvf(). > > Thanks! This is a valid concern. Will try to fix this. I'm suggesting to fix it this way for the unmounted case only: @@ -1976,6 +1976,13 @@ JvmtiHandshake::execute(JvmtiUnitedHandshakeClosure* hs_cl, ThreadsListHandle* t return; } if (target_jt == nullptr) { // unmounted virtual thread + // JvmtiVTMSTransitionDisabler can start after the vthread executed notifyJvmtiUnmount(), i.e. + // the vthread is already outside the transition, but before changing the state to TERMINATED. + // Changing the state to TERMINATED is racy, so we check if the continuation is done in advance. + oop cont = java_lang_VirtualThread::continuation(target_h()); + if (jdk_internal_vm_Continuation::done(cont)) { + return; + } hs_cl->do_vthread(target_h); // execute handshake closure callback on current thread directly } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16460#discussion_r1396084391 From mli at openjdk.org Thu Nov 16 17:31:40 2023 From: mli at openjdk.org (Hamlin Li) Date: Thu, 16 Nov 2023 17:31:40 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v3] In-Reply-To: <0T5rzQVycnjhsaQN4NIY8XMM-a-JwDAojLc7dENbetI=.424ac5f2-2748-4e6b-a70d-34ade9cd8e81@github.com> References: <0T5rzQVycnjhsaQN4NIY8XMM-a-JwDAojLc7dENbetI=.424ac5f2-2748-4e6b-a70d-34ade9cd8e81@github.com> Message-ID: On Thu, 16 Nov 2023 09:38:08 GMT, Yuri Gaevsky wrote: >> Hello All, >> >> Please review these changes to support _vectorizedHashCode intrinsic on >> RISC-V platform. The patch adds the "scalar" code for the intrinsic without >> usage of any RVV instruction but provides manual unrolling of the appropriate >> loop. The code with usage of RVV instruction could be added as follow-up of >> the patch or independently. >> >> Thanks, >> -Yuri Gaevsky >> >> P.S. My OCA has been accepted recently (ygaevsky). >> >> ### Correctness checks >> >> Testing: tier1 tests successfully passed on a RISC-V StarFive JH7110 board with Linux. >> >> ### Performance results (the numbers for non-ints are similar) >> >> #### StarFive JH7110 board: >> >> >> ArraysHashCode: without intrinsic with intrinsic >> ------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> ------------------------------------------------------------------------------- >> multiints 0 avgt 30 2.658 ? 0.001 2.661 ? 0.004 ns/op >> multiints 1 avgt 30 4.881 ? 0.011 4.892 ? 0.015 ns/op >> multiints 2 avgt 30 16.109 ? 0.041 10.451 ? 0.075 ns/op >> multiints 3 avgt 30 14.873 ? 0.068 11.753 ? 0.024 ns/op >> multiints 4 avgt 30 17.283 ? 0.078 13.176 ? 0.044 ns/op >> multiints 5 avgt 30 19.691 ? 0.136 14.723 ? 0.046 ns/op >> multiints 6 avgt 30 21.727 ? 0.166 15.463 ? 0.124 ns/op >> multiints 7 avgt 30 23.790 ? 0.126 18.298 ? 0.059 ns/op >> multiints 8 avgt 30 23.527 ? 0.116 18.267 ? 0.046 ns/op >> multiints 9 avgt 30 27.981 ? 0.303 20.453 ? 0.069 ns/op >> multiints 10 avgt 30 26.947 ? 0.215 20.541 ? 0.051 ns/op >> multiints 50 avgt 30 95.373 ? 0.588 69.238 ? 0.208 ns/op >> multiints 100 avgt 30 177.109 ? 0.525 137.852 ? 0.417 ns/op >> multiints 200 avgt 30 341.074 ? 1.363 296.832 ? 0.725 ns/op >> multiints 500 avgt 30 847.993 ? 1.713 752.415 ? 1.918 ns/op >> multiints 1000 avgt 30 1610.199 ? 5.424 1426.112 ? 3.407 ns/op >> multiints 10000 avgt 30 16234.260 ? 26.789 14447.936 ? 26.345 ns/op >> multiints 100000 avgt 30 170726.025 ? 184.003 152587.649 ? 381.964 ns/op >> ---------------------------------------... > > Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: > > Addressed most of suggestions for code improvements from @Hamlin-Li src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1492: > 1490: beqz(cnt, DONE); > 1491: > 1492: lw(pow31_2, ExternalAddress(StubRoutines::riscv::arrays_hashcode_powers_of_31() Now you don't need this `lw` anymore, as 961 will fit in an immediate, so `mv pow31_2, 961` should be fine. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1512: > 1510: + 0 * sizeof(jint))); // [31^^3:31^^4] > 1511: > 1512: bind(WIDE_LOOP); Seems in this loop, tmp3 and tmp1 can share one register. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1515: > 1513: mulw(result, result, pow31_3_4); // 31^^4 * h > 1514: DO_ELEMENT_LOAD(tmp1, 0); > 1515: srli(tmp2, pow31_3_4, 32); tmp2 can be calculated outside of the loop. src/hotspot/cpu/riscv/stubRoutines_riscv.cpp line 62: > 60: 923521, // 0x000E1781 > 61: 29791, // 0x0000745F > 62: 961, // 0x000003C1 based on the comment above about `pow31_2 `, line 62-64 can be removed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1396054423 PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1396069114 PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1396071131 PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1396057735 From mli at openjdk.org Thu Nov 16 17:31:45 2023 From: mli at openjdk.org (Hamlin Li) Date: Thu, 16 Nov 2023 17:31:45 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v2] In-Reply-To: References: <9i5yHmpRi3-XqL5lw0-0IexhCDr2FOi5nT4dgY7cWao=.ab8a1d6e-c9fc-4108-820b-374ce7815463@github.com> <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> Message-ID: On Thu, 16 Nov 2023 09:06:05 GMT, Yuri Gaevsky wrote: >> src/hotspot/cpu/riscv/riscv.ad line 10306: >> >>> 10304: >>> 10305: >>> 10306: instruct arrays_hashcode(iRegP_R11 ary, iRegI_R12 cnt, iRegI_R10 result, immI basic_type, >> >> Is it necessary to specify the regs(r11/12/10) here? > > I've just "borrowed" those definitions from other intrinsics around. Do you think we can improve this with iRegP/iRegI? Seems to me it's not necessary to specify the registers. Can you try it? >> src/hotspot/cpu/riscv/riscv.ad line 10312: >> >>> 10310: match(Set result (VectorizedHashCode (Binary ary cnt) (Binary result basic_type))); >>> 10311: effect(TEMP tmp1, TEMP tmp2, TEMP tmp3, TEMP tmp4, TEMP tmp5, TEMP tmp6, >>> 10312: USE_KILL ary, USE_KILL cnt, USE basic_type, KILL cr); >> >> should `TEMP_DEF result` be added here? > > Hmm, addition of TEMP_DEF result makes the bencmark results even worse tha without intrinsic (I haven't look at the generated assembler though). What specific tests were run for this intrinsic implementation to verify the correctness? BTW, can you add some comments about what java method or bytecode this intrinsic is for? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1396077210 PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1396087016 From sspitsyn at openjdk.org Thu Nov 16 17:45:36 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 16 Nov 2023 17:45:36 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v4] In-Reply-To: References: <6vZHqAagtrtDw-c9xNZNpaBCa1HrpYw22uhYtDTHwAw=.3a59dde0-ab33-43b8-a08f-d481e7acffef@github.com> Message-ID: On Wed, 8 Nov 2023 16:02:08 GMT, Serguei Spitsyn wrote: >> src/hotspot/share/prims/jvmtiEnvBase.cpp line 2416: >> >>> 2414: if (!JvmtiEnvBase::is_vthread_alive(_target_h())) { >>> 2415: return; // JVMTI_ERROR_THREAD_NOT_ALIVE (default) >>> 2416: } >> >> Don't we have this check already in JvmtiHandshake::execute()? Same with the other converted functions. > > Good suggestion, thanks. > I'm a little bit paranoid about terminated vthreads. :) > Will try to get rid of it and retest all tiers. I've removed the extra checks for `JvmtiEnvBase::is_vthread_alive()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16460#discussion_r1396115170 From dcubed at openjdk.org Thu Nov 16 18:20:57 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 16 Nov 2023 18:20:57 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v4] In-Reply-To: References: Message-ID: <-DODxJdHO2qs-XXVSQSSIZZZKIfHjHKtY8kt9PpNWVs=.82a739dc-0da3-4bbc-b0de-c00ebae56c22@github.com> On Thu, 16 Nov 2023 10:04:11 GMT, Axel Boldt-Christmas wrote: >> LM_LIGHTWEIGHT only uses the lock bits for its locking. This leaves the hashCode bits free when a monitor is not inflated. So instead of inflating when installing the hashCode on a fast locked object it can simply use the hashCode bits in the markWord. >> >> The mark word transitions Unlocked (0b01) <=> Locked (0b00) are done by retrying the CAS if it fails due to non-lock bit changes. >> The mark word transitions Monitor (0b10) <=> Locked/Unlocked (0b0X) are the same as before, inflation already handles hash codes. This change does not interact with the mark word if it is in a Monitor (0b10) state, so the strong CAS which is used for deflation are still valid, and will not fail to any other reason than the cooperative race to help transition the mark word during deflation. >> >> This is dependent on JDK-8319778 simply because JDK-8319797 is dependent on both this and JDK-8319778. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Move is_lock_owned closer to its only use src/hotspot/share/runtime/synchronizer.cpp line 521: > 519: const markWord new_mark = mark.set_fast_locked(); > 520: const markWord old_mark = mark; > 521: mark = obj()->cas_set_mark(new_mark, old_mark); I'm having trouble seeing how this change is related to hash codes. The previous code did not loop and if the calling thread's attempt to lightweight lock the object lost a race, then we simply fell thru down to the inflate-enter loop... I don't see any explanation for this change in the bug report or in the PR anywhere. Perhaps I'll figure it out as I reason thru the changes... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16603#discussion_r1396154763 From dcubed at openjdk.org Thu Nov 16 18:27:55 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 16 Nov 2023 18:27:55 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v4] In-Reply-To: <-DODxJdHO2qs-XXVSQSSIZZZKIfHjHKtY8kt9PpNWVs=.82a739dc-0da3-4bbc-b0de-c00ebae56c22@github.com> References: <-DODxJdHO2qs-XXVSQSSIZZZKIfHjHKtY8kt9PpNWVs=.82a739dc-0da3-4bbc-b0de-c00ebae56c22@github.com> Message-ID: On Thu, 16 Nov 2023 18:17:34 GMT, Daniel D. Daugherty wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Move is_lock_owned closer to its only use > > src/hotspot/share/runtime/synchronizer.cpp line 521: > >> 519: const markWord new_mark = mark.set_fast_locked(); >> 520: const markWord old_mark = mark; >> 521: mark = obj()->cas_set_mark(new_mark, old_mark); > > I'm having trouble seeing how this change is related to hash codes. > The previous code did not loop and if the calling thread's attempt to > lightweight lock the object lost a race, then we simply fell thru down > to the inflate-enter loop... > > I don't see any explanation for this change in the bug report or in the > PR anywhere. Perhaps I'll figure it out as I reason thru the changes... The change from `mark.is_neutral()` to `mark.is_unlocked()` is equivalent since both functions have the exact same meaning. Is there some reason to make that change? Are we trying to migrate away from `mark.is_neutral()` to `mark.is_unlocked()`? I think the original "neutral" concept was from when we had biased locking... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16603#discussion_r1396162159 From dcubed at openjdk.org Thu Nov 16 18:31:51 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 16 Nov 2023 18:31:51 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v4] In-Reply-To: References: <-DODxJdHO2qs-XXVSQSSIZZZKIfHjHKtY8kt9PpNWVs=.82a739dc-0da3-4bbc-b0de-c00ebae56c22@github.com> Message-ID: On Thu, 16 Nov 2023 18:24:43 GMT, Daniel D. Daugherty wrote: >> src/hotspot/share/runtime/synchronizer.cpp line 521: >> >>> 519: const markWord new_mark = mark.set_fast_locked(); >>> 520: const markWord old_mark = mark; >>> 521: mark = obj()->cas_set_mark(new_mark, old_mark); >> >> I'm having trouble seeing how this change is related to hash codes. >> The previous code did not loop and if the calling thread's attempt to >> lightweight lock the object lost a race, then we simply fell thru down >> to the inflate-enter loop... >> >> I don't see any explanation for this change in the bug report or in the >> PR anywhere. Perhaps I'll figure it out as I reason thru the changes... > > The change from `mark.is_neutral()` to `mark.is_unlocked()` is equivalent > since both functions have the exact same meaning. Is there some reason to > make that change? Are we trying to migrate away from `mark.is_neutral()` to > `mark.is_unlocked()`? I think the original "neutral" concept was from when we > had biased locking... The rename from `locked_mark` to `new_mark` seems superfluous. The old and new version of the variable name are used in the same places so I don't quite see the reason for the rename. I liked `locked_mark` because it was the "locked" version of the original `mark`. I do like the use of the `const`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16603#discussion_r1396166107 From dcubed at openjdk.org Thu Nov 16 18:38:49 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 16 Nov 2023 18:38:49 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v4] In-Reply-To: References: <-DODxJdHO2qs-XXVSQSSIZZZKIfHjHKtY8kt9PpNWVs=.82a739dc-0da3-4bbc-b0de-c00ebae56c22@github.com> Message-ID: On Thu, 16 Nov 2023 18:28:34 GMT, Daniel D. Daugherty wrote: >> The change from `mark.is_neutral()` to `mark.is_unlocked()` is equivalent >> since both functions have the exact same meaning. Is there some reason to >> make that change? Are we trying to migrate away from `mark.is_neutral()` to >> `mark.is_unlocked()`? I think the original "neutral" concept was from when we >> had biased locking... > > The rename from `locked_mark` to `new_mark` seems superfluous. The old and > new version of the variable name are used in the same places so I don't quite see > the reason for the rename. I liked `locked_mark` because it was the "locked" version > of the original `mark`. I do like the use of the `const`. Now I think I see why you changed the `cas_set_mark()` call to return its value into `mark` instead of `old_mark` and why the original parameter to the CAS was changed from `mark` to `old_mark`. It's a bit different style then we typically use for CAS where the return value goes into an `old_something` variable. Since `mark` is now the loop control and `old_mark` is now `const` you had to juggle things with the `cas_set_mark()` call. Okay, I think I get the reason for this bit of juggling. (I still don't understand the rename from `locked_mark` to `new_mark` but that's okay.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16603#discussion_r1396174354 From dcubed at openjdk.org Thu Nov 16 18:44:52 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 16 Nov 2023 18:44:52 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v4] In-Reply-To: References: <-DODxJdHO2qs-XXVSQSSIZZZKIfHjHKtY8kt9PpNWVs=.82a739dc-0da3-4bbc-b0de-c00ebae56c22@github.com> Message-ID: On Thu, 16 Nov 2023 18:36:09 GMT, Daniel D. Daugherty wrote: >> The rename from `locked_mark` to `new_mark` seems superfluous. The old and >> new version of the variable name are used in the same places so I don't quite see >> the reason for the rename. I liked `locked_mark` because it was the "locked" version >> of the original `mark`. I do like the use of the `const`. > > Now I think I see why you changed the `cas_set_mark()` call to return its value into > `mark` instead of `old_mark` and why the original parameter to the CAS was changed > from `mark` to `old_mark`. It's a bit different style then we typically use for CAS where > the return value goes into an `old_something` variable. > > Since `mark` is now the loop control and `old_mark` is now `const` you had to juggle > things with the `cas_set_mark()` call. Okay, I think I get the reason for this bit of juggling. > (I still don't understand the rename from `locked_mark` to `new_mark` but that's okay.) Okay so this bit of code changes were done for a couple of reasons (of course this is my guess): 1) add a retry here in case of collision to give this thread another chance at doing a lightweight lock instead of always dropping into inflate-enter loop below on a collision 2) clean up the code a little bit with some use of `const` to make things a little more clear. I'm going to guess that the type of collision that we're trying to avoid is hash code installation. You may want to add a comment above `L516 while (mark.is_unlocked()) {` that says: // Allow for a retry here just in case the cas_set_markI() below collided with a hash code installation: ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16603#discussion_r1396180120 From dcubed at openjdk.org Thu Nov 16 18:51:49 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 16 Nov 2023 18:51:49 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v4] In-Reply-To: References: <1DG4zwC5I96PdIuDQCQbeEsOL3NR5owY7ehs-3axPlE=.68e173e2-e92b-43a5-abf3-9ef30b48443d@github.com> Message-ID: On Mon, 13 Nov 2023 12:21:19 GMT, David Holmes wrote: >> The condition does now check for a successful CAS, not the unsuccessful one. If it was successful then there is no monitor, thus no anonymous owner. >> >> If the CAS failed and the mark word is no longer fast locked. It must be inflated. So we fallthrough down to the inflated case. >> >> `ObjectSynchronizer::inflate` correctly handles fixing the owner. > > I see - thanks. It is hard to see where the code goes to when the CAS fails. In the old code, I think if there was a hash code installation racing with this monitor exit by the thread owner, the hash code installer thread would have inflated the monitor which would have resulted in the anonymous owner value also being set. So we could have had two different inflating thread racers... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16603#discussion_r1396188446 From dcubed at openjdk.org Thu Nov 16 18:55:53 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 16 Nov 2023 18:55:53 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v4] In-Reply-To: References: Message-ID: On Thu, 16 Nov 2023 10:04:11 GMT, Axel Boldt-Christmas wrote: >> LM_LIGHTWEIGHT only uses the lock bits for its locking. This leaves the hashCode bits free when a monitor is not inflated. So instead of inflating when installing the hashCode on a fast locked object it can simply use the hashCode bits in the markWord. >> >> The mark word transitions Unlocked (0b01) <=> Locked (0b00) are done by retrying the CAS if it fails due to non-lock bit changes. >> The mark word transitions Monitor (0b10) <=> Locked/Unlocked (0b0X) are the same as before, inflation already handles hash codes. This change does not interact with the mark word if it is in a Monitor (0b10) state, so the strong CAS which is used for deflation are still valid, and will not fail to any other reason than the cooperative race to help transition the mark word during deflation. >> >> This is dependent on JDK-8319778 simply because JDK-8319797 is dependent on both this and JDK-8319778. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Move is_lock_owned closer to its only use src/hotspot/share/runtime/synchronizer.cpp line 576: > 574: if (LockingMode == LM_LIGHTWEIGHT) { > 575: // Fast-locking does not use the 'lock' argument. > 576: while (mark.is_fast_locked()) { Please consider add a comment above L576: // Allow for a retry here just in case the cas_set_mark() below collided with a hash code installation: ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16603#discussion_r1396192678 From sspitsyn at openjdk.org Thu Nov 16 18:57:32 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 16 Nov 2023 18:57:32 GMT Subject: RFR: 8320239: add dynamic switch for JvmtiVTMSTransitionDisabler sync protocol In-Reply-To: References: Message-ID: On Thu, 16 Nov 2023 17:05:17 GMT, Leonid Mesnik wrote: >> This is an update for a performance/scalability enhancement. >> >> The `JvmtiVTMSTransitionDisabler`sync protocol is on a performance critical path of the virtual threads mount state transitions (VTMS transitions). It has a noticeable performance overhead (about 10%) which contributes to the combined JVMTI performance overhead when Java apps are executed with loaded JVMTI agents. >> >> Please, also see another/related performance issue which contributes around 70% of total performance overhead: >> [8308614](https://bugs.openjdk.org/browse/JDK-8308614): Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 >> >> Testing: >> - Ran mach5 tiers 1-6 with no regressions noticed. > > src/hotspot/share/prims/jvmtiThreadState.cpp line 430: > >> 428: assert(!thread->is_in_VTMS_transition(), "VTMS_transition sanity check"); >> 429: thread->set_is_in_VTMS_transition(true); >> 430: java_lang_Thread::set_is_in_VTMS_transition(vt, true); > > indentation is incorrect. Thank you. Fixed now. > src/hotspot/share/prims/jvmtiThreadState.hpp line 86: > >> 84: static volatile bool _SR_mode; // there is an active suspender or resumer >> 85: static volatile int _VTMS_transition_count; // current number of VTMS transitions >> 86: static int _sync_protocol_enabled_count; // current number of JvmtiVTMSTransitionDisablers enabled sync protocol > > The _sync_protocol_enabled_count and _sync_protocol_enabled_permanently are read/updated in different threads. How access to them is protected from racing? Might be make sense to add this info in comment? Good catch, thanks. My initial intention was to make them volatile with Atomic load/store/update. Fixed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16688#discussion_r1396191582 PR Review Comment: https://git.openjdk.org/jdk/pull/16688#discussion_r1396194416 From pchilanomate at openjdk.org Thu Nov 16 19:05:30 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 16 Nov 2023 19:05:30 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v4] In-Reply-To: References: <6vZHqAagtrtDw-c9xNZNpaBCa1HrpYw22uhYtDTHwAw=.3a59dde0-ab33-43b8-a08f-d481e7acffef@github.com> <2F8ze1cLKzvTMEqwY8JJMZ9QbZUxqrMCv7nl6uFJLMI=.7c4dba0a-8c8d-4825-9670-75b76b9bf184@github.com> Message-ID: On Thu, 16 Nov 2023 17:22:35 GMT, Serguei Spitsyn wrote: >> Thanks! This is a valid concern. Will try to fix this. > > I'm suggesting to fix it this way for the unmounted case only: > > @@ -1976,6 +1976,13 @@ JvmtiHandshake::execute(JvmtiUnitedHandshakeClosure* hs_cl, ThreadsListHandle* t > return; > } > if (target_jt == nullptr) { // unmounted virtual thread > + // JvmtiVTMSTransitionDisabler can start after the vthread executed notifyJvmtiUnmount(), i.e. > + // the vthread is already outside the transition, but before changing the state to TERMINATED. > + // Changing the state to TERMINATED is racy, so we check if the continuation is done in advance. > + oop cont = java_lang_VirtualThread::continuation(target_h()); > + if (jdk_internal_vm_Continuation::done(cont)) { > + return; > + } > hs_cl->do_vthread(target_h); // execute handshake closure callback on current thread directly > } > } Sounds good. Is there a reason why not have the check inside JvmtiEnvBase::is_vthread_alive()? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16460#discussion_r1396202630 From dcubed at openjdk.org Thu Nov 16 19:11:52 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 16 Nov 2023 19:11:52 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v4] In-Reply-To: References: <2MRTHFoYSaSW2NH922LOEvqKx4NLjshWaHJaYV2RdVY=.e234046a-aac8-4d7b-81b9-269506944165@github.com> Message-ID: On Thu, 16 Nov 2023 13:54:31 GMT, Axel Boldt-Christmas wrote: >> Yes that comment. If we don't inflate for hashcode anymore for lightweight locking mode, why would we inflate with a non-JavaThread? Is it impossible? > > I see what you mean. Because we need to put the hash code in the ObjectMonitor if it already is inflated and it may be the case that the ObjectMonitor has an anonymous owner we can still call `is_lock_owned` when retrieving the monitor for FastHashCode. > > Technically we could condition the inflate call on if the mark word lock bits and pick out the ObjectMonitor from the mark word. It would always be installed when reaching that point in the LM_LIGHTWEIGHT mode. Do we inflate when the VMThread is doing JVM/TI tagging? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16603#discussion_r1396205422 From dcubed at openjdk.org Thu Nov 16 19:11:55 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 16 Nov 2023 19:11:55 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v4] In-Reply-To: References: Message-ID: On Thu, 16 Nov 2023 10:04:11 GMT, Axel Boldt-Christmas wrote: >> LM_LIGHTWEIGHT only uses the lock bits for its locking. This leaves the hashCode bits free when a monitor is not inflated. So instead of inflating when installing the hashCode on a fast locked object it can simply use the hashCode bits in the markWord. >> >> The mark word transitions Unlocked (0b01) <=> Locked (0b00) are done by retrying the CAS if it fails due to non-lock bit changes. >> The mark word transitions Monitor (0b10) <=> Locked/Unlocked (0b0X) are the same as before, inflation already handles hash codes. This change does not interact with the mark word if it is in a Monitor (0b10) state, so the strong CAS which is used for deflation are still valid, and will not fail to any other reason than the cooperative race to help transition the mark word during deflation. >> >> This is dependent on JDK-8319778 simply because JDK-8319797 is dependent on both this and JDK-8319778. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Move is_lock_owned closer to its only use test/hotspot/jtreg/runtime/whitebox/TestWBDeflateIdleMonitors.java line 69: > 67: // HotSpot implementation detail: asking for the hash code > 68: // when the object is locked causes monitor inflation. > 69: if (obj.hashCode() == 0xBAD) System.out.println("!"); Why was this deleted from the test? `get_next_hash()` can still return this value so it's useful to know if a bad hash code made it out this far... Update: I see why it was deleted now. hashCode() while locked no longer forces inflation. We're gonna have to find all the places in various tests that assume that works. That's going to be fun... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16603#discussion_r1396207772 From dcubed at openjdk.org Thu Nov 16 19:17:55 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 16 Nov 2023 19:17:55 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v4] In-Reply-To: References: Message-ID: On Thu, 16 Nov 2023 10:04:11 GMT, Axel Boldt-Christmas wrote: >> LM_LIGHTWEIGHT only uses the lock bits for its locking. This leaves the hashCode bits free when a monitor is not inflated. So instead of inflating when installing the hashCode on a fast locked object it can simply use the hashCode bits in the markWord. >> >> The mark word transitions Unlocked (0b01) <=> Locked (0b00) are done by retrying the CAS if it fails due to non-lock bit changes. >> The mark word transitions Monitor (0b10) <=> Locked/Unlocked (0b0X) are the same as before, inflation already handles hash codes. This change does not interact with the mark word if it is in a Monitor (0b10) state, so the strong CAS which is used for deflation are still valid, and will not fail to any other reason than the cooperative race to help transition the mark word during deflation. >> >> This is dependent on JDK-8319778 simply because JDK-8319797 is dependent on both this and JDK-8319778. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Move is_lock_owned closer to its only use src/hotspot/cpu/x86/sharedRuntime_x86.cpp line 68: > 66: __ testptr(result, markWord::monitor_value); > 67: __ jcc(Assembler::notZero, slowCase); > 68: } else { Not needed for other platforms? Or will that be done with other bugs or sub-tasks? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16603#discussion_r1396215537 From dcubed at openjdk.org Thu Nov 16 19:22:07 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 16 Nov 2023 19:22:07 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v4] In-Reply-To: References: Message-ID: On Thu, 16 Nov 2023 10:04:11 GMT, Axel Boldt-Christmas wrote: >> LM_LIGHTWEIGHT only uses the lock bits for its locking. This leaves the hashCode bits free when a monitor is not inflated. So instead of inflating when installing the hashCode on a fast locked object it can simply use the hashCode bits in the markWord. >> >> The mark word transitions Unlocked (0b01) <=> Locked (0b00) are done by retrying the CAS if it fails due to non-lock bit changes. >> The mark word transitions Monitor (0b10) <=> Locked/Unlocked (0b0X) are the same as before, inflation already handles hash codes. This change does not interact with the mark word if it is in a Monitor (0b10) state, so the strong CAS which is used for deflation are still valid, and will not fail to any other reason than the cooperative race to help transition the mark word during deflation. >> >> This is dependent on JDK-8319778 simply because JDK-8319797 is dependent on both this and JDK-8319778. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Move is_lock_owned closer to its only use Thumbs up. I have a couple of editorial comments, but I think that's it. I did a review of the C2 changes and I don't see anything obviously wrong, but having a C2 reviewer would be very, very useful. This needs additional testing. I recommend Tier{1,2,3} on all the usual Oracle platforms. Tier{4,5,6} would catch the uses of hash code stuff in more stress related configs by the Serviceability tests. ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16603#pullrequestreview-1735268119 From stefank at openjdk.org Thu Nov 16 19:23:45 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 16 Nov 2023 19:23:45 GMT Subject: RFR: 8318757: VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls [v10] In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 17:36:04 GMT, Daniel D. Daugherty wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Tweak test > > What CI testing has been done with this PR? I saw that some was planned, but I don't > see the actual tiers executed has been mentioned... @dcubed-ojdk Thanks for taking a look at the changes. I'll see if I can find a good changeset that can fix the typo. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16519#issuecomment-1815167453 From rriggs at openjdk.org Thu Nov 16 20:17:55 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Thu, 16 Nov 2023 20:17:55 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v7] In-Reply-To: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: > Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. > The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. > > The changes include: > > - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate > if the input array is modified before the constructor returns. > The resulting string may contain any combination of characters sampled from the input array. > > - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. > For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. > If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. > If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. > Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. > Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. > > - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. > > - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. Roger Riggs has updated the pull request incrementally with two additional commits since the last revision: - Merge pull request #3 from cl4es/8311906_intrinsic_fixes Fix crash with jmpb in AVX3 intrinsic - Fix crash with jmpb in AVX3 intrinsic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16425/files - new: https://git.openjdk.org/jdk/pull/16425/files/5dda14c4..626d7bf1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16425&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16425&range=05-06 Stats: 265 lines in 3 files changed: 263 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16425/head:pull/16425 PR: https://git.openjdk.org/jdk/pull/16425 From rriggs at openjdk.org Thu Nov 16 20:27:11 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Thu, 16 Nov 2023 20:27:11 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v8] In-Reply-To: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: > Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. > The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. > > The changes include: > > - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate > if the input array is modified before the constructor returns. > The resulting string may contain any combination of characters sampled from the input array. > > - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. > For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. > If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. > If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. > Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. > Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. > > - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. > > - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. Roger Riggs has updated the pull request incrementally with one additional commit since the last revision: Remove trailing whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16425/files - new: https://git.openjdk.org/jdk/pull/16425/files/626d7bf1..3e3607e9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16425&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16425&range=06-07 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16425/head:pull/16425 PR: https://git.openjdk.org/jdk/pull/16425 From jbachorik at openjdk.org Thu Nov 16 20:33:50 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Thu, 16 Nov 2023 20:33:50 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes Message-ID: Please, review this fix for a corner case handling of `jmethodID` values. The issue is related to the interplay between `jmethodID` values and method redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` instance. Once that method gets redefined, the `jmethodID` is updated to point to the last `Method` version. Unless the method is still on stack/running, in which case the original `jmethodID` will be redirected to the latest `Method` version and at the same time the 'previous' `Method` version will receive a new `jmethodID` pointing to that previous version. If we happen to capture stacktrace via `GetStackTrace` or `GetAllStackTraces` JVMTI calls while this previous `Method` version is still on stack we will have the corresponding frame identified by a `jmethodID` pointing to that version. However, sooner or later the 'previous' class version becomes eligible for cleanup at what time all contained `Method` instances. The cleanup process will not perform the `jmethodID` pointer maintenance and we will end up with pointers to deallocated memory. This is caused by the fact that the `jmethodID` lifecycle is bound to `ClassLoaderData` instance and all relevant `jmethodID`s will get batch-updated when the class loader is being released and all its classes are getting unloaded. This means that we need to make sure that if a `Method` instance is being deallocate the associated `jmethodID` (if any) must not point to the deallocated instance once we are finished. Unfortunately, we can not just update the `jmethodID` values in bulk when purging an old class version - the per `InstanceKlass` jmethodID cache is present only for the main class version and contains `jmethodID` values for both the old and current method versions. Therefore we need to perform `jmethodID` lookup when we are about to deallocate a `Method` instance and clean up the pointer only if that `jmethodID` is pointing to the `Method` instance which is being deallocated. _(For anyone interested, a much lengthier writeup is available in [my blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_ ------------- Commit messages: - 8313816: Accessing jmethodID might lead to spurious crashes Changes: https://git.openjdk.org/jdk/pull/16662/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16662&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8313816 Stats: 282 lines in 9 files changed: 278 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16662.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16662/head:pull/16662 PR: https://git.openjdk.org/jdk/pull/16662 From dholmes at openjdk.org Thu Nov 16 20:33:54 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 16 Nov 2023 20:33:54 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes In-Reply-To: References: Message-ID: On Tue, 14 Nov 2023 17:56:09 GMT, Jaroslav Bachorik wrote: > Please, review this fix for a corner case handling of `jmethodID` values. > > The issue is related to the interplay between `jmethodID` values and method redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` instance. Once that method gets redefined, the `jmethodID` is updated to point to the last `Method` version. > Unless the method is still on stack/running, in which case the original `jmethodID` will be redirected to the latest `Method` version and at the same time the 'previous' `Method` version will receive a new `jmethodID` pointing to that previous version. > > If we happen to capture stacktrace via `GetStackTrace` or `GetAllStackTraces` JVMTI calls while this previous `Method` version is still on stack we will have the corresponding frame identified by a `jmethodID` pointing to that version. > However, sooner or later the 'previous' class version becomes eligible for cleanup at what time all contained `Method` instances. The cleanup process will not perform the `jmethodID` pointer maintenance and we will end up with pointers to deallocated memory. > This is caused by the fact that the `jmethodID` lifecycle is bound to `ClassLoaderData` instance and all relevant `jmethodID`s will get batch-updated when the class loader is being released and all its classes are getting unloaded. > > This means that we need to make sure that if a `Method` instance is being deallocate the associated `jmethodID` (if any) must not point to the deallocated instance once we are finished. Unfortunately, we can not just update the `jmethodID` values in bulk when purging an old class version - the per `InstanceKlass` jmethodID cache is present only for the main class version and contains `jmethodID` values for both the old and current method versions. > > Therefore we need to perform `jmethodID` lookup when we are about to deallocate a `Method` instance and clean up the pointer only if that `jmethodID` is pointing to the `Method` instance which is being deallocated. > > _(For anyone interested, a much lengthier writeup is available in [my blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_ src/hotspot/share/oops/instanceKlass.cpp line 531: > 529: > 530: void InstanceKlass::deallocate_methods(ClassLoaderData* loader_data, > 531: Array* methods, InstanceKlass* klass) { An explicit boolean parameter would be cleaner/clearer. src/hotspot/share/oops/instanceKlass.cpp line 542: > 540: if (klass) { > 541: jmethodID jmid = method->find_jmethod_id_or_null(); > 542: // Do the pointer maintenance before releasing the metadata, just in case I assume there should be a period after 'case`. But just in case of what? src/hotspot/share/oops/instanceKlass.cpp line 549: > 547: if (jmid != nullptr && *((Method**)jmid) == method) { > 548: *((Method**)jmid) = nullptr; > 549: } This should be abstracted behind a utility function e.g. `method->clear_jmethod_id()`. src/hotspot/share/oops/method.cpp line 2277: > 2275: } > 2276: } > 2277: Can this race with redefinition? src/hotspot/share/oops/method.hpp line 730: > 728: // so handles are not used to avoid deadlock. > 729: jmethodID find_jmethod_id_or_null() { > 730: return method_holder() != nullptr ? method_holder()->jmethod_id_or_null(this) : nullptr; If `method_holder()` is null at this point what does that mean for the lifecycle of the Method? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1393663791 PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1393664277 PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1393672300 PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1395072721 PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1395071297 From jbachorik at openjdk.org Thu Nov 16 20:33:54 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Thu, 16 Nov 2023 20:33:54 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 05:35:49 GMT, David Holmes wrote: >> Please, review this fix for a corner case handling of `jmethodID` values. >> >> The issue is related to the interplay between `jmethodID` values and method redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` instance. Once that method gets redefined, the `jmethodID` is updated to point to the last `Method` version. >> Unless the method is still on stack/running, in which case the original `jmethodID` will be redirected to the latest `Method` version and at the same time the 'previous' `Method` version will receive a new `jmethodID` pointing to that previous version. >> >> If we happen to capture stacktrace via `GetStackTrace` or `GetAllStackTraces` JVMTI calls while this previous `Method` version is still on stack we will have the corresponding frame identified by a `jmethodID` pointing to that version. >> However, sooner or later the 'previous' class version becomes eligible for cleanup at what time all contained `Method` instances. The cleanup process will not perform the `jmethodID` pointer maintenance and we will end up with pointers to deallocated memory. >> This is caused by the fact that the `jmethodID` lifecycle is bound to `ClassLoaderData` instance and all relevant `jmethodID`s will get batch-updated when the class loader is being released and all its classes are getting unloaded. >> >> This means that we need to make sure that if a `Method` instance is being deallocate the associated `jmethodID` (if any) must not point to the deallocated instance once we are finished. Unfortunately, we can not just update the `jmethodID` values in bulk when purging an old class version - the per `InstanceKlass` jmethodID cache is present only for the main class version and contains `jmethodID` values for both the old and current method versions. >> >> Therefore we need to perform `jmethodID` lookup when we are about to deallocate a `Method` instance and clean up the pointer only if that `jmethodID` is pointing to the `Method` instance which is being deallocated. >> >> _(For anyone interested, a much lengthier writeup is available in [my blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_ > > src/hotspot/share/oops/instanceKlass.cpp line 531: > >> 529: >> 530: void InstanceKlass::deallocate_methods(ClassLoaderData* loader_data, >> 531: Array* methods, InstanceKlass* klass) { > > An explicit boolean parameter would be cleaner/clearer. I just removed the `klass` argument. It is not really used anyway. > src/hotspot/share/oops/instanceKlass.cpp line 542: > >> 540: if (klass) { >> 541: jmethodID jmid = method->find_jmethod_id_or_null(); >> 542: // Do the pointer maintenance before releasing the metadata, just in case > > I assume there should be a period after 'case`. But just in case of what? The code was moved to`method.cpp` and this particular comment line became obsolete > src/hotspot/share/oops/instanceKlass.cpp line 549: > >> 547: if (jmid != nullptr && *((Method**)jmid) == method) { >> 548: *((Method**)jmid) = nullptr; >> 549: } > > This should be abstracted behind a utility function e.g. `method->clear_jmethod_id()`. Done > src/hotspot/share/oops/method.cpp line 2277: > >> 2275: } >> 2276: } >> 2277: > > Can this race with redefinition? The cleanup of previous versions is executed in VM_Operation at a safepoint - therefore we should be safe against races with class redefinitions. I am adding an assert to `clear_jmethod_id()` to check for being at a safepoint. > src/hotspot/share/oops/method.hpp line 730: > >> 728: // so handles are not used to avoid deadlock. >> 729: jmethodID find_jmethod_id_or_null() { >> 730: return method_holder() != nullptr ? method_holder()->jmethod_id_or_null(this) : nullptr; > > If `method_holder()` is null at this point what does that mean for the lifecycle of the Method? Please, ignore this part of code for the time being. I had a crash in CI which was pointing vaguely to this code - unfortunately, the hs_err.log files are not preserved in the test archives and I am not able to reproduce the failure locally. I need to debug the crash and make sure I understand the root cause. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1394642247 PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1394647034 PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1394647173 PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1395100685 PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1395102362 From jbachorik at openjdk.org Thu Nov 16 20:33:55 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Thu, 16 Nov 2023 20:33:55 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 18:47:19 GMT, Jaroslav Bachorik wrote: >> src/hotspot/share/oops/instanceKlass.cpp line 531: >> >>> 529: >>> 530: void InstanceKlass::deallocate_methods(ClassLoaderData* loader_data, >>> 531: Array* methods, InstanceKlass* klass) { >> >> An explicit boolean parameter would be cleaner/clearer. > > I just removed the `klass` argument. It is not really used anyway. I actually ended up with a boolean parameter here. >> src/hotspot/share/oops/method.hpp line 730: >> >>> 728: // so handles are not used to avoid deadlock. >>> 729: jmethodID find_jmethod_id_or_null() { >>> 730: return method_holder() != nullptr ? method_holder()->jmethod_id_or_null(this) : nullptr; >> >> If `method_holder()` is null at this point what does that mean for the lifecycle of the Method? > > Please, ignore this part of code for the time being. I had a crash in CI which was pointing vaguely to this code - unfortunately, the hs_err.log files are not preserved in the test archives and I am not able to reproduce the failure locally. I need to debug the crash and make sure I understand the root cause. _Update:_ I was able to get to the bottom of the methods not having method holder associated with them. The `ClassFileParser` does not finalize initialization of the `InstanceKlass` it has created if `_klass != nullptr` (https://github.com/openjdk/jdk/blob/9727f4bdddc071e6f59806087339f345405ab004/src/hotspot/share/classfile/classFileParser.cpp#L5161). This also means, that the `Method` instances are not wired to their method holders via 'constant method'->'constant pool'->'pool holder' chain. However, they need to be deallocated and as such I really need a distinguishing argument for `InstanceKlass::deallocate_methods` call such that I don't attempt to resolve `jmethodid` values in that case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1396296501 PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1396175846 From jvernee at openjdk.org Thu Nov 16 20:56:37 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 16 Nov 2023 20:56:37 GMT Subject: RFR: 8267532: C2: Profile and prune untaken exception handlers [v5] In-Reply-To: References: Message-ID: <8ZDW97dfYYcbnkSwL04seLPlSYdVMj1UiLg5lmR399A=.579ab132-ddb8-403a-afa9-5158d1c238c8@github.com> On Wed, 15 Nov 2023 17:42:52 GMT, Vladimir Ivanov wrote: > Please, file an RFE to explore pruning of unreached call sites. Filed: https://bugs.openjdk.org/browse/JDK-8320271 ------------- PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1815295836 From cjplummer at openjdk.org Thu Nov 16 21:40:36 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 16 Nov 2023 21:40:36 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 In-Reply-To: References: Message-ID: On Thu, 16 Nov 2023 11:15:27 GMT, Serguei Spitsyn wrote: > This is a fix of a performance/scalability related issue. The `JvmtiThreadState` objects for virtual thread filtered events enabled globally are created eagerly because it is needed when the `interp_only_mode` is enabled. Otherwise, some events which are generated in `interp_only_mode` from the debugging version of interpreter chunks can be missed. > However, it has to be okay to avoid eager creation of these object if no `interp_only_mode` has ever been requested. > It seems to be an extremely important optimization to create JvmtiThreadState objects lazily in such cases. > It is done by introducing the flag `JvmtiThreadState::_seen_interp_only_mode` which indicates when the `JvmtiThreadState` objects have to be created eagerly. > > Additionally, the fix includes the following related changes: > - Use condition double checking idiom for `MutexLocker mu(JvmtiThreadState_lock)` in the function `JvmtiVTMSTransitionDisabler::VTMS_mount_end` which is on a performance-critical path and looks like this: > > JvmtiThreadState* state = thread->jvmti_thread_state(); > if (state != nullptr && state->is_pending_interp_only_mode()) { > MutexLocker mu(JvmtiThreadState_lock); > state = thread->jvmti_thread_state(); > if (state != nullptr && state->is_pending_interp_only_mode()) { > JvmtiEventController::enter_interp_only_mode(); > } > } > > > - Add extra check of `JvmtiExport::can_support_virtual_threads()` when virtual thread mount and unmount are posted. > - Minor: Added a `ThreadsListHandle` to the `JvmtiEventControllerPrivate::enter_interp_only_mode`. It is needed because of the dynamic creation of compensating carrier threads which is racy for JVMTI `SetEventNotificationMode` implementation. > > Performance mesurements: > - Without this fix the test provided by the bug submitter gives execution numbers: > - no ClassLoad events enabled: 3251 ms > - ClassLoad events are enabled: 40534 ms > > - With the fix: > - no ClassLoad events enabled: 3270 ms > - ClassLoad events are enabled: 3385 ms > > Testing: > - Ran mach5 tiers 1-6, no regressions are noticed src/hotspot/share/prims/jvmtiEventController.cpp line 372: > 370: return; // EnterInterpOnlyModeClosure will be executed right after mount. > 371: } > 372: ThreadsListHandle tlh(current); Why was this added? src/hotspot/share/prims/jvmtiThreadState.cpp line 531: > 529: > 530: // JvmtiThreadState objects for virtual thread filtered events enabled globally > 531: // must be created eagerly if the interp_only_mode is enabled. Otherwise, This sentence is hard to read. How about: "If interp_only_mode is enabled then we must eagerly create JvmtiThreadState objects for globally enabled virtual thread filtered events." src/hotspot/share/prims/jvmtiThreadState.cpp line 579: > 577: VTMS_mount_end(vthread); > 578: if (JvmtiExport::can_support_virtual_threads() && > 579: JvmtiExport::should_post_vthread_mount()) { It seems odd that "can_support" can be false when "should_post" is true. I would think that "should_post" would always be false when "can_support" is false, and therefore there would be no need to check "can_support". src/hotspot/share/prims/jvmtiThreadState.hpp line 234: > 232: inline void set_head_env_thread_state(JvmtiEnvThreadState* ets); > 233: > 234: static bool _seen_interp_only_mode; // needed for optimization Say what the flag represents, not why we have it. src/hotspot/share/prims/jvmtiThreadState.hpp line 257: > 255: // JvmtiThreadState objects for virtual thread filtered events enabled globally > 256: // must be created eagerly if the interp_only_mode is enabled. Otherwise, > 257: // it is an important optimization to create JvmtiThreadState objects lazily. No need for this comment here. It is already at the call site, which is where it belongs. Instead the comment here should say what this API does (return true if any thread has entered interp_only_mode at any point during the JVMs execution). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1396375451 PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1396372511 PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1396371590 PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1396362847 PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1396374577 From manc at openjdk.org Thu Nov 16 22:00:31 2023 From: manc at openjdk.org (Man Cao) Date: Thu, 16 Nov 2023 22:00:31 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is create for one JavaThread associated with attached native thread [v2] In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Tue, 14 Nov 2023 20:57:09 GMT, Jiangli Zhou wrote: >> Please review JvmtiThreadState::state_for_while_locked change to handle the state->get_thread_oop() null case. Please see https://bugs.openjdk.org/browse/JDK-8319935 for details. > > Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: > > Don't try to setup_jvmti_thread_state for obj allocation sampling if the current thread is attaching from native and is allocating the thread oop. That's to make sure we don't create a 'partial' JvmtiThreadState. Thanks. The latest change to `JvmtiSampledObjectAllocEventCollector::object_alloc_is_safe_to_sample()` looks OK to me. Skipping a few allocations for JVMTI allocation sampler is better than resulting in a problematic `JvmtiThreadState` instance. My main question is if we can now change `if (state == nullptr || state->get_thread_oop() != thread_oop) ` to `if (state == nullptr)` in `JvmtiThreadState::state_for_while_locked()`. I suspect we would never run into a case of `state != nullptr && state->get_thread_oop() != thread_oop` with the latest change, even with virtual threads. This is backed up by testing with https://github.com/openjdk/jdk/commit/00ace66c36243671a0fb1b673b3f9845460c6d22 not triggering any failure. If we run into such as a case, it could still be problematic as `JvmtiThreadState::state_for_while_locked()` would allocate a new `JvmtiThreadState` instance pointing to the same JavaThread, and it does not delete the existing instance. Could anyone with deep knowledge on JvmtiThreadState and virtual threads provide some feedback on this change and https://bugs.openjdk.org/browse/JDK-8319935? @AlanBateman, do you know who would be the best reviewer for this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16642#issuecomment-1815379890 From rgiulietti at openjdk.org Thu Nov 16 22:03:43 2023 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Thu, 16 Nov 2023 22:03:43 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v8] In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: On Thu, 16 Nov 2023 20:27:11 GMT, Roger Riggs wrote: >> Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. >> The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. >> >> The changes include: >> >> - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate >> if the input array is modified before the constructor returns. >> The resulting string may contain any combination of characters sampled from the input array. >> >> - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. >> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. >> If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. >> If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. >> Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. >> Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. >> >> - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. >> >> - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. > > Roger Riggs has updated the pull request incrementally with one additional commit since the last revision: > > Remove trailing whitespace src/java.base/share/classes/java/lang/String.java line 4818: > 4816: if (COMPACT_STRINGS) { > 4817: byte[] val = StringUTF16.compress(value, off, len); > 4818: this.coder = (val.length == len) ? LATIN1 : UTF16; A uglier branch-less variant Suggestion: this.coder = len - val.length >>> Integer.SIZE - 1; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1396401074 From iklam at openjdk.org Thu Nov 16 22:43:45 2023 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 16 Nov 2023 22:43:45 GMT Subject: RFR: 8320147: Remove DumpSharedSpaces Message-ID: One more PR for cleanup with cdsConfig.hpp: Replace the global variable `DumpSharedSpaces` with `CDSConfig::is_dumping_static_archive()`. Note: some mis-uses of `DumpSharedSpaces` need to be replaced with `CDSConfig::is_dumping_heap()` or `CDSConfig::is_dumping_full_module_graph()` ------------- Commit messages: - 8320147: Remove DumpSharedSpaces Changes: https://git.openjdk.org/jdk/pull/16700/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16700&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320147 Stats: 178 lines in 50 files changed: 81 ins; 16 del; 81 mod Patch: https://git.openjdk.org/jdk/pull/16700.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16700/head:pull/16700 PR: https://git.openjdk.org/jdk/pull/16700 From sspitsyn at openjdk.org Thu Nov 16 22:46:31 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 16 Nov 2023 22:46:31 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v4] In-Reply-To: References: <6vZHqAagtrtDw-c9xNZNpaBCa1HrpYw22uhYtDTHwAw=.3a59dde0-ab33-43b8-a08f-d481e7acffef@github.com> <2F8ze1cLKzvTMEqwY8JJMZ9QbZUxqrMCv7nl6uFJLMI=.7c4dba0a-8c8d-4825-9670-75b76b9bf184@github.com> Message-ID: On Thu, 16 Nov 2023 19:02:39 GMT, Patricio Chilano Mateo wrote: >> I'm suggesting to fix it this way for the unmounted case only: >> >> @@ -1976,6 +1976,13 @@ JvmtiHandshake::execute(JvmtiUnitedHandshakeClosure* hs_cl, ThreadsListHandle* t >> return; >> } >> if (target_jt == nullptr) { // unmounted virtual thread >> + // JvmtiVTMSTransitionDisabler can start after the vthread executed notifyJvmtiUnmount(), i.e. >> + // the vthread is already outside the transition, but before changing the state to TERMINATED. >> + // Changing the state to TERMINATED is racy, so we check if the continuation is done in advance. >> + oop cont = java_lang_VirtualThread::continuation(target_h()); >> + if (jdk_internal_vm_Continuation::done(cont)) { >> + return; >> + } >> hs_cl->do_vthread(target_h); // execute handshake closure callback on current thread directly >> } >> } > > Sounds good. Is there a reason why not have the check inside JvmtiEnvBase::is_vthread_alive()? If it is a part of the `JvmtiEnvBase::is_vthread_alive()` then it is racy for mounted virtual threads. It is not racy for unmounted virtual threads. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16460#discussion_r1396456207 From ccheung at openjdk.org Thu Nov 16 23:42:30 2023 From: ccheung at openjdk.org (Calvin Cheung) Date: Thu, 16 Nov 2023 23:42:30 GMT Subject: RFR: 8320147: Remove DumpSharedSpaces In-Reply-To: References: Message-ID: On Thu, 16 Nov 2023 22:36:17 GMT, Ioi Lam wrote: > One more PR for cleanup with cdsConfig.hpp: > > Replace the global variable `DumpSharedSpaces` with `CDSConfig::is_dumping_static_archive()`. > > Note: some mis-uses of `DumpSharedSpaces` need to be replaced with `CDSConfig::is_dumping_heap()` or `CDSConfig::is_dumping_full_module_graph()` Few files require copyright header update. instanceClassLoaderKlass.hpp instanceMirrorKlass.hpp instanceRefKlass.hpp instanceStackChunkKlass.hpp src/hotspot/share/oops/instanceClassLoaderKlass.cpp line 2: > 1: /* > 2: * Copyright (c) 1997, 2023, Oracle and/or its affiliates. All rights reserved. Since this is a new file, no need to include 1997. ------------- Marked as reviewed by ccheung (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16700#pullrequestreview-1735730095 PR Review Comment: https://git.openjdk.org/jdk/pull/16700#discussion_r1396496277 From sviswanathan at openjdk.org Fri Nov 17 00:14:42 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 17 Nov 2023 00:14:42 GMT Subject: RFR: 8318562: Computational test more than 2x slower when AVX instructions are used Message-ID: This PR fixes the perf regression seen on AVX for floating point conversions. In AVX the cvt instructions have three operands cvtxx dst, src1, src2. Where src2 is the one being converted. The dst gets the lower bits as the converted value and upper bits (up to 128) from src1. The C2 jit uses the cvtxx dst, dst, src2 flavor. Here the problem was due to uninitialized upper bits of the dst XMM register. Doing an xor dst, dst before the conversion instruction fixes the perf regression. Perf before the patch on UseAVX=3 platform: ComputePI.compute_pi_dbl_flt avgt 5 471.875 ? 0.400 ns/op ComputePI.compute_pi_flt_dbl avgt 5 1877.174 ? 0.557 ns/op ComputePI.compute_pi_int_dbl avgt 5 655.222 ? 28.082 ns/op ComputePI.compute_pi_int_flt avgt 5 737.178 ? 0.077 ns/op ComputePI.compute_pi_long_dbl avgt 5 767.364 ? 0.027 ns/op ComputePI.compute_pi_long_flt avgt 5 587.854 ? 10.068 ns/op Perf after the patch on UseAVX=3 platform: Benchmark Mode Cnt Score Error Units ComputePI.compute_pi_dbl_flt avgt 5 468.328 ? 0.141 ns/op ComputePI.compute_pi_flt_dbl avgt 5 435.430 ? 0.259 ns/op ComputePI.compute_pi_int_dbl avgt 5 424.088 ? 0.050 ns/op ComputePI.compute_pi_int_flt avgt 5 417.345 ? 0.207 ns/op ComputePI.compute_pi_long_dbl avgt 5 425.751 ? 0.006 ns/op ComputePI.compute_pi_long_flt avgt 5 430.199 ? 0.736 ns/op ------------- Commit messages: - fix 32bit build problem - Fix for AVX cvt performance Changes: https://git.openjdk.org/jdk/pull/16701/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16701&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318562 Stats: 247 lines in 4 files changed: 245 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16701.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16701/head:pull/16701 PR: https://git.openjdk.org/jdk/pull/16701 From kvn at openjdk.org Fri Nov 17 01:22:33 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 17 Nov 2023 01:22:33 GMT Subject: RFR: 8318562: Computational test more than 2x slower when AVX instructions are used In-Reply-To: References: Message-ID: On Thu, 16 Nov 2023 23:46:53 GMT, Sandhya Viswanathan wrote: > This PR fixes the perf regression seen on AVX for floating point conversions. > > In AVX the cvt instructions have three operands cvtxx dst, src1, src2. Where src2 is the one being converted. The dst gets the lower bits as the converted value and upper bits (up to 128) from src1. > > The C2 jit uses the cvtxx dst, dst, src2 flavor. Here the problem was due to uninitialized upper bits of the dst XMM register. > Doing an xor dst, dst before the conversion instruction fixes the perf regression. > > Perf before the patch on UseAVX=3 platform: > ComputePI.compute_pi_dbl_flt avgt 5 471.875 ? 0.400 ns/op > ComputePI.compute_pi_flt_dbl avgt 5 1877.174 ? 0.557 ns/op > ComputePI.compute_pi_int_dbl avgt 5 655.222 ? 28.082 ns/op > ComputePI.compute_pi_int_flt avgt 5 737.178 ? 0.077 ns/op > ComputePI.compute_pi_long_dbl avgt 5 767.364 ? 0.027 ns/op > ComputePI.compute_pi_long_flt avgt 5 587.854 ? 10.068 ns/op > > Perf after the patch on UseAVX=3 platform: > Benchmark Mode Cnt Score Error Units > ComputePI.compute_pi_dbl_flt avgt 5 468.328 ? 0.141 ns/op > ComputePI.compute_pi_flt_dbl avgt 5 435.430 ? 0.259 ns/op > ComputePI.compute_pi_int_dbl avgt 5 424.088 ? 0.050 ns/op > ComputePI.compute_pi_int_flt avgt 5 417.345 ? 0.207 ns/op > ComputePI.compute_pi_long_dbl avgt 5 425.751 ? 0.006 ns/op > ComputePI.compute_pi_long_flt avgt 5 430.199 ? 0.736 ns/op @sviswa7 thank you for finding the cause! I will test it locally. ------------- PR Review: https://git.openjdk.org/jdk/pull/16701#pullrequestreview-1735830320 From pchilanomate at openjdk.org Fri Nov 17 02:13:30 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 17 Nov 2023 02:13:30 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v4] In-Reply-To: References: <6vZHqAagtrtDw-c9xNZNpaBCa1HrpYw22uhYtDTHwAw=.3a59dde0-ab33-43b8-a08f-d481e7acffef@github.com> <2F8ze1cLKzvTMEqwY8JJMZ9QbZUxqrMCv7nl6uFJLMI=.7c4dba0a-8c8d-4825-9670-75b76b9bf184@github.com> Message-ID: On Thu, 16 Nov 2023 22:44:12 GMT, Serguei Spitsyn wrote: >> Sounds good. Is there a reason why not have the check inside JvmtiEnvBase::is_vthread_alive()? > > If it is a part of the `JvmtiEnvBase::is_vthread_alive()` then it is racy for mounted virtual threads. > It is not racy for unmounted virtual threads. So we should only see that a continuation is done for an unmounted vthread. The last place where we could see a mounted vthread is at notifyJvmtiEnd(), blocked in start_VTMS_transition(), but the continuation is not marked done yet. Also I realize the window for the problematic case I mentioned starts even earlier at notifyJvmtiEnd(), not notifyJvmtiUnmount(), because blocking due to JvmtiVTMSTransitionDisabler happens in start_VTMS_transition() not finish_VTMS_transition(). Once the vthread executed notifyJvmtiEnd() any JvmtiVTMSTransitionDisabler that happens afterwards will fall into this case. So maybe the first sentence of the comment should instead be: "The vthread could have already executed the last unmount but might not have changed state to TERMINATED yet." ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16460#discussion_r1396592975 From kvn at openjdk.org Fri Nov 17 02:14:34 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 17 Nov 2023 02:14:34 GMT Subject: RFR: 8318562: Computational test more than 2x slower when AVX instructions are used In-Reply-To: References: Message-ID: On Thu, 16 Nov 2023 23:46:53 GMT, Sandhya Viswanathan wrote: > This PR fixes the perf regression seen on AVX for floating point conversions. > > In AVX the cvt instructions have three operands cvtxx dst, src1, src2. Where src2 is the one being converted. The dst gets the lower bits as the converted value and upper bits (up to 128) from src1. > > The C2 jit uses the cvtxx dst, dst, src2 flavor. Here the problem was due to uninitialized upper bits of the dst XMM register. > Doing an xor dst, dst before the conversion instruction fixes the perf regression. > > Perf before the patch on UseAVX=3 platform: > ComputePI.compute_pi_dbl_flt avgt 5 471.875 ? 0.400 ns/op > ComputePI.compute_pi_flt_dbl avgt 5 1877.174 ? 0.557 ns/op > ComputePI.compute_pi_int_dbl avgt 5 655.222 ? 28.082 ns/op > ComputePI.compute_pi_int_flt avgt 5 737.178 ? 0.077 ns/op > ComputePI.compute_pi_long_dbl avgt 5 767.364 ? 0.027 ns/op > ComputePI.compute_pi_long_flt avgt 5 587.854 ? 10.068 ns/op > > Perf after the patch on UseAVX=3 platform: > Benchmark Mode Cnt Score Error Units > ComputePI.compute_pi_dbl_flt avgt 5 468.328 ? 0.141 ns/op > ComputePI.compute_pi_flt_dbl avgt 5 435.430 ? 0.259 ns/op > ComputePI.compute_pi_int_dbl avgt 5 424.088 ? 0.050 ns/op > ComputePI.compute_pi_int_flt avgt 5 417.345 ? 0.207 ns/op > ComputePI.compute_pi_long_dbl avgt 5 425.751 ? 0.006 ns/op > ComputePI.compute_pi_long_flt avgt 5 430.199 ? 0.736 ns/op I confirmed that this change solved performance issue on machines I tested (old Broadwell and Cascade Lake CPUs). I am submitting our regular testing for approval. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16701#issuecomment-1815636371 From jbhateja at openjdk.org Fri Nov 17 02:21:29 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 17 Nov 2023 02:21:29 GMT Subject: RFR: 8318562: Computational test more than 2x slower when AVX instructions are used In-Reply-To: <8VGbiOv5QCcp0n7p-Q3nglI2mpvg_L1qRyvbPRE8g2Q=.5a140fa6-fb41-4c4e-8c2a-f8d325d3d051@github.com> References: <8VGbiOv5QCcp0n7p-Q3nglI2mpvg_L1qRyvbPRE8g2Q=.5a140fa6-fb41-4c4e-8c2a-f8d325d3d051@github.com> Message-ID: <71uau-1DADjVuEYJ1xa0J4Y3aBpgOFCxsJBVaQVOCmE=.10f0fc5e-f050-48da-ba61-1bf1ef99125c@github.com> On Fri, 17 Nov 2023 02:16:58 GMT, Sandhya Viswanathan wrote: >> I confirmed that this change solved performance issue on machines I tested (old Broadwell and Cascade Lake CPUs). >> I am submitting our regular testing for approval. > > @vnkozlov Thanks a lot! Hi @sviswa7 , Thanks for addressing this. For SSE versions Bits (MAXVL-1:32) of the corresponding destination register remain unchanged For AVX Bits (127:32) of the XMM register destination are copied from corresponding bits in the first source operand. Since all the computation in backend intially happens in logical register and only at retirement backend copies the logical register to architectural register, thus from micro architectural standpoint both the cases will result in emittion of extra micro op, in first case for merging and in second case for copying. Which is why we inject vzeroupper to save merging plenalities b/w AVX2/AVX512 and SSE transitions. Fix looks ok to me otherwise. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16701#issuecomment-1815645570 From sviswanathan at openjdk.org Fri Nov 17 02:21:28 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 17 Nov 2023 02:21:28 GMT Subject: RFR: 8318562: Computational test more than 2x slower when AVX instructions are used In-Reply-To: References: Message-ID: <8VGbiOv5QCcp0n7p-Q3nglI2mpvg_L1qRyvbPRE8g2Q=.5a140fa6-fb41-4c4e-8c2a-f8d325d3d051@github.com> On Fri, 17 Nov 2023 02:11:29 GMT, Vladimir Kozlov wrote: >> This PR fixes the perf regression seen on AVX for floating point conversions. >> >> In AVX the cvt instructions have three operands cvtxx dst, src1, src2. Where src2 is the one being converted. The dst gets the lower bits as the converted value and upper bits (up to 128) from src1. >> >> The C2 jit uses the cvtxx dst, dst, src2 flavor. Here the problem was due to uninitialized upper bits of the dst XMM register. >> Doing an xor dst, dst before the conversion instruction fixes the perf regression. >> >> Perf before the patch on UseAVX=3 platform: >> ComputePI.compute_pi_dbl_flt avgt 5 471.875 ? 0.400 ns/op >> ComputePI.compute_pi_flt_dbl avgt 5 1877.174 ? 0.557 ns/op >> ComputePI.compute_pi_int_dbl avgt 5 655.222 ? 28.082 ns/op >> ComputePI.compute_pi_int_flt avgt 5 737.178 ? 0.077 ns/op >> ComputePI.compute_pi_long_dbl avgt 5 767.364 ? 0.027 ns/op >> ComputePI.compute_pi_long_flt avgt 5 587.854 ? 10.068 ns/op >> >> Perf after the patch on UseAVX=3 platform: >> Benchmark Mode Cnt Score Error Units >> ComputePI.compute_pi_dbl_flt avgt 5 468.328 ? 0.141 ns/op >> ComputePI.compute_pi_flt_dbl avgt 5 435.430 ? 0.259 ns/op >> ComputePI.compute_pi_int_dbl avgt 5 424.088 ? 0.050 ns/op >> ComputePI.compute_pi_int_flt avgt 5 417.345 ? 0.207 ns/op >> ComputePI.compute_pi_long_dbl avgt 5 425.751 ? 0.006 ns/op >> ComputePI.compute_pi_long_flt avgt 5 430.199 ? 0.736 ns/op > > I confirmed that this change solved performance issue on machines I tested (old Broadwell and Cascade Lake CPUs). > I am submitting our regular testing for approval. @vnkozlov Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16701#issuecomment-1815643813 From jiangli at openjdk.org Fri Nov 17 02:53:36 2023 From: jiangli at openjdk.org (Jiangli Zhou) Date: Fri, 17 Nov 2023 02:53:36 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is create for one JavaThread associated with attached native thread [v2] In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Tue, 14 Nov 2023 20:57:09 GMT, Jiangli Zhou wrote: >> Please review JvmtiThreadState::state_for_while_locked change to handle the state->get_thread_oop() null case. Please see https://bugs.openjdk.org/browse/JDK-8319935 for details. > > Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: > > Don't try to setup_jvmti_thread_state for obj allocation sampling if the current thread is attaching from native and is allocating the thread oop. That's to make sure we don't create a 'partial' JvmtiThreadState. > Thanks. The latest change to `JvmtiSampledObjectAllocEventCollector::object_alloc_is_safe_to_sample()` looks OK to me. Skipping a few allocations for JVMTI allocation sampler is better than resulting in a problematic `JvmtiThreadState` instance. > > My main question is if we can now change `if (state == nullptr || state->get_thread_oop() != thread_oop) ` to `if (state == nullptr)` in `JvmtiThreadState::state_for_while_locked()`. I suspect we would never run into a case of `state != nullptr && state->get_thread_oop() != thread_oop` with the latest change, even with virtual threads. This is backed up by testing with [00ace66](https://github.com/openjdk/jdk/commit/00ace66c36243671a0fb1b673b3f9845460c6d22) not triggering any failure. > > If we run into such as a case, it could still be problematic as `JvmtiThreadState::state_for_while_locked()` would allocate a new `JvmtiThreadState` instance pointing to the same JavaThread, and it does not delete the existing instance. > > Could anyone with deep knowledge on JvmtiThreadState and virtual threads provide some feedback on this change and https://bugs.openjdk.org/browse/JDK-8319935? @AlanBateman, do you know who would be the best reviewer for this? @caoman and I discussed about his suggestion on changing `if (state == nullptr || state->get_thread_oop() != thread_oop)` check in person today. Since it may affect vthread, my main concern is that our current testing may not cover that sufficiently. The suggestion could be worked by a separate enhancement bug. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16642#issuecomment-1815667615 From qamai at openjdk.org Fri Nov 17 03:19:29 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 17 Nov 2023 03:19:29 GMT Subject: RFR: 8318562: Computational test more than 2x slower when AVX instructions are used In-Reply-To: References: Message-ID: On Thu, 16 Nov 2023 23:46:53 GMT, Sandhya Viswanathan wrote: > This PR fixes the perf regression seen on AVX for floating point conversions. > > In AVX the cvt instructions have three operands cvtxx dst, src1, src2. Where src2 is the one being converted. The dst gets the lower bits as the converted value and upper bits (up to 128) from src1. > > The C2 jit uses the cvtxx dst, dst, src2 flavor. Here the problem was due to uninitialized upper bits of the dst XMM register. > Doing an xor dst, dst before the conversion instruction fixes the perf regression. > > Perf before the patch on UseAVX=3 platform: > ComputePI.compute_pi_dbl_flt avgt 5 471.875 ? 0.400 ns/op > ComputePI.compute_pi_flt_dbl avgt 5 1877.174 ? 0.557 ns/op > ComputePI.compute_pi_int_dbl avgt 5 655.222 ? 28.082 ns/op > ComputePI.compute_pi_int_flt avgt 5 737.178 ? 0.077 ns/op > ComputePI.compute_pi_long_dbl avgt 5 767.364 ? 0.027 ns/op > ComputePI.compute_pi_long_flt avgt 5 587.854 ? 10.068 ns/op > > Perf after the patch on UseAVX=3 platform: > Benchmark Mode Cnt Score Error Units > ComputePI.compute_pi_dbl_flt avgt 5 468.328 ? 0.141 ns/op > ComputePI.compute_pi_flt_dbl avgt 5 435.430 ? 0.259 ns/op > ComputePI.compute_pi_int_dbl avgt 5 424.088 ? 0.050 ns/op > ComputePI.compute_pi_int_flt avgt 5 417.345 ? 0.207 ns/op > ComputePI.compute_pi_long_dbl avgt 5 425.751 ? 0.006 ns/op > ComputePI.compute_pi_long_flt avgt 5 430.199 ? 0.736 ns/op For `cvtss2sd` and `cvtsd2ss`, can we branch to have AVX use `vcvtss2sd(dst, src, src)`. This removes the redundant `xor` on newer machines. src/hotspot/cpu/x86/x86_64.ad line 11092: > 11090: %{ > 11091: match(Set dst (ConvD2F src)); > 11092: effect(TEMP dst); You don't need `TEMP dst`, if `dst` is an alias of `src` then a destructive `xor` is not emitted. ------------- PR Review: https://git.openjdk.org/jdk/pull/16701#pullrequestreview-1735952507 PR Review Comment: https://git.openjdk.org/jdk/pull/16701#discussion_r1396641669 From sspitsyn at openjdk.org Fri Nov 17 04:38:27 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 17 Nov 2023 04:38:27 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v4] In-Reply-To: References: <6vZHqAagtrtDw-c9xNZNpaBCa1HrpYw22uhYtDTHwAw=.3a59dde0-ab33-43b8-a08f-d481e7acffef@github.com> <2F8ze1cLKzvTMEqwY8JJMZ9QbZUxqrMCv7nl6uFJLMI=.7c4dba0a-8c8d-4825-9670-75b76b9bf184@github.com> Message-ID: On Fri, 17 Nov 2023 02:10:50 GMT, Patricio Chilano Mateo wrote: >> If it is a part of the `JvmtiEnvBase::is_vthread_alive()` then it is racy for mounted virtual threads. >> It is not racy for unmounted virtual threads. > > So we should only see that a continuation is done for an unmounted vthread. The last place where we could see a mounted vthread is at notifyJvmtiEnd(), blocked in start_VTMS_transition(), but the continuation is not marked done yet. > Also I realize the window for the problematic case I mentioned starts even earlier at notifyJvmtiEnd(), not notifyJvmtiUnmount(), because blocking due to JvmtiVTMSTransitionDisabler happens in start_VTMS_transition() not finish_VTMS_transition(). Once the vthread executed notifyJvmtiEnd() any JvmtiVTMSTransitionDisabler that happens afterwards will fall into this case. So maybe the first sentence of the comment should instead be: "The vthread could have already executed the last unmount but might not have changed state to TERMINATED yet." Thank you. Now I see that `done` is set to `true` during an unmount transition. So, I'm convinced to move the check for `jdk_internal_vm_Continuation::done(cont)` to `JvmtiEnvBase::is_vthread_alive()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16460#discussion_r1396695782 From jbhateja at openjdk.org Fri Nov 17 04:57:30 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 17 Nov 2023 04:57:30 GMT Subject: RFR: 8318562: Computational test more than 2x slower when AVX instructions are used In-Reply-To: References: Message-ID: On Fri, 17 Nov 2023 03:13:42 GMT, Quan Anh Mai wrote: >> This PR fixes the perf regression seen on AVX for floating point conversions. >> >> In AVX the cvt instructions have three operands cvtxx dst, src1, src2. Where src2 is the one being converted. The dst gets the lower bits as the converted value and upper bits (up to 128) from src1. >> >> The C2 jit uses the cvtxx dst, dst, src2 flavor. Here the problem was due to uninitialized upper bits of the dst XMM register. >> Doing an xor dst, dst before the conversion instruction fixes the perf regression. >> >> Perf before the patch on UseAVX=3 platform: >> ComputePI.compute_pi_dbl_flt avgt 5 471.875 ? 0.400 ns/op >> ComputePI.compute_pi_flt_dbl avgt 5 1877.174 ? 0.557 ns/op >> ComputePI.compute_pi_int_dbl avgt 5 655.222 ? 28.082 ns/op >> ComputePI.compute_pi_int_flt avgt 5 737.178 ? 0.077 ns/op >> ComputePI.compute_pi_long_dbl avgt 5 767.364 ? 0.027 ns/op >> ComputePI.compute_pi_long_flt avgt 5 587.854 ? 10.068 ns/op >> >> Perf after the patch on UseAVX=3 platform: >> Benchmark Mode Cnt Score Error Units >> ComputePI.compute_pi_dbl_flt avgt 5 468.328 ? 0.141 ns/op >> ComputePI.compute_pi_flt_dbl avgt 5 435.430 ? 0.259 ns/op >> ComputePI.compute_pi_int_dbl avgt 5 424.088 ? 0.050 ns/op >> ComputePI.compute_pi_int_flt avgt 5 417.345 ? 0.207 ns/op >> ComputePI.compute_pi_long_dbl avgt 5 425.751 ? 0.006 ns/op >> ComputePI.compute_pi_long_flt avgt 5 430.199 ? 0.736 ns/op > > src/hotspot/cpu/x86/x86_64.ad line 11092: > >> 11090: %{ >> 11091: match(Set dst (ConvD2F src)); >> 11092: effect(TEMP dst); > > You don't need `TEMP dst`, if `dst` is an alias of `src` then a destructive `xor` is not emitted. Without TEMP annotation dst and src may be aliased if src live range does not survives beyond this instruction. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16701#discussion_r1396704179 From sspitsyn at openjdk.org Fri Nov 17 05:37:46 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 17 Nov 2023 05:37:46 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v8] In-Reply-To: References: Message-ID: > The handshakes support for virtual threads is needed to simplify the JVMTI implementation for virtual threads. There is a significant duplication in the JVMTI code to differentiate code intended to support platform, virtual threads or both. The handshakes are unified, so it is enough to define just one handshake for both platform and virtual thread. > At the low level, the JVMTI code supporting platform and virtual threads still can be different. > This implementation is based on the `JvmtiVTMSTransitionDisabler` class. > > The internal API includes two new classes: > - `JvmtiHandshake` and `JvmtiUnifiedHandshakeClosure` > > The `JvmtiUnifiedHandshakeClosure` defines two different callback functions: `do_thread()` and `do_vthread()`. > > The first JVMTI functions are picked first to be converted to use the `JvmtiHandshake`: > - `GetStackTrace`, `GetFrameCount`, `GetFrameLocation`, `NotifyFramePop` > > To get the test results clean, the update also fixes the test issue: > [8318631](https://bugs.openjdk.org/browse/JDK-8318631): GetStackTraceSuspendedStressTest.java failed with "check_jvmti_status: JVMTI function returned error: JVMTI_ERROR_THREAD_NOT_ALIVE (15)" > > Testing: > - the mach5 tiers 1-6 are all passed Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: add jdk_internal_vm_Continuation::done(cont) check to JvmtiEnvBase::is_vthread_alive ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16460/files - new: https://git.openjdk.org/jdk/pull/16460/files/2df63547..e61d0703 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16460&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16460&range=06-07 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16460.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16460/head:pull/16460 PR: https://git.openjdk.org/jdk/pull/16460 From sspitsyn at openjdk.org Fri Nov 17 05:46:37 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 17 Nov 2023 05:46:37 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 In-Reply-To: References: Message-ID: <65piquQpnXvDvmbpt-U_EtxYEe7zu8yRCp39ZDA6rZA=.336ea1e4-a339-4a38-9c28-7a2cd1fd2c31@github.com> On Thu, 16 Nov 2023 21:36:28 GMT, Chris Plummer wrote: >> This is a fix of a performance/scalability related issue. The `JvmtiThreadState` objects for virtual thread filtered events enabled globally are created eagerly because it is needed when the `interp_only_mode` is enabled. Otherwise, some events which are generated in `interp_only_mode` from the debugging version of interpreter chunks can be missed. >> However, it has to be okay to avoid eager creation of these object if no `interp_only_mode` has ever been requested. >> It seems to be an extremely important optimization to create JvmtiThreadState objects lazily in such cases. >> It is done by introducing the flag `JvmtiThreadState::_seen_interp_only_mode` which indicates when the `JvmtiThreadState` objects have to be created eagerly. >> >> Additionally, the fix includes the following related changes: >> - Use condition double checking idiom for `MutexLocker mu(JvmtiThreadState_lock)` in the function `JvmtiVTMSTransitionDisabler::VTMS_mount_end` which is on a performance-critical path and looks like this: >> >> JvmtiThreadState* state = thread->jvmti_thread_state(); >> if (state != nullptr && state->is_pending_interp_only_mode()) { >> MutexLocker mu(JvmtiThreadState_lock); >> state = thread->jvmti_thread_state(); >> if (state != nullptr && state->is_pending_interp_only_mode()) { >> JvmtiEventController::enter_interp_only_mode(); >> } >> } >> >> >> - Add extra check of `JvmtiExport::can_support_virtual_threads()` when virtual thread mount and unmount are posted. >> - Minor: Added a `ThreadsListHandle` to the `JvmtiEventControllerPrivate::enter_interp_only_mode`. It is needed because of the dynamic creation of compensating carrier threads which is racy for JVMTI `SetEventNotificationMode` implementation. >> >> Performance mesurements: >> - Without this fix the test provided by the bug submitter gives execution numbers: >> - no ClassLoad events enabled: 3251 ms >> - ClassLoad events are enabled: 40534 ms >> >> - With the fix: >> - no ClassLoad events enabled: 3270 ms >> - ClassLoad events are enabled: 3385 ms >> >> Testing: >> - Ran mach5 tiers 1-6, no regressions are noticed > > src/hotspot/share/prims/jvmtiEventController.cpp line 372: > >> 370: return; // EnterInterpOnlyModeClosure will be executed right after mount. >> 371: } >> 372: ThreadsListHandle tlh(current); > > Why was this added? This is explained in the PR description. Do you think, a just comment is needed or this has to be separated from this fix? > src/hotspot/share/prims/jvmtiThreadState.cpp line 531: > >> 529: >> 530: // JvmtiThreadState objects for virtual thread filtered events enabled globally >> 531: // must be created eagerly if the interp_only_mode is enabled. Otherwise, > > This sentence is hard to read. How about: > > "If interp_only_mode is enabled then we must eagerly create JvmtiThreadState objects for globally enabled virtual thread filtered events." Okay, thanks. The suggestion is taken. > src/hotspot/share/prims/jvmtiThreadState.cpp line 579: > >> 577: VTMS_mount_end(vthread); >> 578: if (JvmtiExport::can_support_virtual_threads() && >> 579: JvmtiExport::should_post_vthread_mount()) { > > It seems odd that "can_support" can be false when "should_post" is true. I would think that "should_post" would always be false when "can_support" is false, and therefore there would be no need to check "can_support". Right, thanks. It is why this check was missed in the first place. Will undo this change. > src/hotspot/share/prims/jvmtiThreadState.hpp line 234: > >> 232: inline void set_head_env_thread_state(JvmtiEnvThreadState* ets); >> 233: >> 234: static bool _seen_interp_only_mode; // needed for optimization > > Say what the flag represents, not why we have it. Thank you for looking at this PR! Okay, thanks. Will do. > src/hotspot/share/prims/jvmtiThreadState.hpp line 257: > >> 255: // JvmtiThreadState objects for virtual thread filtered events enabled globally >> 256: // must be created eagerly if the interp_only_mode is enabled. Otherwise, >> 257: // it is an important optimization to create JvmtiThreadState objects lazily. > > No need for this comment here. It is already at the call site, which is where it belongs. Instead the comment here should say what this API does (return true if any thread has entered interp_only_mode at any point during the JVMs execution). Thanks, good suggestion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1396725938 PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1396728778 PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1396728297 PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1396727211 PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1396729089 From cjplummer at openjdk.org Fri Nov 17 05:58:31 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 17 Nov 2023 05:58:31 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 In-Reply-To: <65piquQpnXvDvmbpt-U_EtxYEe7zu8yRCp39ZDA6rZA=.336ea1e4-a339-4a38-9c28-7a2cd1fd2c31@github.com> References: <65piquQpnXvDvmbpt-U_EtxYEe7zu8yRCp39ZDA6rZA=.336ea1e4-a339-4a38-9c28-7a2cd1fd2c31@github.com> Message-ID: On Fri, 17 Nov 2023 05:38:28 GMT, Serguei Spitsyn wrote: >> src/hotspot/share/prims/jvmtiEventController.cpp line 372: >> >>> 370: return; // EnterInterpOnlyModeClosure will be executed right after mount. >>> 371: } >>> 372: ThreadsListHandle tlh(current); >> >> Why was this added? > > This is explained in the PR description. > Do you think, a just comment is needed or this has to be separated from this fix? I see the PR comment, but I don't really understand it. Is this to force some sort of early initialization to avoid a race later on? It just seems odd to create the tlh, but never use it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1396735115 From fyang at openjdk.org Fri Nov 17 07:16:32 2023 From: fyang at openjdk.org (Fei Yang) Date: Fri, 17 Nov 2023 07:16:32 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v3] In-Reply-To: <0T5rzQVycnjhsaQN4NIY8XMM-a-JwDAojLc7dENbetI=.424ac5f2-2748-4e6b-a70d-34ade9cd8e81@github.com> References: <0T5rzQVycnjhsaQN4NIY8XMM-a-JwDAojLc7dENbetI=.424ac5f2-2748-4e6b-a70d-34ade9cd8e81@github.com> Message-ID: On Thu, 16 Nov 2023 09:38:08 GMT, Yuri Gaevsky wrote: >> Hello All, >> >> Please review these changes to support _vectorizedHashCode intrinsic on >> RISC-V platform. The patch adds the "scalar" code for the intrinsic without >> usage of any RVV instruction but provides manual unrolling of the appropriate >> loop. The code with usage of RVV instruction could be added as follow-up of >> the patch or independently. >> >> Thanks, >> -Yuri Gaevsky >> >> P.S. My OCA has been accepted recently (ygaevsky). >> >> ### Correctness checks >> >> Testing: tier1 tests successfully passed on a RISC-V StarFive JH7110 board with Linux. >> >> ### Performance results (the numbers for non-ints are similar) >> >> #### StarFive JH7110 board: >> >> >> ArraysHashCode: without intrinsic with intrinsic >> ------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> ------------------------------------------------------------------------------- >> multiints 0 avgt 30 2.658 ? 0.001 2.661 ? 0.004 ns/op >> multiints 1 avgt 30 4.881 ? 0.011 4.892 ? 0.015 ns/op >> multiints 2 avgt 30 16.109 ? 0.041 10.451 ? 0.075 ns/op >> multiints 3 avgt 30 14.873 ? 0.068 11.753 ? 0.024 ns/op >> multiints 4 avgt 30 17.283 ? 0.078 13.176 ? 0.044 ns/op >> multiints 5 avgt 30 19.691 ? 0.136 14.723 ? 0.046 ns/op >> multiints 6 avgt 30 21.727 ? 0.166 15.463 ? 0.124 ns/op >> multiints 7 avgt 30 23.790 ? 0.126 18.298 ? 0.059 ns/op >> multiints 8 avgt 30 23.527 ? 0.116 18.267 ? 0.046 ns/op >> multiints 9 avgt 30 27.981 ? 0.303 20.453 ? 0.069 ns/op >> multiints 10 avgt 30 26.947 ? 0.215 20.541 ? 0.051 ns/op >> multiints 50 avgt 30 95.373 ? 0.588 69.238 ? 0.208 ns/op >> multiints 100 avgt 30 177.109 ? 0.525 137.852 ? 0.417 ns/op >> multiints 200 avgt 30 341.074 ? 1.363 296.832 ? 0.725 ns/op >> multiints 500 avgt 30 847.993 ? 1.713 752.415 ? 1.918 ns/op >> multiints 1000 avgt 30 1610.199 ? 5.424 1426.112 ? 3.407 ns/op >> multiints 10000 avgt 30 16234.260 ? 26.789 14447.936 ? 26.345 ns/op >> multiints 100000 avgt 30 170726.025 ? 184.003 152587.649 ? 381.964 ns/op >> ---------------------------------------... > > Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: > > Addressed most of suggestions for code improvements from @Hamlin-Li src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1467: > 1465: BasicType eltype) > 1466: { > 1467: assert_different_registers(ary, cnt, result, tmp1, tmp2, tmp3, tmp4, tmp5, tmp6); We have two scratch registers t0 (x5) / t1 (x6) which should be considered for use in this assembler function. These two are reserved from the register allocator and are suitable for keeping some short-lived values in the procedure. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1396787813 From rehn at openjdk.org Fri Nov 17 07:21:50 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 17 Nov 2023 07:21:50 GMT Subject: RFR: 8310656: RISC-V: __builtin___clear_cache can fail silently. [v3] In-Reply-To: References: <6JeLSyWD6twDLD83OPiG-_5lTgGVHn8dh-rKkc7scmM=.9b7be0e7-cb20-44c6-8d28-d72c00d41edf@github.com> <4eyogulkaSvi1d-xVbPCAp_mwRSD5sHyfysJj2Gat2A=.abfd20ed-c21d-4150-b25c-e4f9a5b71868@github.com> Message-ID: On Wed, 12 Jul 2023 08:45:41 GMT, Vladimir Kempik wrote: > would you mind backporting it to 17u-dev as well ? Sorry missed this. @luhenry it will not apply clean due to "8311145: Remove check_with_errno duplicates" ------------- PR Comment: https://git.openjdk.org/jdk/pull/14670#issuecomment-1815854799 From aboldtch at openjdk.org Fri Nov 17 07:22:58 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 17 Nov 2023 07:22:58 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v4] In-Reply-To: References: <2MRTHFoYSaSW2NH922LOEvqKx4NLjshWaHJaYV2RdVY=.e234046a-aac8-4d7b-81b9-269506944165@github.com> Message-ID: On Thu, 16 Nov 2023 19:05:38 GMT, Daniel D. Daugherty wrote: >> I see what you mean. Because we need to put the hash code in the ObjectMonitor if it already is inflated and it may be the case that the ObjectMonitor has an anonymous owner we can still call `is_lock_owned` when retrieving the monitor for FastHashCode. >> >> Technically we could condition the inflate call on if the mark word lock bits and pick out the ObjectMonitor from the mark word. It would always be installed when reaching that point in the LM_LIGHTWEIGHT mode. > > Do we inflate when the VMThread is doing JVM/TI tagging? LM_LEGACY and LM_MONITOR will. LM_LIGHTWEIGHT technically may. If deflation finishes between reading the mark word in FastHashCode and reading the mark word in `inflate`. It seems like a rare enough case that it does not need to be handled separately. The following change would avoid inflation all together. // An async deflation can race after the inflate() call and before we // can update the ObjectMonitor's header with the hash value below. + if (LockingMode == LM_LIGHTWEIGHT) { + assert(mark.has_monitor(), "must be"); + monitor = mark.monitor(); + } else { - monitor = inflate(current, obj, inflate_cause_hash_code); + monitor = inflate(current, obj, inflate_cause_hash_code); + } // Load ObjectMonitor's header/dmw field and see if it has a hash. Maybe I should change it to this. Given that there has been confusion here. My ideal solution would be to separate the implementations for the different locking modes all together, all of these functions are littered with if (LockingMode == X). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16603#discussion_r1396792753 From sspitsyn at openjdk.org Fri Nov 17 07:26:30 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 17 Nov 2023 07:26:30 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 In-Reply-To: References: <65piquQpnXvDvmbpt-U_EtxYEe7zu8yRCp39ZDA6rZA=.336ea1e4-a339-4a38-9c28-7a2cd1fd2c31@github.com> Message-ID: On Fri, 17 Nov 2023 05:55:44 GMT, Chris Plummer wrote: >> This is explained in the PR description. >> Do you think, a just comment is needed or this has to be separated from this fix? > > I see the PR comment, but I don't really understand it. Is this to force some sort of early initialization to avoid a race later on? It just seems odd to create the tlh, but never use it. The `tlh` is used to protect any existing at this point JavaThread's from being terminated while it is set. My understanding is that there is no need to iterate over all threads in the list to get this protection. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1396795605 From sspitsyn at openjdk.org Fri Nov 17 07:31:33 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 17 Nov 2023 07:31:33 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 In-Reply-To: <65piquQpnXvDvmbpt-U_EtxYEe7zu8yRCp39ZDA6rZA=.336ea1e4-a339-4a38-9c28-7a2cd1fd2c31@github.com> References: <65piquQpnXvDvmbpt-U_EtxYEe7zu8yRCp39ZDA6rZA=.336ea1e4-a339-4a38-9c28-7a2cd1fd2c31@github.com> Message-ID: On Fri, 17 Nov 2023 05:43:21 GMT, Serguei Spitsyn wrote: >> src/hotspot/share/prims/jvmtiThreadState.cpp line 531: >> >>> 529: >>> 530: // JvmtiThreadState objects for virtual thread filtered events enabled globally >>> 531: // must be created eagerly if the interp_only_mode is enabled. Otherwise, >> >> This sentence is hard to read. How about: >> >> "If interp_only_mode is enabled then we must eagerly create JvmtiThreadState objects for globally enabled virtual thread filtered events." > > Okay, thanks. The suggestion is taken. Fixed. >> src/hotspot/share/prims/jvmtiThreadState.cpp line 579: >> >>> 577: VTMS_mount_end(vthread); >>> 578: if (JvmtiExport::can_support_virtual_threads() && >>> 579: JvmtiExport::should_post_vthread_mount()) { >> >> It seems odd that "can_support" can be false when "should_post" is true. I would think that "should_post" would always be false when "can_support" is false, and therefore there would be no need to check "can_support". > > Right, thanks. It is why this check was missed in the first place. Will undo this change. Fixed. >> src/hotspot/share/prims/jvmtiThreadState.hpp line 234: >> >>> 232: inline void set_head_env_thread_state(JvmtiEnvThreadState* ets); >>> 233: >>> 234: static bool _seen_interp_only_mode; // needed for optimization >> >> Say what the flag represents, not why we have it. > > Thank you for looking at this PR! > Okay, thanks. Will do. Fixed. >> src/hotspot/share/prims/jvmtiThreadState.hpp line 257: >> >>> 255: // JvmtiThreadState objects for virtual thread filtered events enabled globally >>> 256: // must be created eagerly if the interp_only_mode is enabled. Otherwise, >>> 257: // it is an important optimization to create JvmtiThreadState objects lazily. >> >> No need for this comment here. It is already at the call site, which is where it belongs. Instead the comment here should say what this API does (return true if any thread has entered interp_only_mode at any point during the JVMs execution). > > Thanks, good suggestion. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1396798836 PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1396801201 PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1396798590 PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1396799894 From sspitsyn at openjdk.org Fri Nov 17 07:34:44 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 17 Nov 2023 07:34:44 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 [v2] In-Reply-To: References: Message-ID: <5EZl52o4_wJkz6SVSxWkPGVdc20b2TWgb2KEWT2Np8Q=.3d52eccc-2b72-4a42-9f5d-5c2fd54475bf@github.com> > This is a fix of a performance/scalability related issue. The `JvmtiThreadState` objects for virtual thread filtered events enabled globally are created eagerly because it is needed when the `interp_only_mode` is enabled. Otherwise, some events which are generated in `interp_only_mode` from the debugging version of interpreter chunks can be missed. > However, it has to be okay to avoid eager creation of these object if no `interp_only_mode` has ever been requested. > It seems to be an extremely important optimization to create JvmtiThreadState objects lazily in such cases. > It is done by introducing the flag `JvmtiThreadState::_seen_interp_only_mode` which indicates when the `JvmtiThreadState` objects have to be created eagerly. > > Additionally, the fix includes the following related changes: > - Use condition double checking idiom for `MutexLocker mu(JvmtiThreadState_lock)` in the function `JvmtiVTMSTransitionDisabler::VTMS_mount_end` which is on a performance-critical path and looks like this: > > JvmtiThreadState* state = thread->jvmti_thread_state(); > if (state != nullptr && state->is_pending_interp_only_mode()) { > MutexLocker mu(JvmtiThreadState_lock); > state = thread->jvmti_thread_state(); > if (state != nullptr && state->is_pending_interp_only_mode()) { > JvmtiEventController::enter_interp_only_mode(); > } > } > > > - Add extra check of `JvmtiExport::can_support_virtual_threads()` when virtual thread mount and unmount are posted. > - Minor: Added a `ThreadsListHandle` to the `JvmtiEventControllerPrivate::enter_interp_only_mode`. It is needed because of the dynamic creation of compensating carrier threads which is racy for JVMTI `SetEventNotificationMode` implementation. > > Performance mesurements: > - Without this fix the test provided by the bug submitter gives execution numbers: > - no ClassLoad events enabled: 3251 ms > - ClassLoad events are enabled: 40534 ms > > - With the fix: > - no ClassLoad events enabled: 3270 ms > - ClassLoad events are enabled: 3385 ms > > Testing: > - Ran mach5 tiers 1-6, no regressions are noticed Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: (1) removed unneeded check for can_support_virtual_threads; (2) corrected some comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16686/files - new: https://git.openjdk.org/jdk/pull/16686/files/c5ba2cb0..2582ae3d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16686&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16686&range=00-01 Stats: 26 lines in 2 files changed: 0 ins; 8 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/16686.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16686/head:pull/16686 PR: https://git.openjdk.org/jdk/pull/16686 From aboldtch at openjdk.org Fri Nov 17 08:02:56 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 17 Nov 2023 08:02:56 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v4] In-Reply-To: References: <-DODxJdHO2qs-XXVSQSSIZZZKIfHjHKtY8kt9PpNWVs=.82a739dc-0da3-4bbc-b0de-c00ebae56c22@github.com> Message-ID: <8xx2PGeKCQDE_G0h8NVj0ZsBbYbFy8lYkrB_jKq6X5I=.f1d1735f-9760-461f-a86e-c682306298e4@github.com> On Thu, 16 Nov 2023 18:42:02 GMT, Daniel D. Daugherty wrote: >> Now I think I see why you changed the `cas_set_mark()` call to return its value into >> `mark` instead of `old_mark` and why the original parameter to the CAS was changed >> from `mark` to `old_mark`. It's a bit different style then we typically use for CAS where >> the return value goes into an `old_something` variable. >> >> Since `mark` is now the loop control and `old_mark` is now `const` you had to juggle >> things with the `cas_set_mark()` call. Okay, I think I get the reason for this bit of juggling. >> (I still don't understand the rename from `locked_mark` to `new_mark` but that's okay.) > > Okay so this bit of code changes were done for a couple of reasons (of course this is my guess): > 1) add a retry here in case of collision to give this thread another chance at doing a lightweight lock instead of always dropping into inflate-enter loop below on a collision > 2) clean up the code a little bit with some use of `const` to make things a little more clear. > > I'm going to guess that the type of collision that we're trying to avoid is hash code installation. > > You may want to add a comment above `L516 while (mark.is_unlocked()) {` that says: > > // Allow for a retry here just in case the cas_set_mark() below collided with a hash code installation: > The change from mark.is_neutral() to mark.is_unlocked() [...]. Is there some reason to make that change? It made the code more readable for me. The is unlocked conveys some relevant meaning, is natural requires knowledge that an unlocked mark word is neutral. The code then reads as transition the mark word from unlocked to to fast locked. Instead of transition the mark word from neutral to fast locked. It may just be my unfamiliarity with the this code ands its history which makes me think the former is more easily understandable then the latter. > Are we trying to migrate away from `mark.is_neutral()` to `mark.is_unlocked()`? There probably is some history here I am unaware of. But to me only one should exist, or the neutral property's meaning needs to be explained in the markWord.hpp file so it is clear what makes it distinct from unlocked. If they are just aliases then to me unlocked seems more natural, but could replace all unlocked with neutral as well. > I still don't understand the rename from `locked_mark` to `new_mark` I'll rename it. I believe it is coincidental. I removed the old code and implemented the new. `cas_set_mark` calls it parameters `old_mark` and `new_mark`, so I did as well. > It's a bit different style then we typically use for CAS where the return value goes into an `old_something` variable. As this is the case I will switch to that style. To reduce confusion. ```c++ const markWord locked_mark = mark.set_fast_locked(); const markWord old_mark = obj()->cas_set_mark(locked_mark, mark); if (old_mark == mark) { // Successfully fast-locked, push object to lock-stack and return. lock_stack.push(obj()); return; } mark = old_mark; > You may want to add a comment above `L516 while (mark.is_unlocked()) {` that says: > > ``` > // Allow for a retry here just in case the cas_set_mark() below collided with a hash code installation: > ``` I'll add a comment. But maybe it should instead say // Retry until a lock state change has been observed. cas_set_mark() may collide with non lock bits modifications. This comment should age better when/if we start using more than just the lock bits and hash code bits. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16603#discussion_r1396831513 From aboldtch at openjdk.org Fri Nov 17 08:16:00 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 17 Nov 2023 08:16:00 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v4] In-Reply-To: References: Message-ID: On Thu, 16 Nov 2023 19:15:06 GMT, Daniel D. Daugherty wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Move is_lock_owned closer to its only use > > src/hotspot/cpu/x86/sharedRuntime_x86.cpp line 68: > >> 66: __ testptr(result, markWord::monitor_value); >> 67: __ jcc(Assembler::notZero, slowCase); >> 68: } else { > > Not needed for other platforms? Or will that be done with other bugs or sub-tasks? I only found that x86 and 32bit arm does this in the shared runtime. I can create a jbs entry for arm. If it is the case that other platform do this for C1, then I cannot find where. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16603#discussion_r1396843718 From aph at openjdk.org Fri Nov 17 08:22:42 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 17 Nov 2023 08:22:42 GMT Subject: RFR: JDK-8319927: Log that IEEE rounding mode was corrupted by loading a library In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 16:06:18 GMT, Matthias Baesken wrote: > [JDK-8295159](https://bugs.openjdk.org/browse/JDK-8295159) added some IEEE conformance checks and corrections on Linux and macOS/BSD , however in case of issues no logging is done, this should be improved. src/hotspot/os/bsd/os_bsd.cpp line 1013: > 1011: int rtn = fesetenv(&default_fenv); > 1012: assert(rtn == 0, "fesetenv must succeed"); > 1013: bool ieee_handling_after_issue = IEEE_subnormal_handling_OK(); This is a misleading name. It should be something explicit like `ieee_handling_succeeded`. Still, I suppose it's too late now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16618#discussion_r1396849608 From aboldtch at openjdk.org Fri Nov 17 08:29:21 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 17 Nov 2023 08:29:21 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v5] In-Reply-To: References: Message-ID: > LM_LIGHTWEIGHT only uses the lock bits for its locking. This leaves the hashCode bits free when a monitor is not inflated. So instead of inflating when installing the hashCode on a fast locked object it can simply use the hashCode bits in the markWord. > > The mark word transitions Unlocked (0b01) <=> Locked (0b00) are done by retrying the CAS if it fails due to non-lock bit changes. > The mark word transitions Monitor (0b10) <=> Locked/Unlocked (0b0X) are the same as before, inflation already handles hash codes. This change does not interact with the mark word if it is in a Monitor (0b10) state, so the strong CAS which is used for deflation are still valid, and will not fail to any other reason than the cooperative race to help transition the mark word during deflation. > > This is dependent on JDK-8319778 simply because JDK-8319797 is dependent on both this and JDK-8319778. Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Use more familiar CAS variable names and pattern ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16603/files - new: https://git.openjdk.org/jdk/pull/16603/files/eac6d691..6fbdc689 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16603&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16603&range=03-04 Stats: 8 lines in 1 file changed: 2 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16603.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16603/head:pull/16603 PR: https://git.openjdk.org/jdk/pull/16603 From aph at openjdk.org Fri Nov 17 08:43:32 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 17 Nov 2023 08:43:32 GMT Subject: RFR: 8319801: Recursive lightweight locking: aarch64 implementation In-Reply-To: References: Message-ID: On Thu, 16 Nov 2023 11:59:26 GMT, Axel Boldt-Christmas wrote: > > Hmm. Which hardware is this? This is stuff I need to be aware of. Please contact me off-line if it's hard to say in public. > > This has been observed with different versions of the Apple M1 > processors. Heh, go figure. This is even more remarkable, given that Apple's own compilers seem to always emit LSE CAS, at least by default. I suppose it's understandable that Apple didn't focus much on LSE performance, given that M1 and its derivatives aren't intended for multi-socket designs, and are based on a highly-integrated cellphone design. But no one told the compiler team. > To clarify, when I say contention I am referring to java monitor > contention, that is, multiple threads are trying to lock the same > object. > > The performance is particularly bad if the LSE CAS fails. This > pattern is something that is prevalent in the un-contended inflated > recursive lock. In the current implementation this is still an > issue, but as we are removing most of the common reason why a > un-contended lock gets inflated we should not see this as often. > > We have at some point also had some code which improves this (e.g. > https://github.com/xmas92/jdk/blob/3150426b261bfceacdceda1b2ebccd82b6e6fb41/src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp#L162-L167 > ) But I did not want to also change the inflated lock / unlock paths > in this PR. Good idea. > We also have had tried different recursive lightweight unlock paths, > some where avoiding the LSE CAS has been more important. In the > current PR it is less important as we make decisions based on the > state of the lock stack first. This avoids most of the cases of > un-contended failing CASes that occur in the main line > implementation. However it still seemed to be more performant on > this hardware to use LL-SC pair. > > Here Where? I see a bunch of patches, but not results. > are some microbenchmarks running on an Apple M1 Pro chip. This > is an extended version of the LockUnlock.java JMH micros. (Patch > [3a7eb13](https://github.com/openjdk/jdk/commit/3a7eb137140971f6b21ffea5dbf512300b38371a)) > Extended because some of the tests never get compiled because C2 > bails out. (Clearly identified in the results as they are an order > of magnitude worse). It's really important not to complexify the AArch64 port for the sake of one manufacturer, no matter how important. If this stuff can be done by providing a hint to the CAS macros that ldx/stx is to be preferred in the Apple case, then it would be tolerable. My primary goal is protecting the AArch64 back end from well-intentioned maintainers. And I suspect that this uncontended case isn't that important to real-world workloads, but I suppose it may be in some cases.. (And, for avoidance of doubt, I suspect that the Neoverse designs are far more important for Java. But I have no evidence to prove that.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16608#issuecomment-1815946091 From aboldtch at openjdk.org Fri Nov 17 08:54:30 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 17 Nov 2023 08:54:30 GMT Subject: RFR: 8319801: Recursive lightweight locking: aarch64 implementation In-Reply-To: References: Message-ID: On Fri, 17 Nov 2023 08:41:13 GMT, Andrew Haley wrote: > Where? I see a bunch of patches, but not results. They are inlined in my comment inside `
` tags. Collapsable sections. Unsure how it looks on your screen but on mine there is a little tree-view style triangle that opens each section. ![image](https://github.com/openjdk/jdk/assets/1139284/642bb823-ba53-46dc-ad6c-df4c71540b18) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16608#issuecomment-1815960620 From aph at openjdk.org Fri Nov 17 09:36:33 2023 From: aph at openjdk.org (Andrew Haley) Date: Fri, 17 Nov 2023 09:36:33 GMT Subject: RFR: 8319801: Recursive lightweight locking: aarch64 implementation [v2] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 08:00:50 GMT, Axel Boldt-Christmas wrote: >> Implements the aarch64 port of JDK-8319796. >> >> There are two major parts for the port implementation. The C2 part, and the part shared by the interpreter, C1 and the native call wrapper. >> >> The biggest change for both parts is that we check the lock stack first and if it is a recursive lightweight [un]lock and in that case simply pop/push and finish successfully. >> >> Only if the recursive lightweight [un]lock fails does it look at the mark word. >> >> For the shared part if it is an unstructured exit, the monitor is inflated or the mark word transition fails it calls into the runtime. >> >> The C2 operates under a few more assumptions, that the locking is structured and balanced. This means that some checks can be elided. >> >> First this means that in C2 unlock if the obj is not on the top of the lock stack, it must be inflated. And reversely if we reach the inflated C2 unlock the obj is not on the lock stack. This second property makes it possible to avoid reading the owner (and checking if it is anonymous). Instead it can either just do an un-contended unlock by writing null to the owner, or if contention happens, simply write the thread to the owner and jump to the runtime. >> >> The aarch64 C2 port tries to avoid stronger memory semantics where ever possible. In C2 lock it first does a relaxed load of the mark word to check for inflation. Both lock and unlock uses a load/store exclusive register pair to transition the mark word. > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge remote-tracking branch 'upstream_jdk/pr/16606' into JDK-8319801 > - 8319801: Recursive lightweight locking: aarch64 implementation > - Cleanup: C2 fast_lock/fast_unlock aarch64 > > Where? I see a bunch of patches, but not results. > > They are inlined in my comment inside `
` tags. Collapsable sections. > > Unsure how it looks on your screen but on mine there is a little tree-view style triangle that opens each section. ![image](https://user-images.githubusercontent.com/1139284/283735408-642bb823-ba53-46dc-ad6c-df4c71540b18.png) Ha! I had no idea Github comments could do that. Those test results do make for interesting reading, with no clear general-purpose solution. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16608#issuecomment-1816032022 From sspitsyn at openjdk.org Fri Nov 17 10:30:59 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 17 Nov 2023 10:30:59 GMT Subject: RFR: 8320239: add dynamic switch for JvmtiVTMSTransitionDisabler sync protocol [v2] In-Reply-To: References: Message-ID: > This is an update for a performance/scalability enhancement. > > The `JvmtiVTMSTransitionDisabler`sync protocol is on a performance critical path of the virtual threads mount state transitions (VTMS transitions). It has a noticeable performance overhead (about 10%) which contributes to the combined JVMTI performance overhead when Java apps are executed with loaded JVMTI agents. > > Please, also see another/related performance issue which contributes around 70% of total performance overhead: > [8308614](https://bugs.openjdk.org/browse/JDK-8308614): Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 > > Testing: > - Ran mach5 tiers 1-6 with no regressions noticed. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: make new fields volatile, use Atomic for access/update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16688/files - new: https://git.openjdk.org/jdk/pull/16688/files/bf093127..a81218fd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16688&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16688&range=00-01 Stats: 15 lines in 2 files changed: 3 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/16688.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16688/head:pull/16688 PR: https://git.openjdk.org/jdk/pull/16688 From duke at openjdk.org Fri Nov 17 11:27:32 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Fri, 17 Nov 2023 11:27:32 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v3] In-Reply-To: References: <0T5rzQVycnjhsaQN4NIY8XMM-a-JwDAojLc7dENbetI=.424ac5f2-2748-4e6b-a70d-34ade9cd8e81@github.com> Message-ID: On Thu, 16 Nov 2023 17:00:45 GMT, Hamlin Li wrote: >> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed most of suggestions for code improvements from @Hamlin-Li > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1492: > >> 1490: beqz(cnt, DONE); >> 1491: >> 1492: lw(pow31_2, ExternalAddress(StubRoutines::riscv::arrays_hashcode_powers_of_31() > > Now you don't need this `lw` anymore, as 961 will fit in an immediate, so `mv pow31_2, 961` should be fine. Agreed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1397121746 From jvernee at openjdk.org Fri Nov 17 12:01:12 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 17 Nov 2023 12:01:12 GMT Subject: RFR: 8267532: C2: Profile and prune untaken exception handlers [v8] In-Reply-To: References: Message-ID: > The issue is essentially that for the Java try-with-resource construct, javac generates multiple calls to `close(`) the resource. One of those calls is inside the hidden exception handler of the try block. The issue for us is that typically the exception handler is never entered (since no exception is thrown), however we don't profile exception handlers at the moment, so the block is not pruned. C2 doesn't inline the `close()` call in the handler due to low call site frequency. As a result, the receiver of that call escapes and can not be scalar replaced, which then leads to a loss in performance. > > There has been some discussion on the JBS issue that this could be fixed by profiling catch blocks. And another suggestion that partial escape analysis could help here to prevent the object from escaping. But, I think there are other benefits to being able to prune dead catch blocks, such as general reduction in code size, and other optimizations being possible by dead code being eliminated. So, I've implemented catch block profiling + pruning in this patch. > > The implementation is essentially very straightforward: we allocate an extra bit of profiling data for each > exception handler of a method in the `MethodData` for that method (which holds all the profiling > data). Then when looking up the exception handler after an exception is thrown, we mark the > exception handler as entered. When C2 parses the exception handler block, and it sees that it has > never been entered, we emit an uncommon trap instead. > > I've also cleaned up the handling of profiling data sections a bit. After adding the extra section of data to MethodData, I was seeing several crashes when ciMethodData was used. The underlying issue seemed to be that the offset of the parameter data was computed based on the total data size - parameter data size (which doesn't work if we add an additional section for exception handler data). I've re-written the code around this a bit to try and prevent issues in the future. Both MethodData and ciMethodData now track offsets of parameter data and exception handler data, and the size of the each data section is derived from the offsets. > > Finally, there was an assert firing in `freeze_internal` in `continuationFreezeThaw.cpp`: > > assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), > "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int64_t)current->held_monitor_count... Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 40 additional commits since the last revision: - add too_many_traps check - Remove has_monitors fix - Merge branch 'master' into PruneDeadCatchBlocks - Only use ProfileExceptionHandlers - drop ProfileExceptionHandlers flag - track catch block enters in deoptimization code too - Add @requires vm.debug to test - remove leftover comment - Add smoke tests for -XX:+StressPrunedExceptionHandlers and -XX:-ProfileExceptionHandlers - Add missing spaces to IRNode - ... and 30 more: https://git.openjdk.org/jdk/compare/7f5b3aef...bee05534 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16416/files - new: https://git.openjdk.org/jdk/pull/16416/files/86700da6..bee05534 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16416&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16416&range=06-07 Stats: 666111 lines in 2543 files changed: 107300 ins; 485759 del; 73052 mod Patch: https://git.openjdk.org/jdk/pull/16416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16416/head:pull/16416 PR: https://git.openjdk.org/jdk/pull/16416 From jvernee at openjdk.org Fri Nov 17 12:18:41 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 17 Nov 2023 12:18:41 GMT Subject: RFR: 8267532: C2: Profile and prune untaken exception handlers [v8] In-Reply-To: References: Message-ID: On Fri, 17 Nov 2023 12:01:12 GMT, Jorn Vernee wrote: >> The issue is essentially that for the Java try-with-resource construct, javac generates multiple calls to `close(`) the resource. One of those calls is inside the hidden exception handler of the try block. The issue for us is that typically the exception handler is never entered (since no exception is thrown), however we don't profile exception handlers at the moment, so the block is not pruned. C2 doesn't inline the `close()` call in the handler due to low call site frequency. As a result, the receiver of that call escapes and can not be scalar replaced, which then leads to a loss in performance. >> >> There has been some discussion on the JBS issue that this could be fixed by profiling catch blocks. And another suggestion that partial escape analysis could help here to prevent the object from escaping. But, I think there are other benefits to being able to prune dead catch blocks, such as general reduction in code size, and other optimizations being possible by dead code being eliminated. So, I've implemented catch block profiling + pruning in this patch. >> >> The implementation is essentially very straightforward: we allocate an extra bit of profiling data for each >> exception handler of a method in the `MethodData` for that method (which holds all the profiling >> data). Then when looking up the exception handler after an exception is thrown, we mark the >> exception handler as entered. When C2 parses the exception handler block, and it sees that it has >> never been entered, we emit an uncommon trap instead. >> >> I've also cleaned up the handling of profiling data sections a bit. After adding the extra section of data to MethodData, I was seeing several crashes when ciMethodData was used. The underlying issue seemed to be that the offset of the parameter data was computed based on the total data size - parameter data size (which doesn't work if we add an additional section for exception handler data). I've re-written the code around this a bit to try and prevent issues in the future. Both MethodData and ciMethodData now track offsets of parameter data and exception handler data, and the size of the each data section is derived from the offsets. >> >> Finally, there was an assert firing in `freeze_internal` in `continuationFreezeThaw.cpp`: >> >> assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), >> "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int... > > Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 40 additional commits since the last revision: > > - add too_many_traps check > - Remove has_monitors fix > - Merge branch 'master' into PruneDeadCatchBlocks > - Only use ProfileExceptionHandlers > - drop ProfileExceptionHandlers flag > - track catch block enters in deoptimization code too > - Add @requires vm.debug to test > - remove leftover comment > - Add smoke tests for -XX:+StressPrunedExceptionHandlers and -XX:-ProfileExceptionHandlers > - Add missing spaces to IRNode > - ... and 30 more: https://git.openjdk.org/jdk/compare/213535ea...bee05534 I've removed the fix for the `has_monitors` issue [1] and filed: https://bugs.openjdk.org/browse/JDK-8320310 I've also added checks using `too_many_traps` as suggested [2] [1]: https://github.com/openjdk/jdk/pull/16416/commits/0a5e247d1ed8402a9923cc726f30f80e08754d32 [2]: https://github.com/openjdk/jdk/pull/16416/commits/bee05534777dc2caf10362f66fea90a06705a144 ------------- PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1816303264 From duke at openjdk.org Fri Nov 17 13:09:00 2023 From: duke at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 17 Nov 2023 13:09:00 GMT Subject: RFR: 8318480: Obsolete UseCounterDecay and remove CounterDecayMinIntervalLength [v4] In-Reply-To: References: Message-ID: <50YDKFPHpqCEnhBk5eBeKWpbTJIHfFpQCfOcdVE8OhE=.75b95951-2c37-4f48-9a0d-fd52251f5771@github.com> > This changeset deprecates the leftover (i.e., no longer used for anything) product compiler flag `UseCounterDecay` (requires CSR) and removes the leftover develop flag `CounterDecayMinIntervalLength`. > > Changes: > - Deprecate `UseCounterDecay` in JDK 22, obsolete it in JDK 23, and expire it in JDK 24. The flag is, in fact, already obsolete, so I've also removed it from the source code (except for the definition in `globals.hpp` which must remain until obsoletion). > - Completely remove `CounterDecayMinIntervalLength`. > > ### Testing > Platforms: windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64. > - `tier1` > - HotSpot parts of `tier2` and `tier3` Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Obsolete UseCounterDecay ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16673/files - new: https://git.openjdk.org/jdk/pull/16673/files/61e0a104..611ac09a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16673&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16673&range=02-03 Stats: 5 lines in 2 files changed: 1 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16673.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16673/head:pull/16673 PR: https://git.openjdk.org/jdk/pull/16673 From azafari at openjdk.org Fri Nov 17 13:22:36 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 17 Nov 2023 13:22:36 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v3] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 09:38:25 GMT, Afshin Zafari wrote: >>> I still approve of this patch as it's better than what we had before. There are a lot of suggested improvements that can be done either in this PR or in a future RFE. `git blame` shows that this hasn't been touched since 2008, so I don't think applying all suggestions now is in any sense critical :-). >> >> Not touched since 2008 suggests to me there might not be a rush to make the change as proposed, and instead take >> the (I think small) additional time to do the better thing, e.g. the unary-predicate suggestion made by several folks. > > @kimbarrett , @dholmes-ora , @merykitty > Is there any comment on this PR? > @afshin-zafari I will leave it to other to (re-) review the latest changes. I don't grok this template stuff enough to pass judgement. Thank you very much @dholmes-ora, for your comments and discussions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15418#issuecomment-1816417460 From azafari at openjdk.org Fri Nov 17 13:22:38 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 17 Nov 2023 13:22:38 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v8] In-Reply-To: <648SrHxCX6_kRX7cmxyGurxOWecLTWUw0_C79J_okbo=.473eaa8c-7dfc-4aa0-839d-bb580cc9d312@github.com> References: <97IBSrr12htoiw751JlhL4f7jiEZeoYVF9hQjas8vrI=.a7143156-e1d5-4774-ba4b-08e29eb05389@github.com> <648SrHxCX6_kRX7cmxyGurxOWecLTWUw0_C79J_okbo=.473eaa8c-7dfc-4aa0-839d-bb580cc9d312@github.com> Message-ID: On Thu, 16 Nov 2023 06:46:48 GMT, Quan Anh Mai wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> function pointer is replaced with template Functor. > > src/hotspot/share/utilities/growableArray.hpp line 213: > >> 211: >> 212: template >> 213: int find(T* token, F f) const { > > Should be > > template > int find(F f) const { > for (int i = 0; i < _len; i++) { > if (f(_data[i]) { > return i; > } > } > return -1; > } We need `token` to find it in the array, don't we? All the invocations pass such a function with two parameters. The change here needs all invocations to be changed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15418#discussion_r1397293350 From stefank at openjdk.org Fri Nov 17 13:34:36 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 17 Nov 2023 13:34:36 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v3] In-Reply-To: References: Message-ID: On Fri, 17 Nov 2023 13:20:22 GMT, Afshin Zafari wrote: >> @kimbarrett , @dholmes-ora , @merykitty >> Is there any comment on this PR? > >> @afshin-zafari I will leave it to other to (re-) review the latest changes. I don't grok this template stuff enough to pass judgement. > > Thank you very much @dholmes-ora, for your comments and discussions. @afshin-zafari I think you misunderstand the feedback given. The suggestions from many of us has been that you should change the functions to accept lambdas / template-typed functions instead. https://github.com/openjdk/jdk/pull/15418#discussion_r1305389552 https://github.com/openjdk/jdk/pull/15418#discussion_r1375386257 https://github.com/openjdk/jdk/pull/15418#discussion_r1376940244 https://github.com/openjdk/jdk/pull/15418#discussion_r1395219935 ------------- PR Comment: https://git.openjdk.org/jdk/pull/15418#issuecomment-1816438451 From jvernee at openjdk.org Fri Nov 17 13:41:58 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 17 Nov 2023 13:41:58 GMT Subject: RFR: 8267532: C2: Profile and prune untaken exception handlers [v9] In-Reply-To: References: Message-ID: <0A-M6LwxHmiYUfunlz_qgeFiPJoWcmzElMOD6RtxWmc=.da64f93c-9db9-4d19-aaa2-c204857f3595@github.com> > The issue is essentially that for the Java try-with-resource construct, javac generates multiple calls to `close(`) the resource. One of those calls is inside the hidden exception handler of the try block. The issue for us is that typically the exception handler is never entered (since no exception is thrown), however we don't profile exception handlers at the moment, so the block is not pruned. C2 doesn't inline the `close()` call in the handler due to low call site frequency. As a result, the receiver of that call escapes and can not be scalar replaced, which then leads to a loss in performance. > > There has been some discussion on the JBS issue that this could be fixed by profiling catch blocks. And another suggestion that partial escape analysis could help here to prevent the object from escaping. But, I think there are other benefits to being able to prune dead catch blocks, such as general reduction in code size, and other optimizations being possible by dead code being eliminated. So, I've implemented catch block profiling + pruning in this patch. > > The implementation is essentially very straightforward: we allocate an extra bit of profiling data for each > exception handler of a method in the `MethodData` for that method (which holds all the profiling > data). Then when looking up the exception handler after an exception is thrown, we mark the > exception handler as entered. When C2 parses the exception handler block, and it sees that it has > never been entered, we emit an uncommon trap instead. > > I've also cleaned up the handling of profiling data sections a bit. After adding the extra section of data to MethodData, I was seeing several crashes when ciMethodData was used. The underlying issue seemed to be that the offset of the parameter data was computed based on the total data size - parameter data size (which doesn't work if we add an additional section for exception handler data). I've re-written the code around this a bit to try and prevent issues in the future. Both MethodData and ciMethodData now track offsets of parameter data and exception handler data, and the size of the each data section is derived from the offsets. > > Finally, there was an assert firing in `freeze_internal` in `continuationFreezeThaw.cpp`: > > assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), > "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int64_t)current->held_monitor_count... Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: - fix linux compile - Revert "add too_many_traps check" This reverts commit bee05534777dc2caf10362f66fea90a06705a144. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16416/files - new: https://git.openjdk.org/jdk/pull/16416/files/bee05534..46c94342 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16416&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16416&range=07-08 Stats: 19 lines in 2 files changed: 0 ins; 3 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/16416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16416/head:pull/16416 PR: https://git.openjdk.org/jdk/pull/16416 From mbaesken at openjdk.org Fri Nov 17 14:02:39 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 17 Nov 2023 14:02:39 GMT Subject: RFR: JDK-8320300: Adjust hs_err output in malloc/mmap error cases Message-ID: <6TiJAzltmbdd02CR3DYfKTC1aw4VI2SIgOh4vtR2TxI=.9e372e5a-9017-4532-a16a-48f6ba61385a@github.com> Some of the error output could be slightly improved. Currently it says for example: There is insufficient memory for the Java Runtime Environment to continue. Native memory allocation (mmap) failed to map 65536 bytes for Failed to commit metaspace. Possible reasons: . . . The process is running with CompressedOops enabled, and the Java Heap may be blocking the growth of the native heap The output 'bytes for Failed to commit metaspace.' should be rephrased. The reason should be more clear that it really IS the case for the current JVM that CompressedOops is set (and that it is not just some general advice) . ------------- Commit messages: - JDK-8320300 Changes: https://git.openjdk.org/jdk/pull/16707/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16707&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320300 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16707.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16707/head:pull/16707 PR: https://git.openjdk.org/jdk/pull/16707 From clanger at openjdk.org Fri Nov 17 15:44:38 2023 From: clanger at openjdk.org (Christoph Langer) Date: Fri, 17 Nov 2023 15:44:38 GMT Subject: RFR: JDK-8320300: Adjust hs_err output in malloc/mmap error cases In-Reply-To: <6TiJAzltmbdd02CR3DYfKTC1aw4VI2SIgOh4vtR2TxI=.9e372e5a-9017-4532-a16a-48f6ba61385a@github.com> References: <6TiJAzltmbdd02CR3DYfKTC1aw4VI2SIgOh4vtR2TxI=.9e372e5a-9017-4532-a16a-48f6ba61385a@github.com> Message-ID: On Fri, 17 Nov 2023 13:53:53 GMT, Matthias Baesken wrote: > Some of the error output could be slightly improved. Currently it says for example: > > There is insufficient memory for the Java Runtime Environment to continue. > Native memory allocation (mmap) failed to map 65536 bytes for Failed to commit metaspace. > Possible reasons: > . . . > The process is running with CompressedOops enabled, and the Java Heap may be blocking the growth of the native heap > > The output 'bytes for Failed to commit metaspace.' should be rephrased. > The reason should be more clear that it really IS the case for the current JVM that CompressedOops is set (and that it is not just some general advice) . Makes sense. ------------- Marked as reviewed by clanger (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16707#pullrequestreview-1737309784 From jvernee at openjdk.org Fri Nov 17 15:46:37 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 17 Nov 2023 15:46:37 GMT Subject: RFR: 8267532: C2: Profile and prune untaken exception handlers [v5] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 17:42:52 GMT, Vladimir Ivanov wrote: > > Missing profiling would be bad, as in that case we'd always try to prune the exception handler. i.e. it's not just a missed optimization. > > Yes, pathological recompilation is another scenario to consider. You can sprinkle `Compile::too_many_traps` checks (both as asserts and product checks) to ensure profiling information is up-to-date. I also realize that what I said here is not quite true, as we mark the handler as entered in the deopt code. So, if we deopt once, we won't get an uncommon trap again. (there's a test case for that as well). ------------- PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1816659130 From kvn at openjdk.org Fri Nov 17 15:55:32 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 17 Nov 2023 15:55:32 GMT Subject: RFR: 8318562: Computational test more than 2x slower when AVX instructions are used In-Reply-To: References: Message-ID: <0RI4X4u01qQdEIJgRscLQKBKLMKI7z2cDoB4sgszR_g=.6c0752b2-b254-4a75-aae0-9216b728aeb0@github.com> On Thu, 16 Nov 2023 23:46:53 GMT, Sandhya Viswanathan wrote: > This PR fixes the perf regression seen on AVX for floating point conversions. > > In AVX the cvt instructions have three operands cvtxx dst, src1, src2. Where src2 is the one being converted. The dst gets the lower bits as the converted value and upper bits (up to 128) from src1. > > The C2 jit uses the cvtxx dst, dst, src2 flavor. Here the problem was due to uninitialized upper bits of the dst XMM register. > Doing an xor dst, dst before the conversion instruction fixes the perf regression. > > Perf before the patch on UseAVX=3 platform: > ComputePI.compute_pi_dbl_flt avgt 5 471.875 ? 0.400 ns/op > ComputePI.compute_pi_flt_dbl avgt 5 1877.174 ? 0.557 ns/op > ComputePI.compute_pi_int_dbl avgt 5 655.222 ? 28.082 ns/op > ComputePI.compute_pi_int_flt avgt 5 737.178 ? 0.077 ns/op > ComputePI.compute_pi_long_dbl avgt 5 767.364 ? 0.027 ns/op > ComputePI.compute_pi_long_flt avgt 5 587.854 ? 10.068 ns/op > > Perf after the patch on UseAVX=3 platform: > Benchmark Mode Cnt Score Error Units > ComputePI.compute_pi_dbl_flt avgt 5 468.328 ? 0.141 ns/op > ComputePI.compute_pi_flt_dbl avgt 5 435.430 ? 0.259 ns/op > ComputePI.compute_pi_int_dbl avgt 5 424.088 ? 0.050 ns/op > ComputePI.compute_pi_int_flt avgt 5 417.345 ? 0.207 ns/op > ComputePI.compute_pi_long_dbl avgt 5 425.751 ? 0.006 ns/op > ComputePI.compute_pi_long_flt avgt 5 430.199 ? 0.736 ns/op My tier1-4,xcomp,stress passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16701#pullrequestreview-1737334612 From kvn at openjdk.org Fri Nov 17 16:16:47 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 17 Nov 2023 16:16:47 GMT Subject: RFR: 8320272: Make method_entry_barrier address shared Message-ID: <32KZcukg8d6Fjn0bnPQIa-X4q_o8DU0XtcHYA5YT468=.c9f33a2d-215d-404c-822b-448c8fd443d4@github.com> Currently all platforms have declared their own address variable for method_entry_barrier stub. Some have even slightly different name: nmethod_entry_barrier. For Leyden project one address is preferable. In aarch64 code changed `movptr` to `lea` instruction to get relocation info as on x86. Tested x86 and aarch64, tier1-4, xcomp, stress. I need help to test on other platforms. Thanks! ------------- Commit messages: - 8320272: Make method_entry_barrier address shared Changes: https://git.openjdk.org/jdk/pull/16708/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16708&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320272 Stats: 63 lines in 32 files changed: 5 ins; 37 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/16708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16708/head:pull/16708 PR: https://git.openjdk.org/jdk/pull/16708 From luhenry at openjdk.org Fri Nov 17 16:17:52 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 17 Nov 2023 16:17:52 GMT Subject: RFR: 8310656: RISC-V: __builtin___clear_cache can fail silently. [v3] In-Reply-To: <4eyogulkaSvi1d-xVbPCAp_mwRSD5sHyfysJj2Gat2A=.abfd20ed-c21d-4150-b25c-e4f9a5b71868@github.com> References: <6JeLSyWD6twDLD83OPiG-_5lTgGVHn8dh-rKkc7scmM=.9b7be0e7-cb20-44c6-8d28-d72c00d41edf@github.com> <4eyogulkaSvi1d-xVbPCAp_mwRSD5sHyfysJj2Gat2A=.abfd20ed-c21d-4150-b25c-e4f9a5b71868@github.com> Message-ID: On Sat, 1 Jul 2023 11:11:15 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> We recently had a bug where user were missing permissions to use this syscall. >> Which caused crashing on, according to hs_err on things like "addi x11, x24, 0" with SIGILL. >> If it fails it is even possible to execute valid but 'old' instruction which may not lead to a crash, instead the program misbehaves. >> >> To avoid this mess I suggest that we first test the syscall during vm init and we use it directly. >> This way we can make sure it never fails. >> >> Tested failing syscall with qemu, tested t1 in qemu, t1 on jh7110 in-progress. > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - merge update and nits > - Merge branch 'master' into 8310656 > - added back data barrier > > Signed-off-by: Robbin Ehn > - 8310656: RISC-V: __builtin___clear_cache can fail silently. Backport to jdk17u-dev in progress at https://github.com/openjdk/jdk17u-dev/pull/1968 ------------- PR Comment: https://git.openjdk.org/jdk/pull/14670#issuecomment-1816708043 From kvn at openjdk.org Fri Nov 17 18:30:04 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 17 Nov 2023 18:30:04 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v13] In-Reply-To: <9usHIyM1snMGi49cBVnd63nvJESA1PqIDkCCwaj7d6U=.fe26d158-8c05-4335-8d16-167b8652238a@github.com> References: <9usHIyM1snMGi49cBVnd63nvJESA1PqIDkCCwaj7d6U=.fe26d158-8c05-4335-8d16-167b8652238a@github.com> Message-ID: On Wed, 15 Nov 2023 16:45:09 GMT, Matias Saavedra Silva wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> S390 port > > I believe this PR has reached a stage where it can be safely committed. Thank you to all the reviewers for your excellent feedback and thank you to all porters for your contributions! @matias9927 this broke arm (32-bit) build: /home/runner/work/jdk/jdk/src/hotspot/cpu/arm/interp_masm_arm.cpp:236:17: error: 'ConstantPoolCacheEntry' was not declared in this scope; did you mean 'ConstantPoolCache'? 236 | assert(sizeof(ConstantPoolCacheEntry) == 4*wordSize, "adjust code below"); | ^~~~~~~~~~~~~~~~~~~~~~ I don't see GHA testing is setup for your repo. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15455#issuecomment-1816895612 From kvn at openjdk.org Fri Nov 17 18:30:30 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 17 Nov 2023 18:30:30 GMT Subject: RFR: 8320272: Make method_entry_barrier address shared In-Reply-To: <32KZcukg8d6Fjn0bnPQIa-X4q_o8DU0XtcHYA5YT468=.c9f33a2d-215d-404c-822b-448c8fd443d4@github.com> References: <32KZcukg8d6Fjn0bnPQIa-X4q_o8DU0XtcHYA5YT468=.c9f33a2d-215d-404c-822b-448c8fd443d4@github.com> Message-ID: On Fri, 17 Nov 2023 16:10:25 GMT, Vladimir Kozlov wrote: > Currently all platforms have declared their own address variable for method_entry_barrier stub. Some have even slightly different name: nmethod_entry_barrier. For Leyden project one address is preferable. > In aarch64 code changed `movptr` to `lea` instruction to get relocation info as on x86. > > Tested x86 and aarch64, tier1-4, xcomp, stress. I need help to test on other platforms. Thanks! Arm (32-bit) cross build is broken by recent changes [8301997](https://github.com/openjdk/jdk/commit/ffa35d8cf181cfbcb54497e997dbd18a9b62b97e) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16708#issuecomment-1816897098 From duke at openjdk.org Fri Nov 17 18:36:59 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Fri, 17 Nov 2023 18:36:59 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v4] In-Reply-To: References: Message-ID: > Hello All, > > Please review these changes to support _vectorizedHashCode intrinsic on > RISC-V platform. The patch adds the "scalar" code for the intrinsic without > usage of any RVV instruction but provides manual unrolling of the appropriate > loop. The code with usage of RVV instruction could be added as follow-up of > the patch or independently. > > Thanks, > -Yuri Gaevsky > > P.S. My OCA has been accepted recently (ygaevsky). > > ### Correctness checks > > Testing: tier1 tests successfully passed on a RISC-V StarFive JH7110 board with Linux. > > ### Performance results (the numbers for non-ints are similar) > > #### StarFive JH7110 board: > > > ArraysHashCode: without intrinsic with intrinsic > ------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > ------------------------------------------------------------------------------- > multiints 0 avgt 30 2.658 ? 0.001 2.661 ? 0.004 ns/op > multiints 1 avgt 30 4.881 ? 0.011 4.892 ? 0.015 ns/op > multiints 2 avgt 30 16.109 ? 0.041 10.451 ? 0.075 ns/op > multiints 3 avgt 30 14.873 ? 0.068 11.753 ? 0.024 ns/op > multiints 4 avgt 30 17.283 ? 0.078 13.176 ? 0.044 ns/op > multiints 5 avgt 30 19.691 ? 0.136 14.723 ? 0.046 ns/op > multiints 6 avgt 30 21.727 ? 0.166 15.463 ? 0.124 ns/op > multiints 7 avgt 30 23.790 ? 0.126 18.298 ? 0.059 ns/op > multiints 8 avgt 30 23.527 ? 0.116 18.267 ? 0.046 ns/op > multiints 9 avgt 30 27.981 ? 0.303 20.453 ? 0.069 ns/op > multiints 10 avgt 30 26.947 ? 0.215 20.541 ? 0.051 ns/op > multiints 50 avgt 30 95.373 ? 0.588 69.238 ? 0.208 ns/op > multiints 100 avgt 30 177.109 ? 0.525 137.852 ? 0.417 ns/op > multiints 200 avgt 30 341.074 ? 1.363 296.832 ? 0.725 ns/op > multiints 500 avgt 30 847.993 ? 1.713 752.415 ? 1.918 ns/op > multiints 1000 avgt 30 1610.199 ? 5.424 1426.112 ? 3.407 ns/op > multiints 10000 avgt 30 16234.260 ? 26.789 14447.936 ? 26.345 ns/op > multiints 100000 avgt 30 170726.025 ? 184.003 152587.649 ? 381.964 ns/op > ------------------------------------------------------------------------------- > > #### T-Head RVB-ICE board: > > > ArraysHashCode: ... Yuri Gaevsky has updated the pull request incrementally with eight additional commits since the last revision: - Temporary registers renaming. - Get tmp3 back and use it to hold 31^^3 constant. - Use t0/t1 as tmp1/tmp2. - Removed tmp3 register. - Replaced tmp3 with tmp1 in wide loop. - Replaced lw() of 31^^2 from memory with mv(); removed 31^^2..31^^0 constants as not needed anymore. - Renamed chunk -> chunks - Removed usage of cnt in loops for branch decisions. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16629/files - new: https://git.openjdk.org/jdk/pull/16629/files/86bcccee..eb4b0f87 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16629&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16629&range=02-03 Stats: 59 lines in 4 files changed: 11 ins; 11 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/16629.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16629/head:pull/16629 PR: https://git.openjdk.org/jdk/pull/16629 From duke at openjdk.org Fri Nov 17 18:37:02 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Fri, 17 Nov 2023 18:37:02 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v3] In-Reply-To: References: <0T5rzQVycnjhsaQN4NIY8XMM-a-JwDAojLc7dENbetI=.424ac5f2-2748-4e6b-a70d-34ade9cd8e81@github.com> Message-ID: On Fri, 17 Nov 2023 07:13:16 GMT, Fei Yang wrote: >> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed most of suggestions for code improvements from @Hamlin-Li > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1467: > >> 1465: BasicType eltype) >> 1466: { >> 1467: assert_different_registers(ary, cnt, result, tmp1, tmp2, tmp3, tmp4, tmp5, tmp6); > > We have two scratch registers t0 (x5) / t1 (x6) which should be considered for use in this assembler function. > These two are reserved from the register allocator and are suitable for keeping some short-lived values in the procedure. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1397724215 From duke at openjdk.org Fri Nov 17 18:37:04 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Fri, 17 Nov 2023 18:37:04 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v2] In-Reply-To: <_96DvqVXj75b9Mmz3pewifXTiv6wPr3-IsWBCaxAbs0=.c4227f48-a537-438c-9c36-78c2fa3b79c8@github.com> References: <9i5yHmpRi3-XqL5lw0-0IexhCDr2FOi5nT4dgY7cWao=.ab8a1d6e-c9fc-4108-820b-374ce7815463@github.com> <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> <_96DvqVXj75b9Mmz3pewifXTiv6wPr3-IsWBCaxAbs0=.c4227f48-a537-438c-9c36-78c2fa3b79c8@github.com> Message-ID: On Wed, 15 Nov 2023 15:52:42 GMT, Yuri Gaevsky wrote: >> src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1477: >> >>> 1475: case T_SHORT: BLOCK_COMMENT("arrays_hashcode(short) {"); break; >>> 1476: case T_INT: BLOCK_COMMENT("arrays_hashcode(int) {"); break; >>> 1477: default: BLOCK_COMMENT("arrays_hashcode {"); break; >> >> In `C2_MacroAssembler::arrays_hashcode_elsize`, default action is `ShouldNotReachHere();`, should it be consistent here? > > Sure, thanks for catching! Done. >> src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1540: >> >>> 1538: addw(result, result, tmp1); // result = result + ary[i] >>> 1539: subw(cnt, cnt, 1); >>> 1540: add(ary, ary, elsize); >> >> Similar comment for cnt and ary as chunk and ary above. > > As above, please advice how to do that. IIUC, that's possible with INDEX-REG addressing which is absent in RISC-V. :-( Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1397724570 PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1397724873 From duke at openjdk.org Fri Nov 17 18:37:06 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Fri, 17 Nov 2023 18:37:06 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v3] In-Reply-To: References: <0T5rzQVycnjhsaQN4NIY8XMM-a-JwDAojLc7dENbetI=.424ac5f2-2748-4e6b-a70d-34ade9cd8e81@github.com> Message-ID: On Fri, 17 Nov 2023 11:24:30 GMT, Yuri Gaevsky wrote: >> src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1492: >> >>> 1490: beqz(cnt, DONE); >>> 1491: >>> 1492: lw(pow31_2, ExternalAddress(StubRoutines::riscv::arrays_hashcode_powers_of_31() >> >> Now you don't need this `lw` anymore, as 961 will fit in an immediate, so `mv pow31_2, 961` should be fine. > > Agreed. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1397723846 From duke at openjdk.org Fri Nov 17 18:37:10 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Fri, 17 Nov 2023 18:37:10 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v3] In-Reply-To: References: <0T5rzQVycnjhsaQN4NIY8XMM-a-JwDAojLc7dENbetI=.424ac5f2-2748-4e6b-a70d-34ade9cd8e81@github.com> Message-ID: On Thu, 16 Nov 2023 17:11:55 GMT, Hamlin Li wrote: >> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed most of suggestions for code improvements from @Hamlin-Li > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1512: > >> 1510: + 0 * sizeof(jint))); // [31^^3:31^^4] >> 1511: >> 1512: bind(WIDE_LOOP); > > Seems in this loop, tmp3 and tmp1 can share one register. Done. > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1515: > >> 1513: mulw(result, result, pow31_3_4); // 31^^4 * h >> 1514: DO_ELEMENT_LOAD(tmp1, 0); >> 1515: srli(tmp2, pow31_3_4, 32); > > tmp2 can be calculated outside of the loop. Done. > src/hotspot/cpu/riscv/stubRoutines_riscv.cpp line 62: > >> 60: 923521, // 0x000E1781 >> 61: 29791, // 0x0000745F >> 62: 961, // 0x000003C1 > > based on the comment above about `pow31_2 `, line 62-64 can be removed now. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1397723643 PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1397723313 PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1397723142 From duke at openjdk.org Fri Nov 17 18:37:12 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Fri, 17 Nov 2023 18:37:12 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v2] In-Reply-To: References: <9i5yHmpRi3-XqL5lw0-0IexhCDr2FOi5nT4dgY7cWao=.ab8a1d6e-c9fc-4108-820b-374ce7815463@github.com> <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> <_96DvqVXj75b9Mmz3pewifXTiv6wPr3-IsWBCaxAbs0=.c4227f48-a537-438c-9c36-78c2fa3b79c8@github.com> Message-ID: On Thu, 16 Nov 2023 09:39:36 GMT, Yuri Gaevsky wrote: >> chunk is only used to tell if the wide loop is done, which can be done by ary too. >> >> And as subw of chunk and addi of ary is in a loop which could be a long one, so better to reduce the instructions in the loop. > > Oh, thanks: look like I understand now what can be done here and there. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1397725200 From cjplummer at openjdk.org Fri Nov 17 18:55:29 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 17 Nov 2023 18:55:29 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 [v2] In-Reply-To: References: <65piquQpnXvDvmbpt-U_EtxYEe7zu8yRCp39ZDA6rZA=.336ea1e4-a339-4a38-9c28-7a2cd1fd2c31@github.com> Message-ID: <_7pFLcmNRjtnifuhYSMmawqAGiqURduxz9ce1splCNc=.4f8d5376-cc8f-4798-a236-abbce62f5027@github.com> On Fri, 17 Nov 2023 07:23:46 GMT, Serguei Spitsyn wrote: >> I see the PR comment, but I don't really understand it. Is this to force some sort of early initialization to avoid a race later on? It just seems odd to create the tlh, but never use it. > > The `tlh` is used to protect any existing at this point JavaThread's from being terminated while it is set. > My understanding is that there is no need to iterate over all threads in the list to get this protection. Ok. I comment indicating that purpose would be useful. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1397741421 From matsaave at openjdk.org Fri Nov 17 18:59:02 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 17 Nov 2023 18:59:02 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v13] In-Reply-To: <9usHIyM1snMGi49cBVnd63nvJESA1PqIDkCCwaj7d6U=.fe26d158-8c05-4335-8d16-167b8652238a@github.com> References: <9usHIyM1snMGi49cBVnd63nvJESA1PqIDkCCwaj7d6U=.fe26d158-8c05-4335-8d16-167b8652238a@github.com> Message-ID: On Wed, 15 Nov 2023 16:45:09 GMT, Matias Saavedra Silva wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> S390 port > > I believe this PR has reached a stage where it can be safely committed. Thank you to all the reviewers for your excellent feedback and thank you to all porters for your contributions! > @matias9927 this broke arm (32-bit) build: > > ``` > /home/runner/work/jdk/jdk/src/hotspot/cpu/arm/interp_masm_arm.cpp:236:17: error: 'ConstantPoolCacheEntry' was not declared in this scope; did you mean 'ConstantPoolCache'? > 236 | assert(sizeof(ConstantPoolCacheEntry) == 4*wordSize, "adjust code below"); > | ^~~~~~~~~~~~~~~~~~~~~~ > ``` > > I don't see GHA testing is setup for your repo. This patch includes changes to the interpreters which were provided by porters. Oracle is responsible for x86-64 and aarch64 so completed those and made an effort to inform porters of the upcoming change. The ARM32 port has not yet been completed and thus is not part of this patch. The ARM32 ports have previously been handled by @bulasevich and @voitylov and they have been contacted. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15455#issuecomment-1816932564 From avoitylov at openjdk.org Fri Nov 17 19:21:01 2023 From: avoitylov at openjdk.org (Aleksei Voitylov) Date: Fri, 17 Nov 2023 19:21:01 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v13] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 16:44:15 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64, RISCV, PPC > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > S390 port Noted, we'll follow up with the arm32 fix a little later. Thanks Matias! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15455#issuecomment-1816957911 From duke at openjdk.org Fri Nov 17 19:57:30 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Fri, 17 Nov 2023 19:57:30 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v2] In-Reply-To: References: <9i5yHmpRi3-XqL5lw0-0IexhCDr2FOi5nT4dgY7cWao=.ab8a1d6e-c9fc-4108-820b-374ce7815463@github.com> <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> Message-ID: On Thu, 16 Nov 2023 17:24:26 GMT, Hamlin Li wrote: > What specific tests were run for this intrinsic implementation to verify the correctness? - jtreg tier1 - hotspot/jtreg/compiler/intrinsics/TestArraysHashCode.java with -Xcomp ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1397805242 From duke at openjdk.org Fri Nov 17 20:04:42 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 17 Nov 2023 20:04:42 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore Message-ID: Splitting vblendvp[sd] into boolean operations is bit faster on ECore, get up to 30% gain =============== BEFORE =============== Benchmark (SIZE) Mode Cnt Score Error Units VectorSignum.floatSignum 256 avgt 3 77.766 ? 0.049 ns/op VectorSignum.floatSignum 512 avgt 3 154.889 ? 0.242 ns/op VectorSignum.floatSignum 1024 avgt 3 306.130 ? 0.605 ns/op VectorSignum.floatSignum 2048 avgt 3 609.965 ? 0.927 ns/op VectorSignum.doubleSignum 256 avgt 3 151.874 ? 1.748 ns/op VectorSignum.doubleSignum 512 avgt 3 303.080 ? 0.310 ns/op VectorSignum.doubleSignum 1024 avgt 3 607.517 ? 0.597 ns/op VectorSignum.doubleSignum 2048 avgt 3 1214.282 ? 1.834 ns/op Benchmark Mode Cnt Score Error Units MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op MaxMinOptimizeTest.dMax avgt 3 137.334 ? 0.128 us/op MaxMinOptimizeTest.dMin avgt 3 137.160 ? 0.465 us/op MaxMinOptimizeTest.dMul avgt 3 77.231 ? 0.051 us/op MaxMinOptimizeTest.fAdd avgt 3 77.165 ? 0.003 us/op MaxMinOptimizeTest.fMax avgt 3 107.428 ? 1.501 us/op MaxMinOptimizeTest.fMin avgt 3 107.186 ? 0.022 us/op MaxMinOptimizeTest.fMul avgt 3 77.164 ? 0.012 us/op =============== AFTER =============== Benchmark (SIZE) Mode Cnt Score Error Units VectorSignum.floatSignum 256 avgt 3 61.816 ? 1.980 ns/op VectorSignum.floatSignum 512 avgt 3 117.251 ? 0.052 ns/op VectorSignum.floatSignum 1024 avgt 3 231.356 ? 0.397 ns/op VectorSignum.floatSignum 2048 avgt 3 458.904 ? 0.774 ns/op VectorSignum.doubleSignum 256 avgt 3 121.449 ? 0.184 ns/op VectorSignum.doubleSignum 512 avgt 3 241.662 ? 0.189 ns/op VectorSignum.doubleSignum 1024 avgt 3 482.365 ? 0.165 ns/op VectorSignum.doubleSignum 2048 avgt 3 962.412 ? 1.401 ns/op Benchmark Mode Cnt Score Error Units MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op MaxMinOptimizeTest.dMax avgt 3 125.701 ? 0.082 us/op MaxMinOptimizeTest.dMin avgt 3 124.704 ? 0.119 us/op MaxMinOptimizeTest.dMul avgt 3 77.232 ? 0.028 us/op MaxMinOptimizeTest.fAdd avgt 3 77.169 ? 0.103 us/op MaxMinOptimizeTest.fMax avgt 3 97.939 ? 0.477 us/op MaxMinOptimizeTest.fMin avgt 3 98.012 ? 0.154 us/op MaxMinOptimizeTest.fMul avgt 3 77.174 ? 0.012 us/op ------------- Commit messages: - emulate vblend on ecores Changes: https://git.openjdk.org/jdk/pull/16716/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16716&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320347 Stats: 301 lines in 8 files changed: 265 ins; 0 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/16716.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16716/head:pull/16716 PR: https://git.openjdk.org/jdk/pull/16716 From sviswanathan at openjdk.org Fri Nov 17 20:07:30 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 17 Nov 2023 20:07:30 GMT Subject: RFR: 8318562: Computational test more than 2x slower when AVX instructions are used In-Reply-To: References: Message-ID: On Fri, 17 Nov 2023 02:11:29 GMT, Vladimir Kozlov wrote: >> This PR fixes the perf regression seen on AVX for floating point conversions. >> >> In AVX the cvt instructions have three operands cvtxx dst, src1, src2. Where src2 is the one being converted. The dst gets the lower bits as the converted value and upper bits (up to 128) from src1. >> >> The C2 jit uses the cvtxx dst, dst, src2 flavor. Here the problem was due to uninitialized upper bits of the dst XMM register. >> Doing an xor dst, dst before the conversion instruction fixes the perf regression. >> >> Perf before the patch on UseAVX=3 platform: >> ComputePI.compute_pi_dbl_flt avgt 5 471.875 ? 0.400 ns/op >> ComputePI.compute_pi_flt_dbl avgt 5 1877.174 ? 0.557 ns/op >> ComputePI.compute_pi_int_dbl avgt 5 655.222 ? 28.082 ns/op >> ComputePI.compute_pi_int_flt avgt 5 737.178 ? 0.077 ns/op >> ComputePI.compute_pi_long_dbl avgt 5 767.364 ? 0.027 ns/op >> ComputePI.compute_pi_long_flt avgt 5 587.854 ? 10.068 ns/op >> >> Perf after the patch on UseAVX=3 platform: >> Benchmark Mode Cnt Score Error Units >> ComputePI.compute_pi_dbl_flt avgt 5 468.328 ? 0.141 ns/op >> ComputePI.compute_pi_flt_dbl avgt 5 435.430 ? 0.259 ns/op >> ComputePI.compute_pi_int_dbl avgt 5 424.088 ? 0.050 ns/op >> ComputePI.compute_pi_int_flt avgt 5 417.345 ? 0.207 ns/op >> ComputePI.compute_pi_long_dbl avgt 5 425.751 ? 0.006 ns/op >> ComputePI.compute_pi_long_flt avgt 5 430.199 ? 0.736 ns/op > > I confirmed that this change solved performance issue on machines I tested (old Broadwell and Cascade Lake CPUs). > I am submitting our regular testing for approval. Thanks a lot for the reviews @vnkozlov @jatin-bhateja @merykitty. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16701#issuecomment-1817025958 From sviswanathan at openjdk.org Fri Nov 17 20:10:34 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 17 Nov 2023 20:10:34 GMT Subject: RFR: 8318562: Computational test more than 2x slower when AVX instructions are used In-Reply-To: References: Message-ID: <-mEmvDKid1f79h0i89Cayztf-Bc1SE0BZh6BozSxafw=.0d551aad-d32b-4d33-b28e-48ebade2071f@github.com> On Fri, 17 Nov 2023 04:53:57 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/x86_64.ad line 11092: >> >>> 11090: %{ >>> 11091: match(Set dst (ConvD2F src)); >>> 11092: effect(TEMP dst); >> >> You don't need `TEMP dst`, if `dst` is an alias of `src` then a destructive `xor` is not emitted. > > Without TEMP annotation dst and src may be aliased if src live range does not survives beyond this instruction. I had checked before submitting the PR, the cvt xmm0, xmm0, xmm0 form was slower than xorps xmm1, xmm1 followed by cvt xmm1, xmm1, xmm0. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16701#discussion_r1397833825 From sviswanathan at openjdk.org Fri Nov 17 20:13:06 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 17 Nov 2023 20:13:06 GMT Subject: Integrated: 8318562: Computational test more than 2x slower when AVX instructions are used In-Reply-To: References: Message-ID: <41eJFOPjt_zxa0HeZW-IWO4pYz_aknaat5omNT2_KIQ=.1727856c-462c-471c-a6fc-8452b5d4f819@github.com> On Thu, 16 Nov 2023 23:46:53 GMT, Sandhya Viswanathan wrote: > This PR fixes the perf regression seen on AVX for floating point conversions. > > In AVX the cvt instructions have three operands cvtxx dst, src1, src2. Where src2 is the one being converted. The dst gets the lower bits as the converted value and upper bits (up to 128) from src1. > > The C2 jit uses the cvtxx dst, dst, src2 flavor. Here the problem was due to uninitialized upper bits of the dst XMM register. > Doing an xor dst, dst before the conversion instruction fixes the perf regression. > > Perf before the patch on UseAVX=3 platform: > ComputePI.compute_pi_dbl_flt avgt 5 471.875 ? 0.400 ns/op > ComputePI.compute_pi_flt_dbl avgt 5 1877.174 ? 0.557 ns/op > ComputePI.compute_pi_int_dbl avgt 5 655.222 ? 28.082 ns/op > ComputePI.compute_pi_int_flt avgt 5 737.178 ? 0.077 ns/op > ComputePI.compute_pi_long_dbl avgt 5 767.364 ? 0.027 ns/op > ComputePI.compute_pi_long_flt avgt 5 587.854 ? 10.068 ns/op > > Perf after the patch on UseAVX=3 platform: > Benchmark Mode Cnt Score Error Units > ComputePI.compute_pi_dbl_flt avgt 5 468.328 ? 0.141 ns/op > ComputePI.compute_pi_flt_dbl avgt 5 435.430 ? 0.259 ns/op > ComputePI.compute_pi_int_dbl avgt 5 424.088 ? 0.050 ns/op > ComputePI.compute_pi_int_flt avgt 5 417.345 ? 0.207 ns/op > ComputePI.compute_pi_long_dbl avgt 5 425.751 ? 0.006 ns/op > ComputePI.compute_pi_long_flt avgt 5 430.199 ? 0.736 ns/op This pull request has now been integrated. Changeset: 0881f2b0 Author: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/0881f2b0c43870ed10b1166d04cef9832e58629e Stats: 247 lines in 4 files changed: 245 ins; 0 del; 2 mod 8318562: Computational test more than 2x slower when AVX instructions are used Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/16701 From duke at openjdk.org Fri Nov 17 20:22:09 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Fri, 17 Nov 2023 20:22:09 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v5] In-Reply-To: References: Message-ID: > Hello All, > > Please review these changes to support _vectorizedHashCode intrinsic on > RISC-V platform. The patch adds the "scalar" code for the intrinsic without > usage of any RVV instruction but provides manual unrolling of the appropriate > loop. The code with usage of RVV instruction could be added as follow-up of > the patch or independently. > > Thanks, > -Yuri Gaevsky > > P.S. My OCA has been accepted recently (ygaevsky). > > ### Correctness checks > > Testing: tier1 tests successfully passed on a RISC-V StarFive JH7110 board with Linux. > > ### Performance results (the numbers for non-ints are similar) > > #### StarFive JH7110 board: > > > ArraysHashCode: without intrinsic with intrinsic > ------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > ------------------------------------------------------------------------------- > multiints 0 avgt 30 2.658 ? 0.001 2.661 ? 0.004 ns/op > multiints 1 avgt 30 4.881 ? 0.011 4.892 ? 0.015 ns/op > multiints 2 avgt 30 16.109 ? 0.041 10.451 ? 0.075 ns/op > multiints 3 avgt 30 14.873 ? 0.068 11.753 ? 0.024 ns/op > multiints 4 avgt 30 17.283 ? 0.078 13.176 ? 0.044 ns/op > multiints 5 avgt 30 19.691 ? 0.136 14.723 ? 0.046 ns/op > multiints 6 avgt 30 21.727 ? 0.166 15.463 ? 0.124 ns/op > multiints 7 avgt 30 23.790 ? 0.126 18.298 ? 0.059 ns/op > multiints 8 avgt 30 23.527 ? 0.116 18.267 ? 0.046 ns/op > multiints 9 avgt 30 27.981 ? 0.303 20.453 ? 0.069 ns/op > multiints 10 avgt 30 26.947 ? 0.215 20.541 ? 0.051 ns/op > multiints 50 avgt 30 95.373 ? 0.588 69.238 ? 0.208 ns/op > multiints 100 avgt 30 177.109 ? 0.525 137.852 ? 0.417 ns/op > multiints 200 avgt 30 341.074 ? 1.363 296.832 ? 0.725 ns/op > multiints 500 avgt 30 847.993 ? 1.713 752.415 ? 1.918 ns/op > multiints 1000 avgt 30 1610.199 ? 5.424 1426.112 ? 3.407 ns/op > multiints 10000 avgt 30 16234.260 ? 26.789 14447.936 ? 26.345 ns/op > multiints 100000 avgt 30 170726.025 ? 184.003 152587.649 ? 381.964 ns/op > ------------------------------------------------------------------------------- > > #### T-Head RVB-ICE board: > > > ArraysHashCode: ... Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: Added comments clarifying what is intrinsified. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16629/files - new: https://git.openjdk.org/jdk/pull/16629/files/eb4b0f87..70768898 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16629&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16629&range=03-04 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16629.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16629/head:pull/16629 PR: https://git.openjdk.org/jdk/pull/16629 From duke at openjdk.org Fri Nov 17 20:22:09 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Fri, 17 Nov 2023 20:22:09 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v2] In-Reply-To: References: <9i5yHmpRi3-XqL5lw0-0IexhCDr2FOi5nT4dgY7cWao=.ab8a1d6e-c9fc-4108-820b-374ce7815463@github.com> <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> Message-ID: On Fri, 17 Nov 2023 19:54:18 GMT, Yuri Gaevsky wrote: >> What specific tests were run for this intrinsic implementation to verify the correctness? >> BTW, can you add some comments about what java method or bytecode this intrinsic is for? > >> What specific tests were run for this intrinsic implementation to verify the correctness? > - jtreg tier1 > - hotspot/jtreg/compiler/intrinsics/TestArraysHashCode.java with -Xcomp > BTW, can you add some comments about what java method or bytecode this intrinsic is for? Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1397844760 From sspitsyn at openjdk.org Fri Nov 17 20:29:11 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 17 Nov 2023 20:29:11 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 [v3] In-Reply-To: References: Message-ID: > This is a fix of a performance/scalability related issue. The `JvmtiThreadState` objects for virtual thread filtered events enabled globally are created eagerly because it is needed when the `interp_only_mode` is enabled. Otherwise, some events which are generated in `interp_only_mode` from the debugging version of interpreter chunks can be missed. > However, it has to be okay to avoid eager creation of these object if no `interp_only_mode` has ever been requested. > It seems to be an extremely important optimization to create JvmtiThreadState objects lazily in such cases. > It is done by introducing the flag `JvmtiThreadState::_seen_interp_only_mode` which indicates when the `JvmtiThreadState` objects have to be created eagerly. > > Additionally, the fix includes the following related changes: > - Use condition double checking idiom for `MutexLocker mu(JvmtiThreadState_lock)` in the function `JvmtiVTMSTransitionDisabler::VTMS_mount_end` which is on a performance-critical path and looks like this: > > JvmtiThreadState* state = thread->jvmti_thread_state(); > if (state != nullptr && state->is_pending_interp_only_mode()) { > MutexLocker mu(JvmtiThreadState_lock); > state = thread->jvmti_thread_state(); > if (state != nullptr && state->is_pending_interp_only_mode()) { > JvmtiEventController::enter_interp_only_mode(); > } > } > > > - Add extra check of `JvmtiExport::can_support_virtual_threads()` when virtual thread mount and unmount are posted. > - Minor: Added a `ThreadsListHandle` to the `JvmtiEventControllerPrivate::enter_interp_only_mode`. It is needed because of the dynamic creation of compensating carrier threads which is racy for JVMTI `SetEventNotificationMode` implementation. > > Performance mesurements: > - Without this fix the test provided by the bug submitter gives execution numbers: > - no ClassLoad events enabled: 3251 ms > - ClassLoad events are enabled: 40534 ms > > - With the fix: > - no ClassLoad events enabled: 3270 ms > - ClassLoad events are enabled: 3385 ms > > Testing: > - Ran mach5 tiers 1-6, no regressions are noticed Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: add comment for new ThreadsListHandle use ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16686/files - new: https://git.openjdk.org/jdk/pull/16686/files/2582ae3d..de36957a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16686&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16686&range=01-02 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16686.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16686/head:pull/16686 PR: https://git.openjdk.org/jdk/pull/16686 From sspitsyn at openjdk.org Fri Nov 17 20:29:11 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 17 Nov 2023 20:29:11 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 [v3] In-Reply-To: <_7pFLcmNRjtnifuhYSMmawqAGiqURduxz9ce1splCNc=.4f8d5376-cc8f-4798-a236-abbce62f5027@github.com> References: <65piquQpnXvDvmbpt-U_EtxYEe7zu8yRCp39ZDA6rZA=.336ea1e4-a339-4a38-9c28-7a2cd1fd2c31@github.com> <_7pFLcmNRjtnifuhYSMmawqAGiqURduxz9ce1splCNc=.4f8d5376-cc8f-4798-a236-abbce62f5027@github.com> Message-ID: On Fri, 17 Nov 2023 18:52:20 GMT, Chris Plummer wrote: >> The `tlh` is used to protect any existing at this point JavaThread's from being terminated while it is set. >> My understanding is that there is no need to iterate over all threads in the list to get this protection. > > Ok. A comment indicating that purpose would be useful. Okay, thanks! I've added a comment: // Protects any existing JavaThread's from being terminated while it is set. // The FJP carrier thread compensating mechanism can create carrier threads concurrently. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1397847729 From coleenp at openjdk.org Fri Nov 17 20:54:39 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 17 Nov 2023 20:54:39 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v10] In-Reply-To: References: <6UvVZPlnf_cSgctUVPspR767dHLa7vxOA4DiC7fJ3LY=.35e24443-ab4d-4240-a991-2d98d120bb3b@github.com> Message-ID: <2eh3c3m3pnjOn8UdDgQ1hqqzwESC1AIin_L7vZzfq2A=.6b96f8c7-cc86-494b-a187-78bb19a69acf@github.com> On Mon, 13 Nov 2023 01:12:11 GMT, Kim Barrett wrote: >> Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix indentation, rename test helper > > src/hotspot/share/oops/symbolHandle.hpp line 48: > >> 46: class SymbolHandleBase : public StackObj { >> 47: static Symbol* volatile _cleanup_delay_queue[]; >> 48: static volatile uint _cleanup_delay_index; > > Putting the delay queue implementation in SymbolHandleBase<> makes unnecessary and unused > data and possibly copies of the code. It is only used in the case where the template parameter is true. > Better would be to put the cleanup delay queue in a separate, non-templated, class. The entire > implementation of the queue could then be in the .cpp file. (I suggest the overhead of an out-of-line > call to add to the queue is in the noise, given that adding to the queue performs 3 atomic RMW > operations.) So something like this: > > > // In symbolHandle.hpp. > class TempSymbolCleanupDelayer : AllStatic { > // Or make these file-scoped statics in the .cpp file. > static const uint QueueSize = 128; > static Symbol* volatile _queue[]; > static volatile uint _index; > > public: > static void delay_cleanup(Symbol* s); > }; > > // In symbolHandle.cpp. > Symbol* volatile TempSymbolCleanupDelayer::_queue[QueueSize] = {}; > volatile uint TempSymbolCleanupDelayer::_index = 0; > > void TempSymbolCleanupDelayer::delay_cleanup(Symbol* sym) { > assert(sym != nullptr, "precondition"); > sym->increment_refcount(); > uint i = Atomic::add(&_index, 1u) % QueueSize; > Symbol* old = Atomic::xchg(&_queue, sym); > Symbol::maybe_decrement_refcount(old); > } > > > Code is completely untested. It incorporates suggestions I'm making elsewhere in this review too. This seems like a reasonable suggestion. It would be better in as a singleton in the cpp file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1397876006 From matsaave at openjdk.org Fri Nov 17 21:05:56 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 17 Nov 2023 21:05:56 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v13] In-Reply-To: References: Message-ID: On Fri, 17 Nov 2023 19:18:13 GMT, Aleksei Voitylov wrote: > Noted, we'll follow up with the arm32 fix a little later. Thanks Matias! Thanks for the confirmation @voitylov, I look forward to the port! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15455#issuecomment-1817101667 From qamai at openjdk.org Fri Nov 17 21:06:44 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 17 Nov 2023 21:06:44 GMT Subject: RFR: 8318562: Computational test more than 2x slower when AVX instructions are used In-Reply-To: References: Message-ID: On Fri, 17 Nov 2023 20:05:03 GMT, Sandhya Viswanathan wrote: >> I confirmed that this change solved performance issue on machines I tested (old Broadwell and Cascade Lake CPUs). >> I am submitting our regular testing for approval. > > Thanks a lot for the reviews @vnkozlov @jatin-bhateja @merykitty. @sviswa7 You mean `cvt xmm1, xmm0, xmm0` is slower than `xorps xmm1, xmm1; cvt xmm1, xmm1, xmm0`, right? Since `cvt xmm0, xmm0, xmm0` has self dependency so standalone its throughput will be lower than `xorps xmm1, xmm1; cvt xmm1, xmm1, xmm0` which does not. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16701#issuecomment-1817102967 From dcubed at openjdk.org Fri Nov 17 21:21:33 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 17 Nov 2023 21:21:33 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 [v3] In-Reply-To: References: Message-ID: On Fri, 17 Nov 2023 20:29:11 GMT, Serguei Spitsyn wrote: >> This is a fix of a performance/scalability related issue. The `JvmtiThreadState` objects for virtual thread filtered events enabled globally are created eagerly because it is needed when the `interp_only_mode` is enabled. Otherwise, some events which are generated in `interp_only_mode` from the debugging version of interpreter chunks can be missed. >> However, it has to be okay to avoid eager creation of these object if no `interp_only_mode` has ever been requested. >> It seems to be an extremely important optimization to create JvmtiThreadState objects lazily in such cases. >> It is done by introducing the flag `JvmtiThreadState::_seen_interp_only_mode` which indicates when the `JvmtiThreadState` objects have to be created eagerly. >> >> Additionally, the fix includes the following related changes: >> - Use condition double checking idiom for `MutexLocker mu(JvmtiThreadState_lock)` in the function `JvmtiVTMSTransitionDisabler::VTMS_mount_end` which is on a performance-critical path and looks like this: >> >> JvmtiThreadState* state = thread->jvmti_thread_state(); >> if (state != nullptr && state->is_pending_interp_only_mode()) { >> MutexLocker mu(JvmtiThreadState_lock); >> state = thread->jvmti_thread_state(); >> if (state != nullptr && state->is_pending_interp_only_mode()) { >> JvmtiEventController::enter_interp_only_mode(); >> } >> } >> >> >> - Add extra check of `JvmtiExport::can_support_virtual_threads()` when virtual thread mount and unmount are posted. >> - Minor: Added a `ThreadsListHandle` to the `JvmtiEventControllerPrivate::enter_interp_only_mode`. It is needed because of the dynamic creation of compensating carrier threads which is racy for JVMTI `SetEventNotificationMode` implementation. >> >> Performance mesurements: >> - Without this fix the test provided by the bug submitter gives execution numbers: >> - no ClassLoad events enabled: 3251 ms >> - ClassLoad events are enabled: 40534 ms >> >> - With the fix: >> - no ClassLoad events enabled: 3270 ms >> - ClassLoad events are enabled: 3385 ms >> >> Testing: >> - Ran mach5 tiers 1-6, no regressions are noticed > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: add comment for new ThreadsListHandle use src/hotspot/share/prims/jvmtiEventController.cpp line 374: > 372: // Protects any existing JavaThread's from being terminated while it is set. > 373: // The FJP carrier thread compensating mechanism can create carrier threads concurrently. > 374: ThreadsListHandle tlh(current); A ThreadsListHandle does not prevent a JavaThread from being terminated. It prevents a JavaThread from exiting and being freed. The JavaThread is able to set the terminated state on itself, but will not be able to complete exiting while it is on a ThreadsListHandle. There is a subtle difference. There's a `target` JavaThread that is fetched from a `JvmtiThreadState` object and that `target` JavaThread is only protected by this `tlh` if `target` is included in the ThreadsList that was captured by this `tlh`. In all likelihood, there should be a ThreadsListHandle farther up the stack that's protecting the JavaThread from which the `JvmtiThreadState` object was extracted and passed to this function. As for carrier threads, if they are created _after_ this `tlh` was created, then this `tlh` cannot protect them because they won't be on this `tlh`'s ThreadsList. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1397897633 From duke at openjdk.org Fri Nov 17 21:42:30 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Fri, 17 Nov 2023 21:42:30 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v2] In-Reply-To: References: <9i5yHmpRi3-XqL5lw0-0IexhCDr2FOi5nT4dgY7cWao=.ab8a1d6e-c9fc-4108-820b-374ce7815463@github.com> <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> Message-ID: On Thu, 16 Nov 2023 17:18:07 GMT, Hamlin Li wrote: >> I've just "borrowed" those definitions from other intrinsics around. Do you think we can improve this with iRegP/iRegI? > > Seems to me it's not necessary to specify the registers. Can you try it? Sure, let me check . ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1397912940 From dlong at openjdk.org Fri Nov 17 22:11:30 2023 From: dlong at openjdk.org (Dean Long) Date: Fri, 17 Nov 2023 22:11:30 GMT Subject: RFR: 8320272: Make method_entry_barrier address shared In-Reply-To: <32KZcukg8d6Fjn0bnPQIa-X4q_o8DU0XtcHYA5YT468=.c9f33a2d-215d-404c-822b-448c8fd443d4@github.com> References: <32KZcukg8d6Fjn0bnPQIa-X4q_o8DU0XtcHYA5YT468=.c9f33a2d-215d-404c-822b-448c8fd443d4@github.com> Message-ID: <7FwjBlfdMsgPdHcdOer9mZsQDQ0PamT8qLzCzdq14z4=.ef428fec-764a-4a28-94f2-4ccbd9c6b3d4@github.com> On Fri, 17 Nov 2023 16:10:25 GMT, Vladimir Kozlov wrote: > Currently all platforms have declared their own address variable for method_entry_barrier stub. Some have even slightly different name: nmethod_entry_barrier. For Leyden project one address is preferable. > In aarch64 code changed `movptr` to `lea` instruction to get relocation info as on x86. > > Tested x86 and aarch64, tier1-4, xcomp, stress. I need help to test on other platforms. Thanks! This seems fine, but you could explain a little more why this is useful for Leyden? I would think having StubRoutines::method_entry_barrier() would be enough, and that it could reference the existing platform-specific name, minimizing changes. I don't understand why the storage needs to be shared in StubRoutines::_method_entry_barrier, for example. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16708#issuecomment-1817176789 From matsaave at openjdk.org Fri Nov 17 22:19:32 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 17 Nov 2023 22:19:32 GMT Subject: RFR: 8320147: Remove DumpSharedSpaces In-Reply-To: References: Message-ID: <8xpb9Q5XfbFz257ttQ14ZMbp005EKJD5LWh4eWO7WFg=.2ed75b52-ac93-4097-af23-a7dfa0e2b27c@github.com> On Thu, 16 Nov 2023 22:36:17 GMT, Ioi Lam wrote: > One more PR for cleanup with cdsConfig.hpp: > > Replace the global variable `DumpSharedSpaces` with `CDSConfig::is_dumping_static_archive()`. > > Note: some mis-uses of `DumpSharedSpaces` need to be replaced with `CDSConfig::is_dumping_heap()` or `CDSConfig::is_dumping_full_module_graph()` LGTM ------------- Marked as reviewed by matsaave (Committer). PR Review: https://git.openjdk.org/jdk/pull/16700#pullrequestreview-1738026387 From duke at openjdk.org Fri Nov 17 22:36:39 2023 From: duke at openjdk.org (duke) Date: Fri, 17 Nov 2023 22:36:39 GMT Subject: Withdrawn: 8314029: Add file name parameter to Compiler.perfmap In-Reply-To: References: Message-ID: On Thu, 21 Sep 2023 20:43:56 GMT, Yi-Fan Tsai wrote: > `jcmd Compiler.perfmap` uses the hard-coded file name for a perf map: `/tmp/perf-%d.map`. This change adds an option for specifying a file name. > > The help message of Compiler.perfmap: > > Compiler.perfmap > Write map file for Linux perf tool. > > Impact: Low > > Syntax : Compiler.perfmap [options] > > Options: (options must be specified using the or = syntax) > filename : [optional] Name of the map file (STRING, no default value) This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/15871 From kvn at openjdk.org Fri Nov 17 22:39:28 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 17 Nov 2023 22:39:28 GMT Subject: RFR: 8320272: Make method_entry_barrier address shared In-Reply-To: <7FwjBlfdMsgPdHcdOer9mZsQDQ0PamT8qLzCzdq14z4=.ef428fec-764a-4a28-94f2-4ccbd9c6b3d4@github.com> References: <32KZcukg8d6Fjn0bnPQIa-X4q_o8DU0XtcHYA5YT468=.c9f33a2d-215d-404c-822b-448c8fd443d4@github.com> <7FwjBlfdMsgPdHcdOer9mZsQDQ0PamT8qLzCzdq14z4=.ef428fec-764a-4a28-94f2-4ccbd9c6b3d4@github.com> Message-ID: On Fri, 17 Nov 2023 22:09:09 GMT, Dean Long wrote: > This seems fine, but you could explain a little more why this is useful for Leyden? I would think having StubRoutines::method_entry_barrier() would be enough, and that it could reference the existing platform-specific name, minimizing changes. I don't understand why the storage needs to be shared in StubRoutines::_method_entry_barrier, for example. Thank you for looking, Dean. Yes, your suggestion would work too. Leyden code calls StubRoutines::method_entry_barrier() to get address: [SCCache.cpp#L3337](https://github.com/openjdk/leyden/blob/premain/src/hotspot/share/code/SCCache.cpp#L3337) But we would need StubRoutines::method_entry_barrier() implementation for each platform in such case. And having duplication and different names does not feel right for me ;^) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16708#issuecomment-1817207215 From coleenp at openjdk.org Fri Nov 17 22:56:31 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 17 Nov 2023 22:56:31 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes In-Reply-To: References: Message-ID: On Tue, 14 Nov 2023 17:56:09 GMT, Jaroslav Bachorik wrote: > Please, review this fix for a corner case handling of `jmethodID` values. > > The issue is related to the interplay between `jmethodID` values and method redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` instance. Once that method gets redefined, the `jmethodID` is updated to point to the last `Method` version. > Unless the method is still on stack/running, in which case the original `jmethodID` will be redirected to the latest `Method` version and at the same time the 'previous' `Method` version will receive a new `jmethodID` pointing to that previous version. > > If we happen to capture stacktrace via `GetStackTrace` or `GetAllStackTraces` JVMTI calls while this previous `Method` version is still on stack we will have the corresponding frame identified by a `jmethodID` pointing to that version. > However, sooner or later the 'previous' class version becomes eligible for cleanup at what time all contained `Method` instances. The cleanup process will not perform the `jmethodID` pointer maintenance and we will end up with pointers to deallocated memory. > This is caused by the fact that the `jmethodID` lifecycle is bound to `ClassLoaderData` instance and all relevant `jmethodID`s will get batch-updated when the class loader is being released and all its classes are getting unloaded. > > This means that we need to make sure that if a `Method` instance is being deallocate the associated `jmethodID` (if any) must not point to the deallocated instance once we are finished. Unfortunately, we can not just update the `jmethodID` values in bulk when purging an old class version - the per `InstanceKlass` jmethodID cache is present only for the main class version and contains `jmethodID` values for both the old and current method versions. > > Therefore we need to perform `jmethodID` lookup when we are about to deallocate a `Method` instance and clean up the pointer only if that `jmethodID` is pointing to the `Method` instance which is being deallocated. > > _(For anyone interested, a much lengthier writeup is available in [my blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_ src/hotspot/share/oops/instanceKlass.cpp line 541: > 539: // The previous version will point to them so they're not totally dangling > 540: assert (!method->on_stack(), "shouldn't be called with methods on stack"); > 541: // Do the pointer maintenance before releasing the metadata, but not for incomplete methods I'm confused by what you mean by method holder, which I think of as methodHandle. Or InstanceKlass is the holder of the methods. Maybe this should be more explicit that it's talking about clearing any associated jmethodIDs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1397970676 From jbachorik at openjdk.org Sat Nov 18 00:26:29 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Sat, 18 Nov 2023 00:26:29 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes In-Reply-To: References: Message-ID: On Fri, 17 Nov 2023 22:53:58 GMT, Coleen Phillimore wrote: >> Please, review this fix for a corner case handling of `jmethodID` values. >> >> The issue is related to the interplay between `jmethodID` values and method redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` instance. Once that method gets redefined, the `jmethodID` is updated to point to the last `Method` version. >> Unless the method is still on stack/running, in which case the original `jmethodID` will be redirected to the latest `Method` version and at the same time the 'previous' `Method` version will receive a new `jmethodID` pointing to that previous version. >> >> If we happen to capture stacktrace via `GetStackTrace` or `GetAllStackTraces` JVMTI calls while this previous `Method` version is still on stack we will have the corresponding frame identified by a `jmethodID` pointing to that version. >> However, sooner or later the 'previous' class version becomes eligible for cleanup at what time all contained `Method` instances. The cleanup process will not perform the `jmethodID` pointer maintenance and we will end up with pointers to deallocated memory. >> This is caused by the fact that the `jmethodID` lifecycle is bound to `ClassLoaderData` instance and all relevant `jmethodID`s will get batch-updated when the class loader is being released and all its classes are getting unloaded. >> >> This means that we need to make sure that if a `Method` instance is being deallocate the associated `jmethodID` (if any) must not point to the deallocated instance once we are finished. Unfortunately, we can not just update the `jmethodID` values in bulk when purging an old class version - the per `InstanceKlass` jmethodID cache is present only for the main class version and contains `jmethodID` values for both the old and current method versions. >> >> Therefore we need to perform `jmethodID` lookup when we are about to deallocate a `Method` instance and clean up the pointer only if that `jmethodID` is pointing to the `Method` instance which is being deallocated. >> >> _(For anyone interested, a much lengthier writeup is available in [my blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_ > > src/hotspot/share/oops/instanceKlass.cpp line 541: > >> 539: // The previous version will point to them so they're not totally dangling >> 540: assert (!method->on_stack(), "shouldn't be called with methods on stack"); >> 541: // Do the pointer maintenance before releasing the metadata, but not for incomplete methods > > I'm confused by what you mean by method holder, which I think of as methodHandle. Or InstanceKlass is the holder of the methods. Maybe this should be more explicit that it's talking about clearing any associated jmethodIDs. The method holder is an `InstanceKlass` object which can be retrieved as `method->method_holder()` (I apologize if I am using not completely correct terms - this is what I grokked from the sources). And incomplete methods created by the `ClassParser` from the class data stream will not have the link to that `InstanceKlass` set up if the `ClassParser` is already having its `_klass` field set to a non-null value. If we are talking about clearing any jmetbodIDs associated with an `InstanceKlass` instance it is not really possible for old method versions because only the current `InstanceKlass` version has the jmethodID cache associated with it and it contains jmethodIDs pointing to bot the old and current methods. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1398009493 From never at openjdk.org Sat Nov 18 00:27:29 2023 From: never at openjdk.org (Tom Rodriguez) Date: Sat, 18 Nov 2023 00:27:29 GMT Subject: RFR: 8320272: Make method_entry_barrier address shared In-Reply-To: <32KZcukg8d6Fjn0bnPQIa-X4q_o8DU0XtcHYA5YT468=.c9f33a2d-215d-404c-822b-448c8fd443d4@github.com> References: <32KZcukg8d6Fjn0bnPQIa-X4q_o8DU0XtcHYA5YT468=.c9f33a2d-215d-404c-822b-448c8fd443d4@github.com> Message-ID: On Fri, 17 Nov 2023 16:10:25 GMT, Vladimir Kozlov wrote: > Currently all platforms have declared their own address variable for method_entry_barrier stub. Some have even slightly different name: nmethod_entry_barrier. For Leyden project one address is preferable. > In aarch64 code changed `movptr` to `lea` instruction to get relocation info as on x86. > > Tested x86 and aarch64, tier1-4, xcomp, stress. I need help to test on other platforms. Thanks! src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp line 150: > 148: thread_disarmed_guard_value_offset = in_bytes(bs_nm->thread_disarmed_guard_value_offset()); > 149: AMD64_ONLY(nmethod_entry_barrier = StubRoutines::method_entry_barrier()); > 150: AARCH64_ONLY(nmethod_entry_barrier = StubRoutines::method_entry_barrier()); Now that's there's a single name you can remove the 2 per arch definitions in favor of single assignment statement. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16708#discussion_r1398009834 From dlong at openjdk.org Sat Nov 18 00:31:27 2023 From: dlong at openjdk.org (Dean Long) Date: Sat, 18 Nov 2023 00:31:27 GMT Subject: RFR: 8320272: Make method_entry_barrier address shared In-Reply-To: <32KZcukg8d6Fjn0bnPQIa-X4q_o8DU0XtcHYA5YT468=.c9f33a2d-215d-404c-822b-448c8fd443d4@github.com> References: <32KZcukg8d6Fjn0bnPQIa-X4q_o8DU0XtcHYA5YT468=.c9f33a2d-215d-404c-822b-448c8fd443d4@github.com> Message-ID: On Fri, 17 Nov 2023 16:10:25 GMT, Vladimir Kozlov wrote: > Currently all platforms have declared their own address variable for method_entry_barrier stub. Some have even slightly different name: nmethod_entry_barrier. For Leyden project one address is preferable. > In aarch64 code changed `movptr` to `lea` instruction to get relocation info as on x86. > > Tested x86 and aarch64, tier1-4, xcomp, stress. I need help to test on other platforms. Thanks! Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16708#pullrequestreview-1738124382 From amenkov at openjdk.org Sat Nov 18 01:40:35 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Sat, 18 Nov 2023 01:40:35 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v8] In-Reply-To: References: Message-ID: On Fri, 17 Nov 2023 05:37:46 GMT, Serguei Spitsyn wrote: >> The handshakes support for virtual threads is needed to simplify the JVMTI implementation for virtual threads. There is a significant duplication in the JVMTI code to differentiate code intended to support platform, virtual threads or both. The handshakes are unified, so it is enough to define just one handshake for both platform and virtual thread. >> At the low level, the JVMTI code supporting platform and virtual threads still can be different. >> This implementation is based on the `JvmtiVTMSTransitionDisabler` class. >> >> The internal API includes two new classes: >> - `JvmtiHandshake` and `JvmtiUnifiedHandshakeClosure` >> >> The `JvmtiUnifiedHandshakeClosure` defines two different callback functions: `do_thread()` and `do_vthread()`. >> >> The first JVMTI functions are picked first to be converted to use the `JvmtiHandshake`: >> - `GetStackTrace`, `GetFrameCount`, `GetFrameLocation`, `NotifyFramePop` >> >> To get the test results clean, the update also fixes the test issue: >> [8318631](https://bugs.openjdk.org/browse/JDK-8318631): GetStackTraceSuspendedStressTest.java failed with "check_jvmti_status: JVMTI function returned error: JVMTI_ERROR_THREAD_NOT_ALIVE (15)" >> >> Testing: >> - the mach5 tiers 1-6 are all passed > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: add jdk_internal_vm_Continuation::done(cont) check to JvmtiEnvBase::is_vthread_alive src/hotspot/share/prims/jvmtiEnvBase.cpp line 631: > 629: return !jdk_internal_vm_Continuation::done(cont) && > 630: java_lang_VirtualThread::state(vt) != java_lang_VirtualThread::NEW && > 631: java_lang_VirtualThread::state(vt) != java_lang_VirtualThread::TERMINATED; AFAIU `jdk_internal_vm_Continuation::done(cont)` is correct check that vthread is terminated and works for both mounted and unmounted vthreads. Then `java_lang_VirtualThread::state(vt) != java_lang_VirtualThread::TERMINATED` check is not needed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16460#discussion_r1398029697 From amenkov at openjdk.org Sat Nov 18 02:32:30 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Sat, 18 Nov 2023 02:32:30 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v8] In-Reply-To: References: Message-ID: On Fri, 17 Nov 2023 05:37:46 GMT, Serguei Spitsyn wrote: >> The handshakes support for virtual threads is needed to simplify the JVMTI implementation for virtual threads. There is a significant duplication in the JVMTI code to differentiate code intended to support platform, virtual threads or both. The handshakes are unified, so it is enough to define just one handshake for both platform and virtual thread. >> At the low level, the JVMTI code supporting platform and virtual threads still can be different. >> This implementation is based on the `JvmtiVTMSTransitionDisabler` class. >> >> The internal API includes two new classes: >> - `JvmtiHandshake` and `JvmtiUnifiedHandshakeClosure` >> >> The `JvmtiUnifiedHandshakeClosure` defines two different callback functions: `do_thread()` and `do_vthread()`. >> >> The first JVMTI functions are picked first to be converted to use the `JvmtiHandshake`: >> - `GetStackTrace`, `GetFrameCount`, `GetFrameLocation`, `NotifyFramePop` >> >> To get the test results clean, the update also fixes the test issue: >> [8318631](https://bugs.openjdk.org/browse/JDK-8318631): GetStackTraceSuspendedStressTest.java failed with "check_jvmti_status: JVMTI function returned error: JVMTI_ERROR_THREAD_NOT_ALIVE (15)" >> >> Testing: >> - the mach5 tiers 1-6 are all passed > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: add jdk_internal_vm_Continuation::done(cont) check to JvmtiEnvBase::is_vthread_alive src/hotspot/share/prims/jvmtiEnvBase.cpp line 1989: > 1987: } else { > 1988: Handshake::execute(hs_cl, tlh, target_jt); // delegate to Handshake implementation > 1989: } Every implementation of JvmtiUnitedHandshakeClosure has to check if the target thread is virtual and call do_vthread manually. I'd suggest to handle this by proxy class, something like Suggestion: class Adapter : public HandshakeClosure { JvmtiUnitedHandshakeClosure* _hs_cl; Handle _target_h; public: Adapter(JvmtiUnitedHandshakeClosure* hs_cl, Handle target_h) : HandshakeClosure(hs_cl->name()), _hs_cl(hs_cl), _target_h(target_h) {} virtual void do_thread(Thread* thread) { if (java_lang_VirtualThread::is_instance(_target_h())) { // virtual thread _hs_cl->do_vthread(_target_h); } else { _hs_cl->do_thread(target); } } } adapter(hs_cl, target_h); if (self) { // target thread is current adapter.do_thread(target_jt); // execute handshake closure callback on current thread directly } else { Handshake::execute(&adapter, tlh, target_jt); // delegate to Handshake implementation } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16460#discussion_r1398042638 From sspitsyn at openjdk.org Sat Nov 18 12:24:57 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 18 Nov 2023 12:24:57 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 [v3] In-Reply-To: References: Message-ID: On Fri, 17 Nov 2023 21:19:05 GMT, Daniel D. Daugherty wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: add comment for new ThreadsListHandle use > > src/hotspot/share/prims/jvmtiEventController.cpp line 374: > >> 372: // Protects any existing JavaThread's from being terminated while it is set. >> 373: // The FJP carrier thread compensating mechanism can create carrier threads concurrently. >> 374: ThreadsListHandle tlh(current); > > A ThreadsListHandle does not prevent a JavaThread from being terminated. It > prevents a JavaThread from exiting and being freed. The JavaThread is able to > set the terminated state on itself, but will not be able to complete exiting while > it is on a ThreadsListHandle. There is a subtle difference. > > There's a `target` JavaThread that is fetched from a `JvmtiThreadState` object > and that `target` JavaThread is only protected by this `tlh` if `target` is included > in the ThreadsList that was captured by this `tlh`. In all likelihood, there should be > a ThreadsListHandle farther up the stack that's protecting the JavaThread from > which the `JvmtiThreadState` object was extracted and passed to this function. > > As for carrier threads, if they are created _after_ this `tlh` was created, then this > `tlh` cannot protect them because they won't be on this `tlh`'s ThreadsList. Thank you for the comment, Dan! Agreed, the comment needs to be corrected in two aspects. I tried to simplify it but failed to do it correctly. It is interesting that there is a `ThreadsListHandle` farther up the stack but it does not help sometimes. It is observed that a `JavaThread` (of a carrier thread) referenced from the `JvmtiThreadState` object can be just created, so we need a `ThreadsListHandle` to avoid possible asserts. With this fix in place I do not see the asserts fired anymore. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1398200015 From sspitsyn at openjdk.org Sat Nov 18 14:35:30 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 18 Nov 2023 14:35:30 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v8] In-Reply-To: References: Message-ID: On Sat, 18 Nov 2023 02:29:26 GMT, Alex Menkov wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: add jdk_internal_vm_Continuation::done(cont) check to JvmtiEnvBase::is_vthread_alive > > src/hotspot/share/prims/jvmtiEnvBase.cpp line 1989: > >> 1987: } else { >> 1988: Handshake::execute(hs_cl, tlh, target_jt); // delegate to Handshake implementation >> 1989: } > > Every implementation of JvmtiUnitedHandshakeClosure has to check if the target thread is virtual and call do_vthread manually. > I'd suggest to handle this by proxy class, something like > Suggestion: > > class Adapter : public HandshakeClosure { > JvmtiUnitedHandshakeClosure* _hs_cl; > Handle _target_h; > public: > Adapter(JvmtiUnitedHandshakeClosure* hs_cl, Handle target_h) > : HandshakeClosure(hs_cl->name()), _hs_cl(hs_cl), _target_h(target_h) {} > virtual void do_thread(Thread* thread) { > if (java_lang_VirtualThread::is_instance(_target_h())) { // virtual thread > _hs_cl->do_vthread(_target_h); > } else { > _hs_cl->do_thread(target); > } > } > } adapter(hs_cl, target_h); > > if (self) { // target thread is current > adapter.do_thread(target_jt); // execute handshake closure callback on current thread directly > } else { > Handshake::execute(&adapter, tlh, target_jt); // delegate to Handshake implementation > } Thank you for the suggestion! Agreed, this should help to get rid of this duplication/ugliness. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16460#discussion_r1398218934 From sspitsyn at openjdk.org Sat Nov 18 14:38:30 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 18 Nov 2023 14:38:30 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v8] In-Reply-To: References: Message-ID: On Sat, 18 Nov 2023 01:37:21 GMT, Alex Menkov wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: add jdk_internal_vm_Continuation::done(cont) check to JvmtiEnvBase::is_vthread_alive > > src/hotspot/share/prims/jvmtiEnvBase.cpp line 631: > >> 629: return !jdk_internal_vm_Continuation::done(cont) && >> 630: java_lang_VirtualThread::state(vt) != java_lang_VirtualThread::NEW && >> 631: java_lang_VirtualThread::state(vt) != java_lang_VirtualThread::TERMINATED; > > AFAIU `jdk_internal_vm_Continuation::done(cont)` is correct check that vthread is terminated and works for both mounted and unmounted vthreads. > Then `java_lang_VirtualThread::state(vt) != java_lang_VirtualThread::TERMINATED` check is not needed Good suggestion, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16460#discussion_r1398219250 From sspitsyn at openjdk.org Sun Nov 19 00:05:49 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sun, 19 Nov 2023 00:05:49 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v9] In-Reply-To: References: Message-ID: <7oZKrqEQvbpRfz6jDNyRMU_EznRR_3W6zDmO5U-7mdU=.877f1d6b-eee4-4fa0-a088-80c685e7d320@github.com> > The handshakes support for virtual threads is needed to simplify the JVMTI implementation for virtual threads. There is a significant duplication in the JVMTI code to differentiate code intended to support platform, virtual threads or both. The handshakes are unified, so it is enough to define just one handshake for both platform and virtual thread. > At the low level, the JVMTI code supporting platform and virtual threads still can be different. > This implementation is based on the `JvmtiVTMSTransitionDisabler` class. > > The internal API includes two new classes: > - `JvmtiHandshake` and `JvmtiUnifiedHandshakeClosure` > > The `JvmtiUnifiedHandshakeClosure` defines two different callback functions: `do_thread()` and `do_vthread()`. > > The first JVMTI functions are picked first to be converted to use the `JvmtiHandshake`: > - `GetStackTrace`, `GetFrameCount`, `GetFrameLocation`, `NotifyFramePop` > > To get the test results clean, the update also fixes the test issue: > [8318631](https://bugs.openjdk.org/browse/JDK-8318631): GetStackTraceSuspendedStressTest.java failed with "check_jvmti_status: JVMTI function returned error: JVMTI_ERROR_THREAD_NOT_ALIVE (15)" > > Testing: > - the mach5 tiers 1-6 are all passed Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: added AdapterClosure to get rid of duplication ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16460/files - new: https://git.openjdk.org/jdk/pull/16460/files/e61d0703..fefeb7f1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16460&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16460&range=07-08 Stats: 47 lines in 2 files changed: 20 ins; 22 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16460.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16460/head:pull/16460 PR: https://git.openjdk.org/jdk/pull/16460 From dcubed at openjdk.org Sun Nov 19 03:50:34 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Sun, 19 Nov 2023 03:50:34 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 [v3] In-Reply-To: References: Message-ID: On Sat, 18 Nov 2023 12:22:10 GMT, Serguei Spitsyn wrote: >> src/hotspot/share/prims/jvmtiEventController.cpp line 374: >> >>> 372: // Protects any existing JavaThread's from being terminated while it is set. >>> 373: // The FJP carrier thread compensating mechanism can create carrier threads concurrently. >>> 374: ThreadsListHandle tlh(current); >> >> A ThreadsListHandle does not prevent a JavaThread from being terminated. It >> prevents a JavaThread from exiting and being freed. The JavaThread is able to >> set the terminated state on itself, but will not be able to complete exiting while >> it is on a ThreadsListHandle. There is a subtle difference. >> >> There's a `target` JavaThread that is fetched from a `JvmtiThreadState` object >> and that `target` JavaThread is only protected by this `tlh` if `target` is included >> in the ThreadsList that was captured by this `tlh`. In all likelihood, there should be >> a ThreadsListHandle farther up the stack that's protecting the JavaThread from >> which the `JvmtiThreadState` object was extracted and passed to this function. >> >> As for carrier threads, if they are created _after_ this `tlh` was created, then this >> `tlh` cannot protect them because they won't be on this `tlh`'s ThreadsList. > > Thank you for the comment, Dan! > Agreed, the comment needs to be corrected in two aspects. > I tried to simplify it but failed to do it correctly. > It is interesting that there is a `ThreadsListHandle` farther up the stack but it does not help sometimes. > It is observed that a `JavaThread` (of a carrier thread) referenced from the `JvmtiThreadState` object can be just created, so we need a `ThreadsListHandle` to avoid possible asserts. With this fix in place I do not see the asserts fired anymore. @sspitsyn - Please point me at the code where these JavaThreads are newly created? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1398317396 From sspitsyn at openjdk.org Sun Nov 19 05:12:50 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sun, 19 Nov 2023 05:12:50 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v10] In-Reply-To: References: Message-ID: > The handshakes support for virtual threads is needed to simplify the JVMTI implementation for virtual threads. There is a significant duplication in the JVMTI code to differentiate code intended to support platform, virtual threads or both. The handshakes are unified, so it is enough to define just one handshake for both platform and virtual thread. > At the low level, the JVMTI code supporting platform and virtual threads still can be different. > This implementation is based on the `JvmtiVTMSTransitionDisabler` class. > > The internal API includes two new classes: > - `JvmtiHandshake` and `JvmtiUnifiedHandshakeClosure` > > The `JvmtiUnifiedHandshakeClosure` defines two different callback functions: `do_thread()` and `do_vthread()`. > > The first JVMTI functions are picked first to be converted to use the `JvmtiHandshake`: > - `GetStackTrace`, `GetFrameCount`, `GetFrameLocation`, `NotifyFramePop` > > To get the test results clean, the update also fixes the test issue: > [8318631](https://bugs.openjdk.org/browse/JDK-8318631): GetStackTraceSuspendedStressTest.java failed with "check_jvmti_status: JVMTI function returned error: JVMTI_ERROR_THREAD_NOT_ALIVE (15)" > > Testing: > - the mach5 tiers 1-6 are all passed Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: remove java_lang_VirtualThread::NEW check from is_vthread_alive ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16460/files - new: https://git.openjdk.org/jdk/pull/16460/files/fefeb7f1..9db5a300 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16460&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16460&range=08-09 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16460.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16460/head:pull/16460 PR: https://git.openjdk.org/jdk/pull/16460 From sspitsyn at openjdk.org Sun Nov 19 05:12:51 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sun, 19 Nov 2023 05:12:51 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v8] In-Reply-To: References: Message-ID: <9fmaSnjImG3zuawScS_8mp_0psnV3t0SOKqQuKS0nnA=.bc8bc845-2b4c-4dc9-8743-cad88ee508c4@github.com> On Sat, 18 Nov 2023 14:35:58 GMT, Serguei Spitsyn wrote: >> src/hotspot/share/prims/jvmtiEnvBase.cpp line 631: >> >>> 629: return !jdk_internal_vm_Continuation::done(cont) && >>> 630: java_lang_VirtualThread::state(vt) != java_lang_VirtualThread::NEW && >>> 631: java_lang_VirtualThread::state(vt) != java_lang_VirtualThread::TERMINATED; >> >> AFAIU `jdk_internal_vm_Continuation::done(cont)` is correct check that vthread is terminated and works for both mounted and unmounted vthreads. >> Then `java_lang_VirtualThread::state(vt) != java_lang_VirtualThread::TERMINATED` check is not needed > > Good suggestion, thanks! Added the fix suggested by Alex. >> src/hotspot/share/prims/jvmtiEnvBase.cpp line 1989: >> >>> 1987: } else { >>> 1988: Handshake::execute(hs_cl, tlh, target_jt); // delegate to Handshake implementation >>> 1989: } >> >> Every implementation of JvmtiUnitedHandshakeClosure has to check if the target thread is virtual and call do_vthread manually. >> I'd suggest to handle this by proxy class, something like >> Suggestion: >> >> class Adapter : public HandshakeClosure { >> JvmtiUnitedHandshakeClosure* _hs_cl; >> Handle _target_h; >> public: >> Adapter(JvmtiUnitedHandshakeClosure* hs_cl, Handle target_h) >> : HandshakeClosure(hs_cl->name()), _hs_cl(hs_cl), _target_h(target_h) {} >> virtual void do_thread(Thread* thread) { >> if (java_lang_VirtualThread::is_instance(_target_h())) { // virtual thread >> _hs_cl->do_vthread(_target_h); >> } else { >> _hs_cl->do_thread(target); >> } >> } >> } adapter(hs_cl, target_h); >> >> if (self) { // target thread is current >> adapter.do_thread(target_jt); // execute handshake closure callback on current thread directly >> } else { >> Handshake::execute(&adapter, tlh, target_jt); // delegate to Handshake implementation >> } > > Thank you for the suggestion! Agreed, this should help to get rid of this duplication/ugliness. Added the fix suggested by Alex. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16460#discussion_r1398324679 PR Review Comment: https://git.openjdk.org/jdk/pull/16460#discussion_r1398324619 From sspitsyn at openjdk.org Sun Nov 19 09:26:28 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sun, 19 Nov 2023 09:26:28 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 [v3] In-Reply-To: References: Message-ID: <1P_TddTTM1eH75Do2Xq-wBrxXSdh7GzJJlgEBH_dSNo=.94392ab2-12a1-4d1b-9131-6164bbb76e7d@github.com> On Sun, 19 Nov 2023 03:47:45 GMT, Daniel D. Daugherty wrote: >> Thank you for the comment, Dan! >> Agreed, the comment needs to be corrected in two aspects. >> I tried to simplify it but failed to do it correctly. >> It is interesting that there is a `ThreadsListHandle` farther up the stack but it does not help sometimes. >> It is observed that a `JavaThread` (of a carrier thread) referenced from the `JvmtiThreadState` object can be just created, so we need a `ThreadsListHandle` to avoid possible asserts. With this fix in place I do not see the asserts fired anymore. > > @sspitsyn - Please point me at the code where these JavaThreads are newly created? @dcubed-ojdk I don't know FJP implementation well enough to point at the code where it happens. However, I observe that new `JavaThread `is being created between two points of the execution path. - First point is in the `JvmtiEventControllerPrivate::recompute_enabled()` at the line where a `ThreadsListHandle` is set. I've added a trap checking if any `JavaThread` pointed by `state->get_thread()` is not protected by the `tlh`. I can see this trap is not fired (I can't say it has never been fired). - Second point is in the `JvmtiEventControllerPrivate::enter_interp_only_mode()`. If a `ThreadsListHandle` is NOT set then I can observe a `JavaThread` referenced by the state->get_thread() which is not protected by any TLH. It a TLH added into `JvmtiEventControllerPrivate::enter_interp_only_mode()` then this `JavaThread` is observed as protected by TLH. I can provide a full stack trace for this `JavaThread` consisting of two parts: carrier thread and virtual thread frames. The name of carrier thread is `ForkJoinPool-1-worker-1`. The virtual thread is a tested virtual thread. The thread dump looks las below: DBG: enter_interp_only_mode: target: 0x7f93f8043d00 virt: 1 carrier: ForkJoinPool-1-worker-1 DBG: ##### NATIVE stacktrace of JavaThread: 0x7f93f8043d00 ##### Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) j fieldacc02.check(Ljava/lang/Object;)I+0 j fieldacc02.lambda$testVirtualThread$0()V+12 j fieldacc02$$Lambda+0x00007f943b001428.run()V+0 j java.lang.Thread.runWith(Ljava/lang/Object;Ljava/lang/Runnable;)V+5 java.base at 22-internal j java.lang.VirtualThread.run(Ljava/lang/Runnable;)V+66 java.base at 22-internal j java.lang.VirtualThread$VThreadContinuation$1.run()V+8 java.base at 22-internal j jdk.internal.vm.Continuation.enter0()V+4 java.base at 22-internal j jdk.internal.vm.Continuation.enter(Ljdk/internal/vm/Continuation;Z)V+1 java.base at 22-internal J 124 jdk.internal.vm.Continuation.enterSpecial(Ljdk/internal/vm/Continuation;ZZ)V java.base at 22-internal (0 bytes) @ 0x00007f94c7cdf744 [0x00007f94c7cdf5e0+0x0000000000000164] j jdk.internal.vm.Continuation.run()V+122 java.base at 22-internal j java.lang.VirtualThread.runContinuation()V+70 java.base at 22-internal j java.lang.VirtualThread$$Lambda+0x00007f943b0496c0.run()V+4 java.base at 22-internal j java.util.concurrent.ForkJoinTask$RunnableExecuteAction.compute()Ljava/lang/Void;+4 java.base at 22-internal j java.util.concurrent.ForkJoinTask$RunnableExecuteAction.compute()Ljava/lang/Object;+1 java.base at 22-internal j java.util.concurrent.ForkJoinTask$InterruptibleTask.exec()Z+51 java.base at 22-internal j java.util.concurrent.ForkJoinTask.doExec()V+10 java.base at 22-internal j java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Ljava/util/concurrent/ForkJoinTask;Ljava/util/concurrent/ForkJoinPool$WorkQueue;I)V+49 java.base at 22-internal j java.util.concurrent.ForkJoinPool.scan(Ljava/util/concurrent/ForkJoinPool$WorkQueue;JI)J+271 java.base at 22-internal j java.util.concurrent.ForkJoinPool.runWorker(Ljava/util/concurrent/ForkJoinPool$WorkQueue;)V+68 java.base at 22-internal j java.util.concurrent.ForkJoinWorkerThread.run()V+31 java.base at 22-internal v ~StubRoutines::call_stub 0x00007f94c7505d21 V [libjvm.so+0xe7b719] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x4a9 (javaCalls.cpp:415) V [libjvm.so+0xe7bdd5] JavaCalls::call_virtual(JavaValue*, Klass*, Symbol*, Symbol*, JavaCallArguments*, JavaThread*)+0x345 (javaCalls.cpp:329) V [libjvm.so+0xe7bff6] JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, JavaThread*)+0x76 (javaCalls.cpp:191) V [libjvm.so+0xfd9723] thread_entry(JavaThread*, JavaThread*)+0x93 (jvm.cpp:2937) V [libjvm.so+0xeb06ac] JavaThread::thread_main_inner()+0xcc (javaThread.cpp:720) V [libjvm.so+0x1789496] Thread::call_run()+0xb6 (thread.cpp:220) V [libjvm.so+0x1493b27] thread_native_entry(Thread*)+0x127 (os_linux.cpp:787) The observed `JavaThead` does not look as a garbage because no crashes has ever been observed. Apparently, it has been recently created because it is not protected by the TLH which was set in the `JvmtiEventControllerPrivate::recompute_enabled()`. I guess, it has to be possible to find out where exactly in the FJP code it is created but it will take time. I'm not sure why do you need it though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1398362375 From sspitsyn at openjdk.org Sun Nov 19 09:44:28 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sun, 19 Nov 2023 09:44:28 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 [v3] In-Reply-To: <1P_TddTTM1eH75Do2Xq-wBrxXSdh7GzJJlgEBH_dSNo=.94392ab2-12a1-4d1b-9131-6164bbb76e7d@github.com> References: <1P_TddTTM1eH75Do2Xq-wBrxXSdh7GzJJlgEBH_dSNo=.94392ab2-12a1-4d1b-9131-6164bbb76e7d@github.com> Message-ID: On Sun, 19 Nov 2023 09:22:43 GMT, Serguei Spitsyn wrote: >> @sspitsyn - Please point me at the code where these JavaThreads are newly created? > > @dcubed-ojdk > I don't know FJP implementation well enough to point at the code where it happens. However, I observe that new `JavaThread `is being created between two points of the execution path. > - First point is in the `JvmtiEventControllerPrivate::recompute_enabled()` at the line where a `ThreadsListHandle` is set. I've added a trap checking if any `JavaThread` pointed by `state->get_thread()` is not protected by the `tlh`. I can see this trap is not fired (I can't say it has never been fired). > - Second point is in the `JvmtiEventControllerPrivate::enter_interp_only_mode()`. If a `ThreadsListHandle` is NOT set then I can observe a `JavaThread` referenced by the state->get_thread() which is not protected by any TLH. It a TLH added into `JvmtiEventControllerPrivate::enter_interp_only_mode()` then this `JavaThread` is observed as protected by TLH. > I can provide a full stack trace for this `JavaThread` consisting of two parts: carrier thread and virtual thread frames. The name of carrier thread is `ForkJoinPool-1-worker-1`. The virtual thread is a tested virtual thread. > The thread dump looks las below: > > DBG: enter_interp_only_mode: target: 0x7f93f8043d00 virt: 1 carrier: ForkJoinPool-1-worker-1 > DBG: ##### NATIVE stacktrace of JavaThread: 0x7f93f8043d00 ##### > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > j fieldacc02.check(Ljava/lang/Object;)I+0 > j fieldacc02.lambda$testVirtualThread$0()V+12 > j fieldacc02$$Lambda+0x00007f943b001428.run()V+0 > j java.lang.Thread.runWith(Ljava/lang/Object;Ljava/lang/Runnable;)V+5 java.base at 22-internal > j java.lang.VirtualThread.run(Ljava/lang/Runnable;)V+66 java.base at 22-internal > j java.lang.VirtualThread$VThreadContinuation$1.run()V+8 java.base at 22-internal > j jdk.internal.vm.Continuation.enter0()V+4 java.base at 22-internal > j jdk.internal.vm.Continuation.enter(Ljdk/internal/vm/Continuation;Z)V+1 java.base at 22-internal > J 124 jdk.internal.vm.Continuation.enterSpecial(Ljdk/internal/vm/Continuation;ZZ)V java.base at 22-internal (0 bytes) @ 0x00007f94c7cdf744 [0x00007f94c7cdf5e0+0x0000000000000164] > j jdk.internal.vm.Continuation.run()V+122 java.base at 22-internal > j java.lang.VirtualThread.runContinuation()V+70 java.base at 22-internal > j java.lang.VirtualThread$$Lambda+0x00007f943b0496c0.run()V+4 java.base at 22-internal > j java.util.concurrent.ForkJoinTask$RunnableExecuteAction.compute()Ljava/lang/Void;+4 java.base at 22-internal > j java.util.concurrent.ForkJoinTask$RunnableExecuteAction.compute(... The stack trace of current thread (where the assert was fired) can explain what is going on a little bit: Current thread (0x00007f93f8043d00): JavaThread "ForkJoinPool-1-worker-1" daemon [_thread_in_vm, id=16779, stack(0x00007f948a597000,0x00007f948a697000) (1024K)] Stack: [0x00007f948a597000,0x00007f948a697000], sp=0x00007f948a6949e0, free space=1014k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x117937d] JvmtiEventControllerPrivate::enter_interp_only_mode(JvmtiThreadState*)+0x45d (jvmtiEventController.cpp:402) V [libjvm.so+0x1179520] JvmtiEventControllerPrivate::recompute_thread_enabled(JvmtiThreadState*) [clone .part.0]+0x190 (jvmtiEventController.cpp:632) V [libjvm.so+0x117a1e1] JvmtiEventControllerPrivate::thread_started(JavaThread*)+0x351 (jvmtiEventController.cpp:1174) V [libjvm.so+0x117e608] JvmtiExport::get_jvmti_thread_state(JavaThread*)+0x98 (jvmtiExport.cpp:424) V [libjvm.so+0x118a86c] JvmtiExport::post_field_access(JavaThread*, Method*, unsigned char*, Klass*, Handle, _jfieldID*)+0x6c (jvmtiExport.cpp:2214) V [libjvm.so+0x118b3a1] JvmtiExport::post_field_access_by_jni(JavaThread*, oop, Klass*, _jfieldID*, bool)+0x321 (jvmtiExport.cpp:2202) V [libjvm.so+0x118b4e9] JvmtiExport::jni_GetField_probe(JavaThread*, _jobject*, oop, Klass*, _jfieldID*, bool)+0x79 (jvmtiExport.cpp:2168) V [libjvm.so+0xf83847] jni_GetStaticBooleanField+0x257 (jni.cpp:2047) C [libfieldacc02.so+0x379b] Java_fieldacc02_check+0x6b (jni.h:1546) j fieldacc02.check(Ljava/lang/Object;)I+0 j fieldacc02.lambda$testVirtualThread$0()V+12 j fieldacc02$$Lambda+0x00007f943b001428.run()V+0 j java.lang.Thread.runWith(Ljava/lang/Object;Ljava/lang/Runnable;)V+5 java.base at 22-internal j java.lang.VirtualThread.run(Ljava/lang/Runnable;)V+66 java.base at 22-internal j java.lang.VirtualThread$VThreadContinuation$1.run()V+8 java.base at 22-internal j jdk.internal.vm.Continuation.enter0()V+4 java.base at 22-internal j jdk.internal.vm.Continuation.enter(Ljdk/internal/vm/Continuation;Z)V+1 java.base at 22-internal J 124 jdk.internal.vm.Continuation.enterSpecial(Ljdk/internal/vm/Continuation;ZZ)V java.base at 22-internal (0 bytes) @ 0x00007f94c7cdf744 [0x00007f94c7cdf5e0+0x0000000000000164] j jdk.internal.vm.Continuation.run()V+122 java.base at 22-internal j java.lang.VirtualThread.runContinuation()V+70 java.base at 22-internal j java.lang.VirtualThread$$Lambda+0x00007f943b0496c0.run()V+4 java.base at 22-internal j java.util.concurrent.ForkJoinTask$RunnableExecuteAction.compute()Ljava/lang/Void;+4 java.base at 22-internal j java.util.concurrent.ForkJoinTask$RunnableExecuteAction.compute()Ljava/lang/Object;+1 java.base at 22-internal j java.util.concurrent.ForkJoinTask$InterruptibleTask.exec()Z+51 java.base at 22-internal j java.util.concurrent.ForkJoinTask.doExec()V+10 java.base at 22-internal j java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Ljava/util/concurrent/ForkJoinTask;Ljava/util/concurrent/ForkJoinPool$WorkQueue;I)V+49 java.base at 22-internal j java.util.concurrent.ForkJoinPool.scan(Ljava/util/concurrent/ForkJoinPool$WorkQueue;JI)J+271 java.base at 22-internal j java.util.concurrent.ForkJoinPool.runWorker(Ljava/util/concurrent/ForkJoinPool$WorkQueue;)V+68 java.base at 22-internal j java.util.concurrent.ForkJoinWorkerThread.run()V+31 java.base at 22-internal v ~StubRoutines::call_stub 0x00007f94c7505d21 V [libjvm.so+0xe7b719] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x4a9 (javaCalls.cpp:415) V [libjvm.so+0xe7bdd5] JavaCalls::call_virtual(JavaValue*, Klass*, Symbol*, Symbol*, JavaCallArguments*, JavaThread*)+0x345 (javaCalls.cpp:329) V [libjvm.so+0xe7bff6] JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, JavaThread*)+0x76 (javaCalls.cpp:191) V [libjvm.so+0xfd9723] thread_entry(JavaThread*, JavaThread*)+0x93 (jvm.cpp:2937) V [libjvm.so+0xeb06ac] JavaThread::thread_main_inner()+0xcc (javaThread.cpp:720) V [libjvm.so+0x1789496] Thread::call_run()+0xb6 (thread.cpp:220) V [libjvm.so+0x1493b27] thread_native_entry(Thread*)+0x127 (os_linux.cpp:787) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1398365938 From iwalulya at openjdk.org Sun Nov 19 17:02:35 2023 From: iwalulya at openjdk.org (Ivan Walulya) Date: Sun, 19 Nov 2023 17:02:35 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v6] In-Reply-To: <9a-07LkAZPw_SiOpo1uO8uA9i66AVNi9OkJjNZfSHU0=.8f4feee9-2b76-4006-a5ca-f0ce2d71c6b0@github.com> References: <9a-07LkAZPw_SiOpo1uO8uA9i66AVNi9OkJjNZfSHU0=.8f4feee9-2b76-4006-a5ca-f0ce2d71c6b0@github.com> Message-ID: On Tue, 14 Nov 2023 14:37:58 GMT, Aleksey Shipilev wrote: >> See the symptoms, reproducer and analysis in the bug. >> >> Current code waits on `disarm()`, which effectively stalls leaving the safepoint if some threads lag behind. Having more runnable threads than CPUs nearly guarantees that we would wait for quite some time, but it also reproduces well if you have enough threads near the CPU count. Just waiting at `arm()` is insufficient, but we can have several `Semaphores` to do what we want. >> >> This PR implements a more efficient `GenericWaitBarrier` to recover the performance. Most of the implementation discussion is in the code comments. The key observation that drives this work is that we want to reuse `Semaphore` and related counters without being stuck waiting for threads to leave. >> >> (AFAICS, futex-based `LinuxWaitBarrier` does roughly the same, but handles this reuse on futex side, by assigning the "address" per futex.) >> >> This issue affects everything except Linux. I initially found this on my M1 Mac, but pretty sure it is easy to reproduce on Windows as well. The safepoints from the reproducer in the bug improved dramatically on a Mac. Note not only the orders of magnitude better safepoint times, but also the several times more GC safepoints in the time-bound allocation test, which means the attainable GC throughput is similarly better, since we don't waste time at this wait barrier. >> >> ![plot-generic-wait-barrier-macos](https://github.com/openjdk/jdk/assets/1858943/3440f200-2360-482a-b8b2-71385cca35b7) >> >> Additional testing: >> - [x] MacOS AArch64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] MacOS AArch64 server fastdebug, `tier2 tier3` >> - [x] Linux x86_64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - Drop the Linux check in preparation for integration > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Rework paddings > - Encode barrier tag into state, resolving another race condition > - Simple review feedback fixes: tracking wakeup numbers, reflowing some methods > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Touchups > - More comments work > - Tight up the comments > - ... and 3 more: https://git.openjdk.org/jdk/compare/2277bc30...191c0dbb LGTM! src/hotspot/share/utilities/waitBarrier_generic.cpp line 119: > 117: void GenericWaitBarrier::Cell::arm(int32_t requested_tag) { > 118: // Before we continue to arm, we need to make sure that all threads > 119: // have left the previous cell. // have left after the previous usage of the cell. ------------- Marked as reviewed by iwalulya (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16404#pullrequestreview-1738568875 PR Review Comment: https://git.openjdk.org/jdk/pull/16404#discussion_r1398349635 From kvn at openjdk.org Sun Nov 19 20:50:50 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 19 Nov 2023 20:50:50 GMT Subject: RFR: 8320272: Make method_entry_barrier address shared [v2] In-Reply-To: <32KZcukg8d6Fjn0bnPQIa-X4q_o8DU0XtcHYA5YT468=.c9f33a2d-215d-404c-822b-448c8fd443d4@github.com> References: <32KZcukg8d6Fjn0bnPQIa-X4q_o8DU0XtcHYA5YT468=.c9f33a2d-215d-404c-822b-448c8fd443d4@github.com> Message-ID: > Currently all platforms have declared their own address variable for method_entry_barrier stub. Some have even slightly different name: nmethod_entry_barrier. For Leyden project one address is preferable. > In aarch64 code changed `movptr` to `lea` instruction to get relocation info as on x86. > > Tested x86 and aarch64, tier1-4, xcomp, stress. I need help to test on other platforms. Thanks! Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: address comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16708/files - new: https://git.openjdk.org/jdk/pull/16708/files/359135e5..3af07981 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16708&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16708&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16708/head:pull/16708 PR: https://git.openjdk.org/jdk/pull/16708 From kvn at openjdk.org Sun Nov 19 20:50:51 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 19 Nov 2023 20:50:51 GMT Subject: RFR: 8320272: Make method_entry_barrier address shared [v2] In-Reply-To: References: <32KZcukg8d6Fjn0bnPQIa-X4q_o8DU0XtcHYA5YT468=.c9f33a2d-215d-404c-822b-448c8fd443d4@github.com> Message-ID: On Sat, 18 Nov 2023 00:24:47 GMT, Tom Rodriguez wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> address comment > > src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp line 150: > >> 148: thread_disarmed_guard_value_offset = in_bytes(bs_nm->thread_disarmed_guard_value_offset()); >> 149: AMD64_ONLY(nmethod_entry_barrier = StubRoutines::method_entry_barrier()); >> 150: AARCH64_ONLY(nmethod_entry_barrier = StubRoutines::method_entry_barrier()); > > Now that's there's a single name you can remove the 2 per arch definitions in favor of single assignment statement. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16708#discussion_r1398491708 From iklam at openjdk.org Sun Nov 19 22:52:02 2023 From: iklam at openjdk.org (Ioi Lam) Date: Sun, 19 Nov 2023 22:52:02 GMT Subject: RFR: 8320147: Remove DumpSharedSpaces [v2] In-Reply-To: References: Message-ID: > One more PR for cleanup with cdsConfig.hpp: > > Replace the global variable `DumpSharedSpaces` with `CDSConfig::is_dumping_static_archive()`. > > Note: some mis-uses of `DumpSharedSpaces` need to be replaced with `CDSConfig::is_dumping_heap()` or `CDSConfig::is_dumping_full_module_graph()` Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into 8320147-remove-DumpSharedSpaces - @calvinccheung comments - fixed copyright - 8320147: Remove DumpSharedSpaces ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16700/files - new: https://git.openjdk.org/jdk/pull/16700/files/613b5de2..328972de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16700&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16700&range=00-01 Stats: 11144 lines in 236 files changed: 4586 ins; 4441 del; 2117 mod Patch: https://git.openjdk.org/jdk/pull/16700.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16700/head:pull/16700 PR: https://git.openjdk.org/jdk/pull/16700 From dholmes at openjdk.org Mon Nov 20 01:02:30 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 20 Nov 2023 01:02:30 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 [v3] In-Reply-To: References: <1P_TddTTM1eH75Do2Xq-wBrxXSdh7GzJJlgEBH_dSNo=.94392ab2-12a1-4d1b-9131-6164bbb76e7d@github.com> Message-ID: <-CGha1yFmQNPbT7s6BtZ0iJFxmPgzoSnozx4pgIZlA4=.77aa51ba-ebcb-419a-9651-dbadb8ef9e91@github.com> On Sun, 19 Nov 2023 09:41:34 GMT, Serguei Spitsyn wrote: >> @dcubed-ojdk >> I don't know FJP implementation well enough to point at the code where it happens. However, I observe that new `JavaThread `is being created between two points of the execution path. >> - First point is in the `JvmtiEventControllerPrivate::recompute_enabled()` at the line where a `ThreadsListHandle` is set. I've added a trap checking if any `JavaThread` pointed by `state->get_thread()` is not protected by the `tlh`. I can see this trap is not fired (I can't say it has never been fired). >> - Second point is in the `JvmtiEventControllerPrivate::enter_interp_only_mode()`. If a `ThreadsListHandle` is NOT set then I can observe a `JavaThread` referenced by the state->get_thread() which is not protected by any TLH. It a TLH added into `JvmtiEventControllerPrivate::enter_interp_only_mode()` then this `JavaThread` is observed as protected by TLH. >> I can provide a full stack trace for this `JavaThread` consisting of two parts: carrier thread and virtual thread frames. The name of carrier thread is `ForkJoinPool-1-worker-1`. The virtual thread is a tested virtual thread. >> The thread dump looks las below: >> >> DBG: enter_interp_only_mode: target: 0x7f93f8043d00 virt: 1 carrier: ForkJoinPool-1-worker-1 >> DBG: ##### NATIVE stacktrace of JavaThread: 0x7f93f8043d00 ##### >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> j fieldacc02.check(Ljava/lang/Object;)I+0 >> j fieldacc02.lambda$testVirtualThread$0()V+12 >> j fieldacc02$$Lambda+0x00007f943b001428.run()V+0 >> j java.lang.Thread.runWith(Ljava/lang/Object;Ljava/lang/Runnable;)V+5 java.base at 22-internal >> j java.lang.VirtualThread.run(Ljava/lang/Runnable;)V+66 java.base at 22-internal >> j java.lang.VirtualThread$VThreadContinuation$1.run()V+8 java.base at 22-internal >> j jdk.internal.vm.Continuation.enter0()V+4 java.base at 22-internal >> j jdk.internal.vm.Continuation.enter(Ljdk/internal/vm/Continuation;Z)V+1 java.base at 22-internal >> J 124 jdk.internal.vm.Continuation.enterSpecial(Ljdk/internal/vm/Continuation;ZZ)V java.base at 22-internal (0 bytes) @ 0x00007f94c7cdf744 [0x00007f94c7cdf5e0+0x0000000000000164] >> j jdk.internal.vm.Continuation.run()V+122 java.base at 22-internal >> j java.lang.VirtualThread.runContinuation()V+70 java.base at 22-internal >> j java.lang.VirtualThread$$Lambda+0x00007f943b0496c0.run()V+4 java.base at 22-internal >> j java.util.concurrent.ForkJoinTask$RunnableExecuteAction.compute()Ljava/lang/Void;+4 java.base at 22-internal >> j java.util.concur... > > The stack trace of current thread (where the assert was fired) can explain what is going on a little bit: > > Current thread (0x00007f93f8043d00): JavaThread "ForkJoinPool-1-worker-1" daemon [_thread_in_vm, id=16779, stack(0x00007f948a597000,0x00007f948a697000) (1024K)] > > Stack: [0x00007f948a597000,0x00007f948a697000], sp=0x00007f948a6949e0, free space=1014k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x117937d] JvmtiEventControllerPrivate::enter_interp_only_mode(JvmtiThreadState*)+0x45d (jvmtiEventController.cpp:402) > V [libjvm.so+0x1179520] JvmtiEventControllerPrivate::recompute_thread_enabled(JvmtiThreadState*) [clone .part.0]+0x190 (jvmtiEventController.cpp:632) > V [libjvm.so+0x117a1e1] JvmtiEventControllerPrivate::thread_started(JavaThread*)+0x351 (jvmtiEventController.cpp:1174) > V [libjvm.so+0x117e608] JvmtiExport::get_jvmti_thread_state(JavaThread*)+0x98 (jvmtiExport.cpp:424) > V [libjvm.so+0x118a86c] JvmtiExport::post_field_access(JavaThread*, Method*, unsigned char*, Klass*, Handle, _jfieldID*)+0x6c (jvmtiExport.cpp:2214) > V [libjvm.so+0x118b3a1] JvmtiExport::post_field_access_by_jni(JavaThread*, oop, Klass*, _jfieldID*, bool)+0x321 (jvmtiExport.cpp:2202) > V [libjvm.so+0x118b4e9] JvmtiExport::jni_GetField_probe(JavaThread*, _jobject*, oop, Klass*, _jfieldID*, bool)+0x79 (jvmtiExport.cpp:2168) > V [libjvm.so+0xf83847] jni_GetStaticBooleanField+0x257 (jni.cpp:2047) > C [libfieldacc02.so+0x379b] Java_fieldacc02_check+0x6b (jni.h:1546) > j fieldacc02.check(Ljava/lang/Object;)I+0 > j fieldacc02.lambda$testVirtualThread$0()V+12 > j fieldacc02$$Lambda+0x00007f943b001428.run()V+0 > j java.lang.Thread.runWith(Ljava/lang/Object;Ljava/lang/Runnable;)V+5 java.base at 22-internal > j java.lang.VirtualThread.run(Ljava/lang/Runnable;)V+66 java.base at 22-internal > j java.lang.VirtualThread$VThreadContinuation$1.run()V+8 java.base at 22-internal > j jdk.internal.vm.Continuation.enter0()V+4 java.base at 22-internal > j jdk.internal.vm.Continuation.enter(Ljdk/internal/vm/Continuation;Z)V+1 java.base at 22-internal > J 124 jdk.internal.vm.Continuation.enterSpecial(Ljdk/internal/vm/Continuation;ZZ)V java.base at 22-internal (0 bytes) @ 0x00007f94c7cdf744 [0x00007f94c7cdf5e0+0x0000000000000164] > j jdk.internal.vm.Continuation.run()V+122 java.base at 22-internal > j java.lang.VirtualThread.runContinuation()V+70 java.base at 22-internal > j java.lang.VirtualThread$$Lambda+0x00007f943b0496c0.run()V+4 java.base at 22-internal > j java.util.concur... Just to re-iterate what Dan was saying, the TLH is only of use if you are accessing threads known to be included in the TLH. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1398540533 From xgong at openjdk.org Mon Nov 20 01:55:39 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 20 Nov 2023 01:55:39 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: References: Message-ID: On Thu, 16 Nov 2023 13:00:03 GMT, Magnus Ihse Bursie wrote: >> Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Add a bundled native lib in jdk as a bridge to libsleef >> - Merge 'jdk:master' into JDK-8312425 >> - Disable sleef by default >> - Merge 'jdk:master' into JDK-8312425 >> - 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF > > doc/building.md line 552: > >> 550: >> 551: libsleef, the [SIMD Library for Evaluating Elementary Functions]( >> 552: https://sleef.org/) is required when building libvmath.so on Linux+AArch64 > > The conventional way we have refered to os/cpu combinations in the build documentation is like this: `Linux/aarch64`. > > I also think you need to expand a bit that this is optional, and if you do not provide libsleef, the build will succeed but without the vector performance enhancements provided by libvmath. Thanks for the review! This sounds good to me. I will add it. > make/autoconf/lib-vmath.m4 line 49: > >> 47: test -e ${with_libsleef}/include/sleef.h; then >> 48: LIBSLEEF_FOUND=yes >> 49: LIBVMATH_LIBS="-L${with_libsleef}/lib" > > This should be LIBSLEEF_LIBS and ...CFLAGS. Seems as above. The target library is `libvmath.so`, and the cflags/libs are used for building it instead of `libsleef.so`. > make/autoconf/lib-vmath.m4 line 92: > >> 90: [] >> 91: ) >> 92: AC_MSG_RESULT([${SVE_FEATURE_SUPPORT}]) > > What is this test even for? I can't see any usage of SVE_FEATURE_SUPPORT outside this function. This is just used to print the result of `AC_MSG_CEHCKING[if ARM SVE feature is supported]` in configure. > make/autoconf/lib-vmath.m4 line 102: > >> 100: fi >> 101: >> 102: AC_SUBST(LIBSLEEF_FOUND) > > Do not export LIBSLEEF_FOUND. It is okay to use internally here, but you should instead export ENABLE_LIBSLEEF, using true/false (instead of yes/no). This is the way we handle all other optional components. Make sense to me. Thanks for the comment! > make/autoconf/libraries.m4 line 129: > >> 127: LIB_SETUP_LIBFFI >> 128: LIB_SETUP_MISC_LIBS >> 129: LIB_SETUP_VMATH > > The function (and file) should be named after "sleef", not "vmath". Yes, it seems weird. But the library we want to built out is `libvmath.so` instead of `libsleef.so`. And we not only check the sleef library, but also the ARM SVE feature inside it. So using `VMATH` suffix is more reasonable to me. WDYT? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1398574834 PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1398573871 PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1398571212 PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1398575161 PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1398572906 From qamai at openjdk.org Mon Nov 20 02:38:41 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 20 Nov 2023 02:38:41 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v8] In-Reply-To: References: <97IBSrr12htoiw751JlhL4f7jiEZeoYVF9hQjas8vrI=.a7143156-e1d5-4774-ba4b-08e29eb05389@github.com> <648SrHxCX6_kRX7cmxyGurxOWecLTWUw0_C79J_okbo=.473eaa8c-7dfc-4aa0-839d-bb580cc9d312@github.com> Message-ID: On Fri, 17 Nov 2023 13:19:40 GMT, Afshin Zafari wrote: >> src/hotspot/share/utilities/growableArray.hpp line 213: >> >>> 211: >>> 212: template >>> 213: int find(T* token, F f) const { >> >> Should be >> >> template >> int find(F f) const { >> for (int i = 0; i < _len; i++) { >> if (f(_data[i]) { >> return i; >> } >> } >> return -1; >> } > > We need `token` to find it in the array, don't we? All the invocations pass such a function with two parameters. The change here needs all invocations to be changed. No, it can be embedded into the function object. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15418#discussion_r1397336104 From qamai at openjdk.org Mon Nov 20 02:38:42 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 20 Nov 2023 02:38:42 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v8] In-Reply-To: References: <97IBSrr12htoiw751JlhL4f7jiEZeoYVF9hQjas8vrI=.a7143156-e1d5-4774-ba4b-08e29eb05389@github.com> <648SrHxCX6_kRX7cmxyGurxOWecLTWUw0_C79J_okbo=.473eaa8c-7dfc-4aa0-839d-bb580cc9d312@github.com> Message-ID: On Fri, 17 Nov 2023 13:45:14 GMT, Quan Anh Mai wrote: >> We need `token` to find it in the array, don't we? All the invocations pass such a function with two parameters. The change here needs all invocations to be changed. > > No, it can be embedded into the function object. I think you can have 2 versions, `GrowableArray::find(const E& value)` and `GrowableArray::find_if(UnaryPredicate p)`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15418#discussion_r1398608278 From fyang at openjdk.org Mon Nov 20 04:17:33 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 20 Nov 2023 04:17:33 GMT Subject: RFR: 8320272: Make method_entry_barrier address shared In-Reply-To: References: <32KZcukg8d6Fjn0bnPQIa-X4q_o8DU0XtcHYA5YT468=.c9f33a2d-215d-404c-822b-448c8fd443d4@github.com> <7FwjBlfdMsgPdHcdOer9mZsQDQ0PamT8qLzCzdq14z4=.ef428fec-764a-4a28-94f2-4ccbd9c6b3d4@github.com> Message-ID: On Fri, 17 Nov 2023 22:36:33 GMT, Vladimir Kozlov wrote: >> This seems fine, but you could explain a little more why this is useful for Leyden? I would think having StubRoutines::method_entry_barrier() would be enough, and that it could reference the existing platform-specific name, minimizing changes. I don't understand why the storage needs to be shared in StubRoutines::_method_entry_barrier, for example. > >> This seems fine, but you could explain a little more why this is useful for Leyden? I would think having StubRoutines::method_entry_barrier() would be enough, and that it could reference the existing platform-specific name, minimizing changes. I don't understand why the storage needs to be shared in StubRoutines::_method_entry_barrier, for example. > > Thank you for looking, Dean. Yes, your suggestion would work too. Leyden code calls StubRoutines::method_entry_barrier() to get address: [SCCache.cpp#L3337](https://github.com/openjdk/leyden/blob/premain/src/hotspot/share/code/SCCache.cpp#L3337) > But we would need StubRoutines::method_entry_barrier() implementation for each platform in such case. And having duplication and different names does not feel right for me ;^) @vnkozlov : Hi, I have tested this on linux-riscv platform. Result looks fine. Would you mind apply following small add-on change which adds relocation info for this platform too? Thanks. [16708-riscv.diff.txt](https://github.com/openjdk/jdk/files/13406179/16708-riscv.diff.txt) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16708#issuecomment-1818200214 From dholmes at openjdk.org Mon Nov 20 04:31:31 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 20 Nov 2023 04:31:31 GMT Subject: RFR: 8318480: Obsolete UseCounterDecay and remove CounterDecayMinIntervalLength [v4] In-Reply-To: <50YDKFPHpqCEnhBk5eBeKWpbTJIHfFpQCfOcdVE8OhE=.75b95951-2c37-4f48-9a0d-fd52251f5771@github.com> References: <50YDKFPHpqCEnhBk5eBeKWpbTJIHfFpQCfOcdVE8OhE=.75b95951-2c37-4f48-9a0d-fd52251f5771@github.com> Message-ID: On Fri, 17 Nov 2023 13:09:00 GMT, Daniel Lund?n wrote: >> This changeset deprecates the leftover (i.e., no longer used for anything) product compiler flag `UseCounterDecay` (requires CSR) and removes the leftover develop flag `CounterDecayMinIntervalLength`. >> >> Changes: >> - Deprecate `UseCounterDecay` in JDK 22, obsolete it in JDK 23, and expire it in JDK 24. The flag is, in fact, already obsolete, so I've also removed it from the source code (except for the definition in `globals.hpp` which must remain until obsoletion). >> - Completely remove `CounterDecayMinIntervalLength`. >> >> ### Testing >> Platforms: windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64. >> - `tier1` >> - HotSpot parts of `tier2` and `tier3` > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Obsolete UseCounterDecay LGTM! Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16673#pullrequestreview-1738943639 From dholmes at openjdk.org Mon Nov 20 05:57:53 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 20 Nov 2023 05:57:53 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v4] In-Reply-To: <8xx2PGeKCQDE_G0h8NVj0ZsBbYbFy8lYkrB_jKq6X5I=.f1d1735f-9760-461f-a86e-c682306298e4@github.com> References: <-DODxJdHO2qs-XXVSQSSIZZZKIfHjHKtY8kt9PpNWVs=.82a739dc-0da3-4bbc-b0de-c00ebae56c22@github.com> <8xx2PGeKCQDE_G0h8NVj0ZsBbYbFy8lYkrB_jKq6X5I=.f1d1735f-9760-461f-a86e-c682306298e4@github.com> Message-ID: <36txE6VE4sim7Y_HJkw_EXmzXfqhdTcWGOPZq8XwuME=.74b0669e-865f-4e4e-b823-fbf3a3b2ab19@github.com> On Fri, 17 Nov 2023 08:00:07 GMT, Axel Boldt-Christmas wrote: > There probably is some history here I am unaware of. But to me only one should exist, or the neutral property's meaning needs to be explained in the markWord.hpp file so it is clear what makes it distinct from unlocked. There is no distinction and there never has been. The `is_neutral` came in with some backend performance enhancements related to adaptive spinning, around the same time that biased-locking was introduced. The work was done by two different engineers. The code in synchronizer.cpp only used `is_neutral` not `is_unlocked`. But the two functions have always had identical definitions. We should get rid of `is_neutral`, but probably not in this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16603#discussion_r1398691377 From jbhateja at openjdk.org Mon Nov 20 06:12:29 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 20 Nov 2023 06:12:29 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore In-Reply-To: References: Message-ID: On Fri, 17 Nov 2023 19:58:13 GMT, Volodymyr Paprotski wrote: > Splitting vblendvp[sd] into boolean operations is bit faster on ECore, get up to 30% gain > > > =============== BEFORE =============== > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 77.766 ? 0.049 ns/op > VectorSignum.floatSignum 512 avgt 3 154.889 ? 0.242 ns/op > VectorSignum.floatSignum 1024 avgt 3 306.130 ? 0.605 ns/op > VectorSignum.floatSignum 2048 avgt 3 609.965 ? 0.927 ns/op > VectorSignum.doubleSignum 256 avgt 3 151.874 ? 1.748 ns/op > VectorSignum.doubleSignum 512 avgt 3 303.080 ? 0.310 ns/op > VectorSignum.doubleSignum 1024 avgt 3 607.517 ? 0.597 ns/op > VectorSignum.doubleSignum 2048 avgt 3 1214.282 ? 1.834 ns/op > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op > MaxMinOptimizeTest.dMax avgt 3 137.334 ? 0.128 us/op > MaxMinOptimizeTest.dMin avgt 3 137.160 ? 0.465 us/op > MaxMinOptimizeTest.dMul avgt 3 77.231 ? 0.051 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.165 ? 0.003 us/op > MaxMinOptimizeTest.fMax avgt 3 107.428 ? 1.501 us/op > MaxMinOptimizeTest.fMin avgt 3 107.186 ? 0.022 us/op > MaxMinOptimizeTest.fMul avgt 3 77.164 ? 0.012 us/op > > =============== AFTER =============== > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 61.816 ? 1.980 ns/op > VectorSignum.floatSignum 512 avgt 3 117.251 ? 0.052 ns/op > VectorSignum.floatSignum 1024 avgt 3 231.356 ? 0.397 ns/op > VectorSignum.floatSignum 2048 avgt 3 458.904 ? 0.774 ns/op > VectorSignum.doubleSignum 256 avgt 3 121.449 ? 0.184 ns/op > VectorSignum.doubleSignum 512 avgt 3 241.662 ? 0.189 ns/op > VectorSignum.doubleSignum 1024 avgt 3 482.365 ? 0.165 ns/op > VectorSignum.doubleSignum 2048 avgt 3 962.412 ? 1.401 ns/op > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op > MaxMinOptimizeTest.dMax avgt 3 125.701 ? 0.082 us/op > MaxMinOptimizeTest.dMin avgt 3 124.704 ? 0.119 us/op > MaxMinOptimizeTest.dMul avgt 3 77.232 ? 0.028 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.169 ? 0.103 us/op > MaxMinOptimizeTest.fMax avgt 3 97.939 ? 0.477 us/op > MaxMinOptimizeTest.fMin avgt 3 98.012 ? 0.154 us/op > MaxMinOptimizeTest.fMul avgt 3 77.174 ? 0.012 us/op Hi @vpaprotsk , please add checks to skip special emulation for 128 bit vectors at all applicable places, as per section "4.1.8.4 256-bit Variable Blend Instructions" of x86 optimization manual variable blends are micro-coded only for 256 bit vectors. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16716#issuecomment-1818285796 From thartmann at openjdk.org Mon Nov 20 06:26:31 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 20 Nov 2023 06:26:31 GMT Subject: RFR: 8318480: Obsolete UseCounterDecay and remove CounterDecayMinIntervalLength [v4] In-Reply-To: <50YDKFPHpqCEnhBk5eBeKWpbTJIHfFpQCfOcdVE8OhE=.75b95951-2c37-4f48-9a0d-fd52251f5771@github.com> References: <50YDKFPHpqCEnhBk5eBeKWpbTJIHfFpQCfOcdVE8OhE=.75b95951-2c37-4f48-9a0d-fd52251f5771@github.com> Message-ID: On Fri, 17 Nov 2023 13:09:00 GMT, Daniel Lund?n wrote: >> This changeset deprecates the leftover (i.e., no longer used for anything) product compiler flag `UseCounterDecay` (requires CSR) and removes the leftover develop flag `CounterDecayMinIntervalLength`. >> >> Changes: >> - Deprecate `UseCounterDecay` in JDK 22, obsolete it in JDK 23, and expire it in JDK 24. The flag is, in fact, already obsolete, so I've also removed it from the source code (except for the definition in `globals.hpp` which must remain until obsoletion). >> - Completely remove `CounterDecayMinIntervalLength`. >> >> ### Testing >> Platforms: windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64. >> - `tier1` >> - HotSpot parts of `tier2` and `tier3` > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Obsolete UseCounterDecay Still good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16673#pullrequestreview-1739029746 From jbhateja at openjdk.org Mon Nov 20 06:33:31 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 20 Nov 2023 06:33:31 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore In-Reply-To: References: Message-ID: On Fri, 17 Nov 2023 19:58:13 GMT, Volodymyr Paprotski wrote: > Splitting vblendvp[sd] into boolean operations is bit faster on ECore, get up to 30% gain > > > =============== BEFORE =============== > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 77.766 ? 0.049 ns/op > VectorSignum.floatSignum 512 avgt 3 154.889 ? 0.242 ns/op > VectorSignum.floatSignum 1024 avgt 3 306.130 ? 0.605 ns/op > VectorSignum.floatSignum 2048 avgt 3 609.965 ? 0.927 ns/op > VectorSignum.doubleSignum 256 avgt 3 151.874 ? 1.748 ns/op > VectorSignum.doubleSignum 512 avgt 3 303.080 ? 0.310 ns/op > VectorSignum.doubleSignum 1024 avgt 3 607.517 ? 0.597 ns/op > VectorSignum.doubleSignum 2048 avgt 3 1214.282 ? 1.834 ns/op > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op > MaxMinOptimizeTest.dMax avgt 3 137.334 ? 0.128 us/op > MaxMinOptimizeTest.dMin avgt 3 137.160 ? 0.465 us/op > MaxMinOptimizeTest.dMul avgt 3 77.231 ? 0.051 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.165 ? 0.003 us/op > MaxMinOptimizeTest.fMax avgt 3 107.428 ? 1.501 us/op > MaxMinOptimizeTest.fMin avgt 3 107.186 ? 0.022 us/op > MaxMinOptimizeTest.fMul avgt 3 77.164 ? 0.012 us/op > > =============== AFTER =============== > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 61.816 ? 1.980 ns/op > VectorSignum.floatSignum 512 avgt 3 117.251 ? 0.052 ns/op > VectorSignum.floatSignum 1024 avgt 3 231.356 ? 0.397 ns/op > VectorSignum.floatSignum 2048 avgt 3 458.904 ? 0.774 ns/op > VectorSignum.doubleSignum 256 avgt 3 121.449 ? 0.184 ns/op > VectorSignum.doubleSignum 512 avgt 3 241.662 ? 0.189 ns/op > VectorSignum.doubleSignum 1024 avgt 3 482.365 ? 0.165 ns/op > VectorSignum.doubleSignum 2048 avgt 3 962.412 ? 1.401 ns/op > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op > MaxMinOptimizeTest.dMax avgt 3 125.701 ? 0.082 us/op > MaxMinOptimizeTest.dMin avgt 3 124.704 ? 0.119 us/op > MaxMinOptimizeTest.dMul avgt 3 77.232 ? 0.028 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.169 ? 0.103 us/op > MaxMinOptimizeTest.fMax avgt 3 97.939 ? 0.477 us/op > MaxMinOptimizeTest.fMin avgt 3 98.012 ? 0.154 us/op > MaxMinOptimizeTest.fMul avgt 3 77.174 ? 0.012 us/op Hi @vpaprotsk , please add checks to skip special emulation for 128 bit vectors at applicable places, as per section "4.1.8.4 256-bit Variable Blend Instructions" of x86 optimization manual variable blends are micro-coded only for 256 bit vectors. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16716#issuecomment-1818304058 From aboldtch at openjdk.org Mon Nov 20 07:36:26 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 20 Nov 2023 07:36:26 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v6] In-Reply-To: References: Message-ID: > LM_LIGHTWEIGHT only uses the lock bits for its locking. This leaves the hashCode bits free when a monitor is not inflated. So instead of inflating when installing the hashCode on a fast locked object it can simply use the hashCode bits in the markWord. > > The mark word transitions Unlocked (0b01) <=> Locked (0b00) are done by retrying the CAS if it fails due to non-lock bit changes. > The mark word transitions Monitor (0b10) <=> Locked/Unlocked (0b0X) are the same as before, inflation already handles hash codes. This change does not interact with the mark word if it is in a Monitor (0b10) state, so the strong CAS which is used for deflation are still valid, and will not fail to any other reason than the cooperative race to help transition the mark word during deflation. > > This is dependent on JDK-8319778 simply because JDK-8319797 is dependent on both this and JDK-8319778. Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Add retry CAS comment - Use is_neutral over is_unlocked - Merge remote-tracking branch 'upstream_jdk/pr/16602' into JDK-8319773 - Use more familiar CAS variable names and pattern - Move is_lock_owned closer to its only use - Merge remote-tracking branch 'upstream_jdk/pr/16602' into JDK-8319773 - Simplify test. - 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16603/files - new: https://git.openjdk.org/jdk/pull/16603/files/6fbdc689..fdbfbf8a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16603&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16603&range=04-05 Stats: 15770 lines in 365 files changed: 7738 ins; 5445 del; 2587 mod Patch: https://git.openjdk.org/jdk/pull/16603.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16603/head:pull/16603 PR: https://git.openjdk.org/jdk/pull/16603 From aboldtch at openjdk.org Mon Nov 20 07:45:53 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 20 Nov 2023 07:45:53 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v5] In-Reply-To: References: Message-ID: <-3n8d5mN8sg-cJeWFReJM8aFh2fBV-KymBOt-elt7S8=.d0f9ec78-e8b4-4fd6-9dea-cbfb33506ecf@github.com> > Implements the runtime part of JDK-8319796. > The different CPU implementations are/will be created as dependent pull requests. > > This enhancement proposes introducing the ability for LM_LIGHTWEIGHT to handle consecutive recursive monitor enter. Limiting the implementation to only consecutive monitor enters allows for more efficient emitted code which only needs to look at the two top most entires on the lock stack to determine what to do in a monitor exit. > > A high level overview: > * Locking is still performed on the mark word > * Unlocked (0b01) <=> Locked (0b00) > * Monitor enter on Obj with mark word Unlocked (0b01) is the same > * Transition Obj's mark word Unlocked (0b01) => Locked (0b00) > * Push Obj onto the lock stack > * Success > * Monitor enter on Obj with mark word Locked (0b00) will check the top entry on the lock stack > * If top entry is Obj > * Push Obj on the lock stack > * Success > * If top entry is not Obj > * Inflate and call ObjectMonitor::enter > * Monitor exit on Obj with mark word Locked (0b00) will check the two top entries on the lock stack > * If just the top entry is Obj > * Transition Obj's mark word Locked (0b00) => Unlocked (0b01) > * Pop the entry > * Success > * If both entries are Obj > * Pop the top entry > * Success > * Any other case only occurs for unstructured locking, then just inflate and call ObjectMonitor::exit > * If the monitor has been inflated for object Obj which is owned by the current thread > * All corresponding entries for Obj is removed from the lock stack > * The monitor recursions is set to the number of removed entries - 1 > * The owner is changed from anonymous to the thread > * The regular ObjectMonitor::action is called. Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge remote-tracking branch 'upstream_jdk/pr/16603' into JDK-8319797 - Merge remote-tracking branch 'upstream_jdk/pr/16603' into JDK-8319797 - Fix nit - Fix comment typos - 8319797: Recursive lightweight locking: Runtime implementation ------------- Changes: https://git.openjdk.org/jdk/pull/16606/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16606&range=04 Stats: 665 lines in 10 files changed: 633 ins; 10 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/16606.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16606/head:pull/16606 PR: https://git.openjdk.org/jdk/pull/16606 From aboldtch at openjdk.org Mon Nov 20 08:57:53 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 20 Nov 2023 08:57:53 GMT Subject: RFR: 8319799: Recursive lightweight locking: x86 implementation [v5] In-Reply-To: References: Message-ID: <2eFdYFCzibViG2KJHrpTmRRqJPn41eR2kfv0RY6Bf10=.ff11d592-6af6-4f54-ae87-0a81a45a4761@github.com> > Implements the x86 port of JDK-8319796. > > There are two major parts for the port implementation. The C2 part, and the part shared by the interpreter, C1 and the native call wrapper. > > The biggest change for both parts is that we check the lock stack first and if it is a recursive lightweight [un]lock and in that case simply pop/push and finish successfully. > > Only if the recursive lightweight [un]lock fails does it look at the mark word. > > For the shared part if it is an unstructured exit, the monitor is inflated or the mark word transition fails it calls into the runtime. > > The C2 operates under a few more assumptions, that the locking is structured and balanced. This means that some checks can be elided. > > First this means that in C2 unlock if the obj is not on the top of the lock stack, it must be inflated. And reversely if we reach the inflated C2 unlock the obj is not on the lock stack. This second property makes it possible to avoid reading the owner (and checking if it is anonymous). Instead it can either just do an un-contended unlock by writing null to the owner, or if contention happens, simply write the thread to the owner and jump to the runtime. > > The x86 C2 port also has some extra oddities. > > The mark word read is done early as it showed better scaling in hyper-threaded scenarios on certain intel hardware, and no noticeable downside on other tested x86 hardware. > > The fast path is written to avoid going through conditional branches. This in combination with keeping the ZF output correct, the code does some actions eagerly, decrementing the held monitor count, popping from the lock stack. And jumps to a code stub if a slow path is required which restores the thread local state to a correct state before jumping to the runtime. > > The contended unlock was also moved to the code stub. Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge remote-tracking branch 'upstream_jdk/pr/16606' into JDK-8319799 - top load adjustments - Merge remote-tracking branch 'upstream_jdk/pr/16606' into JDK-8319799 - Fix type - Move inflated check in fast_locked - Move top load - 8319799: Recursive lightweight locking: x86 implementation - Cleanup: C2 fast_lock/fast_unlock x86 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16607/files - new: https://git.openjdk.org/jdk/pull/16607/files/37d1a0d6..44211e7b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16607&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16607&range=03-04 Stats: 15791 lines in 365 files changed: 7746 ins; 5453 del; 2592 mod Patch: https://git.openjdk.org/jdk/pull/16607.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16607/head:pull/16607 PR: https://git.openjdk.org/jdk/pull/16607 From aboldtch at openjdk.org Mon Nov 20 09:01:47 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 20 Nov 2023 09:01:47 GMT Subject: RFR: 8319801: Recursive lightweight locking: aarch64 implementation [v3] In-Reply-To: References: Message-ID: > Implements the aarch64 port of JDK-8319796. > > There are two major parts for the port implementation. The C2 part, and the part shared by the interpreter, C1 and the native call wrapper. > > The biggest change for both parts is that we check the lock stack first and if it is a recursive lightweight [un]lock and in that case simply pop/push and finish successfully. > > Only if the recursive lightweight [un]lock fails does it look at the mark word. > > For the shared part if it is an unstructured exit, the monitor is inflated or the mark word transition fails it calls into the runtime. > > The C2 operates under a few more assumptions, that the locking is structured and balanced. This means that some checks can be elided. > > First this means that in C2 unlock if the obj is not on the top of the lock stack, it must be inflated. And reversely if we reach the inflated C2 unlock the obj is not on the lock stack. This second property makes it possible to avoid reading the owner (and checking if it is anonymous). Instead it can either just do an un-contended unlock by writing null to the owner, or if contention happens, simply write the thread to the owner and jump to the runtime. > > The aarch64 C2 port tries to avoid stronger memory semantics where ever possible. In C2 lock it first does a relaxed load of the mark word to check for inflation. Both lock and unlock uses a load/store exclusive register pair to transition the mark word. Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge remote-tracking branch 'upstream_jdk/pr/16606' into JDK-8319801 - Merge remote-tracking branch 'upstream_jdk/pr/16606' into JDK-8319801 - 8319801: Recursive lightweight locking: aarch64 implementation - Cleanup: C2 fast_lock/fast_unlock aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16608/files - new: https://git.openjdk.org/jdk/pull/16608/files/1e7a586c..5bc0d0ad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16608&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16608&range=01-02 Stats: 15791 lines in 365 files changed: 7746 ins; 5453 del; 2592 mod Patch: https://git.openjdk.org/jdk/pull/16608.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16608/head:pull/16608 PR: https://git.openjdk.org/jdk/pull/16608 From rehn at openjdk.org Mon Nov 20 09:03:32 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 20 Nov 2023 09:03:32 GMT Subject: RFR: 8318159: RISC-V: Improve itable_stub In-Reply-To: References: Message-ID: On Tue, 14 Nov 2023 15:01:51 GMT, Yuri Gaevsky wrote: > Please review the change for RISC-V similar to #13792(AARCH64) and #13460(X86). > > From #13792: > The change replaces two separate iterations over the itable with new algorithm > consisting of two loops. First, we look for a match with resolved_klass, > checking for a match with holder_klass along the way. Then we continue iterating > (not starting over) the itable using the second loop, checking only for a match > with holder_klass. > > ### Correctness checks > > Testing: tier1 tests successfully passed on HiFive Unmatched board. > > #### Performance results on RISC-V StarFive JH7110 board: > > > InterfaceCalls: before fix after fix > ------------------------------------------------------------------- > Benchmark Mode Cnt Score Error Score Error Units > ------------------------------------------------------------------- > test1stInt2Types avgt 100 14.380 ? 0.017 | 14.370 ? 0.014 ns/op > test1stInt3Types avgt 100 72.724 ? 0.552 | 66.290 ? 0.080 ns/op > test1stInt5Types avgt 100 73.948 ? 0.524 | 68.781 ? 0.377 ns/op > test2ndInt2Types avgt 100 15.705 ? 0.016 | 15.707 ? 0.018 ns/op > test2ndInt3Types avgt 100 82.370 ? 0.453 | 75.363 ? 0.156 ns/op > test2ndInt5Types avgt 100 85.266 ? 0.466 | 80.969 ? 0.752 ns/op > testIfaceCall avgt 100 75.684 ? 0.648 | 72.603 ? 0.460 ns/op > testIfaceExtCall avgt 100 86.293 ? 0.567 | 77.939 ? 0.340 ns/op > testMonomorphic avgt 100 11.357 ? 0.007 | 11.359 ? 0.009 ns/op > ------------------------------------------------------------------- > > > #### Performance results on RISC-V HiFive Unmatched board: > > > InterfaceCalls: before fix after fix > --------------------------------------------------------------------- > Benchmark Mode Cnt Score Error Score Error Units > --------------------------------------------------------------------- > test1stInt2Types avgt 100 24.432 ? 1.811 | 23.205 ? 1.512 ns/op > test1stInt3Types avgt 100 135.800 ? 3.991 | 127.112 ? 2.299 ns/op > test1stInt5Types avgt 100 141.746 ? 4.272 | 136.069 ? 4.919 ns/op > test2ndInt2Types avgt 100 31.474 ? 2.468 | 26.978 ? 1.951 ns/op > test2ndInt3Types avgt 100 146.410 ? 3.575 | 139.443 ? 3.677 ns/op > test2ndInt5Types avgt 100 156.083 ? 3.617 | 150.583 ? 2.909 ns/op > testIfaceCall avgt 100 136.392 ? 2.546 | 129.632 ? 1.662 ns/op > testIfaceExtCall avgt 100 155.602 ? 3.836 | 138.058 ? 2.147 ns/op > testMonomorphic avgt 100 24.018 ? 1.888 | 21.522 ? 1.662 ns/op > ---------... Marked as reviewed by rehn (Reviewer). src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 2548: > 2546: int itmentry_off_bytes = in_bytes(itableMethodEntry::method_offset()); > 2547: int vte_size_bytes = vtableEntry::size_in_bytes(); > 2548: const int vte_scale = 3; exact_log2(vtableEntry::size_in_bytes()) src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 2557: > 2555: // + sizeof(vtableEntry) * (recv_klass->_vtable_len); > 2556: // temp_itbl_klass = itable[0]._interface; > 2557: assert(vte_size_bytes == wordSize, "else adjust vte_scale"); exact_log2 have assert so you can then remove this one. ------------- PR Review: https://git.openjdk.org/jdk/pull/16657#pullrequestreview-1739226219 PR Review Comment: https://git.openjdk.org/jdk/pull/16657#discussion_r1398827699 PR Review Comment: https://git.openjdk.org/jdk/pull/16657#discussion_r1398831298 From mbaesken at openjdk.org Mon Nov 20 09:19:38 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 20 Nov 2023 09:19:38 GMT Subject: RFR: JDK-8320383: refresh libraries cache on AIX in VMError::report Message-ID: VMError::report outputs information about loaded shared libraries; but this info could be outdated on AIX because the libraries cache is currently not refreshed before printing. This is similar to [JDK-8318587](https://bugs.openjdk.org/browse/JDK-8318587) . The refresh in VMError::report could be omitted in some situations where the refresh call is considered problematic. ------------- Commit messages: - remove print output - JDK-8320383 Changes: https://git.openjdk.org/jdk/pull/16730/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16730&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320383 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16730.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16730/head:pull/16730 PR: https://git.openjdk.org/jdk/pull/16730 From shade at openjdk.org Mon Nov 20 09:47:34 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 20 Nov 2023 09:47:34 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v6] In-Reply-To: References: <9a-07LkAZPw_SiOpo1uO8uA9i66AVNi9OkJjNZfSHU0=.8f4feee9-2b76-4006-a5ca-f0ce2d71c6b0@github.com> Message-ID: On Wed, 15 Nov 2023 22:54:14 GMT, Patricio Chilano Mateo wrote: >> @pchilano can you have look ? > >> @pchilano can you have look ? >> > I will. I might not finish the review until next week though. All right, thanks all. I think we are waiting for @pchilano's review, and then we can integrate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1818695162 From shade at openjdk.org Mon Nov 20 09:47:40 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 20 Nov 2023 09:47:40 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v6] In-Reply-To: References: <9a-07LkAZPw_SiOpo1uO8uA9i66AVNi9OkJjNZfSHU0=.8f4feee9-2b76-4006-a5ca-f0ce2d71c6b0@github.com> Message-ID: <78FmfXM1nIC0UkDbOz9uneQk9ADKObIZCIm-y5tT5_4=.ccc39e10-aa1a-4d7f-97f2-804c1bf602be@github.com> On Sun, 19 Nov 2023 08:09:39 GMT, Ivan Walulya wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: >> >> - Drop the Linux check in preparation for integration >> - Merge branch 'master' into JDK-8318986-generic-wait-barrier >> - Merge branch 'master' into JDK-8318986-generic-wait-barrier >> - Rework paddings >> - Encode barrier tag into state, resolving another race condition >> - Simple review feedback fixes: tracking wakeup numbers, reflowing some methods >> - Merge branch 'master' into JDK-8318986-generic-wait-barrier >> - Touchups >> - More comments work >> - Tight up the comments >> - ... and 3 more: https://git.openjdk.org/jdk/compare/de6f765d...191c0dbb > > src/hotspot/share/utilities/waitBarrier_generic.cpp line 119: > >> 117: void GenericWaitBarrier::Cell::arm(int32_t requested_tag) { >> 118: // Before we continue to arm, we need to make sure that all threads >> 119: // have left the previous cell. > > // have left after the previous usage of the cell. Thanks, but I think current wording is better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16404#discussion_r1398915987 From stuefe at openjdk.org Mon Nov 20 10:00:01 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 20 Nov 2023 10:00:01 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v13] In-Reply-To: References: Message-ID: On Fri, 17 Nov 2023 21:02:50 GMT, Matias Saavedra Silva wrote: >> Noted, we'll follow up with the arm32 fix a little later. Thanks Matias! > >> Noted, we'll follow up with the arm32 fix a little later. Thanks Matias! > > Thanks for the confirmation @voitylov, I look forward to the port! @matias9927 Having the arm *build* broken is really bad. It's one thing if the VM is dead on arrival, but this shows up in everyone's GHA as a red flag. It teaches people to ignore GHAs. We require clean GHAs from outside developers; I think this sets a bad precedence. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15455#issuecomment-1818718600 From stuefe at openjdk.org Mon Nov 20 10:09:42 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 20 Nov 2023 10:09:42 GMT Subject: RFR: JDK-8320382: Remove CompressedKlassPointers::is_valid_base() Message-ID: `CompressedKlassPointers::is_valid_base(addr)` abstracts away platform-specific requirements that may limit the use of an address as narrow Klass encoding base. It only ever mattered on aarch64, where we cannot use any arbitrary address as 64-bit immediate for the base. Experience shows that this is a case where the abstraction does not help much. Hiding a very CPU-specific limitation under a generic function made arguing about it difficult. We therefore decided to scrap that function. It is only used for two things: - asserts at runtime; those are unnecessary since we have an assert in macroAssembler_aarch64.cpp that will fire if the base is not correct - the one legitimate use case is checking the user input for -XX:SharedBaseAddress at dump time. We can just express the aarch64 requirement directly, which is clearer to understand. Note that the function has also been incorrect, since it ignored aarch64 EOR mode, and required 32GB alignment for addresses beyond 32GB. However, we can make any 4GB aligned address to work with movk, so the requirement can be simplified to "is 4GB-aligned". (this is a preparatory patch for [JDK-8320368](https://bugs.openjdk.org/browse/JDK-8320368)) ------------- Commit messages: - JDK-8320382-Remove-CompressedKlassPointers-is_valid_base Changes: https://git.openjdk.org/jdk/pull/16727/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16727&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320382 Stats: 40 lines in 3 files changed: 5 ins; 30 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16727.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16727/head:pull/16727 PR: https://git.openjdk.org/jdk/pull/16727 From stuefe at openjdk.org Mon Nov 20 10:09:43 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 20 Nov 2023 10:09:43 GMT Subject: RFR: JDK-8320382: Remove CompressedKlassPointers::is_valid_base() In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 07:47:16 GMT, Thomas Stuefe wrote: > `CompressedKlassPointers::is_valid_base(addr)` abstracts away platform-specific requirements that may limit the use of an address as narrow Klass encoding base. It only ever mattered on aarch64, where we cannot use any arbitrary address as 64-bit immediate for the base. > > Experience shows that this is a case where the abstraction does not help much. Hiding a very CPU-specific limitation under a generic function made arguing about it difficult. We therefore decided to scrap that function. > > It is only used for two things: > - asserts at runtime; those are unnecessary since we have an assert in macroAssembler_aarch64.cpp that will fire if the base is not correct > - the one legitimate use case is checking the user input for -XX:SharedBaseAddress at dump time. We can just express the aarch64 requirement directly, which is clearer to understand. > > Note that the function has also been incorrect, since it ignored aarch64 EOR mode, and required 32GB alignment for addresses beyond 32GB. However, we can make any 4GB aligned address to work with movk, so the requirement can be simplified to "is 4GB-aligned". > > (this is a preparatory patch for [JDK-8320368](https://bugs.openjdk.org/browse/JDK-8320368)) arm breakage unrelated, see because of https://github.com/openjdk/jdk/pull/15455 Ping @theRealAph ------------- PR Comment: https://git.openjdk.org/jdk/pull/16727#issuecomment-1818737840 PR Comment: https://git.openjdk.org/jdk/pull/16727#issuecomment-1818739851 From sspitsyn at openjdk.org Mon Nov 20 10:40:33 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 20 Nov 2023 10:40:33 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 [v3] In-Reply-To: <-CGha1yFmQNPbT7s6BtZ0iJFxmPgzoSnozx4pgIZlA4=.77aa51ba-ebcb-419a-9651-dbadb8ef9e91@github.com> References: <1P_TddTTM1eH75Do2Xq-wBrxXSdh7GzJJlgEBH_dSNo=.94392ab2-12a1-4d1b-9131-6164bbb76e7d@github.com> <-CGha1yFmQNPbT7s6BtZ0iJFxmPgzoSnozx4pgIZlA4=.77aa51ba-ebcb-419a-9651-dbadb8ef9e91@github.com> Message-ID: On Mon, 20 Nov 2023 00:59:35 GMT, David Holmes wrote: >> The stack trace of current thread (where the assert was fired) can explain what is going on a little bit: >> >> Current thread (0x00007f93f8043d00): JavaThread "ForkJoinPool-1-worker-1" daemon [_thread_in_vm, id=16779, stack(0x00007f948a597000,0x00007f948a697000) (1024K)] >> >> Stack: [0x00007f948a597000,0x00007f948a697000], sp=0x00007f948a6949e0, free space=1014k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0x117937d] JvmtiEventControllerPrivate::enter_interp_only_mode(JvmtiThreadState*)+0x45d (jvmtiEventController.cpp:402) >> V [libjvm.so+0x1179520] JvmtiEventControllerPrivate::recompute_thread_enabled(JvmtiThreadState*) [clone .part.0]+0x190 (jvmtiEventController.cpp:632) >> V [libjvm.so+0x117a1e1] JvmtiEventControllerPrivate::thread_started(JavaThread*)+0x351 (jvmtiEventController.cpp:1174) >> V [libjvm.so+0x117e608] JvmtiExport::get_jvmti_thread_state(JavaThread*)+0x98 (jvmtiExport.cpp:424) >> V [libjvm.so+0x118a86c] JvmtiExport::post_field_access(JavaThread*, Method*, unsigned char*, Klass*, Handle, _jfieldID*)+0x6c (jvmtiExport.cpp:2214) >> V [libjvm.so+0x118b3a1] JvmtiExport::post_field_access_by_jni(JavaThread*, oop, Klass*, _jfieldID*, bool)+0x321 (jvmtiExport.cpp:2202) >> V [libjvm.so+0x118b4e9] JvmtiExport::jni_GetField_probe(JavaThread*, _jobject*, oop, Klass*, _jfieldID*, bool)+0x79 (jvmtiExport.cpp:2168) >> V [libjvm.so+0xf83847] jni_GetStaticBooleanField+0x257 (jni.cpp:2047) >> C [libfieldacc02.so+0x379b] Java_fieldacc02_check+0x6b (jni.h:1546) >> j fieldacc02.check(Ljava/lang/Object;)I+0 >> j fieldacc02.lambda$testVirtualThread$0()V+12 >> j fieldacc02$$Lambda+0x00007f943b001428.run()V+0 >> j java.lang.Thread.runWith(Ljava/lang/Object;Ljava/lang/Runnable;)V+5 java.base at 22-internal >> j java.lang.VirtualThread.run(Ljava/lang/Runnable;)V+66 java.base at 22-internal >> j java.lang.VirtualThread$VThreadContinuation$1.run()V+8 java.base at 22-internal >> j jdk.internal.vm.Continuation.enter0()V+4 java.base at 22-internal >> j jdk.internal.vm.Continuation.enter(Ljdk/internal/vm/Continuation;Z)V+1 java.base at 22-internal >> J 124 jdk.internal.vm.Continuation.enterSpecial(Ljdk/internal/vm/Continuation;ZZ)V java.base at 22-internal (0 bytes) @ 0x00007f94c7cdf744 [0x00007f94c7cdf5e0+0x0000000000000164] >> j jdk.internal.vm.Continuation.run()V+122 java.base at 22-internal >> j java.lang.VirtualThread.runContinuation()V+70 java.base at 22-internal >> j java.lang.VirtualThread$$Lambda+0x00007f943b049... > > Just to re-iterate what Dan was saying, the TLH is only of use if you are accessing threads known to be included in the TLH. The issue occurred to be with the current thread which has to be safe to access without a TLH. However, there is this guarantee in the `Hanshake::execute()`: void Handshake::execute(HandshakeClosure* hs_cl, ThreadsListHandle* tlh, JavaThread* target) { . . . guarantee(target != nullptr, "must be"); if (tlh == nullptr) { guarantee(Thread::is_JavaThread_protected_by_TLH(target), "missing ThreadsListHandle in calling context."); This guarantee fired for current thread is a source of confusion and mistakes. Would it be a right thing to correct it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1398993572 From duke at openjdk.org Mon Nov 20 12:05:46 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Mon, 20 Nov 2023 12:05:46 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v6] In-Reply-To: References: Message-ID: > Hello All, > > Please review these changes to support _vectorizedHashCode intrinsic on > RISC-V platform. The patch adds the "scalar" code for the intrinsic without > usage of any RVV instruction but provides manual unrolling of the appropriate > loop. The code with usage of RVV instruction could be added as follow-up of > the patch or independently. > > Thanks, > -Yuri Gaevsky > > P.S. My OCA has been accepted recently (ygaevsky). > > ### Correctness checks > > Testing: tier1 tests successfully passed on a RISC-V StarFive JH7110 board with Linux. > > ### Performance results (the numbers for non-ints are similar) > > #### StarFive JH7110 board: > > > ArraysHashCode: without intrinsic with intrinsic > ------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > ------------------------------------------------------------------------------- > multiints 0 avgt 30 2.658 ? 0.001 2.661 ? 0.004 ns/op > multiints 1 avgt 30 4.881 ? 0.011 4.892 ? 0.015 ns/op > multiints 2 avgt 30 16.109 ? 0.041 10.451 ? 0.075 ns/op > multiints 3 avgt 30 14.873 ? 0.068 11.753 ? 0.024 ns/op > multiints 4 avgt 30 17.283 ? 0.078 13.176 ? 0.044 ns/op > multiints 5 avgt 30 19.691 ? 0.136 14.723 ? 0.046 ns/op > multiints 6 avgt 30 21.727 ? 0.166 15.463 ? 0.124 ns/op > multiints 7 avgt 30 23.790 ? 0.126 18.298 ? 0.059 ns/op > multiints 8 avgt 30 23.527 ? 0.116 18.267 ? 0.046 ns/op > multiints 9 avgt 30 27.981 ? 0.303 20.453 ? 0.069 ns/op > multiints 10 avgt 30 26.947 ? 0.215 20.541 ? 0.051 ns/op > multiints 50 avgt 30 95.373 ? 0.588 69.238 ? 0.208 ns/op > multiints 100 avgt 30 177.109 ? 0.525 137.852 ? 0.417 ns/op > multiints 200 avgt 30 341.074 ? 1.363 296.832 ? 0.725 ns/op > multiints 500 avgt 30 847.993 ? 1.713 752.415 ? 1.918 ns/op > multiints 1000 avgt 30 1610.199 ? 5.424 1426.112 ? 3.407 ns/op > multiints 10000 avgt 30 16234.260 ? 26.789 14447.936 ? 26.345 ns/op > multiints 100000 avgt 30 170726.025 ? 184.003 152587.649 ? 381.964 ns/op > ------------------------------------------------------------------------------- > > #### T-Head RVB-ICE board: > > > ArraysHashCode: ... Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: Changed explicit registers name (iRegP_RXX/iRegI_RXX) for ary, cnt and result to their iRegPNoSp/iRegINoSp counterparts. Changed iRegINoSp->iRegLNoSp for tmp1/tmp4 as they can contain 64-bit values. Changed effects USE_KILL->USE for ary, removed effect for cnt. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16629/files - new: https://git.openjdk.org/jdk/pull/16629/files/70768898..af940acd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16629&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16629&range=04-05 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16629.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16629/head:pull/16629 PR: https://git.openjdk.org/jdk/pull/16629 From duke at openjdk.org Mon Nov 20 12:10:30 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Mon, 20 Nov 2023 12:10:30 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v2] In-Reply-To: References: <9i5yHmpRi3-XqL5lw0-0IexhCDr2FOi5nT4dgY7cWao=.ab8a1d6e-c9fc-4108-820b-374ce7815463@github.com> <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> Message-ID: On Fri, 17 Nov 2023 21:39:33 GMT, Yuri Gaevsky wrote: >> Seems to me it's not necessary to specify the registers. Can you try it? > > Sure, let me check . Done in [this commit](https://github.com/openjdk/jdk/pull/16629/commits/af940acd365677ec3c29a8f066b68b753ad362e4). I've tried the usage of iRegP/iRegI but that caused of the related failure (JVM even didn'r start). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1399112029 From aleonard at openjdk.org Mon Nov 20 12:17:59 2023 From: aleonard at openjdk.org (Andrew Leonard) Date: Mon, 20 Nov 2023 12:17:59 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v13] In-Reply-To: References: Message-ID: <5LSCyDB9YY7eJYiH2Ii2MUnQO9SRA6cfNlYx1jAyn9s=.633bb8c7-1215-481d-a300-c3c66290f11b@github.com> On Wed, 15 Nov 2023 16:44:15 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64, RISCV, PPC, S390 > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > S390 port Adoptium also unable to build jdk-22 ARM32, I have raised a bug: https://bugs.openjdk.org/browse/JDK-8320402 ------------- PR Comment: https://git.openjdk.org/jdk/pull/15455#issuecomment-1818955423 From stuefe at openjdk.org Mon Nov 20 12:49:30 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 20 Nov 2023 12:49:30 GMT Subject: RFR: JDK-8320383: refresh libraries cache on AIX in VMError::report In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 09:14:29 GMT, Matthias Baesken wrote: > VMError::report outputs information about loaded shared libraries; but this info could be outdated on AIX because the libraries cache is currently not refreshed before printing. > This is similar to [JDK-8318587](https://bugs.openjdk.org/browse/JDK-8318587) . > The refresh in VMError::report could be omitted in some situations where the refresh call is considered problematic. Why not just do this in the AIX specific `os::print_dll_info`? Otherwise, I'd be in favor of finding a reasonable OS abstraction for this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16730#issuecomment-1819001496 From mbaesken at openjdk.org Mon Nov 20 13:34:32 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 20 Nov 2023 13:34:32 GMT Subject: RFR: JDK-8320383: refresh libraries cache on AIX in VMError::report In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 12:46:48 GMT, Thomas Stuefe wrote: > Why not just do this in the AIX specific `os::print_dll_info`? > > Otherwise, I'd be in favor of finding a reasonable OS abstraction for this. Hi Thomas, I think we need (on AIX) the info from the loaded lib cache also for printing some info of the native stack, not only for the dll/shared libs section. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16730#issuecomment-1819071599 From stuefe at openjdk.org Mon Nov 20 13:40:30 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 20 Nov 2023 13:40:30 GMT Subject: RFR: JDK-8320383: refresh libraries cache on AIX in VMError::report In-Reply-To: References: Message-ID: <2nsZJNVtl2FQhb1soV9PD3kaWCyMjb3YQND2N7iCTSU=.d1d49865-6034-4f10-b024-c9c775ba356e@github.com> On Mon, 20 Nov 2023 13:32:07 GMT, Matthias Baesken wrote: > > Why not just do this in the AIX specific `os::print_dll_info`? > > Otherwise, I'd be in favor of finding a reasonable OS abstraction for this. > > Hi Thomas, I think we need (on AIX) the info from the loaded lib cache also for printing some info of the native stack, not only for the dll/shared libs section. Then lets abstract this into something like `os::prepare_native_symbols()` or something similar. Could use this on Windows too to update the loaded modules list. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16730#issuecomment-1819081047 From mdoerr at openjdk.org Mon Nov 20 13:43:40 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 20 Nov 2023 13:43:40 GMT Subject: RFR: JDK-8320300: Adjust hs_err output in malloc/mmap error cases In-Reply-To: <6TiJAzltmbdd02CR3DYfKTC1aw4VI2SIgOh4vtR2TxI=.9e372e5a-9017-4532-a16a-48f6ba61385a@github.com> References: <6TiJAzltmbdd02CR3DYfKTC1aw4VI2SIgOh4vtR2TxI=.9e372e5a-9017-4532-a16a-48f6ba61385a@github.com> Message-ID: On Fri, 17 Nov 2023 13:53:53 GMT, Matthias Baesken wrote: > Some of the error output could be slightly improved. Currently it says for example: > > There is insufficient memory for the Java Runtime Environment to continue. > Native memory allocation (mmap) failed to map 65536 bytes for Failed to commit metaspace. > Possible reasons: > . . . > The process is running with CompressedOops enabled, and the Java Heap may be blocking the growth of the native heap > > The output 'bytes for Failed to commit metaspace.' should be rephrased. > The reason should be more clear that it really IS the case for the current JVM that CompressedOops is set (and that it is not just some general advice) . LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16707#pullrequestreview-1739843457 From mbaesken at openjdk.org Mon Nov 20 13:43:40 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 20 Nov 2023 13:43:40 GMT Subject: RFR: JDK-8320300: Adjust hs_err output in malloc/mmap error cases In-Reply-To: <6TiJAzltmbdd02CR3DYfKTC1aw4VI2SIgOh4vtR2TxI=.9e372e5a-9017-4532-a16a-48f6ba61385a@github.com> References: <6TiJAzltmbdd02CR3DYfKTC1aw4VI2SIgOh4vtR2TxI=.9e372e5a-9017-4532-a16a-48f6ba61385a@github.com> Message-ID: On Fri, 17 Nov 2023 13:53:53 GMT, Matthias Baesken wrote: > Some of the error output could be slightly improved. Currently it says for example: > > There is insufficient memory for the Java Runtime Environment to continue. > Native memory allocation (mmap) failed to map 65536 bytes for Failed to commit metaspace. > Possible reasons: > . . . > The process is running with CompressedOops enabled, and the Java Heap may be blocking the growth of the native heap > > The output 'bytes for Failed to commit metaspace.' should be rephrased. > The reason should be more clear that it really IS the case for the current JVM that CompressedOops is set (and that it is not just some general advice) . Hi Christoph and Martin, thanks for the review ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16707#issuecomment-1819082798 From mbaesken at openjdk.org Mon Nov 20 13:43:40 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 20 Nov 2023 13:43:40 GMT Subject: Integrated: JDK-8320300: Adjust hs_err output in malloc/mmap error cases In-Reply-To: <6TiJAzltmbdd02CR3DYfKTC1aw4VI2SIgOh4vtR2TxI=.9e372e5a-9017-4532-a16a-48f6ba61385a@github.com> References: <6TiJAzltmbdd02CR3DYfKTC1aw4VI2SIgOh4vtR2TxI=.9e372e5a-9017-4532-a16a-48f6ba61385a@github.com> Message-ID: On Fri, 17 Nov 2023 13:53:53 GMT, Matthias Baesken wrote: > Some of the error output could be slightly improved. Currently it says for example: > > There is insufficient memory for the Java Runtime Environment to continue. > Native memory allocation (mmap) failed to map 65536 bytes for Failed to commit metaspace. > Possible reasons: > . . . > The process is running with CompressedOops enabled, and the Java Heap may be blocking the growth of the native heap > > The output 'bytes for Failed to commit metaspace.' should be rephrased. > The reason should be more clear that it really IS the case for the current JVM that CompressedOops is set (and that it is not just some general advice) . This pull request has now been integrated. Changeset: 60c8d9c0 Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/60c8d9c045be16fee99a83117844c2a8100f7c1a Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod 8320300: Adjust hs_err output in malloc/mmap error cases Reviewed-by: clanger, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/16707 From stuefe at openjdk.org Mon Nov 20 13:53:46 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 20 Nov 2023 13:53:46 GMT Subject: RFR: JDK-8320300: Adjust hs_err output in malloc/mmap error cases In-Reply-To: <6TiJAzltmbdd02CR3DYfKTC1aw4VI2SIgOh4vtR2TxI=.9e372e5a-9017-4532-a16a-48f6ba61385a@github.com> References: <6TiJAzltmbdd02CR3DYfKTC1aw4VI2SIgOh4vtR2TxI=.9e372e5a-9017-4532-a16a-48f6ba61385a@github.com> Message-ID: On Fri, 17 Nov 2023 13:53:53 GMT, Matthias Baesken wrote: > Some of the error output could be slightly improved. Currently it says for example: > > There is insufficient memory for the Java Runtime Environment to continue. > Native memory allocation (mmap) failed to map 65536 bytes for Failed to commit metaspace. > Possible reasons: > . . . > The process is running with CompressedOops enabled, and the Java Heap may be blocking the growth of the native heap > > The output 'bytes for Failed to commit metaspace.' should be rephrased. > The reason should be more clear that it really IS the case for the current JVM that CompressedOops is set (and that it is not just some general advice) . src/hotspot/share/utilities/vmError.cpp line 836: > 834: st->print(" bytes."); > 835: if (strlen(_detail_msg) > 0) { > 836: st->print(" Error detail: "); I would prefer the more concise form: `(malloc) failed to allocate 4711 bytes (detail detail)` Note that this change just turns around the weirdness. We have many callers of vm_exit_out_of_memory that just pass in an API name. But I still like your variant better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16707#discussion_r1399233879 From thartmann at openjdk.org Mon Nov 20 15:11:24 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 20 Nov 2023 15:11:24 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v6] In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 07:36:26 GMT, Axel Boldt-Christmas wrote: >> LM_LIGHTWEIGHT only uses the lock bits for its locking. This leaves the hashCode bits free when a monitor is not inflated. So instead of inflating when installing the hashCode on a fast locked object it can simply use the hashCode bits in the markWord. >> >> The mark word transitions Unlocked (0b01) <=> Locked (0b00) are done by retrying the CAS if it fails due to non-lock bit changes. >> The mark word transitions Monitor (0b10) <=> Locked/Unlocked (0b0X) are the same as before, inflation already handles hash codes. This change does not interact with the mark word if it is in a Monitor (0b10) state, so the strong CAS which is used for deflation are still valid, and will not fail to any other reason than the cooperative race to help transition the mark word during deflation. >> >> This is dependent on JDK-8319778 simply because JDK-8319797 is dependent on both this and JDK-8319778. > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Add retry CAS comment > - Use is_neutral over is_unlocked > - Merge remote-tracking branch 'upstream_jdk/pr/16602' into JDK-8319773 > - Use more familiar CAS variable names and pattern > - Move is_lock_owned closer to its only use > - Merge remote-tracking branch 'upstream_jdk/pr/16602' into JDK-8319773 > - Simplify test. > - 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT Compiler changes look good and in-line with the runtime changes. src/hotspot/share/opto/library_call.cpp line 4561: > 4559: Node *unlocked_val = _gvn.MakeConX(markWord::unlocked_value); > 4560: Node *chk_unlocked = _gvn.transform(new CmpXNode( lmasked_header, unlocked_val)); > 4561: Node *test_not_unlocked = _gvn.transform(new BoolNode( chk_unlocked, BoolTest::ne)); Suggestion: Node *chk_unlocked = _gvn.transform(new CmpXNode(lmasked_header, unlocked_val)); Node *test_not_unlocked = _gvn.transform(new BoolNode(chk_unlocked, BoolTest::ne)); ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16603#pullrequestreview-1740040909 PR Review Comment: https://git.openjdk.org/jdk/pull/16603#discussion_r1399338438 From duke at openjdk.org Mon Nov 20 15:23:30 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 20 Nov 2023 15:23:30 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 06:30:25 GMT, Jatin Bhateja wrote: > Hi @vpaprotsk , please add checks to skip special emulation for 128 bit vectors at applicable places, as per section "4.1.8.4 256-bit Variable Blend Instructions" of x86 optimization manual variable blends are micro-coded only for 256 bit vectors. Thanks for the chapter reference! I was just beginning to go through that manual ------------- PR Comment: https://git.openjdk.org/jdk/pull/16716#issuecomment-1819270025 From rehn at openjdk.org Mon Nov 20 15:33:13 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 20 Nov 2023 15:33:13 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 In-Reply-To: References: Message-ID: <9x5sC6aXWG2OUYXdS97o-fJgjhNODf-mVC69bQNSSjI=.6425f2fc-d793-4b49-bf97-1ea55d0fd443@github.com> On Wed, 15 Nov 2023 07:41:34 GMT, Fei Yang wrote: >> Hi, please consider. >> >> Main author is @luhenry, I only fixed some minor things and tested it. >> >> Such as: >> test/hotspot/jtreg/compiler/intrinsics/sha/ >> test/jdk/java/security/MessageDigest/ >> test/jdk/jdk/security/ >> tier1 >> >> And still running some test. > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 3926: > >> 3924: //-------------------------------------------------------------------------------- >> 3925: // Quad-round 1 (+1, v11->v12->v13->v10) >> 3926: __ vl1re32_v(v15, consts); > > I am still worried about the vector load latency if we do one `vl1re3_v` to get the consts for each round even for single pass. Preloading the constants into vectors is less likely to have this issue, right? We should have enough vector registers for that purpose. Depending on hardware pipeline depth this load can actually be executed after "__ vadd_vv(v14, v15, v10);" thus that instruction maybe already be retired when reaching round 1. Preloading these, depending on the number of V-load ports, the preloading it self can be very costly as they can't be executed out-of-order in parallel. So hiding the load in previous round can be faster, therefore my fast conclusion without numbers was at least for single pass no preloading *should* be better on bigger hardware. I guess I need to get those numbers :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1399372865 From stuefe at openjdk.org Mon Nov 20 15:40:48 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 20 Nov 2023 15:40:48 GMT Subject: RFR: 8319973: AArch64: Save and restore FPCR in the call stub In-Reply-To: References: Message-ID: <4gTm33xFCtzYXXQqZJEh7nzDXhjnNPeRQVB1ncINiLM=.fd0b09aa-b670-4585-8fe6-96ee570743eb@github.com> On Mon, 13 Nov 2023 18:18:35 GMT, Andrew Haley wrote: > On AArch64 we don't save and restore the default floating-point control state when we enter and leave Java code. We really should, because if we're called via the JNI invocation interface with a weird FP control state we'll not be Java compatible. Looks good, one question inline. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 145: > 143: // ... > 144: // -29 [ argument word 1 ] > 145: // -28 [ saved Floating-point Control Register ] Why the 1-word gap? Alignment? ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16637#pullrequestreview-1740119235 PR Review Comment: https://git.openjdk.org/jdk/pull/16637#discussion_r1399385258 From mdoerr at openjdk.org Mon Nov 20 15:47:39 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 20 Nov 2023 15:47:39 GMT Subject: RFR: 8320418: PPC64: invokevfinal_helper duplicates code to handle ResolvedMethodEntry Message-ID: `TemplateTable::invokevfinal_helper` should use `TemplateTable::prepare_invoke`. `TemplateInterpreter::invoke_return_entry_table_for` needs to support `_fast_invokevfinal` bytecode for that which is only used by PPC64. (It is probably still beneficial for AIX which doesn't support CDS.) In addition, I've cleaned up some inaccurate comments. ------------- Commit messages: - 8320418: PPC64: invokevfinal_helper duplicates code to handle ResolvedMethodEntry Changes: https://git.openjdk.org/jdk/pull/16741/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16741&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320418 Stats: 38 lines in 3 files changed: 5 ins; 16 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/16741.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16741/head:pull/16741 PR: https://git.openjdk.org/jdk/pull/16741 From rehn at openjdk.org Mon Nov 20 16:03:26 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 20 Nov 2023 16:03:26 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 07:34:22 GMT, Fei Yang wrote: >> Hi, please consider. >> >> Main author is @luhenry, I only fixed some minor things and tested it. >> >> Such as: >> test/hotspot/jtreg/compiler/intrinsics/sha/ >> test/jdk/java/security/MessageDigest/ >> test/jdk/jdk/security/ >> tier1 >> >> And still running some test. > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 3769: > >> 3767: __ vslidedown_vi(v16, v27, 2); // v16 = {_,_,e,f} >> 3768: // Merge elements [3..2] of v26 ({a,b}) into elements [3..2] of v16 >> 3769: __ vmerge_vvm(v16, v26, v16); // v16 = {a,b,e,f} > > I see the openssl version makes use of index-load to get {f,e,b,a},{h,g,d,c} pre-loop and index-store to put {f,e,b,a},{h,g,d,c} back to {a,b,c,d},{e,f,g,h} post-loop, which is much simpler than this code. Please consider. > > [1] https://github.com/openssl/openssl/blob/master/crypto/sha/asm/sha256-riscv64-zvkb-zvknha_or_zvknhb.pl#L124-L142 The vsetivli is often expensive:ish, the code in openssl sets it five times before reaching first round. That don't seem like a good idea, now vsetivli make the code much easier to read yes... I guess I need to check numbers for that also.. :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1399418241 From duke at openjdk.org Mon Nov 20 16:20:52 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Mon, 20 Nov 2023 16:20:52 GMT Subject: RFR: 8318159: RISC-V: Improve itable_stub [v2] In-Reply-To: References: Message-ID: > Please review the change for RISC-V similar to #13792(AARCH64) and #13460(X86). > > From #13792: > The change replaces two separate iterations over the itable with new algorithm > consisting of two loops. First, we look for a match with resolved_klass, > checking for a match with holder_klass along the way. Then we continue iterating > (not starting over) the itable using the second loop, checking only for a match > with holder_klass. > > ### Correctness checks > > Testing: tier1 tests successfully passed on HiFive Unmatched board. > > #### Performance results on RISC-V StarFive JH7110 board: > > > InterfaceCalls: before fix after fix > ------------------------------------------------------------------- > Benchmark Mode Cnt Score Error Score Error Units > ------------------------------------------------------------------- > test1stInt2Types avgt 100 14.380 ? 0.017 | 14.370 ? 0.014 ns/op > test1stInt3Types avgt 100 72.724 ? 0.552 | 66.290 ? 0.080 ns/op > test1stInt5Types avgt 100 73.948 ? 0.524 | 68.781 ? 0.377 ns/op > test2ndInt2Types avgt 100 15.705 ? 0.016 | 15.707 ? 0.018 ns/op > test2ndInt3Types avgt 100 82.370 ? 0.453 | 75.363 ? 0.156 ns/op > test2ndInt5Types avgt 100 85.266 ? 0.466 | 80.969 ? 0.752 ns/op > testIfaceCall avgt 100 75.684 ? 0.648 | 72.603 ? 0.460 ns/op > testIfaceExtCall avgt 100 86.293 ? 0.567 | 77.939 ? 0.340 ns/op > testMonomorphic avgt 100 11.357 ? 0.007 | 11.359 ? 0.009 ns/op > ------------------------------------------------------------------- > > > #### Performance results on RISC-V HiFive Unmatched board: > > > InterfaceCalls: before fix after fix > --------------------------------------------------------------------- > Benchmark Mode Cnt Score Error Score Error Units > --------------------------------------------------------------------- > test1stInt2Types avgt 100 24.432 ? 1.811 | 23.205 ? 1.512 ns/op > test1stInt3Types avgt 100 135.800 ? 3.991 | 127.112 ? 2.299 ns/op > test1stInt5Types avgt 100 141.746 ? 4.272 | 136.069 ? 4.919 ns/op > test2ndInt2Types avgt 100 31.474 ? 2.468 | 26.978 ? 1.951 ns/op > test2ndInt3Types avgt 100 146.410 ? 3.575 | 139.443 ? 3.677 ns/op > test2ndInt5Types avgt 100 156.083 ? 3.617 | 150.583 ? 2.909 ns/op > testIfaceCall avgt 100 136.392 ? 2.546 | 129.632 ? 1.662 ns/op > testIfaceExtCall avgt 100 155.602 ? 3.836 | 138.058 ? 2.147 ns/op > testMonomorphic avgt 100 24.018 ? 1.888 | 21.522 ? 1.662 ns/op > ---------... Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: Addressed review comments from @RealFYang and @robehn. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16657/files - new: https://git.openjdk.org/jdk/pull/16657/files/2155b9af..c97bf1e3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16657&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16657&range=00-01 Stats: 6 lines in 1 file changed: 1 ins; 4 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16657/head:pull/16657 PR: https://git.openjdk.org/jdk/pull/16657 From duke at openjdk.org Mon Nov 20 16:20:55 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Mon, 20 Nov 2023 16:20:55 GMT Subject: RFR: 8318159: RISC-V: Improve itable_stub [v2] In-Reply-To: References: Message-ID: On Thu, 16 Nov 2023 04:42:07 GMT, Fei Yang wrote: >> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comments from @RealFYang and @robehn. > > Changes requested by fyang (Reviewer). @RealFYang and @robehn: thanks for your reviews, I've updated the patch as you suggested. > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 2562: > >> 2560: mv(holder_offset, zr); >> 2561: // scan_temp = &(itable[0]._interface) >> 2562: la(scan_temp, Address(scan_temp)); > > The `la` call here won't emit any instructions [1]. So I think we can simply remove it and apply the code comment at L2561 to the preceding `shadd` call. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L745 Good catch, thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16657#issuecomment-1819376231 PR Review Comment: https://git.openjdk.org/jdk/pull/16657#discussion_r1399441672 From duke at openjdk.org Mon Nov 20 16:20:57 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Mon, 20 Nov 2023 16:20:57 GMT Subject: RFR: 8318159: RISC-V: Improve itable_stub [v2] In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 08:50:08 GMT, Robbin Ehn wrote: >> Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: >> >> Addressed review comments from @RealFYang and @robehn. > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 2548: > >> 2546: int itmentry_off_bytes = in_bytes(itableMethodEntry::method_offset()); >> 2547: int vte_size_bytes = vtableEntry::size_in_bytes(); >> 2548: const int vte_scale = 3; > > exact_log2(vtableEntry::size_in_bytes()) Done. > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 2557: > >> 2555: // + sizeof(vtableEntry) * (recv_klass->_vtable_len); >> 2556: // temp_itbl_klass = itable[0]._interface; >> 2557: assert(vte_size_bytes == wordSize, "else adjust vte_scale"); > > exact_log2 have assert so you can then remove this one. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16657#discussion_r1399442791 PR Review Comment: https://git.openjdk.org/jdk/pull/16657#discussion_r1399442342 From aph at openjdk.org Mon Nov 20 16:40:51 2023 From: aph at openjdk.org (Andrew Haley) Date: Mon, 20 Nov 2023 16:40:51 GMT Subject: RFR: 8319973: AArch64: Save and restore FPCR in the call stub In-Reply-To: <4gTm33xFCtzYXXQqZJEh7nzDXhjnNPeRQVB1ncINiLM=.fd0b09aa-b670-4585-8fe6-96ee570743eb@github.com> References: <4gTm33xFCtzYXXQqZJEh7nzDXhjnNPeRQVB1ncINiLM=.fd0b09aa-b670-4585-8fe6-96ee570743eb@github.com> Message-ID: <3LGpsx5hyrCaNJSszyGw-N9UIbuf7cCWwC75EtarW_k=.059ce489-9082-4ee4-8e3f-f493e82efc1d@github.com> On Mon, 20 Nov 2023 15:37:59 GMT, Thomas Stuefe wrote: >> On AArch64 we don't save and restore the default floating-point control state when we enter and leave Java code. We really should, because if we're called via the JNI invocation interface with a weird FP control state we'll not be Java compatible. > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 145: > >> 143: // ... >> 144: // -29 [ argument word 1 ] >> 145: // -28 [ saved Floating-point Control Register ] > > Why the 1-word gap? Alignment? Yes, exactly. AArch64 stack is 16-aligned, and this is enforced by hardware. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16637#discussion_r1399468749 From aph at openjdk.org Mon Nov 20 16:40:52 2023 From: aph at openjdk.org (Andrew Haley) Date: Mon, 20 Nov 2023 16:40:52 GMT Subject: Integrated: 8319973: AArch64: Save and restore FPCR in the call stub In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 18:18:35 GMT, Andrew Haley wrote: > On AArch64 we don't save and restore the default floating-point control state when we enter and leave Java code. We really should, because if we're called via the JNI invocation interface with a weird FP control state we'll not be Java compatible. This pull request has now been integrated. Changeset: 6e86904a Author: Andrew Haley URL: https://git.openjdk.org/jdk/commit/6e86904a94d2ed2815aa6e3364c048dac595320d Stats: 34 lines in 4 files changed: 28 ins; 0 del; 6 mod 8319973: AArch64: Save and restore FPCR in the call stub Reviewed-by: adinn, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/16637 From iklam at openjdk.org Mon Nov 20 17:12:35 2023 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 20 Nov 2023 17:12:35 GMT Subject: RFR: 8320147: Remove DumpSharedSpaces [v2] In-Reply-To: References: Message-ID: On Thu, 16 Nov 2023 23:39:53 GMT, Calvin Cheung wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into 8320147-remove-DumpSharedSpaces >> - @calvinccheung comments - fixed copyright >> - 8320147: Remove DumpSharedSpaces > > Few files require copyright header update. > > instanceClassLoaderKlass.hpp > instanceMirrorKlass.hpp > instanceRefKlass.hpp > instanceStackChunkKlass.hpp Thanks @calvinccheung and @matias9927 for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16700#issuecomment-1819475334 From iklam at openjdk.org Mon Nov 20 17:12:37 2023 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 20 Nov 2023 17:12:37 GMT Subject: Integrated: 8320147: Remove DumpSharedSpaces In-Reply-To: References: Message-ID: On Thu, 16 Nov 2023 22:36:17 GMT, Ioi Lam wrote: > One more PR for cleanup with cdsConfig.hpp: > > Replace the global variable `DumpSharedSpaces` with `CDSConfig::is_dumping_static_archive()`. > > Note: some mis-uses of `DumpSharedSpaces` need to be replaced with `CDSConfig::is_dumping_heap()` or `CDSConfig::is_dumping_full_module_graph()` This pull request has now been integrated. Changeset: 0712b22a Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/0712b22a3ae7075304e5925365429e1d85bd173c Stats: 182 lines in 50 files changed: 81 ins; 16 del; 85 mod 8320147: Remove DumpSharedSpaces Reviewed-by: ccheung, matsaave ------------- PR: https://git.openjdk.org/jdk/pull/16700 From rrich at openjdk.org Mon Nov 20 17:29:40 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 20 Nov 2023 17:29:40 GMT Subject: RFR: 8320418: PPC64: invokevfinal_helper duplicates code to handle ResolvedMethodEntry In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 15:40:10 GMT, Martin Doerr wrote: > `TemplateTable::invokevfinal_helper` should use `TemplateTable::prepare_invoke`. `TemplateInterpreter::invoke_return_entry_table_for` needs to support `_fast_invokevfinal` bytecode for that which is only used by PPC64. (It is probably still beneficial for AIX which doesn't support CDS.) > In addition, I've cleaned up some inaccurate comments. src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 3536: > 3534: Rrecv = Rscratch2; > 3535: __ ld(Rnum_params, in_bytes(Method::const_offset()), Rmethod); > 3536: __ lhz(Rnum_params /* number of params */, in_bytes(ConstMethod::size_of_parameters_offset()), Rnum_params); [`ConstMethod::_size_of_parameters`](https://github.com/openjdk/jdk/blob/0712b22a3ae7075304e5925365429e1d85bd173c/src/hotspot/share/oops/constMethod.hpp#L208) is the size of the parameter block in words. `prepare_invoke` uses `ResolvedMethodEntry::_number_of_parameters` which is `Number of arguments for method`. I'd expect the location of the receiver to depend on the size of the parameters and not their number. How does this work? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16741#discussion_r1399525931 From kvn at openjdk.org Mon Nov 20 18:25:52 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 20 Nov 2023 18:25:52 GMT Subject: RFR: 8320272: Make method_entry_barrier address shared [v3] In-Reply-To: <32KZcukg8d6Fjn0bnPQIa-X4q_o8DU0XtcHYA5YT468=.c9f33a2d-215d-404c-822b-448c8fd443d4@github.com> References: <32KZcukg8d6Fjn0bnPQIa-X4q_o8DU0XtcHYA5YT468=.c9f33a2d-215d-404c-822b-448c8fd443d4@github.com> Message-ID: > Currently all platforms have declared their own address variable for method_entry_barrier stub. Some have even slightly different name: nmethod_entry_barrier. For Leyden project one address is preferable. > In aarch64 code changed `movptr` to `lea` instruction to get relocation info as on x86. > > Tested x86 and aarch64, tier1-4, xcomp, stress. I need help to test on other platforms. Thanks! Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Addition RISC-V patch ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16708/files - new: https://git.openjdk.org/jdk/pull/16708/files/3af07981..bbf946c6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16708&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16708&range=01-02 Stats: 16 lines in 2 files changed: 10 ins; 3 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16708/head:pull/16708 PR: https://git.openjdk.org/jdk/pull/16708 From kvn at openjdk.org Mon Nov 20 18:25:54 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 20 Nov 2023 18:25:54 GMT Subject: RFR: 8320272: Make method_entry_barrier address shared In-Reply-To: References: <32KZcukg8d6Fjn0bnPQIa-X4q_o8DU0XtcHYA5YT468=.c9f33a2d-215d-404c-822b-448c8fd443d4@github.com> <7FwjBlfdMsgPdHcdOer9mZsQDQ0PamT8qLzCzdq14z4=.ef428fec-764a-4a28-94f2-4ccbd9c6b3d4@github.com> Message-ID: On Mon, 20 Nov 2023 04:14:57 GMT, Fei Yang wrote: >>> This seems fine, but you could explain a little more why this is useful for Leyden? I would think having StubRoutines::method_entry_barrier() would be enough, and that it could reference the existing platform-specific name, minimizing changes. I don't understand why the storage needs to be shared in StubRoutines::_method_entry_barrier, for example. >> >> Thank you for looking, Dean. Yes, your suggestion would work too. Leyden code calls StubRoutines::method_entry_barrier() to get address: [SCCache.cpp#L3337](https://github.com/openjdk/leyden/blob/premain/src/hotspot/share/code/SCCache.cpp#L3337) >> But we would need StubRoutines::method_entry_barrier() implementation for each platform in such case. And having duplication and different names does not feel right for me ;^) > > @vnkozlov : Hi, I have tested this on linux-riscv platform. Result looks fine. > Would you mind apply following small add-on change which adds relocation info for this platform too? Thanks. > [16708-riscv.diff.txt](https://github.com/openjdk/jdk/files/13406179/16708-riscv.diff.txt) Thank you, @RealFYang, for testing. I applied your patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16708#issuecomment-1819584853 From shade at openjdk.org Mon Nov 20 19:06:07 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 20 Nov 2023 19:06:07 GMT Subject: RFR: 8319777: Zero: Support 8-byte cmpxchg [v2] In-Reply-To: <5X9UjtgpVfSFxZQggFfQS1Z99xeFR-u1EjoWtIWdVOA=.1528ea9b-4725-4ae2-8606-65ce20ccb7b4@github.com> References: <5X9UjtgpVfSFxZQggFfQS1Z99xeFR-u1EjoWtIWdVOA=.1528ea9b-4725-4ae2-8606-65ce20ccb7b4@github.com> Message-ID: On Tue, 14 Nov 2023 13:28:09 GMT, Aleksey Shipilev wrote: >> See related discussion in [JDK-8318776](https://bugs.openjdk.org/browse/JDK-8318776) that targets to require `supports_cx8()` unconditionally. >> >> I think we can claim Zero is `supports_cx8() == true`, because we have enough fallbacks for 8-byte CASes to work. Note that some code already reaches for these without checking for `supports_cx8()`, so the proverbial horses have already left the barn. >> >> I ran tests with [JDK-8319883](https://bugs.openjdk.org/browse/JDK-8319883) applied to fix known problems with x86_32 Zero. >> >> Additional testing: >> - [x] Linux x86_32 Zero release; jcstress >> - [x] Linux x86_32 Zero fastdebug, `compiler/unsafe java/lang/invoke/VarHandles` >> - [x] Linux x86_32 Zero fastdebug, bootcycle-images > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Only do _supports_cx8 = true > - Merge branch 'master' into JDK-8319777-zero-64cas > - Fix Thanks all, I am integrating and paying extra attention to build/test pipelines for these. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16614#issuecomment-1819639843 From shade at openjdk.org Mon Nov 20 19:09:20 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 20 Nov 2023 19:09:20 GMT Subject: Integrated: 8319777: Zero: Support 8-byte cmpxchg In-Reply-To: References: Message-ID: On Fri, 10 Nov 2023 14:17:32 GMT, Aleksey Shipilev wrote: > See related discussion in [JDK-8318776](https://bugs.openjdk.org/browse/JDK-8318776) that targets to require `supports_cx8()` unconditionally. > > I think we can claim Zero is `supports_cx8() == true`, because we have enough fallbacks for 8-byte CASes to work. Note that some code already reaches for these without checking for `supports_cx8()`, so the proverbial horses have already left the barn. > > I ran tests with [JDK-8319883](https://bugs.openjdk.org/browse/JDK-8319883) applied to fix known problems with x86_32 Zero. > > Additional testing: > - [x] Linux x86_32 Zero release; jcstress > - [x] Linux x86_32 Zero fastdebug, `compiler/unsafe java/lang/invoke/VarHandles` > - [x] Linux x86_32 Zero fastdebug, bootcycle-images This pull request has now been integrated. Changeset: 6b96bb64 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/6b96bb640aa91d96877b8ceea5fed359607c1e45 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod 8319777: Zero: Support 8-byte cmpxchg Reviewed-by: dholmes, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/16614 From rriggs at openjdk.org Mon Nov 20 19:23:35 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Mon, 20 Nov 2023 19:23:35 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v9] In-Reply-To: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: > Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. > The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. > > The changes include: > > - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate > if the input array is modified before the constructor returns. > The resulting string may contain any combination of characters sampled from the input array. > > - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. > For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. > If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. > If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. > Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. > Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. > > - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. > > - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. Roger Riggs has updated the pull request incrementally with two additional commits since the last revision: - Normalize the spec for undefined behavior of String constructors, StringBuilder, and Appendable methods in the case where the input arguments are modified during construction or a StringBuilder or Appendable method call. - Speed up getting the coder from a byte array returned from StringUTF16.compress and normalize calling sequence ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16425/files - new: https://git.openjdk.org/jdk/pull/16425/files/3e3607e9..7924118b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16425&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16425&range=07-08 Stats: 73 lines in 4 files changed: 15 ins; 0 del; 58 mod Patch: https://git.openjdk.org/jdk/pull/16425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16425/head:pull/16425 PR: https://git.openjdk.org/jdk/pull/16425 From jiangli at openjdk.org Mon Nov 20 19:28:13 2023 From: jiangli at openjdk.org (Jiangli Zhou) Date: Mon, 20 Nov 2023 19:28:13 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is create for one JavaThread associated with attached native thread [v3] In-Reply-To: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: > Please review JvmtiThreadState::state_for_while_locked change to handle the state->get_thread_oop() null case. Please see https://bugs.openjdk.org/browse/JDK-8319935 for details. Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: Add a check for a thread is_attaching_via_jni, based on David Holmes' comment. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16642/files - new: https://git.openjdk.org/jdk/pull/16642/files/c2f83e8a..7c0214e2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16642&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16642&range=01-02 Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16642.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16642/head:pull/16642 PR: https://git.openjdk.org/jdk/pull/16642 From jiangli at openjdk.org Mon Nov 20 19:28:17 2023 From: jiangli at openjdk.org (Jiangli Zhou) Date: Mon, 20 Nov 2023 19:28:17 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is create for one JavaThread associated with attached native thread [v2] In-Reply-To: <1sURlXfpTlEu9U30aZAojUIhDgtzeyA8MYrJ_q3xDUs=.bda6a027-ebc8-4107-a1f7-be3edf737e5f@github.com> References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> <1sURlXfpTlEu9U30aZAojUIhDgtzeyA8MYrJ_q3xDUs=.bda6a027-ebc8-4107-a1f7-be3edf737e5f@github.com> Message-ID: On Thu, 16 Nov 2023 09:48:48 GMT, David Holmes wrote: >> Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: >> >> Don't try to setup_jvmti_thread_state for obj allocation sampling if the current thread is attaching from native and is allocating the thread oop. That's to make sure we don't create a 'partial' JvmtiThreadState. > > src/hotspot/share/prims/jvmtiThreadState.inline.hpp line 87: > >> 85: // Don't add a JvmtiThreadState to a thread that is exiting. >> 86: return nullptr; >> 87: } > > I'm wondering if there should also be an `is_jni_attaching` check here? That seems to be a good idea. It would cover other cases that we haven't seen yet. Added a check as suggested, thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16642#discussion_r1399647867 From rriggs at openjdk.org Mon Nov 20 19:35:27 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Mon, 20 Nov 2023 19:35:27 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v10] In-Reply-To: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: > Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. > The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. > > The changes include: > > - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate > if the input array is modified before the constructor returns. > The resulting string may contain any combination of characters sampled from the input array. > > - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. > For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. > If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. > If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. > Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. > Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. > > - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. > > - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. Roger Riggs has updated the pull request incrementally with one additional commit since the last revision: undo noise chars ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16425/files - new: https://git.openjdk.org/jdk/pull/16425/files/7924118b..04d58779 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16425&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16425&range=08-09 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16425/head:pull/16425 PR: https://git.openjdk.org/jdk/pull/16425 From mdoerr at openjdk.org Mon Nov 20 19:51:09 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 20 Nov 2023 19:51:09 GMT Subject: RFR: 8320418: PPC64: invokevfinal_helper duplicates code to handle ResolvedMethodEntry In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 17:26:15 GMT, Richard Reingruber wrote: >> `TemplateTable::invokevfinal_helper` should use `TemplateTable::prepare_invoke`. `TemplateInterpreter::invoke_return_entry_table_for` needs to support `_fast_invokevfinal` bytecode for that which is only used by PPC64. (It is probably still beneficial for AIX which doesn't support CDS.) >> In addition, I've cleaned up some inaccurate comments. > > src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 3536: > >> 3534: Rrecv = Rscratch2; >> 3535: __ ld(Rnum_params, in_bytes(Method::const_offset()), Rmethod); >> 3536: __ lhz(Rnum_params /* number of params */, in_bytes(ConstMethod::size_of_parameters_offset()), Rnum_params); > > [`ConstMethod::_size_of_parameters`](https://github.com/openjdk/jdk/blob/0712b22a3ae7075304e5925365429e1d85bd173c/src/hotspot/share/oops/constMethod.hpp#L208) is the size of the parameter block in words. `prepare_invoke` uses `ResolvedMethodEntry::_number_of_parameters` which is `Number of arguments for method`. I'd expect the location of the receiver to depend on the size of the parameters and not their number. How does this work? One is a copy of the other. See usages of `method_entry->fill_in` in src/hotspot/share/oops/cpCache.cpp. E.g. `method_entry->fill_in((u1)as_TosState(method->result_type()), (u2)method()->size_of_parameters());` Scaling happens in `load_receiver`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16741#discussion_r1399675411 From sviswanathan at openjdk.org Mon Nov 20 21:26:08 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 20 Nov 2023 21:26:08 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore In-Reply-To: References: Message-ID: On Fri, 17 Nov 2023 19:58:13 GMT, Volodymyr Paprotski wrote: > Splitting vblendvp[sd] into boolean operations is bit faster on ECore, get up to 30% gain > > > =============== BEFORE =============== > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 77.766 ? 0.049 ns/op > VectorSignum.floatSignum 512 avgt 3 154.889 ? 0.242 ns/op > VectorSignum.floatSignum 1024 avgt 3 306.130 ? 0.605 ns/op > VectorSignum.floatSignum 2048 avgt 3 609.965 ? 0.927 ns/op > VectorSignum.doubleSignum 256 avgt 3 151.874 ? 1.748 ns/op > VectorSignum.doubleSignum 512 avgt 3 303.080 ? 0.310 ns/op > VectorSignum.doubleSignum 1024 avgt 3 607.517 ? 0.597 ns/op > VectorSignum.doubleSignum 2048 avgt 3 1214.282 ? 1.834 ns/op > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op > MaxMinOptimizeTest.dMax avgt 3 137.334 ? 0.128 us/op > MaxMinOptimizeTest.dMin avgt 3 137.160 ? 0.465 us/op > MaxMinOptimizeTest.dMul avgt 3 77.231 ? 0.051 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.165 ? 0.003 us/op > MaxMinOptimizeTest.fMax avgt 3 107.428 ? 1.501 us/op > MaxMinOptimizeTest.fMin avgt 3 107.186 ? 0.022 us/op > MaxMinOptimizeTest.fMul avgt 3 77.164 ? 0.012 us/op > > =============== AFTER =============== > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 61.816 ? 1.980 ns/op > VectorSignum.floatSignum 512 avgt 3 117.251 ? 0.052 ns/op > VectorSignum.floatSignum 1024 avgt 3 231.356 ? 0.397 ns/op > VectorSignum.floatSignum 2048 avgt 3 458.904 ? 0.774 ns/op > VectorSignum.doubleSignum 256 avgt 3 121.449 ? 0.184 ns/op > VectorSignum.doubleSignum 512 avgt 3 241.662 ? 0.189 ns/op > VectorSignum.doubleSignum 1024 avgt 3 482.365 ? 0.165 ns/op > VectorSignum.doubleSignum 2048 avgt 3 962.412 ? 1.401 ns/op > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op > MaxMinOptimizeTest.dMax avgt 3 125.701 ? 0.082 us/op > MaxMinOptimizeTest.dMin avgt 3 124.704 ? 0.119 us/op > MaxMinOptimizeTest.dMul avgt 3 77.232 ? 0.028 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.169 ? 0.103 us/op > MaxMinOptimizeTest.fMax avgt 3 97.939 ? 0.477 us/op > MaxMinOptimizeTest.fMin avgt 3 98.012 ? 0.154 us/op > MaxMinOptimizeTest.fMul avgt 3 77.174 ? 0.012 us/op src/hotspot/cpu/x86/macroAssembler_x86.cpp line 3488: > 3486: // WARN: Allow dst == (src1|src2), mask == scratch > 3487: bool scratch_available = scratch != xnoreg && scratch != src1 && scratch != src2 && scratch != dst; > 3488: bool dst_available = dst != src1 || dst != src2; What if it is fully masked and dst==mask? In that case also dst_available should be false. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 3503: > 3501: } > 3502: vpor(dst, dst, scratch, vector_len); > 3503: // CASE dst==src1==src2 Should this comment be in the else part below or removed altogether? src/hotspot/cpu/x86/macroAssembler_x86.cpp line 3513: > 3511: // WARN: Allow dst == (src1|src2), mask == scratch > 3512: bool scratch_available = scratch != xnoreg && scratch != src1 && scratch != src2 && scratch != dst && (fully_masked || scratch != mask); > 3513: bool dst_available = dst != src1 || dst != src2; What if it is fully masked and dst==mask? In that case also dst_available should be false. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1399757438 PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1399750936 PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1399760137 From duke at openjdk.org Mon Nov 20 21:38:08 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 20 Nov 2023 21:38:08 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 21:11:09 GMT, Sandhya Viswanathan wrote: >> Splitting vblendvp[sd] into boolean operations is bit faster on ECore, get up to 30% gain >> >> >> =============== BEFORE =============== >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorSignum.floatSignum 256 avgt 3 77.766 ? 0.049 ns/op >> VectorSignum.floatSignum 512 avgt 3 154.889 ? 0.242 ns/op >> VectorSignum.floatSignum 1024 avgt 3 306.130 ? 0.605 ns/op >> VectorSignum.floatSignum 2048 avgt 3 609.965 ? 0.927 ns/op >> VectorSignum.doubleSignum 256 avgt 3 151.874 ? 1.748 ns/op >> VectorSignum.doubleSignum 512 avgt 3 303.080 ? 0.310 ns/op >> VectorSignum.doubleSignum 1024 avgt 3 607.517 ? 0.597 ns/op >> VectorSignum.doubleSignum 2048 avgt 3 1214.282 ? 1.834 ns/op >> Benchmark Mode Cnt Score Error Units >> MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op >> MaxMinOptimizeTest.dMax avgt 3 137.334 ? 0.128 us/op >> MaxMinOptimizeTest.dMin avgt 3 137.160 ? 0.465 us/op >> MaxMinOptimizeTest.dMul avgt 3 77.231 ? 0.051 us/op >> MaxMinOptimizeTest.fAdd avgt 3 77.165 ? 0.003 us/op >> MaxMinOptimizeTest.fMax avgt 3 107.428 ? 1.501 us/op >> MaxMinOptimizeTest.fMin avgt 3 107.186 ? 0.022 us/op >> MaxMinOptimizeTest.fMul avgt 3 77.164 ? 0.012 us/op >> >> =============== AFTER =============== >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorSignum.floatSignum 256 avgt 3 61.816 ? 1.980 ns/op >> VectorSignum.floatSignum 512 avgt 3 117.251 ? 0.052 ns/op >> VectorSignum.floatSignum 1024 avgt 3 231.356 ? 0.397 ns/op >> VectorSignum.floatSignum 2048 avgt 3 458.904 ? 0.774 ns/op >> VectorSignum.doubleSignum 256 avgt 3 121.449 ? 0.184 ns/op >> VectorSignum.doubleSignum 512 avgt 3 241.662 ? 0.189 ns/op >> VectorSignum.doubleSignum 1024 avgt 3 482.365 ? 0.165 ns/op >> VectorSignum.doubleSignum 2048 avgt 3 962.412 ? 1.401 ns/op >> Benchmark Mode Cnt Score Error Units >> MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op >> MaxMinOptimizeTest.dMax avgt 3 125.701 ? 0.082 us/op >> MaxMinOptimizeTest.dMin avgt 3 124.704 ? 0.119 us/op >> MaxMinOptimizeTest.dMul avgt 3 77.232 ? 0.028 us/op >> MaxMinOptimizeTest.fAdd avgt 3 77.169 ? 0.103 us/op >> MaxMinOptimizeTest.fMax avgt 3 97.939 ? 0.477 us/op >> MaxMinOptimizeTest.fMin avgt 3 98.012 ? 0.154 us/op >> MaxMinO... > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 3503: > >> 3501: } >> 3502: vpor(dst, dst, scratch, vector_len); >> 3503: // CASE dst==src1==src2 > > Should this comment be in the else part below or removed altogether? Will remove, was a note for myself that to remember that case, forgot to remove (did add the check for that case `bool dst_available = dst != src1 || dst != src2;`) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1399770488 From duke at openjdk.org Mon Nov 20 21:38:06 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 20 Nov 2023 21:38:06 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore In-Reply-To: References: Message-ID: <4s9WrUMXWP-tkU4dGV8iwhWJUJDOlZoORvw4PiO5UuY=.a8ebb4bf-2325-473a-94bf-9d38b62dc80b@github.com> On Mon, 20 Nov 2023 15:20:35 GMT, Volodymyr Paprotski wrote: > Hi @vpaprotsk , please add checks to skip special emulation for 128 bit vectors at applicable places, as per section "4.1.8.4 256-bit Variable Blend Instructions" of x86 optimization manual variable blends are micro-coded only for 256 bit vectors. I went and remeasured performance of 128-bit vectors with `-XX:MaxVectorSize=16`... =============== BEFORE =============== Benchmark Mode Cnt Score Error Units MaxMinOptimizeTest.dAdd avgt 3 77.232 ? 0.034 us/op MaxMinOptimizeTest.dMax avgt 3 149.242 ? 2.373 us/op MaxMinOptimizeTest.dMin avgt 3 150.000 ? 1.763 us/op MaxMinOptimizeTest.dMul avgt 3 77.237 ? 0.020 us/op MaxMinOptimizeTest.fAdd avgt 3 77.156 ? 0.012 us/op MaxMinOptimizeTest.fMax avgt 3 110.729 ? 0.743 us/op MaxMinOptimizeTest.fMin avgt 3 110.716 ? 0.157 us/op MaxMinOptimizeTest.fMul avgt 3 77.157 ? 0.017 us/op Benchmark (SIZE) Mode Cnt Score Error Units VectorSignum.floatSignum 256 avgt 3 134.137 ? 4.586 ns/op VectorSignum.floatSignum 512 avgt 3 258.117 ? 0.518 ns/op VectorSignum.floatSignum 1024 avgt 3 512.706 ? 5.924 ns/op VectorSignum.floatSignum 2048 avgt 3 979.276 ? 46.734 ns/op VectorSignum.doubleSignum 256 avgt 3 233.108 ? 5.314 ns/op VectorSignum.doubleSignum 512 avgt 3 457.757 ? 3.537 ns/op VectorSignum.doubleSignum 1024 avgt 3 907.037 ? 2.768 ns/op VectorSignum.doubleSignum 2048 avgt 3 1816.200 ? 15.869 ns/op =============== AFTER =============== Benchmark Mode Cnt Score Error Units MaxMinOptimizeTest.dAdd avgt 3 77.238 ? 0.092 us/op MaxMinOptimizeTest.dMax avgt 3 106.636 ? 0.072 us/op MaxMinOptimizeTest.dMin avgt 3 103.060 ? 0.129 us/op MaxMinOptimizeTest.dMul avgt 3 77.233 ? 0.044 us/op MaxMinOptimizeTest.fAdd avgt 3 77.158 ? 0.021 us/op MaxMinOptimizeTest.fMax avgt 3 105.256 ? 1.682 us/op MaxMinOptimizeTest.fMin avgt 3 103.126 ? 0.049 us/op MaxMinOptimizeTest.fMul avgt 3 77.155 ? 0.019 us/op Benchmark (SIZE) Mode Cnt Score Error Units VectorSignum.floatSignum 256 avgt 3 60.523 ? 0.026 ns/op VectorSignum.floatSignum 512 avgt 3 118.415 ? 0.076 ns/op VectorSignum.floatSignum 1024 avgt 3 235.203 ? 0.323 ns/op VectorSignum.floatSignum 2048 avgt 3 467.230 ? 0.144 ns/op VectorSignum.doubleSignum 256 avgt 3 120.955 ? 0.217 ns/op VectorSignum.doubleSignum 512 avgt 3 241.753 ? 0.371 ns/op VectorSignum.doubleSignum 1024 avgt 3 498.055 ? 0.410 ns/op VectorSignum.doubleSignum 2048 avgt 3 974.891 ? 1.472 ns/op For Max/Min, keeping this patch gets us up to 40%, and `VectorSignum.*Signum`, the fix is actually >2x. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16716#issuecomment-1819836163 From matsaave at openjdk.org Mon Nov 20 21:49:15 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Mon, 20 Nov 2023 21:49:15 GMT Subject: RFR: 8320278: ARM32 build is broken after JDK-8301997 Message-ID: JDK-8301997 changed the handling of constant pool cache entries for methods and fully removed the ConstantPoolCacheEntry class. This commit included changes to the interpreters for all supported platforms except ARM32, and its omission resulted in a GHA failure. This patch intends to introduce an ARM32 port that reflects the code changes to the included platforms so the ARM32 code can build and thus pass testing. Verified with tier 1-4 tests. ------------- Commit messages: - Fixed copyright header - 8320278: ARM32 build is broken after JDK-8301997 Changes: https://git.openjdk.org/jdk/pull/16749/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16749&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320278 Stats: 292 lines in 5 files changed: 103 ins; 128 del; 61 mod Patch: https://git.openjdk.org/jdk/pull/16749.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16749/head:pull/16749 PR: https://git.openjdk.org/jdk/pull/16749 From duke at openjdk.org Mon Nov 20 22:01:06 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 20 Nov 2023 22:01:06 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 21:19:02 GMT, Sandhya Viswanathan wrote: >> Splitting vblendvp[sd] into boolean operations is bit faster on ECore, get up to 30% gain >> >> >> =============== BEFORE =============== >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorSignum.floatSignum 256 avgt 3 77.766 ? 0.049 ns/op >> VectorSignum.floatSignum 512 avgt 3 154.889 ? 0.242 ns/op >> VectorSignum.floatSignum 1024 avgt 3 306.130 ? 0.605 ns/op >> VectorSignum.floatSignum 2048 avgt 3 609.965 ? 0.927 ns/op >> VectorSignum.doubleSignum 256 avgt 3 151.874 ? 1.748 ns/op >> VectorSignum.doubleSignum 512 avgt 3 303.080 ? 0.310 ns/op >> VectorSignum.doubleSignum 1024 avgt 3 607.517 ? 0.597 ns/op >> VectorSignum.doubleSignum 2048 avgt 3 1214.282 ? 1.834 ns/op >> Benchmark Mode Cnt Score Error Units >> MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op >> MaxMinOptimizeTest.dMax avgt 3 137.334 ? 0.128 us/op >> MaxMinOptimizeTest.dMin avgt 3 137.160 ? 0.465 us/op >> MaxMinOptimizeTest.dMul avgt 3 77.231 ? 0.051 us/op >> MaxMinOptimizeTest.fAdd avgt 3 77.165 ? 0.003 us/op >> MaxMinOptimizeTest.fMax avgt 3 107.428 ? 1.501 us/op >> MaxMinOptimizeTest.fMin avgt 3 107.186 ? 0.022 us/op >> MaxMinOptimizeTest.fMul avgt 3 77.164 ? 0.012 us/op >> >> =============== AFTER =============== >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorSignum.floatSignum 256 avgt 3 61.816 ? 1.980 ns/op >> VectorSignum.floatSignum 512 avgt 3 117.251 ? 0.052 ns/op >> VectorSignum.floatSignum 1024 avgt 3 231.356 ? 0.397 ns/op >> VectorSignum.floatSignum 2048 avgt 3 458.904 ? 0.774 ns/op >> VectorSignum.doubleSignum 256 avgt 3 121.449 ? 0.184 ns/op >> VectorSignum.doubleSignum 512 avgt 3 241.662 ? 0.189 ns/op >> VectorSignum.doubleSignum 1024 avgt 3 482.365 ? 0.165 ns/op >> VectorSignum.doubleSignum 2048 avgt 3 962.412 ? 1.401 ns/op >> Benchmark Mode Cnt Score Error Units >> MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op >> MaxMinOptimizeTest.dMax avgt 3 125.701 ? 0.082 us/op >> MaxMinOptimizeTest.dMin avgt 3 124.704 ? 0.119 us/op >> MaxMinOptimizeTest.dMul avgt 3 77.232 ? 0.028 us/op >> MaxMinOptimizeTest.fAdd avgt 3 77.169 ? 0.103 us/op >> MaxMinOptimizeTest.fMax avgt 3 97.939 ? 0.477 us/op >> MaxMinOptimizeTest.fMin avgt 3 98.012 ? 0.154 us/op >> MaxMinO... > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 3488: > >> 3486: // WARN: Allow dst == (src1|src2), mask == scratch >> 3487: bool scratch_available = scratch != xnoreg && scratch != src1 && scratch != src2 && scratch != dst; >> 3488: bool dst_available = dst != src1 || dst != src2; > > What if it is fully masked and dst==mask? In that case also dst_available should be false. Hmm.. thats a good catch.. but its also bad if `!fully_masked && mask==scratch` (i.e. both cases will destroy the mask before it can be used) For readability, I think I will just keep a second half of the check: `bool dst_available = dst!=mask && (dst != src1 || dst != src2)` (i.e. whomever calls this function, really _should_ know better, just like all the other functions in this file that dont have input validation.. these checks started as asserts while developing to catch some confusing bugs..) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1399802873 From pchilanomate at openjdk.org Mon Nov 20 22:05:11 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 20 Nov 2023 22:05:11 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v6] In-Reply-To: <9a-07LkAZPw_SiOpo1uO8uA9i66AVNi9OkJjNZfSHU0=.8f4feee9-2b76-4006-a5ca-f0ce2d71c6b0@github.com> References: <9a-07LkAZPw_SiOpo1uO8uA9i66AVNi9OkJjNZfSHU0=.8f4feee9-2b76-4006-a5ca-f0ce2d71c6b0@github.com> Message-ID: On Tue, 14 Nov 2023 14:37:58 GMT, Aleksey Shipilev wrote: >> See the symptoms, reproducer and analysis in the bug. >> >> Current code waits on `disarm()`, which effectively stalls leaving the safepoint if some threads lag behind. Having more runnable threads than CPUs nearly guarantees that we would wait for quite some time, but it also reproduces well if you have enough threads near the CPU count. Just waiting at `arm()` is insufficient, but we can have several `Semaphores` to do what we want. >> >> This PR implements a more efficient `GenericWaitBarrier` to recover the performance. Most of the implementation discussion is in the code comments. The key observation that drives this work is that we want to reuse `Semaphore` and related counters without being stuck waiting for threads to leave. >> >> (AFAICS, futex-based `LinuxWaitBarrier` does roughly the same, but handles this reuse on futex side, by assigning the "address" per futex.) >> >> This issue affects everything except Linux. I initially found this on my M1 Mac, but pretty sure it is easy to reproduce on Windows as well. The safepoints from the reproducer in the bug improved dramatically on a Mac. Note not only the orders of magnitude better safepoint times, but also the several times more GC safepoints in the time-bound allocation test, which means the attainable GC throughput is similarly better, since we don't waste time at this wait barrier. >> >> ![plot-generic-wait-barrier-macos](https://github.com/openjdk/jdk/assets/1858943/3440f200-2360-482a-b8b2-71385cca35b7) >> >> Additional testing: >> - [x] MacOS AArch64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] MacOS AArch64 server fastdebug, `tier2 tier3` >> - [x] Linux x86_64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: > > - Drop the Linux check in preparation for integration > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Rework paddings > - Encode barrier tag into state, resolving another race condition > - Simple review feedback fixes: tracking wakeup numbers, reflowing some methods > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Touchups > - More comments work > - Tight up the comments > - ... and 3 more: https://git.openjdk.org/jdk/compare/c87f6048...191c0dbb Looks good to me. src/hotspot/share/utilities/waitBarrier_generic.cpp line 191: > 189: break; > 190: } > 191: sp.wait(); Do we really need this SpinYield wait() here? I would expect failing that CAS is rare and a retry should work. ------------- Marked as reviewed by pchilanomate (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16404#pullrequestreview-1740773160 PR Review Comment: https://git.openjdk.org/jdk/pull/16404#discussion_r1399800543 From sviswanathan at openjdk.org Mon Nov 20 22:09:07 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 20 Nov 2023 22:09:07 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore In-Reply-To: References: Message-ID: On Fri, 17 Nov 2023 19:58:13 GMT, Volodymyr Paprotski wrote: > Splitting vblendvp[sd] into boolean operations is bit faster on ECore, get up to 30% gain > > > =============== BEFORE =============== > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 77.766 ? 0.049 ns/op > VectorSignum.floatSignum 512 avgt 3 154.889 ? 0.242 ns/op > VectorSignum.floatSignum 1024 avgt 3 306.130 ? 0.605 ns/op > VectorSignum.floatSignum 2048 avgt 3 609.965 ? 0.927 ns/op > VectorSignum.doubleSignum 256 avgt 3 151.874 ? 1.748 ns/op > VectorSignum.doubleSignum 512 avgt 3 303.080 ? 0.310 ns/op > VectorSignum.doubleSignum 1024 avgt 3 607.517 ? 0.597 ns/op > VectorSignum.doubleSignum 2048 avgt 3 1214.282 ? 1.834 ns/op > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op > MaxMinOptimizeTest.dMax avgt 3 137.334 ? 0.128 us/op > MaxMinOptimizeTest.dMin avgt 3 137.160 ? 0.465 us/op > MaxMinOptimizeTest.dMul avgt 3 77.231 ? 0.051 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.165 ? 0.003 us/op > MaxMinOptimizeTest.fMax avgt 3 107.428 ? 1.501 us/op > MaxMinOptimizeTest.fMin avgt 3 107.186 ? 0.022 us/op > MaxMinOptimizeTest.fMul avgt 3 77.164 ? 0.012 us/op > > =============== AFTER =============== > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 61.816 ? 1.980 ns/op > VectorSignum.floatSignum 512 avgt 3 117.251 ? 0.052 ns/op > VectorSignum.floatSignum 1024 avgt 3 231.356 ? 0.397 ns/op > VectorSignum.floatSignum 2048 avgt 3 458.904 ? 0.774 ns/op > VectorSignum.doubleSignum 256 avgt 3 121.449 ? 0.184 ns/op > VectorSignum.doubleSignum 512 avgt 3 241.662 ? 0.189 ns/op > VectorSignum.doubleSignum 1024 avgt 3 482.365 ? 0.165 ns/op > VectorSignum.doubleSignum 2048 avgt 3 962.412 ? 1.401 ns/op > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op > MaxMinOptimizeTest.dMax avgt 3 125.701 ? 0.082 us/op > MaxMinOptimizeTest.dMin avgt 3 124.704 ? 0.119 us/op > MaxMinOptimizeTest.dMul avgt 3 77.232 ? 0.028 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.169 ? 0.103 us/op > MaxMinOptimizeTest.fMax avgt 3 97.939 ? 0.477 us/op > MaxMinOptimizeTest.fMin avgt 3 98.012 ? 0.154 us/op > MaxMinOptimizeTest.fMul avgt 3 77.174 ? 0.012 us/op src/hotspot/cpu/x86/x86.ad line 7844: > 7842: int vlen_enc = vector_length_encoding(this); > 7843: __ vpandn($vtmp$$XMMRegister, $mask$$XMMRegister, $src1$$XMMRegister, vlen_enc); > 7844: __ vpand($dst$$XMMRegister, $src2$$XMMRegister, $mask$$XMMRegister, vlen_enc); May be we could code it as below to be consistent with other places: `__ vpand($dst$$XMMRegister, $mask$$XMMRegister, $src2$$XMMRegister, vlen_enc);` src/hotspot/cpu/x86/x86_64.ad line 4554: > 4552: __ vmaxsd($tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister); > 4553: __ vcmppd($btmp$$XMMRegister, $atmp$$XMMRegister, $atmp$$XMMRegister, Assembler::_false, vector_len); > 4554: __ vblendvpd($dst$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, vector_len, true, $btmp$$XMMRegister); As dst and mask (in this case btmp) need to be independent for EcoreOpt, vblend dst here should be tmp or atmp followed by a move into dst. Either this or have TEMP dst in effect for Ecore case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1399788631 PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1399808720 From coleenp at openjdk.org Mon Nov 20 22:12:09 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 20 Nov 2023 22:12:09 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes In-Reply-To: References: Message-ID: On Tue, 14 Nov 2023 17:56:09 GMT, Jaroslav Bachorik wrote: > Please, review this fix for a corner case handling of `jmethodID` values. > > The issue is related to the interplay between `jmethodID` values and method redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` instance. Once that method gets redefined, the `jmethodID` is updated to point to the last `Method` version. > Unless the method is still on stack/running, in which case the original `jmethodID` will be redirected to the latest `Method` version and at the same time the 'previous' `Method` version will receive a new `jmethodID` pointing to that previous version. > > If we happen to capture stacktrace via `GetStackTrace` or `GetAllStackTraces` JVMTI calls while this previous `Method` version is still on stack we will have the corresponding frame identified by a `jmethodID` pointing to that version. > However, sooner or later the 'previous' class version becomes eligible for cleanup at what time all contained `Method` instances. The cleanup process will not perform the `jmethodID` pointer maintenance and we will end up with pointers to deallocated memory. > This is caused by the fact that the `jmethodID` lifecycle is bound to `ClassLoaderData` instance and all relevant `jmethodID`s will get batch-updated when the class loader is being released and all its classes are getting unloaded. > > This means that we need to make sure that if a `Method` instance is being deallocate the associated `jmethodID` (if any) must not point to the deallocated instance once we are finished. Unfortunately, we can not just update the `jmethodID` values in bulk when purging an old class version - the per `InstanceKlass` jmethodID cache is present only for the main class version and contains `jmethodID` values for both the old and current method versions. > > Therefore we need to perform `jmethodID` lookup when we are about to deallocate a `Method` instance and clean up the pointer only if that `jmethodID` is pointing to the `Method` instance which is being deallocated. > > _(For anyone interested, a much lengthier writeup is available in [my blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_ Good analysis for a very subtle bug. I have a couple of comments, and maybe the test can be simplified but approving the change. src/hotspot/share/classfile/classFileParser.cpp line 5579: > 5577: > 5578: if (_methods != nullptr) { > 5579: // Free methods - those methods are not fully wired and miss the method holder How about saying: for methods whose InstanceKlass as method holder is not yet created? test/hotspot/jtreg/serviceability/jvmti/thread/GetStackTrace/GetStackTraceAndRetransformTest/GetStackTraceAndRetransformTest.java line 53: > 51: import java.util.List; > 52: import java.util.concurrent.CyclicBarrier; > 53: import java.util.concurrent.locks.LockSupport; Do you need all these imports? There's a simple RedefineClassHelper class that does most of the work, but maybe you need the explicit agent code to reproduce the crash? See test/hotspot/jtreg/serviceability/jvmti/RedefineClasses/RedefineRunningMethodsWithBacktrace.java as an example. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16662#pullrequestreview-1740746213 PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1399811821 PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1399810008 From coleenp at openjdk.org Mon Nov 20 22:12:12 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 20 Nov 2023 22:12:12 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes In-Reply-To: References: Message-ID: On Sat, 18 Nov 2023 00:23:44 GMT, Jaroslav Bachorik wrote: >> src/hotspot/share/oops/instanceKlass.cpp line 541: >> >>> 539: // The previous version will point to them so they're not totally dangling >>> 540: assert (!method->on_stack(), "shouldn't be called with methods on stack"); >>> 541: // Do the pointer maintenance before releasing the metadata, but not for incomplete methods >> >> I'm confused by what you mean by method holder, which I think of as methodHandle. Or InstanceKlass is the holder of the methods. Maybe this should be more explicit that it's talking about clearing any associated jmethodIDs. > > The method holder is an `InstanceKlass` object which can be retrieved as `method->method_holder()` (I apologize if I am using not completely correct terms - this is what I grokked from the sources). And incomplete methods created by the `ClassParser` from the class data stream will not have the link to that `InstanceKlass` set up if the `ClassParser` is already having its `_klass` field set to a non-null value. > > If we are talking about clearing any jmetbodIDs associated with an `InstanceKlass` instance it is not really possible for old method versions because only the current `InstanceKlass` version has the jmethodID cache associated with it and it contains jmethodIDs pointing to bot the old and current methods. I see, holder is the right word and concept. So the parameter means has_method_holder, in that the InstanceKlass has been fully parsed at the point of clearing the jmethodIDs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1399785853 From coleenp at openjdk.org Mon Nov 20 22:12:14 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 20 Nov 2023 22:12:14 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes In-Reply-To: References: Message-ID: On Thu, 16 Nov 2023 03:11:04 GMT, Jaroslav Bachorik wrote: >> src/hotspot/share/oops/method.cpp line 2277: >> >>> 2275: } >>> 2276: } >>> 2277: >> >> Can this race with redefinition? > > The cleanup of previous versions is executed in VM_Operation at a safepoint - therefore we should be safe against races with class redefinitions. > I am adding an assert to `clear_jmethod_id()` to check for being at a safepoint. Yes, these are cleaned at a safepoint. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1399798259 From evergizova at openjdk.org Mon Nov 20 22:19:50 2023 From: evergizova at openjdk.org (Ekaterina Vergizova) Date: Mon, 20 Nov 2023 22:19:50 GMT Subject: RFR: 8314220: Configurable InlineCacheBuffer size [v3] In-Reply-To: References: Message-ID: > InlineCacheBuffer size is currently hardcoded to 10K. > This can lead to multiple ICBufferFull safepoints for InlineCacheBuffer cleanup and possible performance degradation. > > Added experimental command line option InlineCacheBufferSize with the same default value, allowing it to be configured for performance experiments with ICBufferFull safepoints frequency. Ekaterina Vergizova has updated the pull request incrementally with one additional commit since the last revision: Changed type, removed unnecessary guarantees ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15271/files - new: https://git.openjdk.org/jdk/pull/15271/files/69d603e4..52b0260a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15271&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15271&range=01-02 Stats: 37 lines in 6 files changed: 5 ins; 30 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/15271.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15271/head:pull/15271 PR: https://git.openjdk.org/jdk/pull/15271 From coleenp at openjdk.org Mon Nov 20 22:21:05 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 20 Nov 2023 22:21:05 GMT Subject: RFR: 8320278: ARM32 build is broken after JDK-8301997 In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 20:37:23 GMT, Matias Saavedra Silva wrote: > JDK-8301997 changed the handling of constant pool cache entries for methods and fully removed the ConstantPoolCacheEntry class. This commit included changes to the interpreters for all supported platforms except ARM32, and its omission resulted in a GHA failure. This patch intends to introduce an ARM32 port that reflects the code changes to the included platforms so the ARM32 code can build and thus pass testing. Verified with tier 1-4 tests. This looks good. This should help the arm32 porters get started and fix the compilation errors. I think in the description, you meant to say you ran tier1-4 on Oracle platforms not arm32. src/hotspot/cpu/arm/templateTable_arm.cpp line 2584: > 2582: __ add(Rtemp, Rcache, in_bytes(ResolvedMethodEntry::bytecode1_offset())); > 2583: case f2_byte: > 2584: __ add(Rtemp, Rcache, in_bytes(ResolvedMethodEntry::bytecode2_offset())); this is missing breaks. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16749#pullrequestreview-1740795581 PR Review Comment: https://git.openjdk.org/jdk/pull/16749#discussion_r1399815942 From rgiulietti at openjdk.org Mon Nov 20 22:32:15 2023 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Mon, 20 Nov 2023 22:32:15 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v10] In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: On Mon, 20 Nov 2023 19:35:27 GMT, Roger Riggs wrote: >> Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. >> The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. >> >> The changes include: >> >> - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate >> if the input array is modified before the constructor returns. >> The resulting string may contain any combination of characters sampled from the input array. >> >> - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. >> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. >> If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. >> If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. >> Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. >> Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. >> >> - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. >> >> - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. > > Roger Riggs has updated the pull request incrementally with one additional commit since the last revision: > > undo noise chars Looks good. Maybe use `StringUTF16.coderFromArrayLen()` where suggested in the comments before integrating. src/java.base/share/classes/java/lang/String.java line 359: > 357: if (COMPACT_STRINGS) { > 358: byte[] val = StringUTF16.compress(codePoints, offset, count); > 359: this.coder = (val.length == count) ? LATIN1 : UTF16; Maybe worth using `StringUTF16.coderFromArrayLen()` even here. src/java.base/share/classes/java/lang/String.java line 4845: > 4843: if (COMPACT_STRINGS && asb.maybeLatin1) { > 4844: this.value = StringUTF16.compress(val, 0, length); > 4845: this.coder = (this.value.length == length) ? LATIN1 : UTF16; `StringUTF16.coderFromArrayLen()`? ------------- Marked as reviewed by rgiulietti (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16425#pullrequestreview-1740808291 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1399824712 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1399825374 From pchilanomate at openjdk.org Mon Nov 20 23:08:06 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 20 Nov 2023 23:08:06 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v10] In-Reply-To: References: Message-ID: On Sun, 19 Nov 2023 05:12:50 GMT, Serguei Spitsyn wrote: >> The handshakes support for virtual threads is needed to simplify the JVMTI implementation for virtual threads. There is a significant duplication in the JVMTI code to differentiate code intended to support platform, virtual threads or both. The handshakes are unified, so it is enough to define just one handshake for both platform and virtual thread. >> At the low level, the JVMTI code supporting platform and virtual threads still can be different. >> This implementation is based on the `JvmtiVTMSTransitionDisabler` class. >> >> The internal API includes two new classes: >> - `JvmtiHandshake` and `JvmtiUnifiedHandshakeClosure` >> >> The `JvmtiUnifiedHandshakeClosure` defines two different callback functions: `do_thread()` and `do_vthread()`. >> >> The first JVMTI functions are picked first to be converted to use the `JvmtiHandshake`: >> - `GetStackTrace`, `GetFrameCount`, `GetFrameLocation`, `NotifyFramePop` >> >> To get the test results clean, the update also fixes the test issue: >> [8318631](https://bugs.openjdk.org/browse/JDK-8318631): GetStackTraceSuspendedStressTest.java failed with "check_jvmti_status: JVMTI function returned error: JVMTI_ERROR_THREAD_NOT_ALIVE (15)" >> >> Testing: >> - the mach5 tiers 1-6 are all passed > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: remove java_lang_VirtualThread::NEW check from is_vthread_alive Thanks Serguei, changes look good to me. ------------- Marked as reviewed by pchilanomate (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16460#pullrequestreview-1740848784 From sviswanathan at openjdk.org Mon Nov 20 23:13:08 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 20 Nov 2023 23:13:08 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore In-Reply-To: References: Message-ID: On Fri, 17 Nov 2023 19:58:13 GMT, Volodymyr Paprotski wrote: > Splitting vblendvp[sd] into boolean operations is bit faster on ECore, get up to 30% gain > > > =============== BEFORE =============== > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 77.766 ? 0.049 ns/op > VectorSignum.floatSignum 512 avgt 3 154.889 ? 0.242 ns/op > VectorSignum.floatSignum 1024 avgt 3 306.130 ? 0.605 ns/op > VectorSignum.floatSignum 2048 avgt 3 609.965 ? 0.927 ns/op > VectorSignum.doubleSignum 256 avgt 3 151.874 ? 1.748 ns/op > VectorSignum.doubleSignum 512 avgt 3 303.080 ? 0.310 ns/op > VectorSignum.doubleSignum 1024 avgt 3 607.517 ? 0.597 ns/op > VectorSignum.doubleSignum 2048 avgt 3 1214.282 ? 1.834 ns/op > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op > MaxMinOptimizeTest.dMax avgt 3 137.334 ? 0.128 us/op > MaxMinOptimizeTest.dMin avgt 3 137.160 ? 0.465 us/op > MaxMinOptimizeTest.dMul avgt 3 77.231 ? 0.051 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.165 ? 0.003 us/op > MaxMinOptimizeTest.fMax avgt 3 107.428 ? 1.501 us/op > MaxMinOptimizeTest.fMin avgt 3 107.186 ? 0.022 us/op > MaxMinOptimizeTest.fMul avgt 3 77.164 ? 0.012 us/op > > =============== AFTER =============== > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 61.816 ? 1.980 ns/op > VectorSignum.floatSignum 512 avgt 3 117.251 ? 0.052 ns/op > VectorSignum.floatSignum 1024 avgt 3 231.356 ? 0.397 ns/op > VectorSignum.floatSignum 2048 avgt 3 458.904 ? 0.774 ns/op > VectorSignum.doubleSignum 256 avgt 3 121.449 ? 0.184 ns/op > VectorSignum.doubleSignum 512 avgt 3 241.662 ? 0.189 ns/op > VectorSignum.doubleSignum 1024 avgt 3 482.365 ? 0.165 ns/op > VectorSignum.doubleSignum 2048 avgt 3 962.412 ? 1.401 ns/op > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op > MaxMinOptimizeTest.dMax avgt 3 125.701 ? 0.082 us/op > MaxMinOptimizeTest.dMin avgt 3 124.704 ? 0.119 us/op > MaxMinOptimizeTest.dMul avgt 3 77.232 ? 0.028 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.169 ? 0.103 us/op > MaxMinOptimizeTest.fMax avgt 3 97.939 ? 0.477 us/op > MaxMinOptimizeTest.fMin avgt 3 98.012 ? 0.154 us/op > MaxMinOptimizeTest.fMul avgt 3 77.174 ? 0.012 us/op test/hotspot/jtreg/compiler/vectorization/TestSignumVector.java line 99: > 97: Random rnd = new Random(20); > 98: for(int i = 0 ; i < ARRLEN; i++) { > 99: finp[i] = (i-ARRLEN/2)*(float)rnd.nextDouble(); We could use rnd.nextFloat() here directly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1399841056 From evergizova at openjdk.org Mon Nov 20 23:22:05 2023 From: evergizova at openjdk.org (Ekaterina Vergizova) Date: Mon, 20 Nov 2023 23:22:05 GMT Subject: RFR: 8314220: Configurable InlineCacheBuffer size [v2] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 22:46:08 GMT, Dean Long wrote: >> OK, can I remove these guarantees for _buffer_size and _buffer_limit in this PR? Or do I need to create a separate one? > > I would like to see it cleaned up in this PR. I changed the InlineCacheBufferSize type to size_t and removed unnecessary guarantees for _buffer_size and _buffer_limit. @dean-long could you please take a look? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15271#discussion_r1399857380 From evergizova at openjdk.org Mon Nov 20 23:22:06 2023 From: evergizova at openjdk.org (Ekaterina Vergizova) Date: Mon, 20 Nov 2023 23:22:06 GMT Subject: RFR: 8314220: Configurable InlineCacheBuffer size [v3] In-Reply-To: References: <5X84H-etGQ-RDan1RTnnZVmXujoo6aleWopu3Hl_J0k=.d82350d3-be39-4acb-accb-ddcb7a8a6fa4@github.com> Message-ID: On Tue, 14 Nov 2023 03:26:44 GMT, Dean Long wrote: >> Thanks @dean-long. >> I would like to keep this enhancement simple and minimal so that it can be backported to 17 and 11. >> So I'd like to avoid changes to StubQueue. I can change the type of InlineCacheBufferSize to size_t and add checked_cast to StubQueue constructor in InlineCacheBuffer::initialize(): >> _buffer = new StubQueue(new ICStubInterface, checked_cast(InlineCacheBufferSize), InlineCacheBuffer_lock, "InlineCacheBuffer"); >> >> Because in any case InlineCacheBufferSize can't be greater than INT_MAX: >> `InlineCacheBufferSize < NonNMethodCodeHeapSize < ReservedCodeCacheSize < CODE_CACHE_DEFAULT_LIMIT = 2G`: >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/codeCache.cpp#L191 >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/compilerDefinitions.cpp#L492 >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/globalDefinitions.hpp#L589 >> >> Will that be OK? > > OK. done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15271#discussion_r1399858296 From dlong at openjdk.org Mon Nov 20 23:51:08 2023 From: dlong at openjdk.org (Dean Long) Date: Mon, 20 Nov 2023 23:51:08 GMT Subject: RFR: 8314220: Configurable InlineCacheBuffer size [v3] In-Reply-To: References: Message-ID: <0gAR9ceE976BaJHI01b5NyeCvYEebD3Qys4lp-B6CD8=.860b622a-26a6-4d40-8cbc-e0656b9db715@github.com> On Mon, 20 Nov 2023 22:19:50 GMT, Ekaterina Vergizova wrote: >> InlineCacheBuffer size is currently hardcoded to 10K. >> This can lead to multiple ICBufferFull safepoints for InlineCacheBuffer cleanup and possible performance degradation. >> >> Added experimental command line option InlineCacheBufferSize with the same default value, allowing it to be configured for performance experiments with ICBufferFull safepoints frequency. > > Ekaterina Vergizova has updated the pull request incrementally with one additional commit since the last revision: > > Changed type, removed unnecessary guarantees Marked as reviewed by dlong (Reviewer). This looks good. There is still potentially over-alignment for the data header, but that's an existing problem: 65 // ICStub_from_destination_address looks up Stub* address from code entry address, 66 // which unfortunately means the stub head should be at the same alignment as the code. 67 static int alignment() { return CodeEntryAlignment; } I suggest you get another review before integrating. ------------- PR Review: https://git.openjdk.org/jdk/pull/15271#pullrequestreview-1740880822 PR Comment: https://git.openjdk.org/jdk/pull/15271#issuecomment-1819987260 From fyang at openjdk.org Tue Nov 21 00:30:06 2023 From: fyang at openjdk.org (Fei Yang) Date: Tue, 21 Nov 2023 00:30:06 GMT Subject: RFR: 8318159: RISC-V: Improve itable_stub [v2] In-Reply-To: References: Message-ID: <0gh1kM5afge9ITZZJwDQzW2C2B_y35t-Qlm8O90Ocnw=.c6b631d0-cf42-42f9-9bb6-f9ee0cb06a10@github.com> On Mon, 20 Nov 2023 16:20:52 GMT, Yuri Gaevsky wrote: >> Please review the change for RISC-V similar to #13792(AARCH64) and #13460(X86). >> >> From #13792: >> The change replaces two separate iterations over the itable with new algorithm >> consisting of two loops. First, we look for a match with resolved_klass, >> checking for a match with holder_klass along the way. Then we continue iterating >> (not starting over) the itable using the second loop, checking only for a match >> with holder_klass. >> >> ### Correctness checks >> >> Testing: tier1 tests successfully passed on HiFive Unmatched board. >> >> #### Performance results on RISC-V StarFive JH7110 board: >> >> >> InterfaceCalls: before fix after fix >> ------------------------------------------------------------------- >> Benchmark Mode Cnt Score Error Score Error Units >> ------------------------------------------------------------------- >> test1stInt2Types avgt 100 14.380 ? 0.017 | 14.370 ? 0.014 ns/op >> test1stInt3Types avgt 100 72.724 ? 0.552 | 66.290 ? 0.080 ns/op >> test1stInt5Types avgt 100 73.948 ? 0.524 | 68.781 ? 0.377 ns/op >> test2ndInt2Types avgt 100 15.705 ? 0.016 | 15.707 ? 0.018 ns/op >> test2ndInt3Types avgt 100 82.370 ? 0.453 | 75.363 ? 0.156 ns/op >> test2ndInt5Types avgt 100 85.266 ? 0.466 | 80.969 ? 0.752 ns/op >> testIfaceCall avgt 100 75.684 ? 0.648 | 72.603 ? 0.460 ns/op >> testIfaceExtCall avgt 100 86.293 ? 0.567 | 77.939 ? 0.340 ns/op >> testMonomorphic avgt 100 11.357 ? 0.007 | 11.359 ? 0.009 ns/op >> ------------------------------------------------------------------- >> >> >> #### Performance results on RISC-V HiFive Unmatched board: >> >> >> InterfaceCalls: before fix after fix >> --------------------------------------------------------------------- >> Benchmark Mode Cnt Score Error Score Error Units >> --------------------------------------------------------------------- >> test1stInt2Types avgt 100 24.432 ? 1.811 | 23.205 ? 1.512 ns/op >> test1stInt3Types avgt 100 135.800 ? 3.991 | 127.112 ? 2.299 ns/op >> test1stInt5Types avgt 100 141.746 ? 4.272 | 136.069 ? 4.919 ns/op >> test2ndInt2Types avgt 100 31.474 ? 2.468 | 26.978 ? 1.951 ns/op >> test2ndInt3Types avgt 100 146.410 ? 3.575 | 139.443 ? 3.677 ns/op >> test2ndInt5Types avgt 100 156.083 ? 3.617 | 150.583 ? 2.909 ns/op >> testIfaceCall avgt 100 136.392 ? 2.546 | 129.632 ? 1.662 ns/op >> testIfaceExtCall avgt 100 155.602 ? 3.836 | 138.058 ... > > Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: > > Addressed review comments from @RealFYang and @robehn. Updated change looks good. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16657#pullrequestreview-1740910077 From kvn at openjdk.org Tue Nov 21 00:35:12 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 21 Nov 2023 00:35:12 GMT Subject: RFR: 8314220: Configurable InlineCacheBuffer size [v3] In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 22:19:50 GMT, Ekaterina Vergizova wrote: >> InlineCacheBuffer size is currently hardcoded to 10K. >> This can lead to multiple ICBufferFull safepoints for InlineCacheBuffer cleanup and possible performance degradation. >> >> Added experimental command line option InlineCacheBufferSize with the same default value, allowing it to be configured for performance experiments with ICBufferFull safepoints frequency. > > Ekaterina Vergizova has updated the pull request incrementally with one additional commit since the last revision: > > Changed type, removed unnecessary guarantees src/hotspot/share/code/stubs.cpp line 221: > 219: // verify alignment > 220: guarantee(_buffer_size % stub_alignment() == 0, "_buffer_size not aligned"); > 221: guarantee(_buffer_limit % stub_alignment() == 0, "_buffer_limit not aligned"); Why these were removed? src/hotspot/share/compiler/compilerDefinitions.cpp line 503: > 501: jio_fprintf(defaultStream::error_stream(), > 502: "Invalid InlineCacheBufferSize=" SIZE_FORMAT "K. Must be less than NonNMethodCodeHeapSize=" SIZE_FORMAT "K.\n", > 503: InlineCacheBufferSize/K, NonNMethodCodeHeapSize/K); You need to check for alignment of the value. In [StubQueue()](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/stubs.cpp#L70) it is aligned up by `2*BytesPerWord` so the final value could be > `InlineCacheBufferSize`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15271#discussion_r1399891240 PR Review Comment: https://git.openjdk.org/jdk/pull/15271#discussion_r1399896133 From duke at openjdk.org Tue Nov 21 00:37:23 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 21 Nov 2023 00:37:23 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v2] In-Reply-To: References: Message-ID: > Splitting vblendvp[sd] into boolean operations is bit faster on ECore, get up to 30% gain > > > =============== BEFORE =============== > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 77.766 ? 0.049 ns/op > VectorSignum.floatSignum 512 avgt 3 154.889 ? 0.242 ns/op > VectorSignum.floatSignum 1024 avgt 3 306.130 ? 0.605 ns/op > VectorSignum.floatSignum 2048 avgt 3 609.965 ? 0.927 ns/op > VectorSignum.doubleSignum 256 avgt 3 151.874 ? 1.748 ns/op > VectorSignum.doubleSignum 512 avgt 3 303.080 ? 0.310 ns/op > VectorSignum.doubleSignum 1024 avgt 3 607.517 ? 0.597 ns/op > VectorSignum.doubleSignum 2048 avgt 3 1214.282 ? 1.834 ns/op > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op > MaxMinOptimizeTest.dMax avgt 3 137.334 ? 0.128 us/op > MaxMinOptimizeTest.dMin avgt 3 137.160 ? 0.465 us/op > MaxMinOptimizeTest.dMul avgt 3 77.231 ? 0.051 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.165 ? 0.003 us/op > MaxMinOptimizeTest.fMax avgt 3 107.428 ? 1.501 us/op > MaxMinOptimizeTest.fMin avgt 3 107.186 ? 0.022 us/op > MaxMinOptimizeTest.fMul avgt 3 77.164 ? 0.012 us/op > > =============== AFTER =============== > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 61.816 ? 1.980 ns/op > VectorSignum.floatSignum 512 avgt 3 117.251 ? 0.052 ns/op > VectorSignum.floatSignum 1024 avgt 3 231.356 ? 0.397 ns/op > VectorSignum.floatSignum 2048 avgt 3 458.904 ? 0.774 ns/op > VectorSignum.doubleSignum 256 avgt 3 121.449 ? 0.184 ns/op > VectorSignum.doubleSignum 512 avgt 3 241.662 ? 0.189 ns/op > VectorSignum.doubleSignum 1024 avgt 3 482.365 ? 0.165 ns/op > VectorSignum.doubleSignum 2048 avgt 3 962.412 ? 1.401 ns/op > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op > MaxMinOptimizeTest.dMax avgt 3 125.701 ? 0.082 us/op > MaxMinOptimizeTest.dMin avgt 3 124.704 ? 0.119 us/op > MaxMinOptimizeTest.dMul avgt 3 77.232 ? 0.028 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.169 ? 0.103 us/op > MaxMinOptimizeTest.fMax avgt 3 97.939 ? 0.477 us/op > MaxMinOptimizeTest.fMin avgt 3 98.012 ? 0.154 us/op > MaxMinOptimizeTest.fMul avgt 3 77.174 ? 0.012 us/op Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge remote-tracking branch 'jdk/master' into vp-ecore2 - review comments - emulate vblend on ecores ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16716/files - new: https://git.openjdk.org/jdk/pull/16716/files/8b7da454..74c68fe6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16716&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16716&range=00-01 Stats: 7164 lines in 178 files changed: 3541 ins; 1055 del; 2568 mod Patch: https://git.openjdk.org/jdk/pull/16716.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16716/head:pull/16716 PR: https://git.openjdk.org/jdk/pull/16716 From amenkov at openjdk.org Tue Nov 21 01:28:10 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Tue, 21 Nov 2023 01:28:10 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v10] In-Reply-To: References: Message-ID: On Sun, 19 Nov 2023 05:12:50 GMT, Serguei Spitsyn wrote: >> The handshakes support for virtual threads is needed to simplify the JVMTI implementation for virtual threads. There is a significant duplication in the JVMTI code to differentiate code intended to support platform, virtual threads or both. The handshakes are unified, so it is enough to define just one handshake for both platform and virtual thread. >> At the low level, the JVMTI code supporting platform and virtual threads still can be different. >> This implementation is based on the `JvmtiVTMSTransitionDisabler` class. >> >> The internal API includes two new classes: >> - `JvmtiHandshake` and `JvmtiUnifiedHandshakeClosure` >> >> The `JvmtiUnifiedHandshakeClosure` defines two different callback functions: `do_thread()` and `do_vthread()`. >> >> The first JVMTI functions are picked first to be converted to use the `JvmtiHandshake`: >> - `GetStackTrace`, `GetFrameCount`, `GetFrameLocation`, `NotifyFramePop` >> >> To get the test results clean, the update also fixes the test issue: >> [8318631](https://bugs.openjdk.org/browse/JDK-8318631): GetStackTraceSuspendedStressTest.java failed with "check_jvmti_status: JVMTI function returned error: JVMTI_ERROR_THREAD_NOT_ALIVE (15)" >> >> Testing: >> - the mach5 tiers 1-6 are all passed > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: remove java_lang_VirtualThread::NEW check from is_vthread_alive Marked as reviewed by amenkov (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16460#pullrequestreview-1740951932 From dlong at openjdk.org Tue Nov 21 01:31:11 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 21 Nov 2023 01:31:11 GMT Subject: RFR: 8314220: Configurable InlineCacheBuffer size [v3] In-Reply-To: References: Message-ID: On Tue, 21 Nov 2023 00:22:02 GMT, Vladimir Kozlov wrote: >> Ekaterina Vergizova has updated the pull request incrementally with one additional commit since the last revision: >> >> Changed type, removed unnecessary guarantees > > src/hotspot/share/code/stubs.cpp line 221: > >> 219: // verify alignment >> 220: guarantee(_buffer_size % stub_alignment() == 0, "_buffer_size not aligned"); >> 221: guarantee(_buffer_limit % stub_alignment() == 0, "_buffer_limit not aligned"); > > Why these were removed? I suggested to remove it, because it shouldn't be necessary to align the end of the code. Only the data header and code start should need alignment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15271#discussion_r1399925678 From dlong at openjdk.org Tue Nov 21 01:43:11 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 21 Nov 2023 01:43:11 GMT Subject: RFR: 8314220: Configurable InlineCacheBuffer size [v3] In-Reply-To: References: Message-ID: On Tue, 21 Nov 2023 00:31:49 GMT, Vladimir Kozlov wrote: >> Ekaterina Vergizova has updated the pull request incrementally with one additional commit since the last revision: >> >> Changed type, removed unnecessary guarantees > > src/hotspot/share/compiler/compilerDefinitions.cpp line 503: > >> 501: jio_fprintf(defaultStream::error_stream(), >> 502: "Invalid InlineCacheBufferSize=" SIZE_FORMAT "K. Must be less than NonNMethodCodeHeapSize=" SIZE_FORMAT "K.\n", >> 503: InlineCacheBufferSize/K, NonNMethodCodeHeapSize/K); > > You need to check for alignment of the value. In [StubQueue()](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/stubs.cpp#L70) it is aligned up by `2*BytesPerWord` so the final value could be > `InlineCacheBufferSize`. I think the align up to 2*BytesPerWord is not really need, because BufferBlob::create already does its own alignment which will make the final value > InlineCacheBufferSize. BufferBlob::create uses the size as a minimum, not a maximum. I don't think the above check should need to know the details of BufferBlob::create and StubQueue() alignment adjustments. Having InlineCacheBufferSize near to NonNMethodCodeHeapSize is going the make the JVM fail in startup for other reasons, isn't it? Maybe the max for InlineCacheBufferSize should be NonNMethodCodeHeapSize/2? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15271#discussion_r1399931830 From jjoo at openjdk.org Tue Nov 21 02:19:47 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Tue, 21 Nov 2023 02:19:47 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v45] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Address comments and refactor TTTC class for simplification ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/ce7dbfcf..17a8eaf3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=44 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=43-44 Stats: 97 lines in 16 files changed: 11 ins; 28 del; 58 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From jjoo at openjdk.org Tue Nov 21 02:19:48 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Tue, 21 Nov 2023 02:19:48 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v44] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 23:50:02 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Fix whitespace Addressed comments, but am running into an assertion failure here when building: https://github.com/openjdk/jdk/pull/15082/files#diff-d1c5f7a125171a3828bdb3f4488327e03ceb96ff615eacfc9df406aca408cbe6R72 Will investigate tomorrow. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15082#issuecomment-1820104586 From sviswanathan at openjdk.org Tue Nov 21 02:55:06 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 21 Nov 2023 02:55:06 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v2] In-Reply-To: References: Message-ID: On Tue, 21 Nov 2023 00:37:23 GMT, Volodymyr Paprotski wrote: >> Splitting vblendvp[sd] into boolean operations is bit faster on ECore, get up to 30% gain >> >> >> =============== BEFORE =============== >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorSignum.floatSignum 256 avgt 3 77.766 ? 0.049 ns/op >> VectorSignum.floatSignum 512 avgt 3 154.889 ? 0.242 ns/op >> VectorSignum.floatSignum 1024 avgt 3 306.130 ? 0.605 ns/op >> VectorSignum.floatSignum 2048 avgt 3 609.965 ? 0.927 ns/op >> VectorSignum.doubleSignum 256 avgt 3 151.874 ? 1.748 ns/op >> VectorSignum.doubleSignum 512 avgt 3 303.080 ? 0.310 ns/op >> VectorSignum.doubleSignum 1024 avgt 3 607.517 ? 0.597 ns/op >> VectorSignum.doubleSignum 2048 avgt 3 1214.282 ? 1.834 ns/op >> Benchmark Mode Cnt Score Error Units >> MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op >> MaxMinOptimizeTest.dMax avgt 3 137.334 ? 0.128 us/op >> MaxMinOptimizeTest.dMin avgt 3 137.160 ? 0.465 us/op >> MaxMinOptimizeTest.dMul avgt 3 77.231 ? 0.051 us/op >> MaxMinOptimizeTest.fAdd avgt 3 77.165 ? 0.003 us/op >> MaxMinOptimizeTest.fMax avgt 3 107.428 ? 1.501 us/op >> MaxMinOptimizeTest.fMin avgt 3 107.186 ? 0.022 us/op >> MaxMinOptimizeTest.fMul avgt 3 77.164 ? 0.012 us/op >> >> =============== AFTER =============== >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorSignum.floatSignum 256 avgt 3 61.816 ? 1.980 ns/op >> VectorSignum.floatSignum 512 avgt 3 117.251 ? 0.052 ns/op >> VectorSignum.floatSignum 1024 avgt 3 231.356 ? 0.397 ns/op >> VectorSignum.floatSignum 2048 avgt 3 458.904 ? 0.774 ns/op >> VectorSignum.doubleSignum 256 avgt 3 121.449 ? 0.184 ns/op >> VectorSignum.doubleSignum 512 avgt 3 241.662 ? 0.189 ns/op >> VectorSignum.doubleSignum 1024 avgt 3 482.365 ? 0.165 ns/op >> VectorSignum.doubleSignum 2048 avgt 3 962.412 ? 1.401 ns/op >> Benchmark Mode Cnt Score Error Units >> MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op >> MaxMinOptimizeTest.dMax avgt 3 125.701 ? 0.082 us/op >> MaxMinOptimizeTest.dMin avgt 3 124.704 ? 0.119 us/op >> MaxMinOptimizeTest.dMul avgt 3 77.232 ? 0.028 us/op >> MaxMinOptimizeTest.fAdd avgt 3 77.169 ? 0.103 us/op >> MaxMinOptimizeTest.fMax avgt 3 97.939 ? 0.477 us/op >> MaxMinOptimizeTest.fMin avgt 3 98.012 ? 0.154 us/op >> MaxMinO... > > Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge remote-tracking branch 'jdk/master' into vp-ecore2 > - review comments > - emulate vblend on ecores The PR looks good to me now, thanks. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16716#pullrequestreview-1741014065 From kvn at openjdk.org Tue Nov 21 03:46:09 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 21 Nov 2023 03:46:09 GMT Subject: RFR: 8314220: Configurable InlineCacheBuffer size [v3] In-Reply-To: References: Message-ID: On Tue, 21 Nov 2023 01:40:26 GMT, Dean Long wrote: >> src/hotspot/share/compiler/compilerDefinitions.cpp line 503: >> >>> 501: jio_fprintf(defaultStream::error_stream(), >>> 502: "Invalid InlineCacheBufferSize=" SIZE_FORMAT "K. Must be less than NonNMethodCodeHeapSize=" SIZE_FORMAT "K.\n", >>> 503: InlineCacheBufferSize/K, NonNMethodCodeHeapSize/K); >> >> You need to check for alignment of the value. In [StubQueue()](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/stubs.cpp#L70) it is aligned up by `2*BytesPerWord` so the final value could be > `InlineCacheBufferSize`. > > I think the align up to 2*BytesPerWord is not really need, because BufferBlob::create already does its own alignment which will make the final value > InlineCacheBufferSize. BufferBlob::create uses the size as a minimum, not a maximum. > I don't think the above check should need to know the details of BufferBlob::create and StubQueue() alignment adjustments. Having InlineCacheBufferSize near to NonNMethodCodeHeapSize is going the make the JVM fail in startup for other reasons, isn't it? Maybe the max for InlineCacheBufferSize should be NonNMethodCodeHeapSize/2? Thank you for answering my questions, Dean. Yes, non-nmethod section contains template Interpreter code, stubs and adapters. So it will fail immediately if no space left for it. NonNMethodCodeHeapSize/2 is reasonable limit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15271#discussion_r1399992871 From dholmes at openjdk.org Tue Nov 21 05:24:33 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 21 Nov 2023 05:24:33 GMT Subject: RFR: 8318776: Require supports_cx8 to always be true [v4] In-Reply-To: References: Message-ID: > As discussed in JBS all platforms (some tweaks to Zero are in progress) actually do support `cx8` i.e. 64-bit compare-and-exchange, so we can strip out the locked-based alternatives to using it and just add a guarantee that it is true at runtime. And all platforms except some ARM variants set `SUPPORTS_NATIVE_CX8`, so we can greatly simplify things. Summary of changes: > - `_supports_cx8` field is only needed when `SUPPORTS_NATIVE_CX8` is not defined > - Assertions for `supports_cx8()` are removed > - Compiler predicates requiring `supports_cx8()` are removed > - Access backend is greatly simplified without the need for lock-based alternative > - `java.util.concurrent.AtomicLongFieldUpdater` is simplified without the need for a lock-based alternative > > I did consider moving all the ARM `kuser_helper` related code to be only defined when `SUPPORTS_NATIVE_CX8` is not defined, but there was a theoretical risk this could change the behaviour if ARMv7 binaries were run on other ARM CPU's. I added a note to that effect in vm_version_linux_arm32.cpp so the ARM port maintainers could clean this up further if desired. > > Testing: > - All Oracle tiers 1-5 builds (which includes an ARMv7 build) > - GHA builds/tests > - Oracle tiers 1-3 sanity testing > > Zero changes coming in via [JDK-8319777](https://bugs.openjdk.org/browse/JDK-8319777) will be merged when they arrive. > > Thanks. David Holmes has updated the pull request incrementally with one additional commit since the last revision: Remove unnecessary includes of vm_version.hpp. Fix copyright years. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16625/files - new: https://git.openjdk.org/jdk/pull/16625/files/65871144..597cef53 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16625&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16625&range=02-03 Stats: 4 lines in 4 files changed: 0 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16625.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16625/head:pull/16625 PR: https://git.openjdk.org/jdk/pull/16625 From eosterlund at openjdk.org Tue Nov 21 06:06:11 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 21 Nov 2023 06:06:11 GMT Subject: RFR: 8318776: Require supports_cx8 to always be true [v4] In-Reply-To: References: Message-ID: On Tue, 21 Nov 2023 05:24:33 GMT, David Holmes wrote: >> As discussed in JBS all platforms (some tweaks to Zero are in progress) actually do support `cx8` i.e. 64-bit compare-and-exchange, so we can strip out the locked-based alternatives to using it and just add a guarantee that it is true at runtime. And all platforms except some ARM variants set `SUPPORTS_NATIVE_CX8`, so we can greatly simplify things. Summary of changes: >> - `_supports_cx8` field is only needed when `SUPPORTS_NATIVE_CX8` is not defined >> - Assertions for `supports_cx8()` are removed >> - Compiler predicates requiring `supports_cx8()` are removed >> - Access backend is greatly simplified without the need for lock-based alternative >> - `java.util.concurrent.AtomicLongFieldUpdater` is simplified without the need for a lock-based alternative >> >> I did consider moving all the ARM `kuser_helper` related code to be only defined when `SUPPORTS_NATIVE_CX8` is not defined, but there was a theoretical risk this could change the behaviour if ARMv7 binaries were run on other ARM CPU's. I added a note to that effect in vm_version_linux_arm32.cpp so the ARM port maintainers could clean this up further if desired. >> >> Testing: >> - All Oracle tiers 1-5 builds (which includes an ARMv7 build) >> - GHA builds/tests >> - Oracle tiers 1-3 sanity testing >> >> Zero changes coming in via [JDK-8319777](https://bugs.openjdk.org/browse/JDK-8319777) will be merged when they arrive. >> >> Thanks. > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Remove unnecessary includes of vm_version.hpp. > Fix copyright years. This looks great! ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16625#pullrequestreview-1741175703 From stuefe at openjdk.org Tue Nov 21 06:15:05 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 21 Nov 2023 06:15:05 GMT Subject: RFR: 8320278: ARM32 build is broken after JDK-8301997 In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 20:37:23 GMT, Matias Saavedra Silva wrote: > JDK-8301997 changed the handling of constant pool cache entries for methods and fully removed the ConstantPoolCacheEntry class. This commit included changes to the interpreters for all supported platforms except ARM32, and its omission resulted in a GHA failure. This patch intends to introduce an ARM32 port that reflects the code changes to the included platforms so the ARM32 code can build and thus pass testing. Verified with tier 1-5 tests on Oracle platforms but not ARM32. Great, thanks for fixing. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16749#pullrequestreview-1741183793 From rehn at openjdk.org Tue Nov 21 06:37:16 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 21 Nov 2023 06:37:16 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v6] In-Reply-To: References: <9a-07LkAZPw_SiOpo1uO8uA9i66AVNi9OkJjNZfSHU0=.8f4feee9-2b76-4006-a5ca-f0ce2d71c6b0@github.com> Message-ID: <8q21gGFk_v-6t9t7idoVkMzWohFSG-GZV_3GerYJQjI=.0ea809d9-b50f-4282-9cc3-354db88736a4@github.com> On Mon, 20 Nov 2023 21:57:27 GMT, Patricio Chilano Mateo wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: >> >> - Drop the Linux check in preparation for integration >> - Merge branch 'master' into JDK-8318986-generic-wait-barrier >> - Merge branch 'master' into JDK-8318986-generic-wait-barrier >> - Rework paddings >> - Encode barrier tag into state, resolving another race condition >> - Simple review feedback fixes: tracking wakeup numbers, reflowing some methods >> - Merge branch 'master' into JDK-8318986-generic-wait-barrier >> - Touchups >> - More comments work >> - Tight up the comments >> - ... and 3 more: https://git.openjdk.org/jdk/compare/73f4ff78...191c0dbb > > src/hotspot/share/utilities/waitBarrier_generic.cpp line 191: > >> 189: break; >> 190: } >> 191: sp.wait(); > > Do we really need this SpinYield wait() here? I would expect failing that CAS is rare and a retry should work. I don't think it is hurting, and some platform do not have a CAS, so Atomic::cmpxchg may be a load-reserve conditional store (risc-v). Which have the possibility of all failed with the cmpxchg. These are the default, so we do spin 4096 times before we yield. (then yields 64 times before it sleeps) static const uint default_spin_limit = 4096; static const uint default_yield_limit = 64; static const uint default_sleep_ns = 1000; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16404#discussion_r1400085009 From rehn at openjdk.org Tue Nov 21 07:07:08 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 21 Nov 2023 07:07:08 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v6] In-Reply-To: <8q21gGFk_v-6t9t7idoVkMzWohFSG-GZV_3GerYJQjI=.0ea809d9-b50f-4282-9cc3-354db88736a4@github.com> References: <9a-07LkAZPw_SiOpo1uO8uA9i66AVNi9OkJjNZfSHU0=.8f4feee9-2b76-4006-a5ca-f0ce2d71c6b0@github.com> <8q21gGFk_v-6t9t7idoVkMzWohFSG-GZV_3GerYJQjI=.0ea809d9-b50f-4282-9cc3-354db88736a4@github.com> Message-ID: On Tue, 21 Nov 2023 06:34:01 GMT, Robbin Ehn wrote: >> src/hotspot/share/utilities/waitBarrier_generic.cpp line 191: >> >>> 189: break; >>> 190: } >>> 191: sp.wait(); >> >> Do we really need this SpinYield wait() here? I would expect failing that CAS is rare and a retry should work. > > If we fail we need to reload _state, when the other CPU just invalidated that cache-line. > Then a spin-pause just before would actually be bad, but since it so rarely happens it doesn't matter. > > But some platform do not have a CAS, so Atomic::cmpxchg may be a load-reserve store-conditional (risc-v). Which have the (at least theoretical) possibility of all failed with the cmpxchg. > As LR/SC is a bit unpredictable, and there are a number of hw vendors, I think it's good to have this just in case. https://lore.kernel.org/all/20230910082911.3378782-10-guoren at kernel.org/ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16404#discussion_r1400108007 From sspitsyn at openjdk.org Tue Nov 21 08:20:19 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 21 Nov 2023 08:20:19 GMT Subject: RFR: 8319244: implement JVMTI handshakes support for virtual threads [v10] In-Reply-To: References: Message-ID: On Sun, 19 Nov 2023 05:12:50 GMT, Serguei Spitsyn wrote: >> The handshakes support for virtual threads is needed to simplify the JVMTI implementation for virtual threads. There is a significant duplication in the JVMTI code to differentiate code intended to support platform, virtual threads or both. The handshakes are unified, so it is enough to define just one handshake for both platform and virtual thread. >> At the low level, the JVMTI code supporting platform and virtual threads still can be different. >> This implementation is based on the `JvmtiVTMSTransitionDisabler` class. >> >> The internal API includes two new classes: >> - `JvmtiHandshake` and `JvmtiUnifiedHandshakeClosure` >> >> The `JvmtiUnifiedHandshakeClosure` defines two different callback functions: `do_thread()` and `do_vthread()`. >> >> The first JVMTI functions are picked first to be converted to use the `JvmtiHandshake`: >> - `GetStackTrace`, `GetFrameCount`, `GetFrameLocation`, `NotifyFramePop` >> >> To get the test results clean, the update also fixes the test issue: >> [8318631](https://bugs.openjdk.org/browse/JDK-8318631): GetStackTraceSuspendedStressTest.java failed with "check_jvmti_status: JVMTI function returned error: JVMTI_ERROR_THREAD_NOT_ALIVE (15)" >> >> Testing: >> - the mach5 tiers 1-6 are all passed > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: remove java_lang_VirtualThread::NEW check from is_vthread_alive Patricio and Alex, thank you a lot for review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16460#issuecomment-1820429682 From sspitsyn at openjdk.org Tue Nov 21 08:20:20 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 21 Nov 2023 08:20:20 GMT Subject: Integrated: 8319244: implement JVMTI handshakes support for virtual threads In-Reply-To: References: Message-ID: On Wed, 1 Nov 2023 18:44:04 GMT, Serguei Spitsyn wrote: > The handshakes support for virtual threads is needed to simplify the JVMTI implementation for virtual threads. There is a significant duplication in the JVMTI code to differentiate code intended to support platform, virtual threads or both. The handshakes are unified, so it is enough to define just one handshake for both platform and virtual thread. > At the low level, the JVMTI code supporting platform and virtual threads still can be different. > This implementation is based on the `JvmtiVTMSTransitionDisabler` class. > > The internal API includes two new classes: > - `JvmtiHandshake` and `JvmtiUnifiedHandshakeClosure` > > The `JvmtiUnifiedHandshakeClosure` defines two different callback functions: `do_thread()` and `do_vthread()`. > > The first JVMTI functions are picked first to be converted to use the `JvmtiHandshake`: > - `GetStackTrace`, `GetFrameCount`, `GetFrameLocation`, `NotifyFramePop` > > To get the test results clean, the update also fixes the test issue: > [8318631](https://bugs.openjdk.org/browse/JDK-8318631): GetStackTraceSuspendedStressTest.java failed with "check_jvmti_status: JVMTI function returned error: JVMTI_ERROR_THREAD_NOT_ALIVE (15)" > > Testing: > - the mach5 tiers 1-6 are all passed This pull request has now been integrated. Changeset: 839dd653 Author: Serguei Spitsyn URL: https://git.openjdk.org/jdk/commit/839dd653663867f770fbe4af0a57468675eb12db Stats: 498 lines in 4 files changed: 138 ins; 334 del; 26 mod 8319244: implement JVMTI handshakes support for virtual threads Reviewed-by: pchilanomate, amenkov ------------- PR: https://git.openjdk.org/jdk/pull/16460 From fyang at openjdk.org Tue Nov 21 08:26:08 2023 From: fyang at openjdk.org (Fei Yang) Date: Tue, 21 Nov 2023 08:26:08 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 16:00:43 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 3769: >> >>> 3767: __ vslidedown_vi(v16, v27, 2); // v16 = {_,_,e,f} >>> 3768: // Merge elements [3..2] of v26 ({a,b}) into elements [3..2] of v16 >>> 3769: __ vmerge_vvm(v16, v26, v16); // v16 = {a,b,e,f} >> >> I see the openssl version makes use of index-load to get {f,e,b,a},{h,g,d,c} pre-loop and index-store to put {f,e,b,a},{h,g,d,c} back to {a,b,c,d},{e,f,g,h} post-loop, which is much simpler than this code. Please consider. >> >> [1] https://github.com/openssl/openssl/blob/master/crypto/sha/asm/sha256-riscv64-zvkb-zvknha_or_zvknhb.pl#L124-L142 > > The vsetivli is often expensive:ish, the code in openssl sets it five times before reaching first round. > That don't seem like a good idea, now vsetivli make the code much easier to read yes... > > I guess I need to check numbers for that also.. :) Yeah. Why not consider something more simpler if there is no known big difference on performance numbers? And this is the first version when RVV-1.0 compatible hardwares are not popular yet :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1400183278 From rrich at openjdk.org Tue Nov 21 08:27:07 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 21 Nov 2023 08:27:07 GMT Subject: RFR: 8320418: PPC64: invokevfinal_helper duplicates code to handle ResolvedMethodEntry In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 19:48:26 GMT, Martin Doerr wrote: >> src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 3536: >> >>> 3534: Rrecv = Rscratch2; >>> 3535: __ ld(Rnum_params, in_bytes(Method::const_offset()), Rmethod); >>> 3536: __ lhz(Rnum_params /* number of params */, in_bytes(ConstMethod::size_of_parameters_offset()), Rnum_params); >> >> [`ConstMethod::_size_of_parameters`](https://github.com/openjdk/jdk/blob/0712b22a3ae7075304e5925365429e1d85bd173c/src/hotspot/share/oops/constMethod.hpp#L208) is the size of the parameter block in words. `prepare_invoke` uses `ResolvedMethodEntry::_number_of_parameters` which is `Number of arguments for method`. I'd expect the location of the receiver to depend on the size of the parameters and not their number. How does this work? > > One is a copy of the other. See usages of `method_entry->fill_in` in src/hotspot/share/oops/cpCache.cpp. E.g. > `method_entry->fill_in((u1)as_TosState(method->result_type()), (u2)method()->size_of_parameters());` > Scaling happens in `load_receiver`. Thanks for looking it up for me :) So `ResolvedMethodEntry::_number_of_parameters` is a misnomer or am I missing something? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16741#discussion_r1400184083 From fyang at openjdk.org Tue Nov 21 08:35:08 2023 From: fyang at openjdk.org (Fei Yang) Date: Tue, 21 Nov 2023 08:35:08 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 In-Reply-To: <9x5sC6aXWG2OUYXdS97o-fJgjhNODf-mVC69bQNSSjI=.6425f2fc-d793-4b49-bf97-1ea55d0fd443@github.com> References: <9x5sC6aXWG2OUYXdS97o-fJgjhNODf-mVC69bQNSSjI=.6425f2fc-d793-4b49-bf97-1ea55d0fd443@github.com> Message-ID: On Mon, 20 Nov 2023 15:29:37 GMT, Robbin Ehn wrote: > Depending on hardware pipeline depth this load can actually be executed after "__ vadd_vv(v14, v15, v10);" thus that instruction maybe already be retired when reaching round 1. > > Preloading these, depending on the number of V-load ports, the preloading it self can be very costly as they can't be executed out-of-order in parallel. Make sense. I was expecting those to retire when reaching the first round (round0). > So hiding the load in previous round can be faster, therefore my fast conclusion without numbers was at least for single pass no preloading _should_ be better on bigger hardware. But I see that there is a true data dependence on the vector load for each round. Any thing I missed? Say, for round2: // Quad-round 2 (+2, v12->v13->v10->v11) __ vl1re32_v(v15, consts); ----> Define v15 __ addi(consts, consts, 16); __ vadd_vv(v14, v15, v12); ----> Use v15 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1400194455 From rehn at openjdk.org Tue Nov 21 08:51:06 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 21 Nov 2023 08:51:06 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 In-Reply-To: References: <9x5sC6aXWG2OUYXdS97o-fJgjhNODf-mVC69bQNSSjI=.6425f2fc-d793-4b49-bf97-1ea55d0fd443@github.com> Message-ID: <_LkvimbOKKuIZon0Ajv9lKReO19xQjFI2VH2b4hsCE4=.89f5725a-150c-4a03-a6c2-a71a2f5fe3b6@github.com> On Tue, 21 Nov 2023 08:31:54 GMT, Fei Yang wrote: >> Depending on hardware pipeline depth this load can actually be executed after >> "__ vadd_vv(v14, v15, v10);" thus that instruction maybe already be retired when reaching round 1. >> >> Preloading these, depending on the number of V-load ports, the preloading it self can be very costly as they can't be executed out-of-order in parallel. >> >> So hiding the load in previous round can be faster, therefore my fast conclusion without numbers was at least for single pass no preloading *should* be better on bigger hardware. >> >> I guess I need to get those numbers :) > >> Depending on hardware pipeline depth this load can actually be executed after "__ vadd_vv(v14, v15, v10);" thus that instruction maybe already be retired when reaching round 1. >> >> Preloading these, depending on the number of V-load ports, the preloading it self can be very costly as they can't be executed out-of-order in parallel. > > Make sense. I was expecting those to retire when reaching the first round (round0). > >> So hiding the load in previous round can be faster, therefore my fast conclusion without numbers was at least for single pass no preloading _should_ be better on bigger hardware. > > But I see that there is a true data dependence on the vector load for each round. Any thing I missed? > Say, for round2: > > // Quad-round 2 (+2, v12->v13->v10->v11) > __ vl1re32_v(v15, consts); ----> Define v15 > __ addi(consts, consts, 16); > __ vadd_vv(v14, v15, v12); ----> Use v15 __ vadd_vv(v14, v15, v11); <<----------------------------------------------------------| __ vsha2cl_vv(v17, v16, v14); | __ vsha2ch_vv(v16, v17, v14); | __ vmerge_vvm(v14, v13, v12); | __ vsha2ms_vv(v11, v14, v10); // Generate W[23:20] | //-------------------------------------------------------------------------------- | // Quad-round 2 (+2, v12->v13->v10->v11) | __ vl1re32_v(v15, consts); --------------------------------------------------------------- No ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1400217024 From rehn at openjdk.org Tue Nov 21 09:02:07 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 21 Nov 2023 09:02:07 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 In-Reply-To: <_LkvimbOKKuIZon0Ajv9lKReO19xQjFI2VH2b4hsCE4=.89f5725a-150c-4a03-a6c2-a71a2f5fe3b6@github.com> References: <9x5sC6aXWG2OUYXdS97o-fJgjhNODf-mVC69bQNSSjI=.6425f2fc-d793-4b49-bf97-1ea55d0fd443@github.com> <_LkvimbOKKuIZon0Ajv9lKReO19xQjFI2VH2b4hsCE4=.89f5725a-150c-4a03-a6c2-a71a2f5fe3b6@github.com> Message-ID: On Tue, 21 Nov 2023 08:47:37 GMT, Robbin Ehn wrote: >>> Depending on hardware pipeline depth this load can actually be executed after "__ vadd_vv(v14, v15, v10);" thus that instruction maybe already be retired when reaching round 1. >>> >>> Preloading these, depending on the number of V-load ports, the preloading it self can be very costly as they can't be executed out-of-order in parallel. >> >> Make sense. I was expecting those to retire when reaching the first round (round0). >> >>> So hiding the load in previous round can be faster, therefore my fast conclusion without numbers was at least for single pass no preloading _should_ be better on bigger hardware. >> >> But I see that there is a true data dependence on the vector load for each round. Any thing I missed? >> Say, for round2: >> >> // Quad-round 2 (+2, v12->v13->v10->v11) >> __ vl1re32_v(v15, consts); ----> Define v15 >> __ addi(consts, consts, 16); >> __ vadd_vv(v14, v15, v12); ----> Use v15 > > __ vadd_vv(v14, v15, v11); > <<----------------------------------------------------------| > __ vsha2cl_vv(v17, v16, v14); | > __ vsha2ch_vv(v16, v17, v14); | > __ vmerge_vvm(v14, v13, v12); | > __ vsha2ms_vv(v11, v14, v10); // Generate W[23:20] | > //-------------------------------------------------------------------------------- | > // Quad-round 2 (+2, v12->v13->v10->v11) | > __ vl1re32_v(v15, consts); --------------------------------------------------------------- > > > No ? If you consider register renaming load should be enable to start even earlier. E.g. load into vX, then rename vX to v15 after the add that uses v15. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1400231771 From shade at openjdk.org Tue Nov 21 09:36:31 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 21 Nov 2023 09:36:31 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v7] In-Reply-To: References: Message-ID: > See the symptoms, reproducer and analysis in the bug. > > Current code waits on `disarm()`, which effectively stalls leaving the safepoint if some threads lag behind. Having more runnable threads than CPUs nearly guarantees that we would wait for quite some time, but it also reproduces well if you have enough threads near the CPU count. Just waiting at `arm()` is insufficient, but we can have several `Semaphores` to do what we want. > > This PR implements a more efficient `GenericWaitBarrier` to recover the performance. Most of the implementation discussion is in the code comments. The key observation that drives this work is that we want to reuse `Semaphore` and related counters without being stuck waiting for threads to leave. > > (AFAICS, futex-based `LinuxWaitBarrier` does roughly the same, but handles this reuse on futex side, by assigning the "address" per futex.) > > This issue affects everything except Linux. I initially found this on my M1 Mac, but pretty sure it is easy to reproduce on Windows as well. The safepoints from the reproducer in the bug improved dramatically on a Mac. Note not only the orders of magnitude better safepoint times, but also the several times more GC safepoints in the time-bound allocation test, which means the attainable GC throughput is similarly better, since we don't waste time at this wait barrier. > > ![plot-generic-wait-barrier-macos](https://github.com/openjdk/jdk/assets/1858943/3440f200-2360-482a-b8b2-71385cca35b7) > > Additional testing: > - [x] MacOS AArch64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) > - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) > - [x] MacOS AArch64 server fastdebug, `tier2 tier3` > - [x] Linux x86_64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) > - [x] Linux AArch64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: - Do not SpinYield at disarm loop - Merge branch 'master' into JDK-8318986-generic-wait-barrier - Drop the Linux check in preparation for integration - Merge branch 'master' into JDK-8318986-generic-wait-barrier - Merge branch 'master' into JDK-8318986-generic-wait-barrier - Rework paddings - Encode barrier tag into state, resolving another race condition - Simple review feedback fixes: tracking wakeup numbers, reflowing some methods - Merge branch 'master' into JDK-8318986-generic-wait-barrier - Touchups - ... and 5 more: https://git.openjdk.org/jdk/compare/0e39d942...32b0a9c6 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16404/files - new: https://git.openjdk.org/jdk/pull/16404/files/191c0dbb..32b0a9c6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16404&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16404&range=05-06 Stats: 12996 lines in 291 files changed: 6015 ins; 4707 del; 2274 mod Patch: https://git.openjdk.org/jdk/pull/16404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16404/head:pull/16404 PR: https://git.openjdk.org/jdk/pull/16404 From shade at openjdk.org Tue Nov 21 09:36:32 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 21 Nov 2023 09:36:32 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v6] In-Reply-To: References: <9a-07LkAZPw_SiOpo1uO8uA9i66AVNi9OkJjNZfSHU0=.8f4feee9-2b76-4006-a5ca-f0ce2d71c6b0@github.com> <8q21gGFk_v-6t9t7idoVkMzWohFSG-GZV_3GerYJQjI=.0ea809d9-b50f-4282-9cc3-354db88736a4@github.com> Message-ID: On Tue, 21 Nov 2023 07:04:44 GMT, Robbin Ehn wrote: >> If we fail we need to reload _state, when the other CPU just invalidated that cache-line. >> Then a spin-pause just before would actually be bad, but since it so rarely happens it doesn't matter. >> >> But some platform do not have a CAS, so Atomic::cmpxchg may be a load-reserve store-conditional (risc-v). Which have the (at least theoretical) possibility of all failed with the cmpxchg. >> As LR/SC is a bit unpredictable, and there are a number of hw vendors, I think it's good to have this just in case. > > https://lore.kernel.org/all/20230910082911.3378782-10-guoren at kernel.org/ Actually, I think we don't need `SpinYield` in this particular place for a few reasons: 1. We want to disarm as fast as possible, even if that means more contention, since we are on "leaving safepoint" path in VM thread here. 2. No other `_state` update loop yields, so this loop is effectively low priority under contention, which makes (1) even worse. 3. There is a sharing between `SpinYield` in `_state` CAS loop here and wakeup backoff later. Which is subtly leaking the `SpinYield` state between the phases: the aggressive backoff accrued due to `_state` contention would transfer to dealing with signaling contention. I removed this `wait()` in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16404#discussion_r1400278309 From mdoerr at openjdk.org Tue Nov 21 10:01:08 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 21 Nov 2023 10:01:08 GMT Subject: RFR: 8320418: PPC64: invokevfinal_helper duplicates code to handle ResolvedMethodEntry In-Reply-To: References: Message-ID: On Tue, 21 Nov 2023 08:24:08 GMT, Richard Reingruber wrote: >> One is a copy of the other. See usages of `method_entry->fill_in` in src/hotspot/share/oops/cpCache.cpp. E.g. >> `method_entry->fill_in((u1)as_TosState(method->result_type()), (u2)method()->size_of_parameters());` >> Scaling happens in `load_receiver`. > > Thanks for looking it up for me :) > So `ResolvedMethodEntry::_number_of_parameters` is a misnomer or am I missing something? Well, the size in stack words is the same as the number. One can like it or not. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16741#discussion_r1400314203 From duke at openjdk.org Tue Nov 21 10:03:30 2023 From: duke at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 21 Nov 2023 10:03:30 GMT Subject: RFR: 8318480: Obsolete UseCounterDecay and remove CounterDecayMinIntervalLength [v4] In-Reply-To: References: <50YDKFPHpqCEnhBk5eBeKWpbTJIHfFpQCfOcdVE8OhE=.75b95951-2c37-4f48-9a0d-fd52251f5771@github.com> Message-ID: On Mon, 20 Nov 2023 04:28:50 GMT, David Holmes wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Obsolete UseCounterDecay > > LGTM! Thanks Thanks @dholmes-ora and @TobiHartmann. Integrating now (but I require a sponsor). ------------- PR Comment: https://git.openjdk.org/jdk/pull/16673#issuecomment-1820580308 From fyang at openjdk.org Tue Nov 21 10:07:19 2023 From: fyang at openjdk.org (Fei Yang) Date: Tue, 21 Nov 2023 10:07:19 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 In-Reply-To: References: <9x5sC6aXWG2OUYXdS97o-fJgjhNODf-mVC69bQNSSjI=.6425f2fc-d793-4b49-bf97-1ea55d0fd443@github.com> <_LkvimbOKKuIZon0Ajv9lKReO19xQjFI2VH2b4hsCE4=.89f5725a-150c-4a03-a6c2-a71a2f5fe3b6@github.com> Message-ID: <1rTN32en51Pjpr-mdaDjw3UzQnf7W4J8JQTf-CMG04s=.904657b9-7a3a-46e3-8936-cf0f16b5c7b9@github.com> On Tue, 21 Nov 2023 08:59:10 GMT, Robbin Ehn wrote: >> __ vadd_vv(v14, v15, v11); >> <<----------------------------------------------------------| >> __ vsha2cl_vv(v17, v16, v14); | >> __ vsha2ch_vv(v16, v17, v14); | >> __ vmerge_vvm(v14, v13, v12); | >> __ vsha2ms_vv(v11, v14, v10); // Generate W[23:20] | >> //-------------------------------------------------------------------------------- | >> // Quad-round 2 (+2, v12->v13->v10->v11) | >> __ vl1re32_v(v15, consts); --------------------------------------------------------------- >> >> >> No ? > > If you consider register renaming load should be enable to start even earlier. > E.g. load into vX, then rename vX to v15 after the add that uses v15. Thanks. Now I see what you mean. That makes sense to me. It will be interesting to see how the performance numbers may vary. Unfortunately, I don't have access to the hardware yet. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1400324480 From shade at openjdk.org Tue Nov 21 10:19:10 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 21 Nov 2023 10:19:10 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v7] In-Reply-To: References: Message-ID: <2mgtqWds_JBcOqpWI1kTqs-pyugPdxxw19p8fnQ30J0=.c2053e87-a830-4501-9c0b-95767f31b7aa@github.com> On Tue, 21 Nov 2023 09:36:31 GMT, Aleksey Shipilev wrote: >> See the symptoms, reproducer and analysis in the bug. >> >> Current code waits on `disarm()`, which effectively stalls leaving the safepoint if some threads lag behind. Having more runnable threads than CPUs nearly guarantees that we would wait for quite some time, but it also reproduces well if you have enough threads near the CPU count. Just waiting at `arm()` is insufficient, but we can have several `Semaphores` to do what we want. >> >> This PR implements a more efficient `GenericWaitBarrier` to recover the performance. Most of the implementation discussion is in the code comments. The key observation that drives this work is that we want to reuse `Semaphore` and related counters without being stuck waiting for threads to leave. >> >> (AFAICS, futex-based `LinuxWaitBarrier` does roughly the same, but handles this reuse on futex side, by assigning the "address" per futex.) >> >> This issue affects everything except Linux. I initially found this on my M1 Mac, but pretty sure it is easy to reproduce on Windows as well. The safepoints from the reproducer in the bug improved dramatically on a Mac. Note not only the orders of magnitude better safepoint times, but also the several times more GC safepoints in the time-bound allocation test, which means the attainable GC throughput is similarly better, since we don't waste time at this wait barrier. >> >> ![plot-generic-wait-barrier-macos](https://github.com/openjdk/jdk/assets/1858943/3440f200-2360-482a-b8b2-71385cca35b7) >> >> Additional testing: >> - [x] MacOS AArch64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] MacOS AArch64 server fastdebug, `tier2 tier3` >> - [x] Linux x86_64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: > > - Do not SpinYield at disarm loop > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Drop the Linux check in preparation for integration > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Rework paddings > - Encode barrier tag into state, resolving another race condition > - Simple review feedback fixes: tracking wakeup numbers, reflowing some methods > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Touchups > - ... and 5 more: https://git.openjdk.org/jdk/compare/09ae5018...32b0a9c6 The performance improvements still hold. I would wait for some light testing to complete, and then I will integrate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1820619560 From shade at openjdk.org Tue Nov 21 10:35:14 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 21 Nov 2023 10:35:14 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v8] In-Reply-To: References: Message-ID: <4JD2RhqD5U439s7se-vjGnNIglIAvr9g91J2ZOtyPvk=.9b7ceb98-0772-4ff1-ad05-f48a3e09bb4b@github.com> > See the symptoms, reproducer and analysis in the bug. > > Current code waits on `disarm()`, which effectively stalls leaving the safepoint if some threads lag behind. Having more runnable threads than CPUs nearly guarantees that we would wait for quite some time, but it also reproduces well if you have enough threads near the CPU count. Just waiting at `arm()` is insufficient, but we can have several `Semaphores` to do what we want. > > This PR implements a more efficient `GenericWaitBarrier` to recover the performance. Most of the implementation discussion is in the code comments. The key observation that drives this work is that we want to reuse `Semaphore` and related counters without being stuck waiting for threads to leave. > > (AFAICS, futex-based `LinuxWaitBarrier` does roughly the same, but handles this reuse on futex side, by assigning the "address" per futex.) > > This issue affects everything except Linux. I initially found this on my M1 Mac, but pretty sure it is easy to reproduce on Windows as well. The safepoints from the reproducer in the bug improved dramatically on a Mac. Note not only the orders of magnitude better safepoint times, but also the several times more GC safepoints in the time-bound allocation test, which means the attainable GC throughput is similarly better, since we don't waste time at this wait barrier. > > ![plot-generic-wait-barrier-macos](https://github.com/openjdk/jdk/assets/1858943/3440f200-2360-482a-b8b2-71385cca35b7) > > Additional testing: > - [x] MacOS AArch64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) > - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) > - [x] MacOS AArch64 server fastdebug, `tier2 tier3` > - [x] Linux x86_64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) > - [x] Linux AArch64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: - Merge branch 'master' into JDK-8318986-generic-wait-barrier - Do not SpinYield at disarm loop - Merge branch 'master' into JDK-8318986-generic-wait-barrier - Drop the Linux check in preparation for integration - Merge branch 'master' into JDK-8318986-generic-wait-barrier - Merge branch 'master' into JDK-8318986-generic-wait-barrier - Rework paddings - Encode barrier tag into state, resolving another race condition - Simple review feedback fixes: tracking wakeup numbers, reflowing some methods - Merge branch 'master' into JDK-8318986-generic-wait-barrier - ... and 6 more: https://git.openjdk.org/jdk/compare/b7a341a1...e56a2bfa ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16404/files - new: https://git.openjdk.org/jdk/pull/16404/files/32b0a9c6..e56a2bfa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16404&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16404&range=06-07 Stats: 6185 lines in 143 files changed: 3543 ins; 792 del; 1850 mod Patch: https://git.openjdk.org/jdk/pull/16404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16404/head:pull/16404 PR: https://git.openjdk.org/jdk/pull/16404 From rehn at openjdk.org Tue Nov 21 10:47:11 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 21 Nov 2023 10:47:11 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v8] In-Reply-To: <4JD2RhqD5U439s7se-vjGnNIglIAvr9g91J2ZOtyPvk=.9b7ceb98-0772-4ff1-ad05-f48a3e09bb4b@github.com> References: <4JD2RhqD5U439s7se-vjGnNIglIAvr9g91J2ZOtyPvk=.9b7ceb98-0772-4ff1-ad05-f48a3e09bb4b@github.com> Message-ID: <02Pzve-jUmsh1db7GX6IbAWivPt6EgflZhdXp0x6B_w=.8604165f-7b41-4257-9c73-4515c840c2bd@github.com> On Tue, 21 Nov 2023 10:35:14 GMT, Aleksey Shipilev wrote: >> See the symptoms, reproducer and analysis in the bug. >> >> Current code waits on `disarm()`, which effectively stalls leaving the safepoint if some threads lag behind. Having more runnable threads than CPUs nearly guarantees that we would wait for quite some time, but it also reproduces well if you have enough threads near the CPU count. Just waiting at `arm()` is insufficient, but we can have several `Semaphores` to do what we want. >> >> This PR implements a more efficient `GenericWaitBarrier` to recover the performance. Most of the implementation discussion is in the code comments. The key observation that drives this work is that we want to reuse `Semaphore` and related counters without being stuck waiting for threads to leave. >> >> (AFAICS, futex-based `LinuxWaitBarrier` does roughly the same, but handles this reuse on futex side, by assigning the "address" per futex.) >> >> This issue affects everything except Linux. I initially found this on my M1 Mac, but pretty sure it is easy to reproduce on Windows as well. The safepoints from the reproducer in the bug improved dramatically on a Mac. Note not only the orders of magnitude better safepoint times, but also the several times more GC safepoints in the time-bound allocation test, which means the attainable GC throughput is similarly better, since we don't waste time at this wait barrier. >> >> ![plot-generic-wait-barrier-macos](https://github.com/openjdk/jdk/assets/1858943/3440f200-2360-482a-b8b2-71385cca35b7) >> >> Additional testing: >> - [x] MacOS AArch64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] MacOS AArch64 server fastdebug, `tier2 tier3` >> - [x] Linux x86_64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Do not SpinYield at disarm loop > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Drop the Linux check in preparation for integration > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Rework paddings > - Encode barrier tag into state, resolving another race condition > - Simple review feedback fixes: tracking wakeup numbers, reflowing some methods > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - ... and 6 more: https://git.openjdk.org/jdk/compare/287ebd0a...e56a2bfa Thanks! Did you want to tackle the futex version also? I'll create a JBS anyhow to track it. ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16404#pullrequestreview-1741673658 From rehn at openjdk.org Tue Nov 21 10:47:12 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 21 Nov 2023 10:47:12 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v6] In-Reply-To: References: <9a-07LkAZPw_SiOpo1uO8uA9i66AVNi9OkJjNZfSHU0=.8f4feee9-2b76-4006-a5ca-f0ce2d71c6b0@github.com> <8q21gGFk_v-6t9t7idoVkMzWohFSG-GZV_3GerYJQjI=.0ea809d9-b50f-4282-9cc3-354db88736a4@github.com> Message-ID: <1pz0td0K415qo_811zHYhKtBFFLPiCTgS5BNQN24D2A=.f9ea76d5-72c2-4a09-bc46-a698bfc5eb82@github.com> On Tue, 21 Nov 2023 09:33:31 GMT, Aleksey Shipilev wrote: >> https://lore.kernel.org/all/20230910082911.3378782-10-guoren at kernel.org/ > > Actually, I think we don't need `SpinYield` in this particular place for a few reasons: > 1. We want to disarm as fast as possible, even if that means more contention, since we are on "leaving safepoint" path in VM thread here _and_ we want to have all new waiters to return immediately. > 2. No other `_state` update loop yields, so this loop is effectively low priority under contention, which makes (1) even worse. > 3. There is a sharing between `SpinYield` in `_state` CAS loop here and wakeup backoff later. Which is subtly leaking the `SpinYield` state between the phases: the aggressive backoff accrued due to `_state` contention would transfer to dealing with signaling contention. > > I removed this `wait()` in new commit. Yes, you are correct, since they are moving towards identical state, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16404#discussion_r1400381092 From shade at openjdk.org Tue Nov 21 10:56:09 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 21 Nov 2023 10:56:09 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v8] In-Reply-To: <02Pzve-jUmsh1db7GX6IbAWivPt6EgflZhdXp0x6B_w=.8604165f-7b41-4257-9c73-4515c840c2bd@github.com> References: <4JD2RhqD5U439s7se-vjGnNIglIAvr9g91J2ZOtyPvk=.9b7ceb98-0772-4ff1-ad05-f48a3e09bb4b@github.com> <02Pzve-jUmsh1db7GX6IbAWivPt6EgflZhdXp0x6B_w=.8604165f-7b41-4257-9c73-4515c840c2bd@github.com> Message-ID: On Tue, 21 Nov 2023 10:44:49 GMT, Robbin Ehn wrote: > Did you want to tackle the futex version also? I'll create a JBS anyhow to track it. I have no plans to tackle futex version. Trying to finish the tasks already started this year :) But I would be surprised if there are things that can be improved on top what futex already does for us. Probably avalanche wakeups would be a thing to try? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1820682743 From rehn at openjdk.org Tue Nov 21 11:00:11 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 21 Nov 2023 11:00:11 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 In-Reply-To: <1rTN32en51Pjpr-mdaDjw3UzQnf7W4J8JQTf-CMG04s=.904657b9-7a3a-46e3-8936-cf0f16b5c7b9@github.com> References: <9x5sC6aXWG2OUYXdS97o-fJgjhNODf-mVC69bQNSSjI=.6425f2fc-d793-4b49-bf97-1ea55d0fd443@github.com> <_LkvimbOKKuIZon0Ajv9lKReO19xQjFI2VH2b4hsCE4=.89f5725a-150c-4a03-a6c2-a71a2f5fe3b6@github.com> <1rTN32en51Pjpr-mdaDjw3UzQnf7W4J8JQTf-CMG04s=.904657b9-7a3a-46e3-8936-cf0f16b5c7b9@github.com> Message-ID: <5ydUXSyM7-XcGRH86bvVH4LJM94sAY7rahyUeqcrkBk=.e237d328-06f4-4919-af88-ea6f56d0b202@github.com> On Tue, 21 Nov 2023 10:04:09 GMT, Fei Yang wrote: >> If you consider register renaming load should be enable to start even earlier. >> E.g. load into vX, then rename vX to v15 after the add that uses v15. > > Thanks. Now I see what you mean. That makes sense to me. It will be interesting to see how the performance numbers may vary. Unfortunately, I don't have access to the hardware yet. We don't either have such hardware, we simulate via gem5. Ventana v2 should have 15 wide pipeline with RVV 1.0 how knows how this will execute on such :) As we don't know I think you are correct in we should write the most readable version first. And later we can apt these for hwprobe triplet of vendor/arch/impl if we think that it's worth it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1400399603 From dholmes at openjdk.org Tue Nov 21 11:00:11 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 21 Nov 2023 11:00:11 GMT Subject: RFR: 8318776: Require supports_cx8 to always be true [v4] In-Reply-To: References: Message-ID: On Tue, 21 Nov 2023 06:03:38 GMT, Erik ?sterlund wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unnecessary includes of vm_version.hpp. >> Fix copyright years. > > This looks great! Thanks for the review @fisk ! I have to wait for a few Zero related PRs to get integrated then re-merge, before I can integrate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16625#issuecomment-1820689855 From duke at openjdk.org Tue Nov 21 11:01:24 2023 From: duke at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 21 Nov 2023 11:01:24 GMT Subject: Integrated: 8318480: Obsolete UseCounterDecay and remove CounterDecayMinIntervalLength In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 09:40:46 GMT, Daniel Lund?n wrote: > This changeset obsoletes the leftover (i.e., no longer used for anything) product compiler flag `UseCounterDecay` (requires CSR) and removes the leftover develop flag `CounterDecayMinIntervalLength`. > > Changes: > - Obsolete `UseCounterDecay` in JDK 22 and expire it in JDK 23. > - Completely remove `CounterDecayMinIntervalLength`. > > ### Testing > Platforms: windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64. > - `tier1` > - HotSpot parts of `tier2` and `tier3` This pull request has now been integrated. Changeset: 92320707 Author: Daniel Lund?n Committer: David Holmes URL: https://git.openjdk.org/jdk/commit/923207073af985a1b72de3c777d55b0c2d392b25 Stats: 58 lines in 21 files changed: 1 ins; 19 del; 38 mod 8318480: Obsolete UseCounterDecay and remove CounterDecayMinIntervalLength Reviewed-by: thartmann, lmesnik, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/16673 From rehn at openjdk.org Tue Nov 21 11:08:11 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 21 Nov 2023 11:08:11 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v8] In-Reply-To: References: <4JD2RhqD5U439s7se-vjGnNIglIAvr9g91J2ZOtyPvk=.9b7ceb98-0772-4ff1-ad05-f48a3e09bb4b@github.com> <02Pzve-jUmsh1db7GX6IbAWivPt6EgflZhdXp0x6B_w=.8604165f-7b41-4257-9c73-4515c840c2bd@github.com> Message-ID: On Tue, 21 Nov 2023 10:53:06 GMT, Aleksey Shipilev wrote: > > Did you want to tackle the futex version also? I'll create a JBS anyhow to track it. > > I have no plans to tackle futex version. Trying to finish the tasks already started this year :) But I would be surprised if there are things that can be improved on top what futex already does for us. Probably avalanche wakeups would be a thing to try? Ok! Yes! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1820702688 From mdoerr at openjdk.org Tue Nov 21 11:30:09 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 21 Nov 2023 11:30:09 GMT Subject: RFR: 8320418: PPC64: invokevfinal_helper duplicates code to handle ResolvedMethodEntry In-Reply-To: References: Message-ID: On Tue, 21 Nov 2023 09:58:30 GMT, Martin Doerr wrote: >> Thanks for looking it up for me :) >> So `ResolvedMethodEntry::_number_of_parameters` is a misnomer or am I missing something? > > Well, the size in stack words is the same as the number. One can like it or not. So, yes, the name doesn't tell what it actually is. It's the number of slots and not the number of arguments. But that's out of scope for this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16741#discussion_r1400433667 From rrich at openjdk.org Tue Nov 21 11:36:11 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 21 Nov 2023 11:36:11 GMT Subject: RFR: 8320418: PPC64: invokevfinal_helper duplicates code to handle ResolvedMethodEntry In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 15:40:10 GMT, Martin Doerr wrote: > `TemplateTable::invokevfinal_helper` should use `TemplateTable::prepare_invoke`. `TemplateInterpreter::invoke_return_entry_table_for` needs to support `_fast_invokevfinal` bytecode for that which is only used by PPC64. (It is probably still beneficial for AIX which doesn't support CDS.) > In addition, I've cleaned up some inaccurate comments. Looks good (apart from the preexisting misnomer `ResolvedMethodEntry::_number_of_parameters`). ------------- Marked as reviewed by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16741#pullrequestreview-1741762882 From rrich at openjdk.org Tue Nov 21 11:36:14 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 21 Nov 2023 11:36:14 GMT Subject: RFR: 8320418: PPC64: invokevfinal_helper duplicates code to handle ResolvedMethodEntry In-Reply-To: References: Message-ID: On Tue, 21 Nov 2023 11:27:35 GMT, Martin Doerr wrote: >> Well, the size in stack words is the same as the number. One can like it or not. > > So, yes, the name doesn't tell what it actually is. It's the number of slots and not the number of arguments. But that's out of scope for this PR. It is out of this scope. And it is a misnomer that will confuse people. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16741#discussion_r1400438061 From mdoerr at openjdk.org Tue Nov 21 11:48:15 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 21 Nov 2023 11:48:15 GMT Subject: RFR: 8320418: PPC64: invokevfinal_helper duplicates code to handle ResolvedMethodEntry In-Reply-To: References: Message-ID: On Tue, 21 Nov 2023 11:31:44 GMT, Richard Reingruber wrote: >> So, yes, the name doesn't tell what it actually is. It's the number of slots and not the number of arguments. But that's out of scope for this PR. > > It is out of this scope. And it is a misnomer that will confuse people. Agreed. Thanks for the review! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16741#discussion_r1400453068 From stuefe at openjdk.org Tue Nov 21 13:16:37 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 21 Nov 2023 13:16:37 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 16:38:17 GMT, Thomas Stuefe wrote: > In `Metaspace::reserve_address_space_for_compressed_classes`, we reserve space for the future Klass range. We place the Klass range somewhere that allows us to use "good" narrow Klass decoding later when initializing the encoding scheme. > > Narrow Klass decoding is inherently CPU-specific, so doing this in shared coding is awkward. It leads to many ifdefs, vague code comments that are difficult to explain, and missed optimizations. > > There are common patterns: > - all platforms benefit from unscaled encoding so trying to reserve <4GB for CDS=off is worthwhile. > > But there are more differences than one would think: > - some platforms (s390, riscv) benefit from reservation < 4GB even with CDS=on since a 32-bit immediate requires fewer instructions > - some platforms (aarch64) don't benefit from zero-based encoding, so no need to try that > - some platforms benefit from optimizing the base for 16-bit moves (PPC, s390, aarch64) or for other immediate formats (riscv) > > It would be much better to have this section per CPU so that every CPU can implement its perfect, well documented version. A bit of code duplication is a good price for code clarity. > > ------------- > > This patch splits out `Metaspace::reserve_address_space_for_compressed_classes` into five variants, one per CPU (moving the code to CompressedKlassPointers); it also splits out `CompressedKlassPointers::initialize` into two variants, one for aarch64, one for all other platforms. > > Changes per-CPU: > > #### aarch64: > > Don't attempt to reserve for zero-based encoding; since lsl is not faster than movk. We reserve for movk mode right away if reserve for unscaled fails or if CDS=on. > > We also add a last-ditch attempt to reserve optimized for movk via over-alignment. We only do this on aarch64 to prevent errors like this one JDK-8318119: "Invalid narrow Klass base on aarch64 post 8312018" > > Since we don't want zero-based encoding, we need an aarch64-specific version for `CompressedKlassPointers::initialize()` > > #### riscv: > > We attempt to reserve at a "good" base that has only bits set either in [12..32), [32, 44) or in [44, 64). > > #### s390: > > We attempt to allocate < 4GB unconditionally. arm32 and macos intel GHA errors unrelated ------------- PR Comment: https://git.openjdk.org/jdk/pull/16743#issuecomment-1820629519 From stuefe at openjdk.org Tue Nov 21 13:16:36 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 21 Nov 2023 13:16:36 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation Message-ID: In `Metaspace::reserve_address_space_for_compressed_classes`, we reserve space for the future Klass range. We place the Klass range somewhere that allows us to use "good" narrow Klass decoding later when initializing the encoding scheme. Narrow Klass decoding is inherently CPU-specific, so doing this in shared coding is awkward. It leads to many ifdefs, vague code comments that are difficult to explain, and missed optimizations. There are common patterns: - all platforms benefit from unscaled encoding so trying to reserve <4GB for CDS=off is worthwhile. But there are more differences than one would think: - some platforms (s390, riscv) benefit from reservation < 4GB even with CDS=on since a 32-bit immediate requires fewer instructions - some platforms (aarch64) don't benefit from zero-based encoding, so no need to try that - some platforms benefit from optimizing the base for 16-bit moves (PPC, s390, aarch64) or for other immediate formats (riscv) It would be much better to have this section per CPU so that every CPU can implement its perfect, well documented version. A bit of code duplication is a good price for code clarity. ------------- This patch splits out `Metaspace::reserve_address_space_for_compressed_classes` into five variants, one per CPU (moving the code to CompressedKlassPointers); it also splits out `CompressedKlassPointers::initialize` into two variants, one for aarch64, one for all other platforms. Changes per-CPU: #### aarch64: Don't attempt to reserve for zero-based encoding; since lsl is not faster than movk. We reserve for movk mode right away if reserve for unscaled fails or if CDS=on. We also add a last-ditch attempt to reserve optimized for movk via over-alignment. We only do this on aarch64 to prevent errors like this one JDK-8318119: "Invalid narrow Klass base on aarch64 post 8312018" Since we don't want zero-based encoding, we need an aarch64-specific version for `CompressedKlassPointers::initialize()` #### riscv: We attempt to reserve at a "good" base that has only bits set either in [12..32), [32, 44) or in [44, 64). #### s390: We attempt to allocate < 4GB unconditionally. ------------- Depends on: https://git.openjdk.org/jdk/pull/16727 Commit messages: - Update compressedKlass_aarch64.cpp - JDK-8320368-Per-CPU-optimization-of-Klass-range-reservation Changes: https://git.openjdk.org/jdk/pull/16743/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16743&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320368 Stats: 466 lines in 14 files changed: 356 ins; 63 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/16743.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16743/head:pull/16743 PR: https://git.openjdk.org/jdk/pull/16743 From aboldtch at openjdk.org Tue Nov 21 13:59:57 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 21 Nov 2023 13:59:57 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v7] In-Reply-To: References: Message-ID: <5qeXZ05KFfZzyWj21UyoQFrCUC2wrxKP8T_zEhK8Lms=.c7196b2e-fa8d-4494-a24d-4a8c68dcd420@github.com> > LM_LIGHTWEIGHT only uses the lock bits for its locking. This leaves the hashCode bits free when a monitor is not inflated. So instead of inflating when installing the hashCode on a fast locked object it can simply use the hashCode bits in the markWord. > > The mark word transitions Unlocked (0b01) <=> Locked (0b00) are done by retrying the CAS if it fails due to non-lock bit changes. > The mark word transitions Monitor (0b10) <=> Locked/Unlocked (0b0X) are the same as before, inflation already handles hash codes. This change does not interact with the mark word if it is in a Monitor (0b10) state, so the strong CAS which is used for deflation are still valid, and will not fail to any other reason than the cooperative race to help transition the mark word during deflation. > > This is dependent on JDK-8319778 simply because JDK-8319797 is dependent on both this and JDK-8319778. Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/library_call.cpp Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16603/files - new: https://git.openjdk.org/jdk/pull/16603/files/fdbfbf8a..560dd153 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16603&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16603&range=05-06 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16603.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16603/head:pull/16603 PR: https://git.openjdk.org/jdk/pull/16603 From aph at openjdk.org Tue Nov 21 14:15:09 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 21 Nov 2023 14:15:09 GMT Subject: RFR: 8319801: Recursive lightweight locking: aarch64 implementation [v3] In-Reply-To: References: Message-ID: <9qW0rFrGiudabO2j8f7nAhF2Ka0MyMzEvdHu_QOGj3U=.d49f84f9-0b19-405a-9ea6-4a0b857926d8@github.com> On Mon, 20 Nov 2023 09:01:47 GMT, Axel Boldt-Christmas wrote: >> Implements the aarch64 port of JDK-8319796. >> >> There are two major parts for the port implementation. The C2 part, and the part shared by the interpreter, C1 and the native call wrapper. >> >> The biggest change for both parts is that we check the lock stack first and if it is a recursive lightweight [un]lock and in that case simply pop/push and finish successfully. >> >> Only if the recursive lightweight [un]lock fails does it look at the mark word. >> >> For the shared part if it is an unstructured exit, the monitor is inflated or the mark word transition fails it calls into the runtime. >> >> The C2 operates under a few more assumptions, that the locking is structured and balanced. This means that some checks can be elided. >> >> First this means that in C2 unlock if the obj is not on the top of the lock stack, it must be inflated. And reversely if we reach the inflated C2 unlock the obj is not on the lock stack. This second property makes it possible to avoid reading the owner (and checking if it is anonymous). Instead it can either just do an un-contended unlock by writing null to the owner, or if contention happens, simply write the thread to the owner and jump to the runtime. >> >> The aarch64 C2 port tries to avoid stronger memory semantics where ever possible. In C2 lock it first does a relaxed load of the mark word to check for inflation. Both lock and unlock uses a load/store exclusive register pair to transition the mark word. > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge remote-tracking branch 'upstream_jdk/pr/16606' into JDK-8319801 > - Merge remote-tracking branch 'upstream_jdk/pr/16606' into JDK-8319801 > - 8319801: Recursive lightweight locking: aarch64 implementation > - Cleanup: C2 fast_lock/fast_unlock aarch64 Editing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16608#issuecomment-1820998454 From ihse at openjdk.org Tue Nov 21 14:16:11 2023 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 21 Nov 2023 14:16:11 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 01:47:34 GMT, Xiaohong Gong wrote: >> make/autoconf/lib-vmath.m4 line 92: >> >>> 90: [] >>> 91: ) >>> 92: AC_MSG_RESULT([${SVE_FEATURE_SUPPORT}]) >> >> What is this test even for? I can't see any usage of SVE_FEATURE_SUPPORT outside this function. > > This is just used to print the result of `AC_MSG_CEHCKING[if ARM SVE feature is supported]` in configure. Ah, now I se what you are trying to do here. First of all, in the detection part, only set `SVE_FEATURE_SUPPORT`. Then you can handle the `SVE_CFLAGS` addition elsewhere/later. Secondly, you should not mix these `SVE_CFLAGS` with the spleef C flags. Keeping them separate will allow for LIBSLEEF_CFLAGS to be named just that. Thirdly, I do not like at all how you just come crashing in setting `-march` like that. The `-march` flag is handled by `FLAGS_SETUP_ABI_PROFILE`. Actually, now that I think of it, this is just completely wrong! You are checking on features on the build machine, to determine what target machine code to generate, with no way to override. You need to break out the -march handling separately. It should be moved to FLAGS_SETUP_ABI_PROFILE. I'm guessing you will need to make something like a `aarch64-sve` profile, and possibly try to auto-select it based on the result of the sve test program above. But changing `OPENJDK_TARGET_ABI_PROFILE` can have further consequences; I do not know the full extent on the top of my head. >> make/autoconf/libraries.m4 line 129: >> >>> 127: LIB_SETUP_LIBFFI >>> 128: LIB_SETUP_MISC_LIBS >>> 129: LIB_SETUP_VMATH >> >> The function (and file) should be named after "sleef", not "vmath". > > Yes, it seems weird. But the library we want to built out is `libvmath.so` instead of `libsleef.so`. And we not only check the sleef library, but also the ARM SVE feature inside it. So using `VMATH` suffix is more reasonable to me. WDYT? As I said above, you should not mix the two together. Keep the library handling for libsleef. Move the march setting to where it belongs. And rename the files, functions and variables after this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1400663476 PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1400665010 From ayang at openjdk.org Tue Nov 21 14:22:09 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 21 Nov 2023 14:22:09 GMT Subject: RFR: JDK-8234502 : Merge GenCollectedHeap and SerialHeap [v7] In-Reply-To: References: Message-ID: On Thu, 16 Nov 2023 16:49:54 GMT, Lei Zaakjyu wrote: >> JDK-8234502 : Merge GenCollectedHeap and SerialHeap > > Lei Zaakjyu has updated the pull request incrementally with three additional commits since the last revision: > > - replace a necessary include statement > - clean up > - add line-breaks The general shape looks good. I wanna took a closer look after #16492 is merged. (I believe that can reduce the diff size of this PR slightly.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16623#issuecomment-1821014195 From aboldtch at openjdk.org Tue Nov 21 14:28:01 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 21 Nov 2023 14:28:01 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v8] In-Reply-To: References: Message-ID: > LM_LIGHTWEIGHT only uses the lock bits for its locking. This leaves the hashCode bits free when a monitor is not inflated. So instead of inflating when installing the hashCode on a fast locked object it can simply use the hashCode bits in the markWord. > > The mark word transitions Unlocked (0b01) <=> Locked (0b00) are done by retrying the CAS if it fails due to non-lock bit changes. > The mark word transitions Monitor (0b10) <=> Locked/Unlocked (0b0X) are the same as before, inflation already handles hash codes. This change does not interact with the mark word if it is in a Monitor (0b10) state, so the strong CAS which is used for deflation are still valid, and will not fail to any other reason than the cooperative race to help transition the mark word during deflation. > > This is dependent on JDK-8319778 simply because JDK-8319797 is dependent on both this and JDK-8319778. Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Fix copy paste typo. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16603/files - new: https://git.openjdk.org/jdk/pull/16603/files/560dd153..b9cc13a6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16603&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16603&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16603.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16603/head:pull/16603 PR: https://git.openjdk.org/jdk/pull/16603 From aboldtch at openjdk.org Tue Nov 21 14:37:01 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 21 Nov 2023 14:37:01 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v9] In-Reply-To: References: Message-ID: > LM_LIGHTWEIGHT only uses the lock bits for its locking. This leaves the hashCode bits free when a monitor is not inflated. So instead of inflating when installing the hashCode on a fast locked object it can simply use the hashCode bits in the markWord. > > The mark word transitions Unlocked (0b01) <=> Locked (0b00) are done by retrying the CAS if it fails due to non-lock bit changes. > The mark word transitions Monitor (0b10) <=> Locked/Unlocked (0b0X) are the same as before, inflation already handles hash codes. This change does not interact with the mark word if it is in a Monitor (0b10) state, so the strong CAS which is used for deflation are still valid, and will not fail to any other reason than the cooperative race to help transition the mark word during deflation. > > This is dependent on JDK-8319778 simply because JDK-8319797 is dependent on both this and JDK-8319778. Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - Merge remote-tracking branch 'upstream_jdk/pr/16602' into JDK-8319773 - Fix copy paste typo. - Update src/hotspot/share/opto/library_call.cpp Co-authored-by: Tobias Hartmann - Add retry CAS comment - Use is_neutral over is_unlocked - Merge remote-tracking branch 'upstream_jdk/pr/16602' into JDK-8319773 - Use more familiar CAS variable names and pattern - Move is_lock_owned closer to its only use - Merge remote-tracking branch 'upstream_jdk/pr/16602' into JDK-8319773 - Simplify test. - ... and 1 more: https://git.openjdk.org/jdk/compare/aa4e86ce...b4061417 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16603/files - new: https://git.openjdk.org/jdk/pull/16603/files/b9cc13a6..b4061417 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16603&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16603&range=07-08 Stats: 6364 lines in 172 files changed: 3596 ins; 822 del; 1946 mod Patch: https://git.openjdk.org/jdk/pull/16603.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16603/head:pull/16603 PR: https://git.openjdk.org/jdk/pull/16603 From aboldtch at openjdk.org Tue Nov 21 14:45:23 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 21 Nov 2023 14:45:23 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v6] In-Reply-To: References: Message-ID: <9qMIC_BQk5i5MmbQLovTmNsla_qMxlgCCZhyK8eHHSc=.c7d958c4-deaa-4264-a3f4-1907240d26d2@github.com> > Implements the runtime part of JDK-8319796. > The different CPU implementations are/will be created as dependent pull requests. > > This enhancement proposes introducing the ability for LM_LIGHTWEIGHT to handle consecutive recursive monitor enter. Limiting the implementation to only consecutive monitor enters allows for more efficient emitted code which only needs to look at the two top most entires on the lock stack to determine what to do in a monitor exit. > > A high level overview: > * Locking is still performed on the mark word > * Unlocked (0b01) <=> Locked (0b00) > * Monitor enter on Obj with mark word Unlocked (0b01) is the same > * Transition Obj's mark word Unlocked (0b01) => Locked (0b00) > * Push Obj onto the lock stack > * Success > * Monitor enter on Obj with mark word Locked (0b00) will check the top entry on the lock stack > * If top entry is Obj > * Push Obj on the lock stack > * Success > * If top entry is not Obj > * Inflate and call ObjectMonitor::enter > * Monitor exit on Obj with mark word Locked (0b00) will check the two top entries on the lock stack > * If just the top entry is Obj > * Transition Obj's mark word Locked (0b00) => Unlocked (0b01) > * Pop the entry > * Success > * If both entries are Obj > * Pop the top entry > * Success > * Any other case only occurs for unstructured locking, then just inflate and call ObjectMonitor::exit > * If the monitor has been inflated for object Obj which is owned by the current thread > * All corresponding entries for Obj is removed from the lock stack > * The monitor recursions is set to the number of removed entries - 1 > * The owner is changed from anonymous to the thread > * The regular ObjectMonitor::action is called. Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge remote-tracking branch 'upstream_jdk/pr/16603' into JDK-8319797 - Merge remote-tracking branch 'upstream_jdk/pr/16603' into JDK-8319797 - Merge remote-tracking branch 'upstream_jdk/pr/16603' into JDK-8319797 - Fix nit - Fix comment typos - 8319797: Recursive lightweight locking: Runtime implementation ------------- Changes: https://git.openjdk.org/jdk/pull/16606/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16606&range=05 Stats: 665 lines in 10 files changed: 633 ins; 10 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/16606.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16606/head:pull/16606 PR: https://git.openjdk.org/jdk/pull/16606 From aboldtch at openjdk.org Tue Nov 21 15:02:28 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 21 Nov 2023 15:02:28 GMT Subject: RFR: 8319799: Recursive lightweight locking: x86 implementation [v6] In-Reply-To: References: Message-ID: <6Us8sCkGXp66V3ymolw0yK7_Ocri3EaQNkKwOQmWkMY=.a0b0703b-de1c-4de1-bf6c-54256747a9cf@github.com> > Implements the x86 port of JDK-8319796. > > There are two major parts for the port implementation. The C2 part, and the part shared by the interpreter, C1 and the native call wrapper. > > The biggest change for both parts is that we check the lock stack first and if it is a recursive lightweight [un]lock and in that case simply pop/push and finish successfully. > > Only if the recursive lightweight [un]lock fails does it look at the mark word. > > For the shared part if it is an unstructured exit, the monitor is inflated or the mark word transition fails it calls into the runtime. > > The C2 operates under a few more assumptions, that the locking is structured and balanced. This means that some checks can be elided. > > First this means that in C2 unlock if the obj is not on the top of the lock stack, it must be inflated. And reversely if we reach the inflated C2 unlock the obj is not on the lock stack. This second property makes it possible to avoid reading the owner (and checking if it is anonymous). Instead it can either just do an un-contended unlock by writing null to the owner, or if contention happens, simply write the thread to the owner and jump to the runtime. > > The x86 C2 port also has some extra oddities. > > The mark word read is done early as it showed better scaling in hyper-threaded scenarios on certain intel hardware, and no noticeable downside on other tested x86 hardware. > > The fast path is written to avoid going through conditional branches. This in combination with keeping the ZF output correct, the code does some actions eagerly, decrementing the held monitor count, popping from the lock stack. And jumps to a code stub if a slow path is required which restores the thread local state to a correct state before jumping to the runtime. > > The contended unlock was also moved to the code stub. Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: - Merge remote-tracking branch 'upstream_jdk/pr/16606' into JDK-8319799 - Merge remote-tracking branch 'upstream_jdk/pr/16606' into JDK-8319799 - top load adjustments - Merge remote-tracking branch 'upstream_jdk/pr/16606' into JDK-8319799 - Fix type - Move inflated check in fast_locked - Move top load - 8319799: Recursive lightweight locking: x86 implementation - Cleanup: C2 fast_lock/fast_unlock x86 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16607/files - new: https://git.openjdk.org/jdk/pull/16607/files/44211e7b..40d30882 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16607&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16607&range=04-05 Stats: 6367 lines in 174 files changed: 3596 ins; 822 del; 1949 mod Patch: https://git.openjdk.org/jdk/pull/16607.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16607/head:pull/16607 PR: https://git.openjdk.org/jdk/pull/16607 From aboldtch at openjdk.org Tue Nov 21 15:25:44 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 21 Nov 2023 15:25:44 GMT Subject: RFR: 8319801: Recursive lightweight locking: aarch64 implementation [v4] In-Reply-To: References: Message-ID: > Implements the aarch64 port of JDK-8319796. > > There are two major parts for the port implementation. The C2 part, and the part shared by the interpreter, C1 and the native call wrapper. > > The biggest change for both parts is that we check the lock stack first and if it is a recursive lightweight [un]lock and in that case simply pop/push and finish successfully. > > Only if the recursive lightweight [un]lock fails does it look at the mark word. > > For the shared part if it is an unstructured exit, the monitor is inflated or the mark word transition fails it calls into the runtime. > > The C2 operates under a few more assumptions, that the locking is structured and balanced. This means that some checks can be elided. > > First this means that in C2 unlock if the obj is not on the top of the lock stack, it must be inflated. And reversely if we reach the inflated C2 unlock the obj is not on the lock stack. This second property makes it possible to avoid reading the owner (and checking if it is anonymous). Instead it can either just do an un-contended unlock by writing null to the owner, or if contention happens, simply write the thread to the owner and jump to the runtime. > > The aarch64 C2 port tries to avoid stronger memory semantics where ever possible. In C2 lock it first does a relaxed load of the mark word to check for inflation. Both lock and unlock uses a load/store exclusive register pair to transition the mark word. Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge remote-tracking branch 'upstream_jdk/pr/16606' into JDK-8319801 - Merge remote-tracking branch 'upstream_jdk/pr/16606' into JDK-8319801 - Merge remote-tracking branch 'upstream_jdk/pr/16606' into JDK-8319801 - 8319801: Recursive lightweight locking: aarch64 implementation - Cleanup: C2 fast_lock/fast_unlock aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16608/files - new: https://git.openjdk.org/jdk/pull/16608/files/5bc0d0ad..263b3061 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16608&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16608&range=02-03 Stats: 6367 lines in 174 files changed: 3596 ins; 822 del; 1949 mod Patch: https://git.openjdk.org/jdk/pull/16608.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16608/head:pull/16608 PR: https://git.openjdk.org/jdk/pull/16608 From rriggs at openjdk.org Tue Nov 21 15:26:51 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Tue, 21 Nov 2023 15:26:51 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v11] In-Reply-To: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: > Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. > The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. > > The changes include: > > - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate > if the input array is modified before the constructor returns. > The resulting string may contain any combination of characters sampled from the input array. > > - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. > For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. > If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. > If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. > Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. > Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. > > - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. > > - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. Roger Riggs has updated the pull request incrementally with two additional commits since the last revision: - Merge pull request #4 from cl4es/8311906_x64_intr_opt Simplified and slightly optimized x86 char_array_compress - Simplified and slightly optimized x86 char_array_compress ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16425/files - new: https://git.openjdk.org/jdk/pull/16425/files/04d58779..0256b9e0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16425&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16425&range=09-10 Stats: 61 lines in 1 file changed: 7 ins; 20 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/16425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16425/head:pull/16425 PR: https://git.openjdk.org/jdk/pull/16425 From tschatzl at openjdk.org Tue Nov 21 16:09:29 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 21 Nov 2023 16:09:29 GMT Subject: RFR: 8317809: Insertion of free code blobs into code cache can be very slow during class unloading Message-ID: <_dcFF70_w7IXSjb6w-HuHCBkPyS3a6NlzejtqdfdYnM=.74e0f9df-eca3-49b0-be68-2d5824c16003@github.com> Insert code blobs in a sorted fashion to exploit the finger-optimization when adding, making this procedure O(n) instead of O(n^2) Introduces a globally available ClassUnloadingContext that contains common methods pertaining to class and code unloading. GCs may use it to efficiently manage unlinked class loader datas and nmethods to allow use of common methods (unlink/merge). The steps typically are registering a new to be unlinked CLD/nmethod, and then purge its memory later. STW collectors perform this work in one big chunk taking the CodeCache_lock, for the entire duration, while concurrent collectors lock/unlock for every insertion to allow for concurrent users for the lock to progress. Some care has been taken to stay consistent with an "unloading = unlinking + purge" scheme; however particularly the existing CLD handling API (still) mixes unlinking and purging in its CLD::unload() call. To simplify this change that is mostly geared towards separating nmethod unlinking from purging, to make code blob freeing O(n) instead of O(n^2). Upcoming changes will * separate nmethod unregistering from nmethod purging to allow doing that in bulk (for the STW collectors); that can significantly reduce code purging time for the STW collectors. * better name the second stage of unlinking (called "cleaning" throughout, e.g. the work done in `G1CollectedHeap::complete_cleaning`) * untangle CLD unlinking and what's called "cleaning" now to allow moving more stuff into the second unlinking stage for better parallelism * G1: move some significant tasks from the remark pause to concurrent (unregistering nmethods, freeing code blobs and cld/metaspace purging) * Maybe move Serial/Parallel GC metaspace purging closer to other unlinking/purging code to keep things local and allow easier logging. Please also first looking into the (small) PR this depends on. The crash on linux-x86 is fixed by PR#16766 which I split out for quicker reviews. Testing: tier1-7 Thanks, Thomas ------------- Depends on: https://git.openjdk.org/jdk/pull/16733 Commit messages: - 8317809 Insert code blobs in a sorted fashion to exploit the finger-optimization when adding, making this procedure O(n) instead of O(n^2) Changes: https://git.openjdk.org/jdk/pull/16759/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16759&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8317809 Stats: 495 lines in 28 files changed: 368 ins; 83 del; 44 mod Patch: https://git.openjdk.org/jdk/pull/16759.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16759/head:pull/16759 PR: https://git.openjdk.org/jdk/pull/16759 From matsaave at openjdk.org Tue Nov 21 16:18:25 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 21 Nov 2023 16:18:25 GMT Subject: RFR: 8320278: ARM32 build is broken after JDK-8301997 [v2] In-Reply-To: References: Message-ID: > JDK-8301997 changed the handling of constant pool cache entries for methods and fully removed the ConstantPoolCacheEntry class. This commit included changes to the interpreters for all supported platforms except ARM32, and its omission resulted in a GHA failure. This patch intends to introduce an ARM32 port that reflects the code changes to the included platforms so the ARM32 code can build and thus pass testing. Verified with tier 1-5 tests on Oracle platforms but not ARM32. Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into method_entries_arm32 - Added breaks - Merge branch 'master' into method_entries_arm32 - Fixed copyright header - 8320278: ARM32 build is broken after JDK-8301997 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16749/files - new: https://git.openjdk.org/jdk/pull/16749/files/d9db3337..5b06dca2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16749&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16749&range=00-01 Stats: 9511 lines in 255 files changed: 4953 ins; 1698 del; 2860 mod Patch: https://git.openjdk.org/jdk/pull/16749.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16749/head:pull/16749 PR: https://git.openjdk.org/jdk/pull/16749 From stuefe at openjdk.org Tue Nov 21 16:40:33 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 21 Nov 2023 16:40:33 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v2] In-Reply-To: References: Message-ID: <-87BuqlmLl6uuGDwGdHl8IB-cbzyyCWGxlm5M-dsTkA=.76849dd6-f3ea-4b11-9222-5adcde9b4b51@github.com> > In `Metaspace::reserve_address_space_for_compressed_classes`, we reserve space for the future Klass range. We place the Klass range somewhere that allows us to use "good" narrow Klass decoding later when initializing the encoding scheme. > > Narrow Klass decoding is inherently CPU-specific, so doing this in shared coding is awkward. It leads to many ifdefs, vague code comments that are difficult to explain, and missed optimizations. > > There are common patterns: > - all platforms benefit from unscaled encoding so trying to reserve <4GB for CDS=off is worthwhile. > > But there are more differences than one would think: > - some platforms (s390, riscv) benefit from reservation < 4GB even with CDS=on since a 32-bit immediate requires fewer instructions > - some platforms (aarch64) don't benefit from zero-based encoding, so no need to try that > - some platforms benefit from optimizing the base for 16-bit moves (PPC, s390, aarch64) or for other immediate formats (riscv) > > It would be much better to have this section per CPU so that every CPU can implement its perfect, well documented version. A bit of code duplication is a good price for code clarity. > > ------------- > > This patch splits out `Metaspace::reserve_address_space_for_compressed_classes` into five variants, one per CPU (moving the code to CompressedKlassPointers); it also splits out `CompressedKlassPointers::initialize` into two variants, one for aarch64, one for all other platforms. > > Changes per-CPU: > > #### aarch64: > > Don't attempt to reserve for zero-based encoding; since lsl is not faster than movk. We reserve for movk mode right away if reserve for unscaled fails or if CDS=on. > > We also add a last-ditch attempt to reserve optimized for movk via over-alignment. We only do this on aarch64 to prevent errors like this one JDK-8318119: "Invalid narrow Klass base on aarch64 post 8312018" > > Since we don't want zero-based encoding, we need an aarch64-specific version for `CompressedKlassPointers::initialize()` > > #### riscv: > > We attempt to reserve at a "good" base that has only bits set either in [12..32), [32, 44) or in [44, 64). > > #### s390: > > We attempt to allocate < 4GB unconditionally. Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8320368-Per-CPU-optimization-of-Klass-range-reservation' of github.com:tstuefe/jdk into JDK-8320368-Per-CPU-optimization-of-Klass-range-reservation - Regression Test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16743/files - new: https://git.openjdk.org/jdk/pull/16743/files/03a1b149..ad702f95 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16743&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16743&range=00-01 Stats: 124 lines in 3 files changed: 123 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16743.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16743/head:pull/16743 PR: https://git.openjdk.org/jdk/pull/16743 From shade at openjdk.org Tue Nov 21 16:42:12 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 21 Nov 2023 16:42:12 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v6] In-Reply-To: References: <9a-07LkAZPw_SiOpo1uO8uA9i66AVNi9OkJjNZfSHU0=.8f4feee9-2b76-4006-a5ca-f0ce2d71c6b0@github.com> Message-ID: On Wed, 15 Nov 2023 22:54:14 GMT, Patricio Chilano Mateo wrote: >> @pchilano can you have look ? > >> @pchilano can you have look ? >> > I will. I might not finish the review until next week though. @pchilano @dholmes-ora @cl4es, may I ask you to check if anyone at Oracle did the testing runs for this PR, and schedule a run if not? I am sure we are good with current testing, but additional safety would be nice to avoid surprises this close to JDK 22 RDP1. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1821276284 From kvn at openjdk.org Tue Nov 21 16:51:18 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 21 Nov 2023 16:51:18 GMT Subject: Integrated: 8320272: Make method_entry_barrier address shared In-Reply-To: <32KZcukg8d6Fjn0bnPQIa-X4q_o8DU0XtcHYA5YT468=.c9f33a2d-215d-404c-822b-448c8fd443d4@github.com> References: <32KZcukg8d6Fjn0bnPQIa-X4q_o8DU0XtcHYA5YT468=.c9f33a2d-215d-404c-822b-448c8fd443d4@github.com> Message-ID: <7nP-iFYNkh_-yiUqE69FmAdNC5mKOhk0XCl7X_4ZCHI=.4ea0ea93-153c-4783-a733-9b0e2aa411ce@github.com> On Fri, 17 Nov 2023 16:10:25 GMT, Vladimir Kozlov wrote: > Currently all platforms have declared their own address variable for method_entry_barrier stub. Some have even slightly different name: nmethod_entry_barrier. For Leyden project one address is preferable. > In aarch64 code changed `movptr` to `lea` instruction to get relocation info as on x86. > > Tested x86 and aarch64, tier1-4, xcomp, stress. I need help to test on other platforms. Thanks! This pull request has now been integrated. Changeset: c4aba875 Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/c4aba875708f1701e8f1b6fa9676f42e235ec461 Stats: 77 lines in 32 files changed: 15 ins; 41 del; 21 mod 8320272: Make method_entry_barrier address shared Reviewed-by: dlong ------------- PR: https://git.openjdk.org/jdk/pull/16708 From shade at openjdk.org Tue Nov 21 17:01:12 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 21 Nov 2023 17:01:12 GMT Subject: RFR: 8318776: Require supports_cx8 to always be true [v4] In-Reply-To: References: Message-ID: On Tue, 21 Nov 2023 06:03:38 GMT, Erik ?sterlund wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unnecessary includes of vm_version.hpp. >> Fix copyright years. > > This looks great! > Thanks for the review @fisk ! I have to wait for a few Zero related PRs to get integrated then re-merge, before I can integrate. Zero patches were pushed, please re-merge. I checked current mainline works well with at least linux-arm-zero-fastdebug, and I would like to re-test it with this patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16625#issuecomment-1821309132 From pchilanomate at openjdk.org Tue Nov 21 17:08:08 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 21 Nov 2023 17:08:08 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v6] In-Reply-To: References: <9a-07LkAZPw_SiOpo1uO8uA9i66AVNi9OkJjNZfSHU0=.8f4feee9-2b76-4006-a5ca-f0ce2d71c6b0@github.com> Message-ID: On Wed, 15 Nov 2023 22:54:14 GMT, Patricio Chilano Mateo wrote: >> @pchilano can you have look ? > >> @pchilano can you have look ? >> > I will. I might not finish the review until next week though. > @pchilano @dholmes-ora @cl4es, may I ask you to check if anyone at Oracle did the testing runs for this PR, and schedule a run if not? I am sure we are good with current testing, but additional safety would be nice to avoid surprises this close to JDK 22 RDP1. > I'll schedule a round of testing for Tiers1-7 with the latest version. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1821320886 From evergizova at openjdk.org Tue Nov 21 17:41:28 2023 From: evergizova at openjdk.org (Ekaterina Vergizova) Date: Tue, 21 Nov 2023 17:41:28 GMT Subject: RFR: 8314220: Configurable InlineCacheBuffer size [v3] In-Reply-To: References: Message-ID: On Tue, 21 Nov 2023 01:40:26 GMT, Dean Long wrote: >> src/hotspot/share/compiler/compilerDefinitions.cpp line 503: >> >>> 501: jio_fprintf(defaultStream::error_stream(), >>> 502: "Invalid InlineCacheBufferSize=" SIZE_FORMAT "K. Must be less than NonNMethodCodeHeapSize=" SIZE_FORMAT "K.\n", >>> 503: InlineCacheBufferSize/K, NonNMethodCodeHeapSize/K); >> >> You need to check for alignment of the value. In [StubQueue()](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/stubs.cpp#L70) it is aligned up by `2*BytesPerWord` so the final value could be > `InlineCacheBufferSize`. > > I think the align up to 2*BytesPerWord is not really need, because BufferBlob::create already does its own alignment which will make the final value > InlineCacheBufferSize. BufferBlob::create uses the size as a minimum, not a maximum. > I don't think the above check should need to know the details of BufferBlob::create and StubQueue() alignment adjustments. Having InlineCacheBufferSize near to NonNMethodCodeHeapSize is going the make the JVM fail in startup for other reasons, isn't it? Maybe the max for InlineCacheBufferSize should be NonNMethodCodeHeapSize/2? Thanks @dean-long and @vnkozlov, I updated InlineCacheBufferSize limit to NonNMethodCodeHeapSize/2. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15271#discussion_r1400945799 From evergizova at openjdk.org Tue Nov 21 17:41:26 2023 From: evergizova at openjdk.org (Ekaterina Vergizova) Date: Tue, 21 Nov 2023 17:41:26 GMT Subject: RFR: 8314220: Configurable InlineCacheBuffer size [v4] In-Reply-To: References: Message-ID: <0WB7f9VNThtrvvvopegzy51wIssRwuXpuiwdq5_2r8w=.82a3753c-7ac1-4d46-a6bc-6817cdedfd53@github.com> > InlineCacheBuffer size is currently hardcoded to 10K. > This can lead to multiple ICBufferFull safepoints for InlineCacheBuffer cleanup and possible performance degradation. > > Added experimental command line option InlineCacheBufferSize with the same default value, allowing it to be configured for performance experiments with ICBufferFull safepoints frequency. Ekaterina Vergizova has updated the pull request incrementally with one additional commit since the last revision: Changed InlineCacheBufferSize limit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15271/files - new: https://git.openjdk.org/jdk/pull/15271/files/52b0260a..3ee22afb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15271&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15271&range=02-03 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/15271.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15271/head:pull/15271 PR: https://git.openjdk.org/jdk/pull/15271 From rkennke at openjdk.org Tue Nov 21 17:51:08 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 21 Nov 2023 17:51:08 GMT Subject: RFR: JDK-8320382: Remove CompressedKlassPointers::is_valid_base() In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 07:47:16 GMT, Thomas Stuefe wrote: > `CompressedKlassPointers::is_valid_base(addr)` abstracts away platform-specific requirements that may limit the use of an address as narrow Klass encoding base. It only ever mattered on aarch64, where we cannot use any arbitrary address as 64-bit immediate for the base. > > Experience shows that this is a case where the abstraction does not help much. Hiding a very CPU-specific limitation under a generic function made arguing about it difficult. We therefore decided to scrap that function. > > It is only used for two things: > - asserts at runtime; those are unnecessary since we have an assert in macroAssembler_aarch64.cpp that will fire if the base is not correct > - the one legitimate use case is checking the user input for -XX:SharedBaseAddress at dump time. We can just express the aarch64 requirement directly, which is clearer to understand. > > Note that the function has also been incorrect, since it ignored aarch64 EOR mode, and required 32GB alignment for addresses beyond 32GB. However, we can make any 4GB aligned address to work with movk, so the requirement can be simplified to "is 4GB-aligned". > > (this is a preparatory patch for [JDK-8320368](https://bugs.openjdk.org/browse/JDK-8320368)) I think it's ok. I think there is an advantage in checking the base early instead of somewhere deep in MA, at which point I'd probably wonder where the base came from. Maybe there is a way to achieve the simplification that you had in mind while also retaining those checks? ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16727#pullrequestreview-1742703747 From cslucas at openjdk.org Tue Nov 21 17:58:35 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 21 Nov 2023 17:58:35 GMT Subject: RFR: JDK-8241503: C2: Share MacroAssembler between mach nodes during code emission [v2] In-Reply-To: References: Message-ID: > # Description > > Please review this PR with a patch to re-use the same C2_MacroAssembler object to emit all instructions in the same compilation unit. > > Overall, the change is pretty simple. However, due to the renaming of the variable to access C2_MacroAssembler, from `_masm.` to `masm->`, and also some method prototype changes, the patch became quite large. > > # Help Needed for Testing > > I don't have access to all platforms necessary to test this. I hope some other folks can help with testing on `S390`, `RISC-V` and `PPC`. > > # Testing status > > ## tier1 > > | | Win | Mac | Linux | > |----------|---------|---------|---------| > | ARM64 | | | | > | ARM32 | | | | > | x86 | | | | > | x64 | | | | > | PPC64 | | | | > | S390x | | | | > | RiscV | | | | Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Catch up with changes on master - Reuse same C2_MacroAssembler object to emit instructions. ------------- Changes: https://git.openjdk.org/jdk/pull/16484/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16484&range=01 Stats: 3356 lines in 60 files changed: 1039 ins; 426 del; 1891 mod Patch: https://git.openjdk.org/jdk/pull/16484.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16484/head:pull/16484 PR: https://git.openjdk.org/jdk/pull/16484 From psandoz at openjdk.org Tue Nov 21 18:17:11 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 21 Nov 2023 18:17:11 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: References: Message-ID: On Mon, 23 Oct 2023 09:02:35 GMT, Xiaohong Gong wrote: > This looks good. As far as I can tell the choice you've made of accuracy matches what we need to meet the spec. Same here . Sinh/cosh/tanh/expm1 are specified to be within 2.5 ulps of the exact result, but i believe sleef does not offer that option, it's either within 1 or 3.5, so we have to pick the former. AFAICT sleef does not say anything about monotonicity, but we relax semi-monotonicity for all but sqrt (we defer to IEEE 754). ------------- PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1821422985 From kvn at openjdk.org Tue Nov 21 18:21:09 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 21 Nov 2023 18:21:09 GMT Subject: RFR: 8314220: Configurable InlineCacheBuffer size [v4] In-Reply-To: <0WB7f9VNThtrvvvopegzy51wIssRwuXpuiwdq5_2r8w=.82a3753c-7ac1-4d46-a6bc-6817cdedfd53@github.com> References: <0WB7f9VNThtrvvvopegzy51wIssRwuXpuiwdq5_2r8w=.82a3753c-7ac1-4d46-a6bc-6817cdedfd53@github.com> Message-ID: On Tue, 21 Nov 2023 17:41:26 GMT, Ekaterina Vergizova wrote: >> InlineCacheBuffer size is currently hardcoded to 10K. >> This can lead to multiple ICBufferFull safepoints for InlineCacheBuffer cleanup and possible performance degradation. >> >> Added experimental command line option InlineCacheBufferSize with the same default value, allowing it to be configured for performance experiments with ICBufferFull safepoints frequency. > > Ekaterina Vergizova has updated the pull request incrementally with one additional commit since the last revision: > > Changed InlineCacheBufferSize limit Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15271#pullrequestreview-1742782106 From jbhateja at openjdk.org Tue Nov 21 18:36:18 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 21 Nov 2023 18:36:18 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v2] In-Reply-To: References: Message-ID: On Tue, 21 Nov 2023 00:37:23 GMT, Volodymyr Paprotski wrote: >> Splitting vblendvp[sd] into boolean operations is bit faster on ECore, get up to 30% gain >> >> >> =============== BEFORE =============== >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorSignum.floatSignum 256 avgt 3 77.766 ? 0.049 ns/op >> VectorSignum.floatSignum 512 avgt 3 154.889 ? 0.242 ns/op >> VectorSignum.floatSignum 1024 avgt 3 306.130 ? 0.605 ns/op >> VectorSignum.floatSignum 2048 avgt 3 609.965 ? 0.927 ns/op >> VectorSignum.doubleSignum 256 avgt 3 151.874 ? 1.748 ns/op >> VectorSignum.doubleSignum 512 avgt 3 303.080 ? 0.310 ns/op >> VectorSignum.doubleSignum 1024 avgt 3 607.517 ? 0.597 ns/op >> VectorSignum.doubleSignum 2048 avgt 3 1214.282 ? 1.834 ns/op >> Benchmark Mode Cnt Score Error Units >> MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op >> MaxMinOptimizeTest.dMax avgt 3 137.334 ? 0.128 us/op >> MaxMinOptimizeTest.dMin avgt 3 137.160 ? 0.465 us/op >> MaxMinOptimizeTest.dMul avgt 3 77.231 ? 0.051 us/op >> MaxMinOptimizeTest.fAdd avgt 3 77.165 ? 0.003 us/op >> MaxMinOptimizeTest.fMax avgt 3 107.428 ? 1.501 us/op >> MaxMinOptimizeTest.fMin avgt 3 107.186 ? 0.022 us/op >> MaxMinOptimizeTest.fMul avgt 3 77.164 ? 0.012 us/op >> >> =============== AFTER =============== >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorSignum.floatSignum 256 avgt 3 61.816 ? 1.980 ns/op >> VectorSignum.floatSignum 512 avgt 3 117.251 ? 0.052 ns/op >> VectorSignum.floatSignum 1024 avgt 3 231.356 ? 0.397 ns/op >> VectorSignum.floatSignum 2048 avgt 3 458.904 ? 0.774 ns/op >> VectorSignum.doubleSignum 256 avgt 3 121.449 ? 0.184 ns/op >> VectorSignum.doubleSignum 512 avgt 3 241.662 ? 0.189 ns/op >> VectorSignum.doubleSignum 1024 avgt 3 482.365 ? 0.165 ns/op >> VectorSignum.doubleSignum 2048 avgt 3 962.412 ? 1.401 ns/op >> Benchmark Mode Cnt Score Error Units >> MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op >> MaxMinOptimizeTest.dMax avgt 3 125.701 ? 0.082 us/op >> MaxMinOptimizeTest.dMin avgt 3 124.704 ? 0.119 us/op >> MaxMinOptimizeTest.dMul avgt 3 77.232 ? 0.028 us/op >> MaxMinOptimizeTest.fAdd avgt 3 77.169 ? 0.103 us/op >> MaxMinOptimizeTest.fMax avgt 3 97.939 ? 0.477 us/op >> MaxMinOptimizeTest.fMin avgt 3 98.012 ? 0.154 us/op >> MaxMinO... > > Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge remote-tracking branch 'jdk/master' into vp-ecore2 > - review comments > - emulate vblend on ecores src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1112: > 1110: void (MacroAssembler::*vblend)(XMMRegister, XMMRegister, XMMRegister, XMMRegister, int, bool, XMMRegister); > 1111: void (MacroAssembler::*vmaxmin)(XMMRegister, XMMRegister, XMMRegister, int); > 1112: void (MacroAssembler::*vcmp)(XMMRegister, XMMRegister, XMMRegister, int, int); We do support C++11 dialect, you can use following declarations. using vblend = void (*) (XMMRegister, XMMRegister, XMMRegister, XMMRegister, int, bool, XMMRegister); src/hotspot/cpu/x86/macroAssembler_x86.cpp line 3577: > 3575: if (EnableX86ECoreOpts && scratch_available && dst_available) { > 3576: XMMRegister full_mask = mask; > 3577: if (!fully_masked) { name change suggestion for better understanding. fully_masked -> compute_mask src/hotspot/cpu/x86/macroAssembler_x86.cpp line 3601: > 3599: if (EnableX86ECoreOpts && scratch_available && dst_available) { > 3600: XMMRegister full_mask = mask; > 3601: if (!fully_masked) { Same a above fully_masked -> compute_mask, remove full_mask. src/hotspot/cpu/x86/x86.ad line 7840: > 7838: match(Set dst (VectorBlend (Binary src1 src2) mask)); > 7839: format %{ "vector_blend $dst,$src1,$src2,$mask\t! using $vtmp as TEMP" %} > 7840: effect(TEMP vtmp, TEMP dst); TEMP dst can be removed. src/hotspot/cpu/x86/x86_64.ad line 4519: > 4517: __ vcmpps($btmp$$XMMRegister, $atmp$$XMMRegister, $atmp$$XMMRegister, Assembler::_false, vector_len); > 4518: __ vblendvps($dst$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, vector_len, true, $btmp$$XMMRegister); > 4519: } Please move into a new macro assembly routine. src/hotspot/cpu/x86/x86_64.ad line 4568: > 4566: __ vcmppd($btmp$$XMMRegister, $atmp$$XMMRegister, $atmp$$XMMRegister, Assembler::_false, vector_len); > 4567: __ vblendvpd($dst$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, vector_len, true, $btmp$$XMMRegister); > 4568: } Please move to a new macro assembly routine. src/hotspot/cpu/x86/x86_64.ad line 4616: > 4614: __ vcmpps($btmp$$XMMRegister, $atmp$$XMMRegister, $atmp$$XMMRegister, Assembler::_false, vector_len); > 4615: __ vblendvps($dst$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, vector_len, true, $btmp$$XMMRegister); > 4616: } Please move to a new macro assembly routine. src/hotspot/cpu/x86/x86_64.ad line 4645: > 4643: "vcmppd.unordered $btmp,$atmp,$atmp \n\t" > 4644: "vblendvpd $dst,$tmp,$atmp,$btmp \n\t" > 4645: %} Format block may not be valid for e-cores, you can replace it with following to be consistent on both the cores. ` minD $dst, $a, $b \t! using %tmp, %atmp and %btmp as TEMP ` src/hotspot/cpu/x86/x86_64.ad line 4665: > 4663: __ vcmppd($btmp$$XMMRegister, $atmp$$XMMRegister, $atmp$$XMMRegister, Assembler::_false, vector_len); > 4664: __ vblendvpd($dst$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, vector_len, true, $btmp$$XMMRegister); > 4665: } Please move this logic into a new macro assembly routine. test/hotspot/jtreg/compiler/vectorization/TestSignumVector.java line 112: > 110: if (fout[i] != 1.0) throw new RuntimeException("Expected positive numbers in second half of array: " + java.util.Arrays.toString(fout)); > 111: } > 112: } Its ok to add correctness check here, but test only intend to perform check IR validations, there are detailed function tests in following files test/hotspot/jtreg/compiler/intrinsics/math/TestSignumIntrinsic.java test/hotspot/jtreg/compiler/c2/cr6340864/TestFloatVect.java test/hotspot/jtreg/compiler/c2/cr6340864/TestDoubleVect.java test/hotspot/jtreg/compiler/vectorization/runner/BasicFloatOpTest.java line 119: > 117: } > 118: } > 119: Test performs IR validation, you can also update existing functional test with more test values. test/hotspot/jtreg/compiler/intrinsics/math/TestFpMinMaxIntrinsics.java ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1400923128 PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1400953279 PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1400987897 PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1400983194 PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1400985305 PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1400985798 PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1400986113 PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1400969890 PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1400976470 PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1401000788 PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1401006644 From jbhateja at openjdk.org Tue Nov 21 18:36:20 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 21 Nov 2023 18:36:20 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v2] In-Reply-To: References: Message-ID: On Tue, 21 Nov 2023 17:44:11 GMT, Jatin Bhateja wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge remote-tracking branch 'jdk/master' into vp-ecore2 >> - review comments >> - emulate vblend on ecores > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 3577: > >> 3575: if (EnableX86ECoreOpts && scratch_available && dst_available) { >> 3576: XMMRegister full_mask = mask; >> 3577: if (!fully_masked) { > > name change suggestion for better understanding. fully_masked -> compute_mask We can also remove full_mask register and directly update mask if compute mask is true. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1400957739 From sviswanathan at openjdk.org Tue Nov 21 19:04:11 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 21 Nov 2023 19:04:11 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v2] In-Reply-To: References: Message-ID: On Tue, 21 Nov 2023 18:10:30 GMT, Jatin Bhateja wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge remote-tracking branch 'jdk/master' into vp-ecore2 >> - review comments >> - emulate vblend on ecores > > src/hotspot/cpu/x86/x86.ad line 7840: > >> 7838: match(Set dst (VectorBlend (Binary src1 src2) mask)); >> 7839: format %{ "vector_blend $dst,$src1,$src2,$mask\t! using $vtmp as TEMP" %} >> 7840: effect(TEMP vtmp, TEMP dst); > > TEMP dst can be removed. TEMP dst is needed because otherwise register allocator can allocate dst and vtmp to be same. The vpand will then overwrite vtmp. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1401041831 From matsaave at openjdk.org Tue Nov 21 19:06:13 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 21 Nov 2023 19:06:13 GMT Subject: RFR: 8320278: ARM32 build is broken after JDK-8301997 [v2] In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 22:18:07 GMT, Coleen Phillimore wrote: >> Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Merge branch 'master' into method_entries_arm32 >> - Added breaks >> - Merge branch 'master' into method_entries_arm32 >> - Fixed copyright header >> - 8320278: ARM32 build is broken after JDK-8301997 > > This looks good. This should help the arm32 porters get started and fix the compilation errors. I think in the description, you meant to say you ran tier1-4 on Oracle platforms not arm32. Thanks for the reviews @coleenp and @tstuefe! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16749#issuecomment-1821495679 From matsaave at openjdk.org Tue Nov 21 19:06:14 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 21 Nov 2023 19:06:14 GMT Subject: Integrated: 8320278: ARM32 build is broken after JDK-8301997 In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 20:37:23 GMT, Matias Saavedra Silva wrote: > JDK-8301997 changed the handling of constant pool cache entries for methods and fully removed the ConstantPoolCacheEntry class. This commit included changes to the interpreters for all supported platforms except ARM32, and its omission resulted in a GHA failure. This patch intends to introduce an ARM32 port that reflects the code changes to the included platforms so the ARM32 code can build and thus pass testing. Verified with tier 1-5 tests on Oracle platforms but not ARM32. This pull request has now been integrated. Changeset: 6d824364 Author: Matias Saavedra Silva URL: https://git.openjdk.org/jdk/commit/6d824364c2fefa3185a8a15bdd41537fad31427c Stats: 294 lines in 5 files changed: 105 ins; 128 del; 61 mod 8320278: ARM32 build is broken after JDK-8301997 Reviewed-by: coleenp, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/16749 From jbhateja at openjdk.org Tue Nov 21 19:07:06 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 21 Nov 2023 19:07:06 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore In-Reply-To: <4s9WrUMXWP-tkU4dGV8iwhWJUJDOlZoORvw4PiO5UuY=.a8ebb4bf-2325-473a-94bf-9d38b62dc80b@github.com> References: <4s9WrUMXWP-tkU4dGV8iwhWJUJDOlZoORvw4PiO5UuY=.a8ebb4bf-2325-473a-94bf-9d38b62dc80b@github.com> Message-ID: <3h1ByALFuwhkDABJAnNtBq6bZziHzSE4ZDx4knZMJDw=.fb43ee82-8f81-4a93-a3df-d726ebd824ce@github.com> On Mon, 20 Nov 2023 21:32:54 GMT, Volodymyr Paprotski wrote: > > Hi @vpaprotsk , please add checks to skip special emulation for 128 bit vectors at applicable places, as per section "4.1.8.4 256-bit Variable Blend Instructions" of x86 optimization manual variable blends are micro-coded only for 256 bit vectors. > > I went and remeasured performance of 128-bit vectors with `-XX:MaxVectorSize=16`... > > ``` > =============== BEFORE =============== > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.232 ? 0.034 us/op > MaxMinOptimizeTest.dMax avgt 3 149.242 ? 2.373 us/op > MaxMinOptimizeTest.dMin avgt 3 150.000 ? 1.763 us/op > MaxMinOptimizeTest.dMul avgt 3 77.237 ? 0.020 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.156 ? 0.012 us/op > MaxMinOptimizeTest.fMax avgt 3 110.729 ? 0.743 us/op > MaxMinOptimizeTest.fMin avgt 3 110.716 ? 0.157 us/op > MaxMinOptimizeTest.fMul avgt 3 77.157 ? 0.017 us/op > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 134.137 ? 4.586 ns/op > VectorSignum.floatSignum 512 avgt 3 258.117 ? 0.518 ns/op > VectorSignum.floatSignum 1024 avgt 3 512.706 ? 5.924 ns/op > VectorSignum.floatSignum 2048 avgt 3 979.276 ? 46.734 ns/op > VectorSignum.doubleSignum 256 avgt 3 233.108 ? 5.314 ns/op > VectorSignum.doubleSignum 512 avgt 3 457.757 ? 3.537 ns/op > VectorSignum.doubleSignum 1024 avgt 3 907.037 ? 2.768 ns/op > VectorSignum.doubleSignum 2048 avgt 3 1816.200 ? 15.869 ns/op > > =============== AFTER =============== > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.238 ? 0.092 us/op > MaxMinOptimizeTest.dMax avgt 3 106.636 ? 0.072 us/op > MaxMinOptimizeTest.dMin avgt 3 103.060 ? 0.129 us/op > MaxMinOptimizeTest.dMul avgt 3 77.233 ? 0.044 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.158 ? 0.021 us/op > MaxMinOptimizeTest.fMax avgt 3 105.256 ? 1.682 us/op > MaxMinOptimizeTest.fMin avgt 3 103.126 ? 0.049 us/op > MaxMinOptimizeTest.fMul avgt 3 77.155 ? 0.019 us/op > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 60.523 ? 0.026 ns/op > VectorSignum.floatSignum 512 avgt 3 118.415 ? 0.076 ns/op > VectorSignum.floatSignum 1024 avgt 3 235.203 ? 0.323 ns/op > VectorSignum.floatSignum 2048 avgt 3 467.230 ? 0.144 ns/op > VectorSignum.doubleSignum 256 avgt 3 120.955 ? 0.217 ns/op > VectorSignum.doubleSignum 512 avgt 3 241.753 ? 0.371 ns/op > VectorSignum.doubleSignum 1024 avgt 3 498.055 ? 0.410 ns/op > VectorSignum.doubleSignum 2048 avgt 3 974.891 ? 1.472 ns/op > ``` > > For Max/Min, keeping this patch gets us up to 40%, and `VectorSignum.*Signum`, the fix is actually >2x. I see following results on cascade lake -XX:+UnlockDiagnosticVMOptions -XX:-EnableX86ECoreOpts -XX:MaxVectorSize=16 Benchmark Mode Cnt Score Error Units MaxMinOptimizeTest.dMax avgt 2 119.131 us/op MaxMinOptimizeTest.dMax:asm avgt NaN --- MaxMinOptimizeTest.dMin avgt 2 117.812 us/op MaxMinOptimizeTest.dMin:asm avgt NaN --- -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts -XX:MaxVectorSize=16 Benchmark Mode Cnt Score Error Units MaxMinOptimizeTest.dMax avgt 2 128.076 us/op MaxMinOptimizeTest.dMax:asm avgt NaN --- MaxMinOptimizeTest.dMin avgt 2 126.978 us/op MaxMinOptimizeTest.dMin:asm avgt NaN --- ------------- PR Comment: https://git.openjdk.org/jdk/pull/16716#issuecomment-1821505204 From jbhateja at openjdk.org Tue Nov 21 19:27:10 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 21 Nov 2023 19:27:10 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore In-Reply-To: <4s9WrUMXWP-tkU4dGV8iwhWJUJDOlZoORvw4PiO5UuY=.a8ebb4bf-2325-473a-94bf-9d38b62dc80b@github.com> References: <4s9WrUMXWP-tkU4dGV8iwhWJUJDOlZoORvw4PiO5UuY=.a8ebb4bf-2325-473a-94bf-9d38b62dc80b@github.com> Message-ID: On Mon, 20 Nov 2023 21:32:54 GMT, Volodymyr Paprotski wrote: > > Hi @vpaprotsk , please add checks to skip special emulation for 128 bit vectors at applicable places, as per section "4.1.8.4 256-bit Variable Blend Instructions" of x86 optimization manual variable blends are micro-coded only for 256 bit vectors. > > I went and remeasured performance of 128-bit vectors with `-XX:MaxVectorSize=16`... > > ``` > =============== BEFORE =============== > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.232 ? 0.034 us/op > MaxMinOptimizeTest.dMax avgt 3 149.242 ? 2.373 us/op > MaxMinOptimizeTest.dMin avgt 3 150.000 ? 1.763 us/op > MaxMinOptimizeTest.dMul avgt 3 77.237 ? 0.020 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.156 ? 0.012 us/op > MaxMinOptimizeTest.fMax avgt 3 110.729 ? 0.743 us/op > MaxMinOptimizeTest.fMin avgt 3 110.716 ? 0.157 us/op > MaxMinOptimizeTest.fMul avgt 3 77.157 ? 0.017 us/op > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 134.137 ? 4.586 ns/op > VectorSignum.floatSignum 512 avgt 3 258.117 ? 0.518 ns/op > VectorSignum.floatSignum 1024 avgt 3 512.706 ? 5.924 ns/op > VectorSignum.floatSignum 2048 avgt 3 979.276 ? 46.734 ns/op > VectorSignum.doubleSignum 256 avgt 3 233.108 ? 5.314 ns/op > VectorSignum.doubleSignum 512 avgt 3 457.757 ? 3.537 ns/op > VectorSignum.doubleSignum 1024 avgt 3 907.037 ? 2.768 ns/op > VectorSignum.doubleSignum 2048 avgt 3 1816.200 ? 15.869 ns/op > > =============== AFTER =============== > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.238 ? 0.092 us/op > MaxMinOptimizeTest.dMax avgt 3 106.636 ? 0.072 us/op > MaxMinOptimizeTest.dMin avgt 3 103.060 ? 0.129 us/op > MaxMinOptimizeTest.dMul avgt 3 77.233 ? 0.044 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.158 ? 0.021 us/op > MaxMinOptimizeTest.fMax avgt 3 105.256 ? 1.682 us/op > MaxMinOptimizeTest.fMin avgt 3 103.126 ? 0.049 us/op > MaxMinOptimizeTest.fMul avgt 3 77.155 ? 0.019 us/op > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 60.523 ? 0.026 ns/op > VectorSignum.floatSignum 512 avgt 3 118.415 ? 0.076 ns/op > VectorSignum.floatSignum 1024 avgt 3 235.203 ? 0.323 ns/op > VectorSignum.floatSignum 2048 avgt 3 467.230 ? 0.144 ns/op > VectorSignum.doubleSignum 256 avgt 3 120.955 ? 0.217 ns/op > VectorSignum.doubleSignum 512 avgt 3 241.753 ? 0.371 ns/op > VectorSignum.doubleSignum 1024 avgt 3 498.055 ? 0.410 ns/op > VectorSignum.doubleSignum 2048 avgt 3 974.891 ? 1.472 ns/op > ``` > > For Max/Min, keeping this patch gets us up to 40%, and `VectorSignum.*Signum`, the fix is actually >2x. Thanks for clarification, I check latency for variable blend is 5 cycles on E-cores and that explains the perf improvements. https://uops.info/html-lat/ADL-E/VBLENDVPS_XMM_XMM_XMM_XMM-Measurements.html ------------- PR Comment: https://git.openjdk.org/jdk/pull/16716#issuecomment-1821547115 From duke at openjdk.org Tue Nov 21 19:35:09 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 21 Nov 2023 19:35:09 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v2] In-Reply-To: References: Message-ID: On Tue, 21 Nov 2023 17:17:10 GMT, Jatin Bhateja wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge remote-tracking branch 'jdk/master' into vp-ecore2 >> - review comments >> - emulate vblend on ecores > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1112: > >> 1110: void (MacroAssembler::*vblend)(XMMRegister, XMMRegister, XMMRegister, XMMRegister, int, bool, XMMRegister); >> 1111: void (MacroAssembler::*vmaxmin)(XMMRegister, XMMRegister, XMMRegister, int); >> 1112: void (MacroAssembler::*vcmp)(XMMRegister, XMMRegister, XMMRegister, int, int); > > We do support C++11 dialect, you can use following declarations. > using vblend = void (*) (XMMRegister, XMMRegister, XMMRegister, XMMRegister, int, bool, XMMRegister); Its a member-function pointer, not a regular function pointer.. but `using` function pointer declaration is definitely an improvement I wasn't aware of. Trying to find chapter-and-verse in C++ spec on declaration of member-function pointers and their use with `using`. Will fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1401074293 From szaldana at openjdk.org Tue Nov 21 20:29:12 2023 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Tue, 21 Nov 2023 20:29:12 GMT Subject: RFR: 8257076: os::scan_pages is empty on all platforms Message-ID: The function os::scan_pages was only ever implemented in Solaris and then removed in [JDK-8244224](https://bugs.openjdk.org/browse/JDK-8244224) All other platforms have empty implementations and the interface is not optimal as os::scan_pages expects the range to have just one page size, while in reality it can have multiple. This PR removes this interface, ensuing empty implementations and all dead code related to page scanning. Testing: Tier 1. ------------- Commit messages: - More dead code - Removing more dead code - 8257076: os::scan_pages is empty on all platforms Changes: https://git.openjdk.org/jdk/pull/16740/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16740&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8257076 Stats: 83 lines in 7 files changed: 0 ins; 82 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16740.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16740/head:pull/16740 PR: https://git.openjdk.org/jdk/pull/16740 From cjplummer at openjdk.org Tue Nov 21 20:45:17 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 21 Nov 2023 20:45:17 GMT Subject: RFR: 8314029: Add file name parameter to Compiler.perfmap In-Reply-To: References: Message-ID: On Fri, 22 Sep 2023 02:48:57 GMT, David Holmes wrote: >> `jcmd Compiler.perfmap` uses the hard-coded file name for a perf map: `/tmp/perf-%d.map`. This change adds an option for specifying a file name. >> >> The help message of Compiler.perfmap: >> >> Compiler.perfmap >> Write map file for Linux perf tool. >> >> Impact: Low >> >> Syntax : Compiler.perfmap [options] >> >> Options: (options must be specified using the or = syntax) >> filename : [optional] Name of the map file (STRING, no default value) > > src/jdk.jcmd/share/man/jcmd.1 line 1: > >> 1: .\" Copyright (c) 2012, 2023, Oracle and/or its affiliates. All rights reserved. > > The actual markdown source for this file needs to be updated with these changes. Those sources are not open-source unfortunately. Please either coordinate to get the sources updated with an Oracle developer as part of this PR (they will integrate the internal part), or else please defer this to a subtask and let an Oracle developer update the source and output at the same time. Thanks. I filed JDK-8320556 to update the closed source. It's assigned to me. I'll do the update after these changes are pushed. In the meantime I'll make sure the current jcmd.l changes are correct and match the closed changes I'll be making. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15871#discussion_r1401151891 From cjplummer at openjdk.org Tue Nov 21 21:34:14 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 21 Nov 2023 21:34:14 GMT Subject: RFR: 8314029: Add file name parameter to Compiler.perfmap In-Reply-To: References: Message-ID: On Thu, 21 Sep 2023 20:43:56 GMT, Yi-Fan Tsai wrote: > `jcmd Compiler.perfmap` uses the hard-coded file name for a perf map: `/tmp/perf-%d.map`. This change adds an option for specifying a file name. > > The help message of Compiler.perfmap: > > Compiler.perfmap > Write map file for Linux perf tool. > > Impact: Low > > Syntax : Compiler.perfmap [options] > > Options: (options must be specified using the or = syntax) > filename : [optional] Name of the map file (STRING, no default value) @yftsai Since you requested a CSR review, I assume you want this PR re-opened. Can you please re-open it in that case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15871#issuecomment-1821721789 From duke at openjdk.org Tue Nov 21 21:36:08 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 21 Nov 2023 21:36:08 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v2] In-Reply-To: References: Message-ID: <0pYknfG2fhvKS_00IAtIbi39i3uPtsH9hDTiSW7DRRA=.19b12d7e-44a1-46f8-aab9-8198d36d1574@github.com> On Tue, 21 Nov 2023 19:32:22 GMT, Volodymyr Paprotski wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1112: >> >>> 1110: void (MacroAssembler::*vblend)(XMMRegister, XMMRegister, XMMRegister, XMMRegister, int, bool, XMMRegister); >>> 1111: void (MacroAssembler::*vmaxmin)(XMMRegister, XMMRegister, XMMRegister, int); >>> 1112: void (MacroAssembler::*vcmp)(XMMRegister, XMMRegister, XMMRegister, int, int); >> >> We do support C++11 dialect, you can use following declarations. >> using vblend = void (*) (XMMRegister, XMMRegister, XMMRegister, XMMRegister, int, bool, XMMRegister); > > Its a member-function pointer, not a regular function pointer.. but `using` function pointer declaration is definitely an improvement I wasn't aware of. Trying to find chapter-and-verse in C++ spec on declaration of member-function pointers and their use with `using`. Will fix. Keyword `using` is a rather poor search-engine word, couldn't find examples for member-function pointer declaration with the `using` syntax. So had to look at the standard itself. [C++17 draft](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4659.pdf) --- I think properly, its called an `alias-declaration` [dcl.dcl]. alias-declaration: using identifier attribute-specifier-seqopt = defining-type-id ; At end of section 11.3.3 [dcl.mptr], seems to imply that function pointer declaration and member-function pointer declarations are different [ Note: See also 8.3 and 8.5. The type ?pointer to member? is distinct from the type ?pointer?, that is, a pointer to member is declared only by the pointer to member declarator syntax, and never by the pointer declarator syntax. There is no ?reference-to-member? type in C++. ? end note ] That to me seems to mean that function pointer and member pointer will follow different BNF grammar rules, and might not arrive to both having support for the `using` declaration. I have been trying to find the BNF for it regardless, but its been many years, standard is big, hours of reading later, no closer to a definitive answer that member-function declaration do not support the using syntax. --- Trial-and-error, perhaps the compiler would give me some hints as to where to look.. Section 11.1 Type names suggests removing the identifier to get the type: It is possible to identify uniquely the location in the abstract-declarator where the identifier would appear if the construction were a declarator in a declaration. The named type is then the same as the type of the hypothetical identifier. So, I tried syntax similar to what you suggested: using vblend = void (MacroAssembler::*)(XMMRegister, XMMRegister, XMMRegister, XMMRegister, int, bool, XMMRegister); Does not work: src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp:1134:12: error: expected unqualified-id before '=' token 1134 | vblend = &MacroAssembler::vblendvpd; | ^ src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp:1156:17: error: expected primary-expression before ')' token 1156 | (this->*vblend)(atmp, a, b, mask, vlen_enc, true, btmp); ``` First message is bad, the `&` is required per `8.3.1 Unary operators` section 4 : A pointer to member is only formed when an explicit & is used and its operand is a qualified-id not enclosed in parentheses. Second message is also bad, per section `8.5 Pointer-to-member operators`, cast, `.*`, `->*` are about all the operations allowed on pointer-to-member. --- Looking through the standard, I came upon an alternative potential solution. decltype(&MacroAssembler::vblendvps) vblend; Confirmed to work. Not sure I like how imprecise it is, but it _is_ shorter. --- TLDR: Perhaps there is no such thing as member-function declaration using `using`.. or g++ hasn't implemented it `per-spec` and another compiler does.. but we have to use g++ so that point is moot. Unless someone else can get it to work, my preference would be to keep it as is, but the `decltype` option is there too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1401224703 From jjoo at openjdk.org Tue Nov 21 21:42:39 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Tue, 21 Nov 2023 21:42:39 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v46] In-Reply-To: References: Message-ID: <4S_iyhdkwxpMar7tdNxHobR6vaRcqKcikQQrrcNBwX0=.29697f18-b7cc-44a9-8900-90f7a3a1e780@github.com> > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with two additional commits since the last revision: - Update memory tracking type for CPUTimeCounters - Fix assertion logic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/17a8eaf3..4ca30f32 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=45 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=44-45 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From jjoo at openjdk.org Tue Nov 21 21:46:18 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Tue, 21 Nov 2023 21:46:18 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v46] In-Reply-To: <4S_iyhdkwxpMar7tdNxHobR6vaRcqKcikQQrrcNBwX0=.29697f18-b7cc-44a9-8900-90f7a3a1e780@github.com> References: <4S_iyhdkwxpMar7tdNxHobR6vaRcqKcikQQrrcNBwX0=.29697f18-b7cc-44a9-8900-90f7a3a1e780@github.com> Message-ID: On Tue, 21 Nov 2023 21:42:39 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with two additional commits since the last revision: > > - Update memory tracking type for CPUTimeCounters > - Fix assertion logic All comments have been addressed and assertion failure fixed - this PR should once again be RFR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15082#issuecomment-1821740369 From duke at openjdk.org Tue Nov 21 22:21:18 2023 From: duke at openjdk.org (Yi-Fan Tsai) Date: Tue, 21 Nov 2023 22:21:18 GMT Subject: RFR: 8314029: Add file name parameter to Compiler.perfmap [v2] In-Reply-To: References: Message-ID: > `jcmd Compiler.perfmap` uses the hard-coded file name for a perf map: `/tmp/perf-%d.map`. This change adds an option for specifying a file name. > > The help message of `jcmd PID help Compiler.perfmap` will be updated in a separate PR. Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: Remove changes of jcmd man page ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15871/files - new: https://git.openjdk.org/jdk/pull/15871/files/9fee339b..861bda74 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15871&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15871&range=00-01 Stats: 12 lines in 2 files changed: 1 ins; 10 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15871/head:pull/15871 PR: https://git.openjdk.org/jdk/pull/15871 From sspitsyn at openjdk.org Tue Nov 21 22:28:05 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 21 Nov 2023 22:28:05 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 [v3] In-Reply-To: <-CGha1yFmQNPbT7s6BtZ0iJFxmPgzoSnozx4pgIZlA4=.77aa51ba-ebcb-419a-9651-dbadb8ef9e91@github.com> References: <1P_TddTTM1eH75Do2Xq-wBrxXSdh7GzJJlgEBH_dSNo=.94392ab2-12a1-4d1b-9131-6164bbb76e7d@github.com> <-CGha1yFmQNPbT7s6BtZ0iJFxmPgzoSnozx4pgIZlA4=.77aa51ba-ebcb-419a-9651-dbadb8ef9e91@github.com> Message-ID: On Mon, 20 Nov 2023 00:59:35 GMT, David Holmes wrote: >> @dcubed-ojdk >> I don't know FJP implementation well enough to point at the code where it happens. However, I observe that new `JavaThread `is being created between two points of the execution path. >> - First point is in the `JvmtiEventControllerPrivate::recompute_enabled()` at the line where a `ThreadsListHandle` is set. I've added a trap checking if any `JavaThread` pointed by `state->get_thread()` is not protected by the `tlh`. I can see this trap is not fired (I can't say it has never been fired). >> - Second point is in the `JvmtiEventControllerPrivate::enter_interp_only_mode()`. If a `ThreadsListHandle` is NOT set then I can observe a `JavaThread` referenced by the state->get_thread() which is not protected by any TLH. It a TLH added into `JvmtiEventControllerPrivate::enter_interp_only_mode()` then this `JavaThread` is observed as protected by TLH. >> >> I've removed a part of this comment with stack traces as my traps were not fully correct, need to double check everything. This issue is not well reproducible but I'm still trying to reproduce it again. >> One approach would be to remove this change in the `src/hotspot/share/prims/jvmtiEventController.cpp` from PR and try address it separately. > > Just to re-iterate what Dan was saying, the TLH is only of use if you are accessing threads known to be included in the TLH. I've spend some time trying to reproduce the original issue to provide some details to Dan but there is no luck yet. I'm sure the issue still exists but it is better to investigate and address it separately from this PR. So, my plan is to remove this line with TLH from the `jvmtiEventController.cpp`. BTW, if it is still needed Alan promised to point to the code where the FJP compensating mechanism adds new carrier threads. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1401283561 From sspitsyn at openjdk.org Tue Nov 21 22:49:09 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 21 Nov 2023 22:49:09 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is create for one JavaThread associated with attached native thread [v3] In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Mon, 20 Nov 2023 19:28:13 GMT, Jiangli Zhou wrote: >> Please review JvmtiThreadState::state_for_while_locked change to handle the state->get_thread_oop() null case. Please see https://bugs.openjdk.org/browse/JDK-8319935 for details. > > Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: > > Add a check for a thread is_attaching_via_jni, based on David Holmes' comment. src/hotspot/share/prims/jvmtiExport.cpp line 3144: > 3142: // If the current thread is attaching from native and its thread oop is being > 3143: // allocated, things are not ready for allocation sampling. > 3144: if (thread->is_Java_thread()) { Nit: There is no need for this check at line 3144. There was already check for `!thread->is_Java_thread()` and return with false at line 3138: if (!thread->is_Java_thread() || thread->is_Compiler_thread()) { return false; } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16642#discussion_r1401302311 From duke at openjdk.org Tue Nov 21 23:29:21 2023 From: duke at openjdk.org (Yi-Fan Tsai) Date: Tue, 21 Nov 2023 23:29:21 GMT Subject: RFR: 8314029: Add file name parameter to Compiler.perfmap [v3] In-Reply-To: References: Message-ID: > `jcmd Compiler.perfmap` uses the hard-coded file name for a perf map: `/tmp/perf-%d.map`. This change adds an option for specifying a file name. > > The help message of `jcmd PID help Compiler.perfmap` will be updated in a separate PR. Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: Chagne an option to an argument ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15871/files - new: https://git.openjdk.org/jdk/pull/15871/files/861bda74..61d6f6f4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15871&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15871&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15871/head:pull/15871 PR: https://git.openjdk.org/jdk/pull/15871 From sspitsyn at openjdk.org Tue Nov 21 23:35:09 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 21 Nov 2023 23:35:09 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is create for one JavaThread associated with attached native thread [v3] In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Mon, 20 Nov 2023 19:28:13 GMT, Jiangli Zhou wrote: >> Please review JvmtiThreadState::state_for_while_locked change to handle the state->get_thread_oop() null case. Please see https://bugs.openjdk.org/browse/JDK-8319935 for details. > > Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: > > Add a check for a thread is_attaching_via_jni, based on David Holmes' comment. src/hotspot/share/prims/jvmtiThreadState.inline.hpp line 100: > 98: assert(state->get_thread_oop() != nullptr, "incomplete state"); > 99: } > 100: #endif Nit: I would suggest to write this assert in the form: // Make sure we don't see an incomplete state. An incomplete state can cause // a duplicate JvmtiThreadState being created below and bound to the 'thread' // incorrectly, which leads to stale JavaThread* from the JvmtiThreadState // after the thread exits. assert(state == nullptr || state->get_thread_oop() != nullptr, "incomplete state"); The `#ifdef ASSERT` and `#endif` are not needed then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16642#discussion_r1401332452 From psandoz at openjdk.org Wed Nov 22 00:08:08 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 22 Nov 2023 00:08:08 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: References: Message-ID: <97YS_I-DY-Q5agE6mE-iBkoVxtvL7R4Q3NebjTsXMvI=.dac0dc99-84d7-4cbd-ada6-5190564688a9@github.com> On Wed, 15 Nov 2023 01:32:00 GMT, Xiaohong Gong wrote: >> Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). >> >> SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. >> >> To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. >> >> Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. >> >> [1] https://github.com/openjdk/jdk/pull/3638 >> [2] https://sleef.org/ >> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ >> [4] https://packages.debian.org/bookworm/libsleef3 >> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Add a bundled native lib in jdk as a bridge to libsleef > - Merge 'jdk:master' into JDK-8312425 > - Disable sleef by default > - Merge 'jdk:master' into JDK-8312425 > - 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF Have you considered the possibility of copying the sleef source to the OpenJDK repository and thereby it becomes part of the build process? I don't know how straightforward that is technically and IANAL but I think it's worth exploring. Also it may enable us to use sleef for other platforms where we have gaps (looking at Table 1.1 of https://sleef.org/). Further out it should inspire us to do a Java Vector API port to as indicated in a prior comment. > Yes, libsleef is used to build the binary if found. And at runtime, if the libsleef with right version is not found, `dlopen` to the libvmath.so will fail. And then all the operations will be fall-back to the java default implementation. X86_64 has also bundled the Intel's SVML binary to jdk image at build time. And we use the same way loading/opening the library at runtime. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1821885433 From sspitsyn at openjdk.org Wed Nov 22 01:10:07 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 22 Nov 2023 01:10:07 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is create for one JavaThread associated with attached native thread [v2] In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Fri, 17 Nov 2023 02:51:03 GMT, Jiangli Zhou wrote: >> Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: >> >> Don't try to setup_jvmti_thread_state for obj allocation sampling if the current thread is attaching from native and is allocating the thread oop. That's to make sure we don't create a 'partial' JvmtiThreadState. > >> Thanks. The latest change to `JvmtiSampledObjectAllocEventCollector::object_alloc_is_safe_to_sample()` looks OK to me. Skipping a few allocations for JVMTI allocation sampler is better than resulting in a problematic `JvmtiThreadState` instance. >> >> My main question is if we can now change `if (state == nullptr || state->get_thread_oop() != thread_oop) ` to `if (state == nullptr)` in `JvmtiThreadState::state_for_while_locked()`. I suspect we would never run into a case of `state != nullptr && state->get_thread_oop() != thread_oop` with the latest change, even with virtual threads. This is backed up by testing with [00ace66](https://github.com/openjdk/jdk/commit/00ace66c36243671a0fb1b673b3f9845460c6d22) not triggering any failure. >> >> If we run into such as a case, it could still be problematic as `JvmtiThreadState::state_for_while_locked()` would allocate a new `JvmtiThreadState` instance pointing to the same JavaThread, and it does not delete the existing instance. >> >> Could anyone with deep knowledge on JvmtiThreadState and virtual threads provide some feedback on this change and https://bugs.openjdk.org/browse/JDK-8319935? @AlanBateman, do you know who would be the best reviewer for this? > > @caoman and I discussed about his suggestion on changing `if (state == nullptr || state->get_thread_oop() != thread_oop)` check in person today. Since it may affect vthread, my main concern is that our current testing may not cover that sufficiently. The suggestion could be worked by a separate enhancement bug. > > @jianglizhou - I fixed a typo in the bug's synopsis line. Change this PR's title: s/is create/is created/ > Thanks, @dcubed-ojdk! Now, the PR title needs to be fixed accordingly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16642#issuecomment-1821932355 From dholmes at openjdk.org Wed Nov 22 01:14:02 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 22 Nov 2023 01:14:02 GMT Subject: RFR: 8257076: os::scan_pages is empty on all platforms In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 15:20:33 GMT, Sonia Zaldana Calles wrote: > The function os::scan_pages was only ever implemented in Solaris and then removed in [JDK-8244224](https://bugs.openjdk.org/browse/JDK-8244224) > > All other platforms have empty implementations and the interface is not optimal as os::scan_pages expects the range to have just one page size, while in reality it can have multiple. > > This PR removes this interface, ensuing empty implementations and all dead code related to page scanning. > > Testing: Tier 1. Yes that all looks like dead code. Nice cleanup. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16740#pullrequestreview-1743397861 From sspitsyn at openjdk.org Wed Nov 22 01:27:06 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 22 Nov 2023 01:27:06 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is create for one JavaThread associated with attached native thread [v3] In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Mon, 20 Nov 2023 19:28:13 GMT, Jiangli Zhou wrote: >> Please review JvmtiThreadState::state_for_while_locked change to handle the state->get_thread_oop() null case. Please see https://bugs.openjdk.org/browse/JDK-8319935 for details. > > Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: > > Add a check for a thread is_attaching_via_jni, based on David Holmes' comment. Thank you for filing and fixing this issue! I'm kind of late here. Sorry for that. Is it hard to create a JTreg test for an attaching native thread? I can help if you have a standalone prototype. You can look for some examples in the folder: `test/hotspot/jtreg/serviceability/jvmti/vthread`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16642#issuecomment-1821944429 From dholmes at openjdk.org Wed Nov 22 01:30:05 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 22 Nov 2023 01:30:05 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 21:48:11 GMT, Coleen Phillimore wrote: >> The method holder is an `InstanceKlass` object which can be retrieved as `method->method_holder()` (I apologize if I am using not completely correct terms - this is what I grokked from the sources). And incomplete methods created by the `ClassParser` from the class data stream will not have the link to that `InstanceKlass` set up if the `ClassParser` is already having its `_klass` field set to a non-null value. >> >> If we are talking about clearing any jmetbodIDs associated with an `InstanceKlass` instance it is not really possible for old method versions because only the current `InstanceKlass` version has the jmethodID cache associated with it and it contains jmethodIDs pointing to bot the old and current methods. > > I see, holder is the right word and concept. So the parameter means has_method_holder, in that the InstanceKlass has been fully parsed at the point of clearing the jmethodIDs. Can't we just check `method->method_holder()` for null rather than passing in a parameter like this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1401396222 From dholmes at openjdk.org Wed Nov 22 01:30:09 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 22 Nov 2023 01:30:09 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes In-Reply-To: References: Message-ID: On Tue, 14 Nov 2023 17:56:09 GMT, Jaroslav Bachorik wrote: > Please, review this fix for a corner case handling of `jmethodID` values. > > The issue is related to the interplay between `jmethodID` values and method redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` instance. Once that method gets redefined, the `jmethodID` is updated to point to the last `Method` version. > Unless the method is still on stack/running, in which case the original `jmethodID` will be redirected to the latest `Method` version and at the same time the 'previous' `Method` version will receive a new `jmethodID` pointing to that previous version. > > If we happen to capture stacktrace via `GetStackTrace` or `GetAllStackTraces` JVMTI calls while this previous `Method` version is still on stack we will have the corresponding frame identified by a `jmethodID` pointing to that version. > However, sooner or later the 'previous' class version becomes eligible for cleanup at what time all contained `Method` instances. The cleanup process will not perform the `jmethodID` pointer maintenance and we will end up with pointers to deallocated memory. > This is caused by the fact that the `jmethodID` lifecycle is bound to `ClassLoaderData` instance and all relevant `jmethodID`s will get batch-updated when the class loader is being released and all its classes are getting unloaded. > > This means that we need to make sure that if a `Method` instance is being deallocate the associated `jmethodID` (if any) must not point to the deallocated instance once we are finished. Unfortunately, we can not just update the `jmethodID` values in bulk when purging an old class version - the per `InstanceKlass` jmethodID cache is present only for the main class version and contains `jmethodID` values for both the old and current method versions. > > Therefore we need to perform `jmethodID` lookup when we are about to deallocate a `Method` instance and clean up the pointer only if that `jmethodID` is pointing to the `Method` instance which is being deallocated. > > _(For anyone interested, a much lengthier writeup is available in [my blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_ test/hotspot/jtreg/serviceability/jvmti/thread/GetStackTrace/GetStackTraceAndRetransformTest/GetStackTraceAndRetransformTest.java line 2: > 1: /* > 2: * Copyright (c) 2023 Oracle and/or its affiliates. All rights reserved. An Oracle copyright is not needed here if you wrote this test from scratch. If it is present then we need a comma after the copyright year please. test/hotspot/jtreg/serviceability/jvmti/thread/GetStackTrace/GetStackTraceAndRetransformTest/GetStackTraceAndRetransformTest.java line 29: > 27: * @bug 8313816 > 28: * @summary Test that a sequence of method retransformation and stacktrace capture while the old method > 29: * version is still on stack does not lead to a crash when that's method jmethodID is used as typo: that's method -> that method's test/hotspot/jtreg/serviceability/jvmti/thread/GetStackTrace/GetStackTraceAndRetransformTest/libGetStackTraceAndRetransformTest.cpp line 2: > 1: /* > 2: * Copyright (c) 2023 Oracle and/or its affiliates. All rights reserved. Ditto comment about Oracle copyright. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1401393016 PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1401393442 PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1401393794 From xgong at openjdk.org Wed Nov 22 01:45:11 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 22 Nov 2023 01:45:11 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: References: Message-ID: <1UlAcJ17Pe7VbzAHDUBWF0ruGtSMsdhu6_cc7khS4Y8=.1020cfc6-002a-49f5-9628-657df0c9ba0b@github.com> On Tue, 21 Nov 2023 18:14:41 GMT, Paul Sandoz wrote: > > This looks good. As far as I can tell the choice you've made of accuracy matches what we need to meet the spec. > > Same here . Sinh/cosh/tanh/expm1 are specified to be within 2.5 ulps of the exact result, but i believe sleef does not offer that option, it's either within 1 or 3.5, so we have to pick the former. AFAICT sleef does not say anything about monotonicity, but we relax semi-monotonicity for all but sqrt (we defer to IEEE 754). Yes, that's why we at least have to use the 1up in sleef here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1821956522 From xgong at openjdk.org Wed Nov 22 01:55:06 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 22 Nov 2023 01:55:06 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: <97YS_I-DY-Q5agE6mE-iBkoVxtvL7R4Q3NebjTsXMvI=.dac0dc99-84d7-4cbd-ada6-5190564688a9@github.com> References: <97YS_I-DY-Q5agE6mE-iBkoVxtvL7R4Q3NebjTsXMvI=.dac0dc99-84d7-4cbd-ada6-5190564688a9@github.com> Message-ID: On Wed, 22 Nov 2023 00:05:26 GMT, Paul Sandoz wrote: > Have you considered the possibility of copying the sleef source to the OpenJDK repository and thereby it becomes part of the build process? I don't know how straightforward that is technically and IANAL but I think it's worth exploring. > Hi @PaulSandoz ! Thanks for the suggestion! Copying the sleef source sounds good. However, I actually have no idea about how to handle the third-party licence in OpenJDK project. Do you have any idea about this area? Some suggestions/guidence from the JDK team will be much helpful. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1821965636 From dholmes at openjdk.org Wed Nov 22 02:09:38 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 22 Nov 2023 02:09:38 GMT Subject: RFR: 8318776: Require supports_cx8 to always be true [v5] In-Reply-To: References: Message-ID: > As discussed in JBS all platforms (some tweaks to Zero are in progress) actually do support `cx8` i.e. 64-bit compare-and-exchange, so we can strip out the locked-based alternatives to using it and just add a guarantee that it is true at runtime. And all platforms except some ARM variants set `SUPPORTS_NATIVE_CX8`, so we can greatly simplify things. Summary of changes: > - `_supports_cx8` field is only needed when `SUPPORTS_NATIVE_CX8` is not defined > - Assertions for `supports_cx8()` are removed > - Compiler predicates requiring `supports_cx8()` are removed > - Access backend is greatly simplified without the need for lock-based alternative > - `java.util.concurrent.AtomicLongFieldUpdater` is simplified without the need for a lock-based alternative > > I did consider moving all the ARM `kuser_helper` related code to be only defined when `SUPPORTS_NATIVE_CX8` is not defined, but there was a theoretical risk this could change the behaviour if ARMv7 binaries were run on other ARM CPU's. I added a note to that effect in vm_version_linux_arm32.cpp so the ARM port maintainers could clean this up further if desired. > > Testing: > - All Oracle tiers 1-5 builds (which includes an ARMv7 build) > - GHA builds/tests > - Oracle tiers 1-3 sanity testing > > Zero changes coming in via [JDK-8319777](https://bugs.openjdk.org/browse/JDK-8319777) will be merged when they arrive. > > Thanks. David Holmes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge with master and update Zero code accordingly - Merge branch 'master' into 8318776-supports_cx8 - Remove unnecessary includes of vm_version.hpp. Fix copyright years. - Remove cx8 comment as no longer relevant (the spinlock is used regardless of cx8) - Remove suports_cx8() checks from gtest - Remove test for VMSupportsCX8 - 8318776: Require supports_cx8 to always be true ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16625/files - new: https://git.openjdk.org/jdk/pull/16625/files/597cef53..aad0a4c4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16625&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16625&range=03-04 Stats: 621905 lines in 1279 files changed: 89413 ins; 471113 del; 61379 mod Patch: https://git.openjdk.org/jdk/pull/16625.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16625/head:pull/16625 PR: https://git.openjdk.org/jdk/pull/16625 From dholmes at openjdk.org Wed Nov 22 02:21:05 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 22 Nov 2023 02:21:05 GMT Subject: RFR: 8318776: Require supports_cx8 to always be true [v4] In-Reply-To: References: Message-ID: On Tue, 21 Nov 2023 16:58:21 GMT, Aleksey Shipilev wrote: >> This looks great! > >> Thanks for the review @fisk ! I have to wait for a few Zero related PRs to get integrated then re-merge, before I can integrate. > > Zero patches were pushed, please re-merge. I checked current mainline works well with at least linux-arm-zero-fastdebug, and I would like to re-test it with this patch. @shipilev I have re-merged and update the Zero changes (ifdef around `_saupports_cx8`). @viktorklang-ora and/or @DougLea could I ask you to look at the `java.util.concurrent.AtomicLongFieldUpdater` changes please. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16625#issuecomment-1821983975 From manc at openjdk.org Wed Nov 22 02:22:24 2023 From: manc at openjdk.org (Man Cao) Date: Wed, 22 Nov 2023 02:22:24 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v46] In-Reply-To: <4S_iyhdkwxpMar7tdNxHobR6vaRcqKcikQQrrcNBwX0=.29697f18-b7cc-44a9-8900-90f7a3a1e780@github.com> References: <4S_iyhdkwxpMar7tdNxHobR6vaRcqKcikQQrrcNBwX0=.29697f18-b7cc-44a9-8900-90f7a3a1e780@github.com> Message-ID: On Tue, 21 Nov 2023 21:42:39 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with two additional commits since the last revision: > > - Update memory tracking type for CPUTimeCounters > - Fix assertion logic Looks pretty clean. Only a few minor cleanups remain. src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 71: > 69: #include "oops/oop.inline.hpp" > 70: #include "runtime/atomic.hpp" > 71: #include "runtime/cpuTimeCounters.hpp" This include could be removed. src/hotspot/share/gc/g1/g1ConcurrentRefine.cpp line 38: > 36: #include "memory/allocation.inline.hpp" > 37: #include "memory/iterator.hpp" > 38: #include "runtime/cpuTimeCounters.hpp" This could be removed too. src/hotspot/share/gc/g1/g1ServiceThread.cpp line 26: > 24: > 25: #include "precompiled.hpp" > 26: #include "gc/g1/g1CollectedHeap.hpp" Could this include be removed? src/hotspot/share/gc/g1/g1ServiceThread.hpp line 30: > 28: #include "gc/shared/concurrentGCThread.hpp" > 29: #include "runtime/mutex.hpp" > 30: #include "runtime/perfData.hpp" This could be removed. src/hotspot/share/gc/shared/collectedHeap.cpp line 52: > 50: #include "oops/instanceMirrorKlass.hpp" > 51: #include "oops/oop.inline.hpp" > 52: #include "runtime/atomic.hpp" This could be removed. src/hotspot/share/gc/shared/stringdedup/stringDedupProcessor.cpp line 28: > 26: #include "classfile/javaClasses.inline.hpp" > 27: #include "classfile/stringTable.hpp" > 28: #include "gc/shared/collectedHeap.hpp" This include as well as the include for perfData.hpp (line 45) could be removed. src/hotspot/share/gc/shared/stringdedup/stringDedupProcessor.cpp line 70: > 68: void StringDedup::Processor::initialize() { > 69: _processor = new Processor(); > 70: if (UsePerfData && os::is_thread_cpu_time_supported()) { The if and EXCEPTION_MARK could be removed, because `create_counter()` does that internally. src/hotspot/share/gc/shared/stringdedup/stringDedupProcessor.hpp line 30: > 28: #include "gc/shared/stringdedup/stringDedup.hpp" > 29: #include "memory/allocation.hpp" > 30: #include "runtime/perfData.hpp" This could be removed. src/hotspot/share/runtime/cpuTimeCounters.hpp line 97: > 95: class ThreadTotalCPUTimeClosure: public ThreadClosure { > 96: private: > 97: jlong _gc_total; _total is a more appropriate name for this field. src/hotspot/share/runtime/vmThread.cpp line 140: > 138: PerfDataManager::create_counter(SUN_THREADS, "vmOperationTime", > 139: PerfData::U_Ticks, CHECK); > 140: if (os::is_thread_cpu_time_supported()) { The if could be remove, as `create_counter()` checks it internally. ------------- Marked as reviewed by manc (Committer). PR Review: https://git.openjdk.org/jdk/pull/15082#pullrequestreview-1743425918 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1401413936 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1401415033 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1401416191 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1401416622 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1401416969 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1401417321 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1401418586 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1401420582 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1401410366 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1401419999 From dholmes at openjdk.org Wed Nov 22 02:56:06 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 22 Nov 2023 02:56:06 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is create for one JavaThread associated with attached native thread [v3] In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Tue, 21 Nov 2023 22:45:54 GMT, Serguei Spitsyn wrote: >> Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: >> >> Add a check for a thread is_attaching_via_jni, based on David Holmes' comment. > > src/hotspot/share/prims/jvmtiExport.cpp line 3144: > >> 3142: // If the current thread is attaching from native and its thread oop is being >> 3143: // allocated, things are not ready for allocation sampling. >> 3144: if (thread->is_Java_thread()) { > > Nit: There is no need for this check at line 3144. > There was already check for `!thread->is_Java_thread()` and return with false at line 3138: > > if (!thread->is_Java_thread() || thread->is_Compiler_thread()) { > return false; > } +1 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16642#discussion_r1401442209 From rriggs at openjdk.org Wed Nov 22 05:03:41 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Wed, 22 Nov 2023 05:03:41 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v12] In-Reply-To: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: > Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. > The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. > > The changes include: > > - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate > if the input array is modified before the constructor returns. > The resulting string may contain any combination of characters sampled from the input array. > > - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. > For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. > If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. > If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. > Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. > Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. > > - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. > > - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. Roger Riggs has updated the pull request incrementally with one additional commit since the last revision: Apply StringUTF16.coderFromArrayLen ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16425/files - new: https://git.openjdk.org/jdk/pull/16425/files/0256b9e0..d201344b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16425&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16425&range=10-11 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16425/head:pull/16425 PR: https://git.openjdk.org/jdk/pull/16425 From stuefe at openjdk.org Wed Nov 22 05:57:03 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 22 Nov 2023 05:57:03 GMT Subject: RFR: JDK-8320382: Remove CompressedKlassPointers::is_valid_base() In-Reply-To: References: Message-ID: <6wkqhMDhy56beePfq85I7DN9gwt23l1J99im22vVUqQ=.126899c4-52a5-4e77-bd3f-d54c65ecf4f3@github.com> On Tue, 21 Nov 2023 17:48:06 GMT, Roman Kennke wrote: > I think it's ok. I think there is an advantage in checking the base early instead of somewhere deep in MA, at which point I'd probably wonder where the base came from. Maybe there is a way to achieve the simplification that you had in mind while also retaining those checks? Honestly, none that is worth the complexity. These asserts usually fired as a result of metaspace reservation - which is an automatic process - coming up with unsuitable reservation addresses. An assert at that point is every bit as baffling to the end user as the later assert in MA. The only point where an early clear warning makes sense is when we check the user input for SharedBaseAddress, which we do (and, the new message is even better than the old one). ------------- PR Comment: https://git.openjdk.org/jdk/pull/16727#issuecomment-1822157463 From mbaesken at openjdk.org Wed Nov 22 08:34:05 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 22 Nov 2023 08:34:05 GMT Subject: RFR: JDK-8320383: refresh libraries cache on AIX in VMError::report In-Reply-To: <2nsZJNVtl2FQhb1soV9PD3kaWCyMjb3YQND2N7iCTSU=.d1d49865-6034-4f10-b024-c9c775ba356e@github.com> References: <2nsZJNVtl2FQhb1soV9PD3kaWCyMjb3YQND2N7iCTSU=.d1d49865-6034-4f10-b024-c9c775ba356e@github.com> Message-ID: On Mon, 20 Nov 2023 13:38:00 GMT, Thomas Stuefe wrote: >Then lets abstract this into something like os::prepare_native_symbols() or something similar. Could use this on Windows too to >update the loaded modules list. Hi Thomas, sure we can try that. I covered only AIX in this PR, because we observed issues with bad dll/thread stack trace output on AIX, but so far not on Windows. E.g. in some fontconfig related crashes, see here for details https://bugs.openjdk.org/browse/JDK-8314152 8314152: libfontconfig missing on AIX in many hs_err 'Dynamic libraries' lists and native stack incomplete the dlopen operations coming from the JDK codebase are just not covered for some time until the next cache update happens, so we have a timeframe with outdated info that leads to bad output. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16730#issuecomment-1822315751 From azafari at openjdk.org Wed Nov 22 08:41:35 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Wed, 22 Nov 2023 08:41:35 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v9] In-Reply-To: References: Message-ID: > The `find` method now is > ```C++ > template > int find(T* token, bool f(T*, E)) const { > ... > > Any other functions which use this are also changed. > Local linux-x64-debug hotspot:tier1 passed. Mach5 tier1 build on linux and Windows passed. Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: find methods accepts Function and callers provide lambda. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15418/files - new: https://git.openjdk.org/jdk/pull/15418/files/7665b878..9973f5d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15418&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15418&range=07-08 Stats: 56 lines in 9 files changed: 21 ins; 17 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/15418.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15418/head:pull/15418 PR: https://git.openjdk.org/jdk/pull/15418 From shade at openjdk.org Wed Nov 22 09:00:13 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 22 Nov 2023 09:00:13 GMT Subject: RFR: 8318776: Require supports_cx8 to always be true [v5] In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 02:09:38 GMT, David Holmes wrote: >> As discussed in JBS all platforms (some tweaks to Zero are in progress) actually do support `cx8` i.e. 64-bit compare-and-exchange, so we can strip out the locked-based alternatives to using it and just add a guarantee that it is true at runtime. And all platforms except some ARM variants set `SUPPORTS_NATIVE_CX8`, so we can greatly simplify things. Summary of changes: >> - `_supports_cx8` field is only needed when `SUPPORTS_NATIVE_CX8` is not defined >> - Assertions for `supports_cx8()` are removed >> - Compiler predicates requiring `supports_cx8()` are removed >> - Access backend is greatly simplified without the need for lock-based alternative >> - `java.util.concurrent.AtomicLongFieldUpdater` is simplified without the need for a lock-based alternative >> >> I did consider moving all the ARM `kuser_helper` related code to be only defined when `SUPPORTS_NATIVE_CX8` is not defined, but there was a theoretical risk this could change the behaviour if ARMv7 binaries were run on other ARM CPU's. I added a note to that effect in vm_version_linux_arm32.cpp so the ARM port maintainers could clean this up further if desired. >> >> Testing: >> - All Oracle tiers 1-5 builds (which includes an ARMv7 build) >> - GHA builds/tests >> - Oracle tiers 1-3 sanity testing >> >> Zero changes coming in via [JDK-8319777](https://bugs.openjdk.org/browse/JDK-8319777) will be merged when they arrive. >> >> Thanks. > > David Holmes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge with master and update Zero code accordingly > - Merge branch 'master' into 8318776-supports_cx8 > - Remove unnecessary includes of vm_version.hpp. > Fix copyright years. > - Remove cx8 comment as no longer relevant (the spinlock is used regardless of cx8) > - Remove suports_cx8() checks from gtest > - Remove test for VMSupportsCX8 > - 8318776: Require supports_cx8 to always be true Thanks! Zero tests are running. The PR looks great, except extra safety suggestion in x86 part: src/hotspot/cpu/x86/vm_version_x86.cpp line 819: > 817: } > 818: > 819: _supports_cx8 = supports_cmpxchg8(); I think we should leave the runtime check here (under `ifndef`, like in ARM?). This covers the remaining case of running on legacy x86 without CX8 implemented: the init guarantee would then fire and prevent any other surprises at runtime. Sure, it would be hard to come up with such a platform today, but it would be safer to refuse to run there right away on the off-chance someone actually has it :) ------------- PR Review: https://git.openjdk.org/jdk/pull/16625#pullrequestreview-1743847107 PR Review Comment: https://git.openjdk.org/jdk/pull/16625#discussion_r1401696816 From aph at openjdk.org Wed Nov 22 09:08:09 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 22 Nov 2023 09:08:09 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: References: <97YS_I-DY-Q5agE6mE-iBkoVxtvL7R4Q3NebjTsXMvI=.dac0dc99-84d7-4cbd-ada6-5190564688a9@github.com> Message-ID: On Wed, 22 Nov 2023 01:52:51 GMT, Xiaohong Gong wrote: > > Have you considered the possibility of copying the sleef source to the OpenJDK repository and thereby it becomes part of the build process? I don't know how straightforward that is technically and IANAL but I think it's worth exploring. > > Hi @PaulSandoz ! Thanks for the suggestion! Copying the sleef source sounds good. However, I actually have no idea about how to handle the third-party licence in OpenJDK project. Do you have any idea about this area? Some suggestions/guidence from the JDK team will be much helpful. Thanks! >From a legal pespective, we can do this. SLEEF is distributed under Boost Software License Version 1.0., which is a GPL-compatible free software licence. The only issue is whether we want to do so. It would certainly be convenient. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1822364670 From aph at openjdk.org Wed Nov 22 09:16:05 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 22 Nov 2023 09:16:05 GMT Subject: RFR: JDK-8320382: Remove CompressedKlassPointers::is_valid_base() In-Reply-To: References: Message-ID: <9Niow-ujF4KfXrSUfkju4pqNXzM9kB1a9dsTQ12pzFk=.910c85a2-af3a-410d-a365-45028fab82b0@github.com> On Mon, 20 Nov 2023 07:47:16 GMT, Thomas Stuefe wrote: > `CompressedKlassPointers::is_valid_base(addr)` abstracts away platform-specific requirements that may limit the use of an address as narrow Klass encoding base. It only ever mattered on aarch64, where we cannot use any arbitrary address as 64-bit immediate for the base. > > Experience shows that this is a case where the abstraction does not help much. Hiding a very CPU-specific limitation under a generic function made arguing about it difficult. We therefore decided to scrap that function. > > It is only used for two things: > - asserts at runtime; those are unnecessary since we have an assert in macroAssembler_aarch64.cpp that will fire if the base is not correct. Both asserts fire at VM initialization; neither of these asserts is much clearer than the other, so no reason to keep asserting for is_valid_base() > - the one legitimate use case is checking the user input for -XX:SharedBaseAddress at dump time. We can just express the aarch64 requirement directly, which is clearer to understand. > > Note that the function has also been incorrect, since it ignored aarch64 EOR mode, and required 32GB alignment for addresses beyond 32GB. However, we can make any 4GB aligned address to work with movk, so the requirement can be simplified to "is 4GB-aligned". > > (this is a preparatory patch for [JDK-8320368](https://bugs.openjdk.org/browse/JDK-8320368) and further Lilliput-related changes) Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16727#pullrequestreview-1743894075 From stuefe at openjdk.org Wed Nov 22 09:23:24 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 22 Nov 2023 09:23:24 GMT Subject: Integrated: JDK-8320382: Remove CompressedKlassPointers::is_valid_base() In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 07:47:16 GMT, Thomas Stuefe wrote: > `CompressedKlassPointers::is_valid_base(addr)` abstracts away platform-specific requirements that may limit the use of an address as narrow Klass encoding base. It only ever mattered on aarch64, where we cannot use any arbitrary address as 64-bit immediate for the base. > > Experience shows that this is a case where the abstraction does not help much. Hiding a very CPU-specific limitation under a generic function made arguing about it difficult. We therefore decided to scrap that function. > > It is only used for two things: > - asserts at runtime; those are unnecessary since we have an assert in macroAssembler_aarch64.cpp that will fire if the base is not correct. Both asserts fire at VM initialization; neither of these asserts is much clearer than the other, so no reason to keep asserting for is_valid_base() > - the one legitimate use case is checking the user input for -XX:SharedBaseAddress at dump time. We can just express the aarch64 requirement directly, which is clearer to understand. > > Note that the function has also been incorrect, since it ignored aarch64 EOR mode, and required 32GB alignment for addresses beyond 32GB. However, we can make any 4GB aligned address to work with movk, so the requirement can be simplified to "is 4GB-aligned". > > (this is a preparatory patch for [JDK-8320368](https://bugs.openjdk.org/browse/JDK-8320368) and further Lilliput-related changes) This pull request has now been integrated. Changeset: 98edb03a Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/98edb03abe1692dcf5c6c463011b895d6e59b8cb Stats: 40 lines in 3 files changed: 5 ins; 30 del; 5 mod 8320382: Remove CompressedKlassPointers::is_valid_base() Reviewed-by: rkennke, aph ------------- PR: https://git.openjdk.org/jdk/pull/16727 From stuefe at openjdk.org Wed Nov 22 09:23:23 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 22 Nov 2023 09:23:23 GMT Subject: RFR: JDK-8320382: Remove CompressedKlassPointers::is_valid_base() In-Reply-To: <9Niow-ujF4KfXrSUfkju4pqNXzM9kB1a9dsTQ12pzFk=.910c85a2-af3a-410d-a365-45028fab82b0@github.com> References: <9Niow-ujF4KfXrSUfkju4pqNXzM9kB1a9dsTQ12pzFk=.910c85a2-af3a-410d-a365-45028fab82b0@github.com> Message-ID: <6aADE0hBjzxNPAQOqGITXvvTP7QkDhx4psj20xhueSA=.72e58876-9a6e-4717-b0a5-c30f7dd63c1a@github.com> On Wed, 22 Nov 2023 09:12:54 GMT, Andrew Haley wrote: >> `CompressedKlassPointers::is_valid_base(addr)` abstracts away platform-specific requirements that may limit the use of an address as narrow Klass encoding base. It only ever mattered on aarch64, where we cannot use any arbitrary address as 64-bit immediate for the base. >> >> Experience shows that this is a case where the abstraction does not help much. Hiding a very CPU-specific limitation under a generic function made arguing about it difficult. We therefore decided to scrap that function. >> >> It is only used for two things: >> - asserts at runtime; those are unnecessary since we have an assert in macroAssembler_aarch64.cpp that will fire if the base is not correct. Both asserts fire at VM initialization; neither of these asserts is much clearer than the other, so no reason to keep asserting for is_valid_base() >> - the one legitimate use case is checking the user input for -XX:SharedBaseAddress at dump time. We can just express the aarch64 requirement directly, which is clearer to understand. >> >> Note that the function has also been incorrect, since it ignored aarch64 EOR mode, and required 32GB alignment for addresses beyond 32GB. However, we can make any 4GB aligned address to work with movk, so the requirement can be simplified to "is 4GB-aligned". >> >> (this is a preparatory patch for [JDK-8320368](https://bugs.openjdk.org/browse/JDK-8320368) and further Lilliput-related changes) > > Marked as reviewed by aph (Reviewer). Thanks @theRealAph and @rkennke ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16727#issuecomment-1822386763 From shade at openjdk.org Wed Nov 22 09:27:11 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 22 Nov 2023 09:27:11 GMT Subject: RFR: 8318776: Require supports_cx8 to always be true [v5] In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 02:09:38 GMT, David Holmes wrote: >> As discussed in JBS all platforms (some tweaks to Zero are in progress) actually do support `cx8` i.e. 64-bit compare-and-exchange, so we can strip out the locked-based alternatives to using it and just add a guarantee that it is true at runtime. And all platforms except some ARM variants set `SUPPORTS_NATIVE_CX8`, so we can greatly simplify things. Summary of changes: >> - `_supports_cx8` field is only needed when `SUPPORTS_NATIVE_CX8` is not defined >> - Assertions for `supports_cx8()` are removed >> - Compiler predicates requiring `supports_cx8()` are removed >> - Access backend is greatly simplified without the need for lock-based alternative >> - `java.util.concurrent.AtomicLongFieldUpdater` is simplified without the need for a lock-based alternative >> >> I did consider moving all the ARM `kuser_helper` related code to be only defined when `SUPPORTS_NATIVE_CX8` is not defined, but there was a theoretical risk this could change the behaviour if ARMv7 binaries were run on other ARM CPU's. I added a note to that effect in vm_version_linux_arm32.cpp so the ARM port maintainers could clean this up further if desired. >> >> Testing: >> - All Oracle tiers 1-5 builds (which includes an ARMv7 build) >> - GHA builds/tests >> - Oracle tiers 1-3 sanity testing >> >> Zero changes coming in via [JDK-8319777](https://bugs.openjdk.org/browse/JDK-8319777) will be merged when they arrive. >> >> Thanks. > > David Holmes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge with master and update Zero code accordingly > - Merge branch 'master' into 8318776-supports_cx8 > - Remove unnecessary includes of vm_version.hpp. > Fix copyright years. > - Remove cx8 comment as no longer relevant (the spinlock is used regardless of cx8) > - Remove suports_cx8() checks from gtest > - Remove test for VMSupportsCX8 > - 8318776: Require supports_cx8 to always be true src/hotspot/share/runtime/vm_version.cpp line 33: > 31: void VM_Version_init() { > 32: VM_Version::initialize(); > 33: guarantee(VM_Version::supports_cx8(), "Support for 64-bit atomic operations in required in this release"); Typo: "in required in". Also, no need to mention "this release" at all? Suggestion for message: "JVM requires platform support for 64-bit atomic operations" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16625#discussion_r1401743607 From aph at openjdk.org Wed Nov 22 09:53:11 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 22 Nov 2023 09:53:11 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v2] In-Reply-To: <-87BuqlmLl6uuGDwGdHl8IB-cbzyyCWGxlm5M-dsTkA=.76849dd6-f3ea-4b11-9222-5adcde9b4b51@github.com> References: <-87BuqlmLl6uuGDwGdHl8IB-cbzyyCWGxlm5M-dsTkA=.76849dd6-f3ea-4b11-9222-5adcde9b4b51@github.com> Message-ID: On Tue, 21 Nov 2023 16:40:33 GMT, Thomas Stuefe wrote: >> In `Metaspace::reserve_address_space_for_compressed_classes`, we reserve space for the future Klass range. We place the Klass range somewhere that allows us to use "good" narrow Klass decoding later when initializing the encoding scheme. >> >> Narrow Klass decoding is inherently CPU-specific, so doing this in shared coding is awkward. It leads to many ifdefs, vague code comments that are difficult to explain, and missed optimizations. >> >> There are common patterns: >> - all platforms benefit from unscaled encoding so trying to reserve <4GB for CDS=off is worthwhile. >> >> But there are more differences than one would think: >> - some platforms (s390, riscv) benefit from reservation < 4GB even with CDS=on since a 32-bit immediate requires fewer instructions >> - some platforms (aarch64) don't benefit from zero-based encoding, so no need to try that >> - some platforms benefit from optimizing the base for 16-bit moves (PPC, s390, aarch64) or for other immediate formats (riscv) >> >> It would be much better to have this section per CPU so that every CPU can implement its perfect, well documented version. A bit of code duplication is a good price for code clarity. >> >> ------------- >> >> This patch splits out `Metaspace::reserve_address_space_for_compressed_classes` into five variants, one per CPU (moving the code to CompressedKlassPointers); it also splits out `CompressedKlassPointers::initialize` into two variants, one for aarch64, one for all other platforms. >> >> Changes per-CPU: >> >> #### aarch64: >> >> Don't attempt to reserve for zero-based encoding; since lsl is not faster than movk. We reserve for movk mode right away if reserve for unscaled fails or if CDS=on. >> >> We also add a last-ditch attempt to reserve optimized for movk via over-alignment. We only do this on aarch64 to prevent errors like this one JDK-8318119: "Invalid narrow Klass base on aarch64 post 8312018" >> >> Since we don't want zero-based encoding, we need an aarch64-specific version for `CompressedKlassPointers::initialize()` >> >> #### riscv: >> >> We attempt to reserve at a "good" base that has only bits set either in [12..32), [32, 44) or in [44, 64). >> >> #### s390: >> >> We attempt to allocate < 4GB unconditionally. > > Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'JDK-8320368-Per-CPU-optimization-of-Klass-range-reservation' of github.com:tstuefe/jdk into JDK-8320368-Per-CPU-optimization-of-Klass-range-reservation > - Regression Test src/hotspot/cpu/aarch64/compressedKlass_aarch64.cpp line 49: > 47: } > 48: > 49: // If that failed, attempt to allocate at any 4G-aligned address. Let the system decide where. For ASLR, One small nit here: encoding in MOVK mode may require more instructions than XOR mode because XOR is `eor dst, src, 0x800000000` but MOVK is `mov dst, src; eor dst, src, 0x800000000`. XOR is always the best, and we should perhaps try it first. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16743#discussion_r1401779226 From stuefe at openjdk.org Wed Nov 22 10:03:08 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 22 Nov 2023 10:03:08 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v2] In-Reply-To: References: <-87BuqlmLl6uuGDwGdHl8IB-cbzyyCWGxlm5M-dsTkA=.76849dd6-f3ea-4b11-9222-5adcde9b4b51@github.com> Message-ID: On Wed, 22 Nov 2023 09:49:56 GMT, Andrew Haley wrote: >> Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'JDK-8320368-Per-CPU-optimization-of-Klass-range-reservation' of github.com:tstuefe/jdk into JDK-8320368-Per-CPU-optimization-of-Klass-range-reservation >> - Regression Test > > src/hotspot/cpu/aarch64/compressedKlass_aarch64.cpp line 49: > >> 47: } >> 48: >> 49: // If that failed, attempt to allocate at any 4G-aligned address. Let the system decide where. For ASLR, > > One small nit here: encoding in MOVK mode may require more instructions than XOR mode because XOR is `eor dst, src, 0x800000000` but MOVK is `mov dst, src; eor dst, src, 0x800000000`. XOR is always the best, and we should perhaps try it first. Oh, you are right. Okay, I will add that. Being able to do CPU-specific stuff without ifdef is like a breath of fresh air. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16743#discussion_r1401794854 From shade at openjdk.org Wed Nov 22 10:38:08 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 22 Nov 2023 10:38:08 GMT Subject: RFR: 8318776: Require supports_cx8 to always be true [v5] In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 08:57:11 GMT, Aleksey Shipilev wrote: > Zero tests are running. Caught the `guarantee` on linux-arm-zero-fastdebug! But that is actually the fault in my previous patch: #16779. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16625#issuecomment-1822510325 From jbhateja at openjdk.org Wed Nov 22 10:40:08 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 22 Nov 2023 10:40:08 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v2] In-Reply-To: <0pYknfG2fhvKS_00IAtIbi39i3uPtsH9hDTiSW7DRRA=.19b12d7e-44a1-46f8-aab9-8198d36d1574@github.com> References: <0pYknfG2fhvKS_00IAtIbi39i3uPtsH9hDTiSW7DRRA=.19b12d7e-44a1-46f8-aab9-8198d36d1574@github.com> Message-ID: On Tue, 21 Nov 2023 21:32:57 GMT, Volodymyr Paprotski wrote: >> Its a member-function pointer, not a regular function pointer.. but `using` function pointer declaration is definitely an improvement I wasn't aware of. Trying to find chapter-and-verse in C++ spec on declaration of member-function pointers and their use with `using`. Will fix. > > Keyword `using` is a rather poor search-engine word, couldn't find examples for member-function pointer declaration with the `using` syntax. So had to look at the standard itself. [C++17 draft](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4659.pdf) > > --- > > I think properly, its called an `alias-declaration` [dcl.dcl]. > > alias-declaration: > using identifier attribute-specifier-seqopt = defining-type-id ; > > > At end of section 11.3.3 [dcl.mptr], seems to imply that function pointer declaration and member-function pointer declarations are different > > [ Note: See also 8.3 and 8.5. The type ?pointer to member? is distinct from the type ?pointer?, that is, a > pointer to member is declared only by the pointer to member declarator syntax, and never by the pointer > declarator syntax. There is no ?reference-to-member? type in C++. ? end note ] > > That to me seems to mean that function pointer and member pointer will follow different BNF grammar rules, and might not arrive to both having support for the `using` declaration. I have been trying to find the BNF for it regardless, but its been many years, standard is big, hours of reading later, no closer to a definitive answer that member-function declaration do not support the using syntax. > > --- > > Trial-and-error, perhaps the compiler would give me some hints as to where to look.. Section 11.1 Type names suggests removing the identifier to get the type: > > It is possible to identify uniquely the location in the abstract-declarator where the identifier would appear if > the construction were a declarator in a declaration. The named type is then the same as the type of the > hypothetical identifier. > > So, I tried syntax similar to what you suggested: > > using vblend = void (MacroAssembler::*)(XMMRegister, XMMRegister, XMMRegister, XMMRegister, int, bool, XMMRegister); > > Does not work: > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp:1134:12: error: expected unqualified-id before '=' token > 1134 | vblend = &MacroAssembler::vblendvpd; > | ^ > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp:1156:17: error: expected primary-expression before ')' token > 1156 | (this->*vblend)(atmp, a, b, mask, vlen_enc, true, btmp); > ``` > First message is bad, the `&` is required per `8.3.1 Unary operators` section 4 : > > A pointer to member is only formed when an explicit & is used and its operand is a qualified-id not enclosed > in parentheses. > > Second message is also bad, per section `8.5 Pointer-to-member operators`, cast, `.*`, `-... Thanks for giving it a try, let it be like the way it is currently implemented. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1401844424 From shade at openjdk.org Wed Nov 22 10:42:14 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 22 Nov 2023 10:42:14 GMT Subject: RFR: 8320582: Zero: Misplaced CX8 enablement flag Message-ID: When doing [JDK-8319777](https://bugs.openjdk.org/browse/JDK-8319777), I misplaced the `_supports_cx8 = true` flag setting in the method that is only called when CPU features are polled from perf counter code. We need to move the check to a proper place. [JDK-8318776](https://github.com/openjdk/jdk/pull/16625/files) would catch fire without this. Additional testing (redoing JDK-8319777 testing): - [x] Linux arm Zero fastdebug now builds fine with JDK-8318776 fix - [ ] Linux x86_32 Zero release; jcstress - [ ] Linux x86_32 Zero fastdebug, `compiler/unsafe java/lang/invoke/VarHandles` - [ ] Linux x86_32 Zero fastdebug, bootcycle-images ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/16779/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16779&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320582 Stats: 12 lines in 1 file changed: 6 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16779.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16779/head:pull/16779 PR: https://git.openjdk.org/jdk/pull/16779 From aboldtch at openjdk.org Wed Nov 22 10:50:15 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 22 Nov 2023 10:50:15 GMT Subject: RFR: 8319700: [AArch64] C2 compilation fails with "Field too big for insn" Message-ID: Not all ZGC C2 BarrierStubs used on aarch64 participates in the laying out of trampoline stubs. (Used enable as many `tbX` instructions as possible.) This leads to to incorrect calculations which may cause the target offset for the `tbX` branch to become to large. This fix changes all the BarriesStubs to stubs which participates in the trampoline logic. Until more platforms requires specialised barrier stub layouts it is not worth adding better support for this pattern. Without a redesign it does make it harder to ensure that this is used correctly. For now the shared code asserts when building for aarch64 that the general shared stubs are not used directly. But care would still have to be taken if any new barrier stubs are introduced. The behaviour was more easily reproducible when large inlining heuristics. This flag combination was used to get somewhat reliable reproducibility `-esa -ea -XX:MaxInlineLevel=300 -XX:MaxInlineSize=1100 -XX:MaxTrivialSize=1000 -XX:LiveNodeCountInliningCutoff=1000000 -XX:MaxNodeLimit=3000000 -XX:NodeLimitFudgeFactor=600000 -XX:+UnlockExperimentalVMOptions -XX:+UseVectorStubs` There was also an observation inside the JBS comments that there where no `tbX` instructions branching to the emitted trampolines. However I was unable to reproduce this. Ran all tests with the following guarantee, this could not observe it either. diff --git a/src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp b/src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp index ebaf1829972..b6c40163a6b 100644 --- a/src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp +++ b/src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp @@ -36,6 +36,7 @@ #include "runtime/icache.hpp" #include "runtime/jniHandles.hpp" #include "runtime/sharedRuntime.hpp" +#include "utilities/debug.hpp" #include "utilities/macros.hpp" #ifdef COMPILER1 #include "c1/c1_LIRAssembler.hpp" @@ -1358,6 +1359,7 @@ void ZLoadBarrierStubC2Aarch64::emit_code(MacroAssembler& masm) { // Current assumption is that the barrier stubs are the first stubs emitted after the actual code assert(stubs_start_offset() <= output->buffer_sizing_data()->_code, "stubs are assumed to be emitted directly after code and code_size is a hard limit on where it can start"); + guarantee(!_test_and_branch_reachable_entry.is_unused(), "Should be used"); __ bind(_test_and_branch_reachable_entry); // Next branch's offset is unknown, but is > branch_offset - Testing - `linux-aarch64`, `linux-aarch64-debug`,`macosx-aarch64`, `macosx-aarch64-debug` - [x] ZGC tier1-tier7 Test Groups - [x] ZGC tier1-tier7 Test Groups with large C2 inlining heuristics - (With test failures due to incompatible flags filtered out) ------------- Commit messages: - 8319700: [AArch64] C2 compilation fails with "Field too big for insn" Changes: https://git.openjdk.org/jdk/pull/16780/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16780&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319700 Stats: 17 lines in 5 files changed: 15 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16780.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16780/head:pull/16780 PR: https://git.openjdk.org/jdk/pull/16780 From rkennke at openjdk.org Wed Nov 22 10:58:14 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 22 Nov 2023 10:58:14 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v3] In-Reply-To: <-87BuqlmLl6uuGDwGdHl8IB-cbzyyCWGxlm5M-dsTkA=.76849dd6-f3ea-4b11-9222-5adcde9b4b51@github.com> References: <-87BuqlmLl6uuGDwGdHl8IB-cbzyyCWGxlm5M-dsTkA=.76849dd6-f3ea-4b11-9222-5adcde9b4b51@github.com> Message-ID: On Wed, 22 Nov 2023 10:57:13 GMT, Thomas Stuefe wrote: >> In `Metaspace::reserve_address_space_for_compressed_classes`, we reserve space for the future Klass range. We place the Klass range somewhere that allows us to use "good" narrow Klass decoding later when initializing the encoding scheme. >> >> Narrow Klass decoding is inherently CPU-specific, so doing this in shared coding is awkward. It leads to many ifdefs, vague code comments that are difficult to explain, and missed optimizations. >> >> There are common patterns: >> - all platforms benefit from unscaled encoding so trying to reserve <4GB for CDS=off is worthwhile. >> >> But there are more differences than one would think: >> - some platforms (s390, riscv) benefit from reservation < 4GB even with CDS=on since a 32-bit immediate requires fewer instructions >> - some platforms (aarch64) don't benefit from zero-based encoding, so no need to try that >> - some platforms benefit from optimizing the base for 16-bit moves (PPC, s390, aarch64) or for other immediate formats (riscv) >> >> It would be much better to have this section per CPU so that every CPU can implement its perfect, well documented version. A bit of code duplication is a good price for code clarity. >> >> ------------- >> >> This patch splits out `Metaspace::reserve_address_space_for_compressed_classes` into five variants, one per CPU (moving the code to CompressedKlassPointers); it also splits out `CompressedKlassPointers::initialize` into two variants, one for aarch64, one for all other platforms. >> >> Changes per-CPU: >> >> #### aarch64: >> >> Don't attempt to reserve for zero-based encoding; since lsl is not faster than movk. We reserve for movk mode right away if reserve for unscaled fails or if CDS=on. >> >> We also add a last-ditch attempt to reserve optimized for movk via over-alignment. We only do this on aarch64 to prevent errors like this one JDK-8318119: "Invalid narrow Klass base on aarch64 post 8312018" >> >> Since we don't want zero-based encoding, we need an aarch64-specific version for `CompressedKlassPointers::initialize()` >> >> #### riscv: >> >> We attempt to reserve at a "good" base that has only bits set either in [12..32), [32, 44) or in [44, 64). >> >> #### s390: >> >> We attempt to allocate < 4GB unconditionally. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'JDK-8320368-Per-CPU-optimization-of-Klass-range-reservation' of github.com:tstuefe/jdk into JDK-8320368-Per-CPU-optimization-of-Klass-range-reservation > - Update compressedKlass_aarch64.cpp src/hotspot/share/cds/metaspaceShared.cpp line 144: > 142: > 143: static bool shared_base_valid(char* shared_base) { > 144: // We check user input for SharedBaseAddress at dump time. We must weed out values Looks like this PR contains #16727 ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16743#discussion_r1401865034 From rkennke at openjdk.org Wed Nov 22 10:58:09 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 22 Nov 2023 10:58:09 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v2] In-Reply-To: <-87BuqlmLl6uuGDwGdHl8IB-cbzyyCWGxlm5M-dsTkA=.76849dd6-f3ea-4b11-9222-5adcde9b4b51@github.com> References: <-87BuqlmLl6uuGDwGdHl8IB-cbzyyCWGxlm5M-dsTkA=.76849dd6-f3ea-4b11-9222-5adcde9b4b51@github.com> Message-ID: <7KBjIWhO2aleeUx2ENkC8pxaX-EQRFaVww7VX2Nf2Yk=.51561a24-bd6d-4723-aa2d-8002ae17a309@github.com> On Tue, 21 Nov 2023 16:40:33 GMT, Thomas Stuefe wrote: >> In `Metaspace::reserve_address_space_for_compressed_classes`, we reserve space for the future Klass range. We place the Klass range somewhere that allows us to use "good" narrow Klass decoding later when initializing the encoding scheme. >> >> Narrow Klass decoding is inherently CPU-specific, so doing this in shared coding is awkward. It leads to many ifdefs, vague code comments that are difficult to explain, and missed optimizations. >> >> There are common patterns: >> - all platforms benefit from unscaled encoding so trying to reserve <4GB for CDS=off is worthwhile. >> >> But there are more differences than one would think: >> - some platforms (s390, riscv) benefit from reservation < 4GB even with CDS=on since a 32-bit immediate requires fewer instructions >> - some platforms (aarch64) don't benefit from zero-based encoding, so no need to try that >> - some platforms benefit from optimizing the base for 16-bit moves (PPC, s390, aarch64) or for other immediate formats (riscv) >> >> It would be much better to have this section per CPU so that every CPU can implement its perfect, well documented version. A bit of code duplication is a good price for code clarity. >> >> ------------- >> >> This patch splits out `Metaspace::reserve_address_space_for_compressed_classes` into five variants, one per CPU (moving the code to CompressedKlassPointers); it also splits out `CompressedKlassPointers::initialize` into two variants, one for aarch64, one for all other platforms. >> >> Changes per-CPU: >> >> #### aarch64: >> >> Don't attempt to reserve for zero-based encoding; since lsl is not faster than movk. We reserve for movk mode right away if reserve for unscaled fails or if CDS=on. >> >> We also add a last-ditch attempt to reserve optimized for movk via over-alignment. We only do this on aarch64 to prevent errors like this one JDK-8318119: "Invalid narrow Klass base on aarch64 post 8312018" >> >> Since we don't want zero-based encoding, we need an aarch64-specific version for `CompressedKlassPointers::initialize()` >> >> #### riscv: >> >> We attempt to reserve at a "good" base that has only bits set either in [12..32), [32, 44) or in [44, 64). >> >> #### s390: >> >> We attempt to allocate < 4GB unconditionally. > > Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'JDK-8320368-Per-CPU-optimization-of-Klass-range-reservation' of github.com:tstuefe/jdk into JDK-8320368-Per-CPU-optimization-of-Klass-range-reservation > - Regression Test Looks ok, only very few comments: src/hotspot/os/linux/os_linux.cpp line 4040: > 4038: > 4039: char* os::pd_attempt_reserve_memory_at(char* requested_addr, size_t bytes, bool exec) { > 4040: Remove the stray newline. ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16743#pullrequestreview-1744097420 PR Review Comment: https://git.openjdk.org/jdk/pull/16743#discussion_r1401862765 From fjiang at openjdk.org Wed Nov 22 11:02:30 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 22 Nov 2023 11:02:30 GMT Subject: RFR: 8320564: RISC-V: Minimal build failed after JDK-8316592 Message-ID: <7HvDfhxI2RCNDFtoV3PPVXYYNRG6Vgp37YMknKYh2l0=.20badcd4-fb74-46e0-8c2c-bb36277866df@github.com> Hi, please review this patch that fix the minimal build failed for riscv. Error log for minimal build: ERROR: Build failed for target 'all' in configuration 'linux-riscv64-minimal-fastdebug' (exit code 2) Stopping javac server === Output from failing command(s) repeated here === * For target hotspot_variant-minimal_libjvm_objs_macroAssembler_riscv.o: /home/jiangfeilong/workspace/jdk/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp: In member function 'void MacroAssembler::wide_madd(Register, Register, Register, Register, Register, Register)': /home/jiangfeilong/workspace/jdk/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp:2064:3: error: 'cad' was not declared in this scope 2064 | cad(sum_lo, sum_lo, tmp1, tmp1); // Add tmp1 to sum_lo with carry output to tmp1 | ^~~ /home/jiangfeilong/workspace/jdk/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp:2065:3: error: 'adc' was not declared in this scope; did you mean 'add'? 2065 | adc(sum_hi, sum_hi, tmp2, tmp1); // Add tmp2 with carry to sum_hi | ^~~ | add * All command lines available in /home/jiangfeilong/workspace/jdk/build/linux-riscv64-minimal-fastdebug/make-support/failure-logs. === End of repeated output === The root cause is that `cad` and `adc` are defined under `COMPILER2` macro but new methods `wide_mul` and `wide_madd` are not. Testing: - [x] linux-riscv minimal fastdebug cross-compile ------------- Commit messages: - RISC-V: Minimal build failed after JDK-8316592 Changes: https://git.openjdk.org/jdk/pull/16781/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16781&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320564 Stats: 42 lines in 2 files changed: 21 ins; 21 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16781.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16781/head:pull/16781 PR: https://git.openjdk.org/jdk/pull/16781 From stuefe at openjdk.org Wed Nov 22 11:12:10 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 22 Nov 2023 11:12:10 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v3] In-Reply-To: References: <-87BuqlmLl6uuGDwGdHl8IB-cbzyyCWGxlm5M-dsTkA=.76849dd6-f3ea-4b11-9222-5adcde9b4b51@github.com> Message-ID: <7Bjj5FP8s4cWnUta3SZY6jSXg8awRjpwFSwVS383uLk=.2688f966-92fd-40ef-b4f4-155ef2108c33@github.com> On Wed, 22 Nov 2023 10:52:43 GMT, Roman Kennke wrote: >> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'JDK-8320368-Per-CPU-optimization-of-Klass-range-reservation' of github.com:tstuefe/jdk into JDK-8320368-Per-CPU-optimization-of-Klass-range-reservation >> - Update compressedKlass_aarch64.cpp > > src/hotspot/share/cds/metaspaceShared.cpp line 144: > >> 142: >> 143: static bool shared_base_valid(char* shared_base) { >> 144: // We check user input for SharedBaseAddress at dump time. We must weed out values > > Looks like this PR contains #16727 ? Its done as dependent PR. If all things worked, 16272 changes should not have showed up here. Not sure what the problem is. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16743#discussion_r1401885371 From duke at openjdk.org Wed Nov 22 11:16:11 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Wed, 22 Nov 2023 11:16:11 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v6] In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 12:05:46 GMT, Yuri Gaevsky wrote: >> Hello All, >> >> Please review these changes to support _vectorizedHashCode intrinsic on >> RISC-V platform. The patch adds the "scalar" code for the intrinsic without >> usage of any RVV instruction but provides manual unrolling of the appropriate >> loop. The code with usage of RVV instruction could be added as follow-up of >> the patch or independently. >> >> Thanks, >> -Yuri Gaevsky >> >> P.S. My OCA has been accepted recently (ygaevsky). >> >> ### Correctness checks >> >> Testing: tier1 tests successfully passed on a RISC-V StarFive JH7110 board with Linux. >> >> ### Performance results (the numbers for non-ints are similar) >> >> #### StarFive JH7110 board: >> >> >> ArraysHashCode: without intrinsic with intrinsic >> ------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> ------------------------------------------------------------------------------- >> multiints 0 avgt 30 2.658 ? 0.001 2.661 ? 0.004 ns/op >> multiints 1 avgt 30 4.881 ? 0.011 4.892 ? 0.015 ns/op >> multiints 2 avgt 30 16.109 ? 0.041 10.451 ? 0.075 ns/op >> multiints 3 avgt 30 14.873 ? 0.068 11.753 ? 0.024 ns/op >> multiints 4 avgt 30 17.283 ? 0.078 13.176 ? 0.044 ns/op >> multiints 5 avgt 30 19.691 ? 0.136 14.723 ? 0.046 ns/op >> multiints 6 avgt 30 21.727 ? 0.166 15.463 ? 0.124 ns/op >> multiints 7 avgt 30 23.790 ? 0.126 18.298 ? 0.059 ns/op >> multiints 8 avgt 30 23.527 ? 0.116 18.267 ? 0.046 ns/op >> multiints 9 avgt 30 27.981 ? 0.303 20.453 ? 0.069 ns/op >> multiints 10 avgt 30 26.947 ? 0.215 20.541 ? 0.051 ns/op >> multiints 50 avgt 30 95.373 ? 0.588 69.238 ? 0.208 ns/op >> multiints 100 avgt 30 177.109 ? 0.525 137.852 ? 0.417 ns/op >> multiints 200 avgt 30 341.074 ? 1.363 296.832 ? 0.725 ns/op >> multiints 500 avgt 30 847.993 ? 1.713 752.415 ? 1.918 ns/op >> multiints 1000 avgt 30 1610.199 ? 5.424 1426.112 ? 3.407 ns/op >> multiints 10000 avgt 30 16234.260 ? 26.789 14447.936 ? 26.345 ns/op >> multiints 100000 avgt 30 170726.025 ? 184.003 152587.649 ? 381.964 ns/op >> ---------------------------------------... > > Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: > > Changed explicit registers name (iRegP_RXX/iRegI_RXX) for ary, cnt and result to their iRegPNoSp/iRegINoSp counterparts. > Changed iRegINoSp->iRegLNoSp for tmp1/tmp4 as they can contain 64-bit values. > Changed effects USE_KILL->USE for ary, removed effect for cnt. macos-x64/windows-x64 failures look unrelated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16629#issuecomment-1822570624 From aph at openjdk.org Wed Nov 22 11:18:07 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 22 Nov 2023 11:18:07 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v2] In-Reply-To: References: <-87BuqlmLl6uuGDwGdHl8IB-cbzyyCWGxlm5M-dsTkA=.76849dd6-f3ea-4b11-9222-5adcde9b4b51@github.com> Message-ID: On Wed, 22 Nov 2023 09:59:59 GMT, Thomas Stuefe wrote: >> src/hotspot/cpu/aarch64/compressedKlass_aarch64.cpp line 49: >> >>> 47: } >>> 48: >>> 49: // If that failed, attempt to allocate at any 4G-aligned address. Let the system decide where. For ASLR, >> >> One small nit here: encoding in MOVK mode may require more instructions than XOR mode because XOR is `eor dst, src, 0x800000000` but MOVK is `mov dst, src; eor dst, src, 0x800000000`. XOR is always the best, and we should perhaps try it first. > > Oh, you are right. Okay, I will add that. Being able to do CPU-specific stuff without ifdef is like a breath of fresh air. The coding scheme is rather weird, making it a little bit tricky to find valid XOR encodings. I don't think we cover all of the single-instruction possibilities. They are: 0x000100000000, 0x000200000000, 0x000300000000, 0x000400000000, 0x000600000000, 0x000700000000, 0x000800000000, 0x000c00000000, 0x000e00000000, 0x000f00000000, 0x001000000000, 0x001800000000, 0x001c00000000, 0x001e00000000, 0x001f00000000, 0x002000000000, 0x003000000000, 0x003800000000, 0x003c00000000, 0x003e00000000, 0x003f00000000, 0x004000000000, 0x006000000000, 0x007000000000, 0x007800000000, 0x007c00000000, 0x007e00000000, 0x007f00000000, 0x008000000000, 0x00c000000000, 0x00e000000000, 0x00f000000000, 0x00f800000000, 0x00fc00000000, 0x00fe00000000, 0x00ff00000000, 0x010000000000, 0x018000000000, 0x01c000000000, 0x01e000000000, 0x01f000000000, 0x01f800000000, 0x01fc00000000, 0x01fe00000000, 0x01ff00000000, 0x020000000000, 0x030000000000, 0x038000000000, 0x03c000000000, 0x03e000000000, 0x03f000000000, 0x03f800000000, 0x03fc00000000, 0x03fe00000000, 0x03ff00000000, 0x040000000000, 0x060000000000, 0x070000000000, 0x078000000000, 0x07c000000000, 0x07e000000000, 0x07f000000000, 0x07f800000000, 0x07fc00000000, 0x07fe00000000, 0x07ff00000000, 0x080000000000, 0x0c0000000000, 0x0e0000000000, 0x0f0000000000, 0x0f8000000000, 0x0fc000000000, 0x0fe000000000, 0x0ff000000000, 0x0ff800000000, 0x0ffc00000000, 0x0ffe00000000, 0x0fff00000000, 0x100000000000, 0x180000000000, 0x1c0000000000, 0x1e0000000000, 0x1f0000000000, 0x1f8000000000, 0x1fc000000000, 0x1fe000000000, 0x1ff000000000, 0x1ff800000000, 0x1ffc00000000, 0x1ffe00000000, 0x1fff00000000, 0x200000000000, 0x300000000000, 0x380000000000, 0x3c0000000000, 0x3e0000000000, 0x3f0000000000, 0x3f8000000000, 0x3fc000000000, 0x3fe000000000, 0x3ff000000000, 0x3ff800000000, 0x3ffc00000000, 0x3ffe00000000, 0x3fff00000000, 0x400000000000, 0x600000000000, 0x700000000000, 0x780000000000, 0x7c0000000000, 0x7e0000000000, 0x7f0000000000, 0x7f8000000000, 0x7fc000000000, 0x7fe000000000, 0x7ff000000000, 0x7ff800000000, 0x7ffc00000000, 0x7ffe00000000, 0x7fff00000000, 0x800000000000, 0xc00000000000, 0xe00000000000, 0xf00000000000, 0xf80000000000, 0xfc0000000000, 0xfe0000000000, 0xff0000000000, 0xff8000000000, 0xffc000000000, 0xffe000000000, 0xfff000000000, 0xfff800000000, 0xfffc00000000, 0xfffe00000000, 0xffff00000000 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16743#discussion_r1401892068 From rkennke at openjdk.org Wed Nov 22 11:23:11 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 22 Nov 2023 11:23:11 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v3] In-Reply-To: <7Bjj5FP8s4cWnUta3SZY6jSXg8awRjpwFSwVS383uLk=.2688f966-92fd-40ef-b4f4-155ef2108c33@github.com> References: <-87BuqlmLl6uuGDwGdHl8IB-cbzyyCWGxlm5M-dsTkA=.76849dd6-f3ea-4b11-9222-5adcde9b4b51@github.com> <7Bjj5FP8s4cWnUta3SZY6jSXg8awRjpwFSwVS383uLk=.2688f966-92fd-40ef-b4f4-155ef2108c33@github.com> Message-ID: On Wed, 22 Nov 2023 11:09:41 GMT, Thomas Stuefe wrote: >> src/hotspot/share/cds/metaspaceShared.cpp line 144: >> >>> 142: >>> 143: static bool shared_base_valid(char* shared_base) { >>> 144: // We check user input for SharedBaseAddress at dump time. We must weed out values >> >> Looks like this PR contains #16727 ? > > Its done as dependent PR. If all things worked, 16272 changes should not have showed up here. Not sure what the problem is. It sometimes requires merging in the latest master branch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16743#discussion_r1401896956 From dholmes at openjdk.org Wed Nov 22 12:36:07 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 22 Nov 2023 12:36:07 GMT Subject: RFR: 8320582: Zero: Misplaced CX8 enablement flag In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 10:33:24 GMT, Aleksey Shipilev wrote: > When doing [JDK-8319777](https://bugs.openjdk.org/browse/JDK-8319777), I misplaced the `_supports_cx8 = true` flag setting in the method that is only called when CPU features are polled from perf counter code. We need to move the check to a proper place. [JDK-8318776](https://github.com/openjdk/jdk/pull/16625/files) would catch fire without this. > > Additional testing (redoing JDK-8319777 testing): > - [x] Linux arm Zero fastdebug now builds fine with JDK-8318776 fix > - [ ] Linux x86_32 Zero release; jcstress > - [ ] Linux x86_32 Zero fastdebug, `compiler/unsafe java/lang/invoke/VarHandles` > - [ ] Linux x86_32 Zero fastdebug, bootcycle-images It is a bit unclear exactly where the right place is in the sense that I'd expect the other variables, like `_no_of_cores`, to be initialized at the same time. But ok. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16779#pullrequestreview-1744276114 From shade at openjdk.org Wed Nov 22 12:47:10 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 22 Nov 2023 12:47:10 GMT Subject: RFR: 8320582: Zero: Misplaced CX8 enablement flag In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 12:33:13 GMT, David Holmes wrote: > It is a bit unclear exactly where the right place is in the sense that I'd expect the other variables, like `_no_of_cores`, to be initialized at the same time. But ok. Yeah. One would expect that after `VM_Version::initialize` is done, all these are set. Which is a proper expectation, and is what the new guarantee in JDK-8318776 apparently relies on. Trivial? I would like to integrate as soon as testing comes back clean. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16779#issuecomment-1822705056 From mdoerr at openjdk.org Wed Nov 22 13:22:14 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 22 Nov 2023 13:22:14 GMT Subject: RFR: 8320418: PPC64: invokevfinal_helper duplicates code to handle ResolvedMethodEntry In-Reply-To: References: Message-ID: <9BHmCcf-qD9pYgAF3OZxTU_lk04NyAn8ts7wJD6biCo=.1bb09638-e2c1-4d2a-8418-1cf11f365c6a@github.com> On Tue, 21 Nov 2023 11:45:10 GMT, Martin Doerr wrote: >> It is out of this scope. And it is a misnomer that will confuse people. > > Agreed. Thanks for the review! @matias9927: `ResolvedMethodEntry::_number_of_parameters` is a misleading name. It's not the number of parameters, it's the number of parameter stack slots. Not every parameter uses only one slot. You may want to improve this in the future. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16741#discussion_r1402034991 From stuefe at openjdk.org Wed Nov 22 13:40:08 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 22 Nov 2023 13:40:08 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v2] In-Reply-To: References: <-87BuqlmLl6uuGDwGdHl8IB-cbzyyCWGxlm5M-dsTkA=.76849dd6-f3ea-4b11-9222-5adcde9b4b51@github.com> Message-ID: On Wed, 22 Nov 2023 11:15:45 GMT, Andrew Haley wrote: >> Oh, you are right. Okay, I will add that. Being able to do CPU-specific stuff without ifdef is like a breath of fresh air. > > The coding scheme is rather weird, making it a little bit tricky to find valid XOR encodings. I don't think we cover all of the single-instruction possibilities. > > They are: > > 0x000100000000, 0x000200000000, 0x000300000000, 0x000400000000, 0x000600000000, 0x000700000000, > 0x000800000000, 0x000c00000000, 0x000e00000000, 0x000f00000000, 0x001000000000, 0x001800000000, > 0x001c00000000, 0x001e00000000, 0x001f00000000, 0x002000000000, 0x003000000000, 0x003800000000, > 0x003c00000000, 0x003e00000000, 0x003f00000000, 0x004000000000, 0x006000000000, 0x007000000000, > 0x007800000000, 0x007c00000000, 0x007e00000000, 0x007f00000000, 0x008000000000, 0x00c000000000, > 0x00e000000000, 0x00f000000000, 0x00f800000000, 0x00fc00000000, 0x00fe00000000, 0x00ff00000000, > 0x010000000000, 0x018000000000, 0x01c000000000, 0x01e000000000, 0x01f000000000, 0x01f800000000, > 0x01fc00000000, 0x01fe00000000, 0x01ff00000000, 0x020000000000, 0x030000000000, 0x038000000000, > 0x03c000000000, 0x03e000000000, 0x03f000000000, 0x03f800000000, 0x03fc00000000, 0x03fe00000000, > 0x03ff00000000, 0x040000000000, 0x060000000000, 0x070000000000, 0x078000000000, 0x07c000000000, > 0x07e000000000, 0x07f000000000, 0x07f800000000, 0x07fc00000000, 0x07fe00000000, 0x07ff00000000, > 0x080000000000, 0x0c0000000000, 0x0e0000000000, 0x0f0000000000, 0x0f8000000000, 0x0fc000000000, > 0x0fe000000000, 0x0ff000000000, 0x0ff800000000, 0x0ffc00000000, 0x0ffe00000000, 0x0fff00000000, > 0x100000000000, 0x180000000000, 0x1c0000000000, 0x1e0000000000, 0x1f0000000000, 0x1f8000000000, > 0x1fc000000000, 0x1fe000000000, 0x1ff000000000, 0x1ff800000000, 0x1ffc00000000, 0x1ffe00000000, > 0x1fff00000000, 0x200000000000, 0x300000000000, 0x380000000000, 0x3c0000000000, 0x3e0000000000, > 0x3f0000000000, 0x3f8000000000, 0x3fc000000000, 0x3fe000000000, 0x3ff000000000, 0x3ff800000000, > 0x3ffc00000000, 0x3ffe00000000, 0x3fff00000000, 0x400000000000, 0x600000000000, 0x700000000000, > 0x780000000000, 0x7c0000000000, 0x7e0000000000, 0x7f0000000000, 0x7f8000000000, 0x7fc000000000, > 0x7fe000000000, 0x7ff000000000, 0x7ff800000000, 0x7ffc00000000, 0x7ffe00000000, 0x7fff00000000, > 0x800000000000, 0xc00000000000, 0xe00000000000, 0xf00000000000, 0xf80000000000, 0xfc0000000000, > 0xfe0000000000, 0xff0000000000, 0xff8000000000, 0xffc000000000, 0xffe000000000, 0xfff000000000, > 0xfff800000000, 0xfffc00000000, 0xfffe00000000, 0xffff00000000 Thanks! I already have a solution with a hard coded table. I think by now everyone has a copy of this table as a gist somewhere :) https://gist.github.com/tstuefe/f6022e5973b0f262a30b23702842331a ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16743#discussion_r1402060956 From stuefe at openjdk.org Wed Nov 22 14:14:35 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 22 Nov 2023 14:14:35 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v3] In-Reply-To: References: Message-ID: > In `Metaspace::reserve_address_space_for_compressed_classes`, we reserve space for the future Klass range. We place the Klass range somewhere that allows us to use "good" narrow Klass decoding later when initializing the encoding scheme. > > Narrow Klass decoding is inherently CPU-specific, so doing this in shared coding is awkward. It leads to many ifdefs, vague code comments that are difficult to explain, and missed optimizations. > > There are common patterns: > - all platforms benefit from unscaled encoding so trying to reserve <4GB for CDS=off is worthwhile. > > But there are more differences than one would think: > - some platforms (s390, riscv) benefit from reservation < 4GB even with CDS=on since a 32-bit immediate requires fewer instructions > - some platforms (aarch64) don't benefit from zero-based encoding, so no need to try that > - some platforms benefit from optimizing the base for 16-bit moves (PPC, s390, aarch64) or for other immediate formats (riscv) > > It would be much better to have this section per CPU so that every CPU can implement its perfect, well documented version. A bit of code duplication is a good price for code clarity. > > ------------- > > This patch splits out `Metaspace::reserve_address_space_for_compressed_classes` into five variants, one per CPU (moving the code to CompressedKlassPointers); it also splits out `CompressedKlassPointers::initialize` into two variants, one for aarch64, one for all other platforms. > > Changes per-CPU: > > #### aarch64: > > Don't attempt to reserve for zero-based encoding; since lsl is not faster than movk. We reserve for movk mode right away if reserve for unscaled fails or if CDS=on. > > We also add a last-ditch attempt to reserve optimized for movk via over-alignment. We only do this on aarch64 to prevent errors like this one JDK-8318119: "Invalid narrow Klass base on aarch64 post 8312018" > > Since we don't want zero-based encoding, we need an aarch64-specific version for `CompressedKlassPointers::initialize()` > > #### riscv: > > We attempt to reserve at a "good" base that has only bits set either in [12..32), [32, 44) or in [44, 64). > > #### s390: > > We attempt to allocate < 4GB unconditionally. Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - EOR mode reservation for aarch64 - Merge branch 'JDK-8320368-Per-CPU-optimization-of-Klass-range-reservation' of github.com:tstuefe/jdk into JDK-8320368-Per-CPU-optimization-of-Klass-range-reservation - Update compressedKlass_aarch64.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16743/files - new: https://git.openjdk.org/jdk/pull/16743/files/ad702f95..943889b0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16743&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16743&range=01-02 Stats: 61 lines in 3 files changed: 57 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16743.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16743/head:pull/16743 PR: https://git.openjdk.org/jdk/pull/16743 From lucy at openjdk.org Wed Nov 22 14:28:14 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 22 Nov 2023 14:28:14 GMT Subject: RFR: 8320418: PPC64: invokevfinal_helper duplicates code to handle ResolvedMethodEntry In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 15:40:10 GMT, Martin Doerr wrote: > `TemplateTable::invokevfinal_helper` should use `TemplateTable::prepare_invoke`. `TemplateInterpreter::invoke_return_entry_table_for` needs to support `_fast_invokevfinal` bytecode for that which is only used by PPC64. (It is probably still beneficial for AIX which doesn't support CDS.) > In addition, I've cleaned up some inaccurate comments. LGTM. Thanks for cleaning up. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16741#pullrequestreview-1744544156 From stefank at openjdk.org Wed Nov 22 14:36:17 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 22 Nov 2023 14:36:17 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v9] In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 08:41:35 GMT, Afshin Zafari wrote: >> The `find` method now is >> ```C++ >> template >> int find(T* token, bool f(T*, E)) const { >> ... >> >> Any other functions which use this are also changed. >> Local linux-x64-debug hotspot:tier1 passed. Mach5 tier1 build on linux and Windows passed. > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > find methods accepts Function and callers provide lambda. Thanks for making this change. I'd like to suggest the following cleanups, some documentation, and a few tests: https://github.com/openjdk/jdk/commit/20d4502471ba396ae395512cfa3dab3f87555421 I think it might be easier to review by looking at the final diff: https://github.com/openjdk/jdk/compare/master...stefank:jdk:pr_15418 ------------- PR Comment: https://git.openjdk.org/jdk/pull/15418#issuecomment-1822887111 From aph at openjdk.org Wed Nov 22 14:43:07 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 22 Nov 2023 14:43:07 GMT Subject: RFR: 8319700: [AArch64] C2 compilation fails with "Field too big for insn" In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 10:44:12 GMT, Axel Boldt-Christmas wrote: > Not all ZGC C2 BarrierStubs used on aarch64 participates in the laying out of trampoline stubs. (Used enable as many `tbX` instructions as possible.) This leads to to incorrect calculations which may cause the target offset for the `tbX` branch to become to large. > > This fix changes all the BarriesStubs to stubs which participates in the trampoline logic. > > Until more platforms requires specialised barrier stub layouts it is not worth adding better support for this pattern. Without a redesign it does make it harder to ensure that this is used correctly. For now the shared code asserts when building for aarch64 that the general shared stubs are not used directly. But care would still have to be taken if any new barrier stubs are introduced. > > The behaviour was more easily reproducible when large inlining heuristics. This flag combination was used to get somewhat reliable reproducibility `-esa -ea -XX:MaxInlineLevel=300 -XX:MaxInlineSize=1100 -XX:MaxTrivialSize=1000 -XX:LiveNodeCountInliningCutoff=1000000 -XX:MaxNodeLimit=3000000 -XX:NodeLimitFudgeFactor=600000 -XX:+UnlockExperimentalVMOptions -XX:+UseVectorStubs` > > There was also an observation inside the JBS comments that there where no `tbX` instructions branching to the emitted trampolines. However I was unable to reproduce this. Ran all tests with the following guarantee, this could not observe it either. > > > diff --git a/src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp b/src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp > index ebaf1829972..b6c40163a6b 100644 > --- a/src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp > +++ b/src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp > @@ -36,6 +36,7 @@ > #include "runtime/icache.hpp" > #include "runtime/jniHandles.hpp" > #include "runtime/sharedRuntime.hpp" > +#include "utilities/debug.hpp" > #include "utilities/macros.hpp" > #ifdef COMPILER1 > #include "c1/c1_LIRAssembler.hpp" > @@ -1358,6 +1359,7 @@ void ZLoadBarrierStubC2Aarch64::emit_code(MacroAssembler& masm) { > // Current assumption is that the barrier stubs are the first stubs emitted after the actual code > assert(stubs_start_offset() <= output->buffer_sizing_data()->_code, "stubs are assumed to be emitted directly after code and code_size is a hard limit on where it can start"); > > + guarantee(!_test_and_branch_reachable_entry.is_unused(), "Should be used"); > __ bind(_test_and_branch_reachable_entry); > > // Next branch's offset is unknown, but is > branch_offset > > > - T... That looks like a reasonable thing to do. In C2 elsewhere we use assembler relaxation to allow the use of `TB`x even in large methods, but this is good enough for now. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16780#pullrequestreview-1744577807 From aph at openjdk.org Wed Nov 22 14:46:11 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 22 Nov 2023 14:46:11 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v3] In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 14:14:35 GMT, Thomas Stuefe wrote: >> In `Metaspace::reserve_address_space_for_compressed_classes`, we reserve space for the future Klass range. We place the Klass range somewhere that allows us to use "good" narrow Klass decoding later when initializing the encoding scheme. >> >> Narrow Klass decoding is inherently CPU-specific, so doing this in shared coding is awkward. It leads to many ifdefs, vague code comments that are difficult to explain, and missed optimizations. >> >> There are common patterns: >> - all platforms benefit from unscaled encoding so trying to reserve <4GB for CDS=off is worthwhile. >> >> But there are more differences than one would think: >> - some platforms (s390, riscv) benefit from reservation < 4GB even with CDS=on since a 32-bit immediate requires fewer instructions >> - some platforms (aarch64) don't benefit from zero-based encoding, so no need to try that >> - some platforms benefit from optimizing the base for 16-bit moves (PPC, s390, aarch64) or for other immediate formats (riscv) >> >> It would be much better to have this section per CPU so that every CPU can implement its perfect, well documented version. A bit of code duplication is a good price for code clarity. >> >> ------------- >> >> This patch splits out `Metaspace::reserve_address_space_for_compressed_classes` into five variants, one per CPU (moving the code to CompressedKlassPointers); it also splits out `CompressedKlassPointers::initialize` into two variants, one for aarch64, one for all other platforms. >> >> Changes per-CPU: >> >> #### aarch64: >> >> Don't attempt to reserve for zero-based encoding; since lsl is not faster than movk. We reserve for movk mode right away if reserve for unscaled fails or if CDS=on. >> >> We also add a last-ditch attempt to reserve optimized for movk via over-alignment. We only do this on aarch64 to prevent errors like this one JDK-8318119: "Invalid narrow Klass base on aarch64 post 8312018" >> >> Since we don't want zero-based encoding, we need an aarch64-specific version for `CompressedKlassPointers::initialize()` >> >> #### riscv: >> >> We attempt to reserve at a "good" base that has only bits set either in [12..32), [32, 44) or in [44, 64). >> >> #### s390: >> >> We attempt to allocate < 4GB unconditionally. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - EOR mode reservation for aarch64 > - Merge branch 'JDK-8320368-Per-CPU-optimization-of-Klass-range-reservation' of github.com:tstuefe/jdk into JDK-8320368-Per-CPU-optimization-of-Klass-range-reservation > - Update compressedKlass_aarch64.cpp src/hotspot/cpu/aarch64/compressedKlass_aarch64.cpp line 60: > 58: 0x7ffc, 0x7ffe, 0x7fff > 59: }; > 60: static constexpr int num_immediates = sizeof(immediates) / sizeof(uint16_t); Suggestion: static constexpr int num_immediates = sizeof immediates / sizeof immediates[0]); ... is more robust against the type of `immediates[]` changing ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16743#discussion_r1402173171 From shade at openjdk.org Wed Nov 22 14:52:10 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 22 Nov 2023 14:52:10 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v6] In-Reply-To: References: <9a-07LkAZPw_SiOpo1uO8uA9i66AVNi9OkJjNZfSHU0=.8f4feee9-2b76-4006-a5ca-f0ce2d71c6b0@github.com> Message-ID: On Tue, 21 Nov 2023 17:05:46 GMT, Patricio Chilano Mateo wrote: > I'll schedule a round of testing for Tiers1-7 with the latest version. Thanks! Any failures so far? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1822913989 From stuefe at openjdk.org Wed Nov 22 15:04:33 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 22 Nov 2023 15:04:33 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v4] In-Reply-To: References: Message-ID: > In `Metaspace::reserve_address_space_for_compressed_classes`, we reserve space for the future Klass range. We place the Klass range somewhere that allows us to use "good" narrow Klass decoding later when initializing the encoding scheme. > > Narrow Klass decoding is inherently CPU-specific, so doing this in shared coding is awkward. It leads to many ifdefs, vague code comments that are difficult to explain, and missed optimizations. > > There are common patterns: > - all platforms benefit from unscaled encoding so trying to reserve <4GB for CDS=off is worthwhile. > > But there are more differences than one would think: > - some platforms (s390, riscv) benefit from reservation < 4GB even with CDS=on since a 32-bit immediate requires fewer instructions > - some platforms (aarch64) don't benefit from zero-based encoding, so no need to try that > - some platforms benefit from optimizing the base for 16-bit moves (PPC, s390, aarch64) or for other immediate formats (riscv) > > It would be much better to have this section per CPU so that every CPU can implement its perfect, well documented version. A bit of code duplication is a good price for code clarity. > > ------------- > > This patch splits out `Metaspace::reserve_address_space_for_compressed_classes` into five variants, one per CPU (moving the code to CompressedKlassPointers); it also splits out `CompressedKlassPointers::initialize` into two variants, one for aarch64, one for all other platforms. > > Changes per-CPU: > > #### aarch64: > > Don't attempt to reserve for zero-based encoding; since lsl is not faster than movk. We reserve for movk mode right away if reserve for unscaled fails or if CDS=on. > > We also add a last-ditch attempt to reserve optimized for movk via over-alignment. We only do this on aarch64 to prevent errors like this one JDK-8318119: "Invalid narrow Klass base on aarch64 post 8312018" > > Since we don't want zero-based encoding, we need an aarch64-specific version for `CompressedKlassPointers::initialize()` > > #### riscv: > > We attempt to reserve at a "good" base that has only bits set either in [12..32), [32, 44) or in [44, 64). > > #### s390: > > We attempt to allocate < 4GB unconditionally. Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision: - fix mistake - feedback andrew - Merge upstream - EOR mode reservation for aarch64 - Merge branch 'JDK-8320368-Per-CPU-optimization-of-Klass-range-reservation' of github.com:tstuefe/jdk into JDK-8320368-Per-CPU-optimization-of-Klass-range-reservation - Update compressedKlass_aarch64.cpp - Regression Test - JDK-8320368-Per-CPU-optimization-of-Klass-range-reservation - JDK-8320382-Remove-CompressedKlassPointers-is_valid_base ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16743/files - new: https://git.openjdk.org/jdk/pull/16743/files/943889b0..e4e23388 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16743&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16743&range=02-03 Stats: 10112 lines in 372 files changed: 6424 ins; 1337 del; 2351 mod Patch: https://git.openjdk.org/jdk/pull/16743.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16743/head:pull/16743 PR: https://git.openjdk.org/jdk/pull/16743 From stuefe at openjdk.org Wed Nov 22 15:04:36 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 22 Nov 2023 15:04:36 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v3] In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 14:14:35 GMT, Thomas Stuefe wrote: >> In `Metaspace::reserve_address_space_for_compressed_classes`, we reserve space for the future Klass range. We place the Klass range somewhere that allows us to use "good" narrow Klass decoding later when initializing the encoding scheme. >> >> Narrow Klass decoding is inherently CPU-specific, so doing this in shared coding is awkward. It leads to many ifdefs, vague code comments that are difficult to explain, and missed optimizations. >> >> There are common patterns: >> - all platforms benefit from unscaled encoding so trying to reserve <4GB for CDS=off is worthwhile. >> >> But there are more differences than one would think: >> - some platforms (s390, riscv) benefit from reservation < 4GB even with CDS=on since a 32-bit immediate requires fewer instructions >> - some platforms (aarch64) don't benefit from zero-based encoding, so no need to try that >> - some platforms benefit from optimizing the base for 16-bit moves (PPC, s390, aarch64) or for other immediate formats (riscv) >> >> It would be much better to have this section per CPU so that every CPU can implement its perfect, well documented version. A bit of code duplication is a good price for code clarity. >> >> ------------- >> >> This patch splits out `Metaspace::reserve_address_space_for_compressed_classes` into five variants, one per CPU (moving the code to CompressedKlassPointers); it also splits out `CompressedKlassPointers::initialize` into two variants, one for aarch64, one for all other platforms. >> >> Changes per-CPU: >> >> #### aarch64: >> >> Don't attempt to reserve for zero-based encoding; since lsl is not faster than movk. We reserve for movk mode right away if reserve for unscaled fails or if CDS=on. >> >> We also add a last-ditch attempt to reserve optimized for movk via over-alignment. We only do this on aarch64 to prevent errors like this one JDK-8318119: "Invalid narrow Klass base on aarch64 post 8312018" >> >> Since we don't want zero-based encoding, we need an aarch64-specific version for `CompressedKlassPointers::initialize()` >> >> #### riscv: >> >> We attempt to reserve at a "good" base that has only bits set either in [12..32), [32, 44) or in [44, 64). >> >> #### s390: >> >> We attempt to allocate < 4GB unconditionally. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - EOR mode reservation for aarch64 > - Merge branch 'JDK-8320368-Per-CPU-optimization-of-Klass-range-reservation' of github.com:tstuefe/jdk into JDK-8320368-Per-CPU-optimization-of-Klass-range-reservation > - Update compressedKlass_aarch64.cpp waiting on robbin ------------- PR Comment: https://git.openjdk.org/jdk/pull/16743#issuecomment-1822932406 From stefank at openjdk.org Wed Nov 22 15:07:34 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 22 Nov 2023 15:07:34 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object Message-ID: In the rewrites made for: [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. The provided tests provoke this assert form: * the JNI thread detach code * thread dumping with locked monitors, and * the JVMTI GetOwnedMonitorInfo API. While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. Test: the written tests with and without the fix. Tier1-Tier3, so far. ------------- Commit messages: - 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object Changes: https://git.openjdk.org/jdk/pull/16783/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16783&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320515 Stats: 258 lines in 8 files changed: 253 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16783.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16783/head:pull/16783 PR: https://git.openjdk.org/jdk/pull/16783 From pchilanomate at openjdk.org Wed Nov 22 15:11:13 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 22 Nov 2023 15:11:13 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v8] In-Reply-To: <4JD2RhqD5U439s7se-vjGnNIglIAvr9g91J2ZOtyPvk=.9b7ceb98-0772-4ff1-ad05-f48a3e09bb4b@github.com> References: <4JD2RhqD5U439s7se-vjGnNIglIAvr9g91J2ZOtyPvk=.9b7ceb98-0772-4ff1-ad05-f48a3e09bb4b@github.com> Message-ID: On Tue, 21 Nov 2023 10:35:14 GMT, Aleksey Shipilev wrote: >> See the symptoms, reproducer and analysis in the bug. >> >> Current code waits on `disarm()`, which effectively stalls leaving the safepoint if some threads lag behind. Having more runnable threads than CPUs nearly guarantees that we would wait for quite some time, but it also reproduces well if you have enough threads near the CPU count. >> >> This PR implements a more efficient `GenericWaitBarrier` to recover the performance. Most of the implementation discussion is in the code comments. The key observation that drives this work is that we want to reuse `Semaphore` and related counters without being stuck waiting for threads to leave. (AFAICS, futex-based `LinuxWaitBarrier` does roughly the same, but handles this reuse on futex side, by assigning the "address" per futex.) >> >> This issue affects everything except Linux. I initially found this on my M1 Mac, but pretty sure it is easy to reproduce on Windows as well. The safepoints from the reproducer in the bug improved dramatically on a Mac, see the graph below. The new version gives **orders of magnitude** better safepoint times. This also translates to much more active GC and attainable allocating rate, because GC throughput is not blocked by overly long safepoints. >> >> ![plot-generic-wait-barrier-macos](https://github.com/openjdk/jdk/assets/1858943/3440f200-2360-482a-b8b2-71385cca35b7) >> >> Additional testing: >> - [x] MacOS AArch64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] MacOS AArch64 server fastdebug, `tier2 tier3` >> - [x] Linux x86_64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Do not SpinYield at disarm loop > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Drop the Linux check in preparation for integration > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Rework paddings > - Encode barrier tag into state, resolving another race condition > - Simple review feedback fixes: tracking wakeup numbers, reflowing some methods > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - ... and 6 more: https://git.openjdk.org/jdk/compare/20538340...e56a2bfa I run Tiers[1-7] and there is one failure in tier5 in test vmTestbase/nsk/monitoring/stress/thread/strace016/TestDescription.java on windows-x64-debug. The output is: #> #> WARNING: switching log to verbose mode, #> because error is complained #> ThreadMonitor> Test mode: DIRECTLY access to MBean ThreadController> number of created threads: 30 ThreadController> depth for all threads: 100 ThreadController> invocation type: mixed Starting threads. ThreadController> locking threads States of the threads are culminated. # ERROR: Thread BLOCKED_ThreadMM001 wrong thread state: RUNNABLE The following stacktrace is for failure analysis. nsk.share.TestFailure: Thread BLOCKED_ThreadMM001 wrong thread state: RUNNABLE at nsk.share.Log.logExceptionForFailureAnalysis(Log.java:431) at nsk.share.Log.complain(Log.java:402) at nsk.monitoring.stress.thread.strace010.runIt(strace010.java:148) at nsk.monitoring.stress.thread.strace010.run(strace010.java:99) at nsk.monitoring.stress.thread.strace010.main(strace010.java:95) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138) at java.base/java.lang.Thread.run(Thread.java:1570) Checked 6 BLOCKED threads # ERROR: Expected amount: 7 for BLOCKED threads actual: 6 Checked 7 WAITING threads Checked 8 TIMED_WAITING threads Checked 9 RUNNABLE threads # ERROR: Expected amount: 8 for RUNNABLE threads actual: 9 Test FAILED #> #> SUMMARY: Following errors occured #> during test execution: #> # ERROR: Thread BLOCKED_ThreadMM001 wrong thread state: RUNNABLE # ERROR: Expected amount: 7 for BLOCKED threads actual: 6 # ERROR: Expected amount: 8 for RUNNABLE threads actual: 9 I re-run tier5 twice and the test alone 100 times but unfortunately couldn't reproduce the issue. I checked the history of failures and haven't seen this failed before. But it could also be that there is some race already in the test uncovered by this patch. There are some jobs pending for macos-x64 (there is currently a bottleneck in the pipeline for this platform). ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1822949463 From dl at openjdk.org Wed Nov 22 15:53:12 2023 From: dl at openjdk.org (Doug Lea) Date: Wed, 22 Nov 2023 15:53:12 GMT Subject: RFR: 8318776: Require supports_cx8 to always be true [v5] In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 02:09:38 GMT, David Holmes wrote: >> As discussed in JBS all platforms (some tweaks to Zero are in progress) actually do support `cx8` i.e. 64-bit compare-and-exchange, so we can strip out the locked-based alternatives to using it and just add a guarantee that it is true at runtime. And all platforms except some ARM variants set `SUPPORTS_NATIVE_CX8`, so we can greatly simplify things. Summary of changes: >> - `_supports_cx8` field is only needed when `SUPPORTS_NATIVE_CX8` is not defined >> - Assertions for `supports_cx8()` are removed >> - Compiler predicates requiring `supports_cx8()` are removed >> - Access backend is greatly simplified without the need for lock-based alternative >> - `java.util.concurrent.AtomicLongFieldUpdater` is simplified without the need for a lock-based alternative >> >> I did consider moving all the ARM `kuser_helper` related code to be only defined when `SUPPORTS_NATIVE_CX8` is not defined, but there was a theoretical risk this could change the behaviour if ARMv7 binaries were run on other ARM CPU's. I added a note to that effect in vm_version_linux_arm32.cpp so the ARM port maintainers could clean this up further if desired. >> >> Testing: >> - All Oracle tiers 1-5 builds (which includes an ARMv7 build) >> - GHA builds/tests >> - Oracle tiers 1-3 sanity testing >> >> Zero changes coming in via [JDK-8319777](https://bugs.openjdk.org/browse/JDK-8319777) will be merged when they arrive. >> >> Thanks. > > David Holmes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge with master and update Zero code accordingly > - Merge branch 'master' into 8318776-supports_cx8 > - Remove unnecessary includes of vm_version.hpp. > Fix copyright years. > - Remove cx8 comment as no longer relevant (the spinlock is used regardless of cx8) > - Remove suports_cx8() checks from gtest > - Remove test for VMSupportsCX8 > - 8318776: Require supports_cx8 to always be true The deletion of backup code and the check for it in java.util.concurrent.AtomicLongFieldUpdater are clearly OK. We always thought the need for it was transient. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16625#issuecomment-1823026095 From shade at openjdk.org Wed Nov 22 15:56:12 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 22 Nov 2023 15:56:12 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v8] In-Reply-To: <4JD2RhqD5U439s7se-vjGnNIglIAvr9g91J2ZOtyPvk=.9b7ceb98-0772-4ff1-ad05-f48a3e09bb4b@github.com> References: <4JD2RhqD5U439s7se-vjGnNIglIAvr9g91J2ZOtyPvk=.9b7ceb98-0772-4ff1-ad05-f48a3e09bb4b@github.com> Message-ID: On Tue, 21 Nov 2023 10:35:14 GMT, Aleksey Shipilev wrote: >> See the symptoms, reproducer and analysis in the bug. >> >> Current code waits on `disarm()`, which effectively stalls leaving the safepoint if some threads lag behind. Having more runnable threads than CPUs nearly guarantees that we would wait for quite some time, but it also reproduces well if you have enough threads near the CPU count. >> >> This PR implements a more efficient `GenericWaitBarrier` to recover the performance. Most of the implementation discussion is in the code comments. The key observation that drives this work is that we want to reuse `Semaphore` and related counters without being stuck waiting for threads to leave. (AFAICS, futex-based `LinuxWaitBarrier` does roughly the same, but handles this reuse on futex side, by assigning the "address" per futex.) >> >> This issue affects everything except Linux. I initially found this on my M1 Mac, but pretty sure it is easy to reproduce on Windows as well. The safepoints from the reproducer in the bug improved dramatically on a Mac, see the graph below. The new version gives **orders of magnitude** better safepoint times. This also translates to much more active GC and attainable allocating rate, because GC throughput is not blocked by overly long safepoints. >> >> ![plot-generic-wait-barrier-macos](https://github.com/openjdk/jdk/assets/1858943/3440f200-2360-482a-b8b2-71385cca35b7) >> >> Additional testing: >> - [x] MacOS AArch64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] MacOS AArch64 server fastdebug, `tier2 tier3` >> - [x] Linux x86_64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Do not SpinYield at disarm loop > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Drop the Linux check in preparation for integration > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Rework paddings > - Encode barrier tag into state, resolving another race condition > - Simple review feedback fixes: tracking wakeup numbers, reflowing some methods > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - ... and 6 more: https://git.openjdk.org/jdk/compare/136b359b...e56a2bfa Thanks for testing! > I run Tiers[1-7] and there is one failure in tier5 in test vmTestbase/nsk/monitoring/stress/thread/strace016/TestDescription.java on windows-x64-debug. I re-run tier5 twice and the test alone 100 times but unfortunately couldn't reproduce the issue. I checked the history of failures and haven't seen this failed before. But it could also be that there is some race already in the test uncovered by this patch. Yes, I think so too. I ran this test hundreds of times without failure. The output implies there is a thread that should be "blocked", but instead it is "runnable". I think the test itself contains the race condition, submitted: https://bugs.openjdk.org/browse/JDK-8320599. I would not treat this failure as integration blocker then. Do you think we should wait for Mac pipeline to complete? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1823032177 From rehn at openjdk.org Wed Nov 22 15:59:14 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 22 Nov 2023 15:59:14 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v8] In-Reply-To: References: <4JD2RhqD5U439s7se-vjGnNIglIAvr9g91J2ZOtyPvk=.9b7ceb98-0772-4ff1-ad05-f48a3e09bb4b@github.com> Message-ID: On Wed, 22 Nov 2023 15:08:45 GMT, Patricio Chilano Mateo wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8318986-generic-wait-barrier >> - Do not SpinYield at disarm loop >> - Merge branch 'master' into JDK-8318986-generic-wait-barrier >> - Drop the Linux check in preparation for integration >> - Merge branch 'master' into JDK-8318986-generic-wait-barrier >> - Merge branch 'master' into JDK-8318986-generic-wait-barrier >> - Rework paddings >> - Encode barrier tag into state, resolving another race condition >> - Simple review feedback fixes: tracking wakeup numbers, reflowing some methods >> - Merge branch 'master' into JDK-8318986-generic-wait-barrier >> - ... and 6 more: https://git.openjdk.org/jdk/compare/36c83bd9...e56a2bfa > > I run Tiers[1-7] and there is one failure in tier5 in test vmTestbase/nsk/monitoring/stress/thread/strace016/TestDescription.java on windows-x64-debug. The output is: > > > #> > #> WARNING: switching log to verbose mode, > #> because error is complained > #> > ThreadMonitor> Test mode: DIRECTLY access to MBean > ThreadController> number of created threads: 30 > ThreadController> depth for all threads: 100 > ThreadController> invocation type: mixed > > Starting threads. > > ThreadController> locking threads > > States of the threads are culminated. > # ERROR: Thread BLOCKED_ThreadMM001 wrong thread state: RUNNABLE > The following stacktrace is for failure analysis. > nsk.share.TestFailure: Thread BLOCKED_ThreadMM001 wrong thread state: RUNNABLE > at nsk.share.Log.logExceptionForFailureAnalysis(Log.java:431) > at nsk.share.Log.complain(Log.java:402) > at nsk.monitoring.stress.thread.strace010.runIt(strace010.java:148) > at nsk.monitoring.stress.thread.strace010.run(strace010.java:99) > at nsk.monitoring.stress.thread.strace010.main(strace010.java:95) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) > at java.base/java.lang.reflect.Method.invoke(Method.java:580) > at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138) > at java.base/java.lang.Thread.run(Thread.java:1570) > > Checked 6 BLOCKED threads > # ERROR: Expected amount: 7 for BLOCKED threads actual: 6 > Checked 7 WAITING threads > Checked 8 TIMED_WAITING threads > Checked 9 RUNNABLE threads > # ERROR: Expected amount: 8 for RUNNABLE threads actual: 9 > > > Test FAILED > > > #> > #> SUMMARY: Following errors occured > #> during test execution: > #> > # ERROR: Thread BLOCKED_ThreadMM001 wrong thread state: RUNNABLE > # ERROR: Expected amount: 7 for BLOCKED threads actual: 6 > # ERROR: Expected amount: 8 for RUNNABLE threads actual: 9 > > > I re-run tier5 twice and the test alone 100 times but unfortunately couldn't reproduce the issue. I checked the history of failures and haven't seen this failed before. But it could also be that there is some race already in the test uncovered by this patch. > > There are some jobs pending for macos-x64 (there is currently a bottleneck in the pipeline for this platform). @pchilano you can change the waitBarrier.hpp so Linux also uses the generic one, as @shipilev did when he tested: #if defined(LINUX) #include "waitBarrier_linux.hpp" And just use "typedef GenericWaitBarrier WaitBarrierDefault;" For better coverage. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1823035846 From stuefe at openjdk.org Wed Nov 22 16:04:13 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 22 Nov 2023 16:04:13 GMT Subject: RFR: 8257076: os::scan_pages is empty on all platforms In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 15:20:33 GMT, Sonia Zaldana Calles wrote: > The function os::scan_pages was only ever implemented in Solaris and then removed in [JDK-8244224](https://bugs.openjdk.org/browse/JDK-8244224) > > All other platforms have empty implementations and the interface is not optimal as os::scan_pages expects the range to have just one page size, while in reality it can have multiple. > > This PR removes this interface, ensuing empty implementations and all dead code related to page scanning. > > Testing: Tier 1. Good! ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16740#pullrequestreview-1744797838 From szaldana at openjdk.org Wed Nov 22 16:04:15 2023 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Wed, 22 Nov 2023 16:04:15 GMT Subject: Integrated: 8257076: os::scan_pages is empty on all platforms In-Reply-To: References: Message-ID: <85BMQ3juD7jSC3xhKFTdMsaqz98HGGbdj2UJ4mXZOvI=.195aa1db-098f-4404-b60f-c5f52369d5b3@github.com> On Mon, 20 Nov 2023 15:20:33 GMT, Sonia Zaldana Calles wrote: > The function os::scan_pages was only ever implemented in Solaris and then removed in [JDK-8244224](https://bugs.openjdk.org/browse/JDK-8244224) > > All other platforms have empty implementations and the interface is not optimal as os::scan_pages expects the range to have just one page size, while in reality it can have multiple. > > This PR removes this interface, ensuing empty implementations and all dead code related to page scanning. > > Testing: Tier 1. This pull request has now been integrated. Changeset: 35526d02 Author: Sonia Zaldana Calles Committer: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/35526d02c3fc6c31112a97a510d000c357b7e308 Stats: 83 lines in 7 files changed: 0 ins; 82 del; 1 mod 8257076: os::scan_pages is empty on all platforms Reviewed-by: dholmes, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/16740 From pchilanomate at openjdk.org Wed Nov 22 16:05:11 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 22 Nov 2023 16:05:11 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v8] In-Reply-To: References: <4JD2RhqD5U439s7se-vjGnNIglIAvr9g91J2ZOtyPvk=.9b7ceb98-0772-4ff1-ad05-f48a3e09bb4b@github.com> Message-ID: On Wed, 22 Nov 2023 15:53:45 GMT, Aleksey Shipilev wrote: > Thanks for testing! > > > I run Tiers[1-7] and there is one failure in tier5 in test vmTestbase/nsk/monitoring/stress/thread/strace016/TestDescription.java on windows-x64-debug. I re-run tier5 twice and the test alone 100 times but unfortunately couldn't reproduce the issue. I checked the history of failures and haven't seen this failed before. But it could also be that there is some race already in the test uncovered by this patch. > > Yes, I think so too. I ran this test hundreds of times without failure. The output implies there is a thread that should be "blocked", but instead it is "runnable". I think the test itself contains the race condition, submitted: https://bugs.openjdk.org/browse/JDK-8320599. I would not treat this failure as integration blocker then. > Yes, I think the issue is in ThreadController.java with Blocker.block(). I'll keep investigating to see if I can reproduce it. > Do you think we should wait for Mac pipeline to complete? > I'm not sure when this tasks will finish. I think we should be good with all the testing done so far. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1823047768 From tschatzl at openjdk.org Wed Nov 22 16:09:27 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 22 Nov 2023 16:09:27 GMT Subject: RFR: 8317809: Insertion of free code blobs into code cache can be very slow during class unloading [v2] In-Reply-To: <_dcFF70_w7IXSjb6w-HuHCBkPyS3a6NlzejtqdfdYnM=.74e0f9df-eca3-49b0-be68-2d5824c16003@github.com> References: <_dcFF70_w7IXSjb6w-HuHCBkPyS3a6NlzejtqdfdYnM=.74e0f9df-eca3-49b0-be68-2d5824c16003@github.com> Message-ID: > Insert code blobs in a sorted fashion to exploit the finger-optimization when adding, making this procedure O(n) instead of O(n^2) > > Introduces a globally available ClassUnloadingContext that contains common methods pertaining to class and code unloading. GCs may use it to efficiently manage unlinked class loader datas and nmethods to allow use of common methods (unlink/merge). > > The steps typically are registering a new to be unlinked CLD/nmethod, and then purge its memory later. STW collectors perform this work in one big chunk taking the CodeCache_lock, for the entire duration, while concurrent collectors lock/unlock for every insertion to allow for concurrent users for the lock to progress. > > Some care has been taken to stay consistent with an "unloading = unlinking + purge" scheme; however particularly the existing CLD handling API (still) mixes unlinking and purging in its CLD::unload() call. To simplify this change that is mostly geared towards separating nmethod unlinking from purging, to make code blob freeing O(n) instead of O(n^2). > > Upcoming changes will > * separate nmethod unregistering from nmethod purging to allow doing that in bulk (for the STW collectors); that can significantly reduce code purging time for the STW collectors. > * better name the second stage of unlinking (called "cleaning" throughout, e.g. the work done in `G1CollectedHeap::complete_cleaning`) > * untangle CLD unlinking and what's called "cleaning" now to allow moving more stuff into the second unlinking stage for better parallelism > * G1: move some significant tasks from the remark pause to concurrent (unregistering nmethods, freeing code blobs and cld/metaspace purging) > * Maybe move Serial/Parallel GC metaspace purging closer to other unlinking/purging code to keep things local and allow easier logging. > > Please also first looking into the (small) PR this depends on. > > The crash on linux-x86 is fixed by PR#16766 which I split out for quicker reviews. > > Testing: tier1-7 > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: iwalulya review, naming ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16759/files - new: https://git.openjdk.org/jdk/pull/16759/files/d63ff4a4..448232df Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16759&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16759&range=00-01 Stats: 14 lines in 2 files changed: 0 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/16759.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16759/head:pull/16759 PR: https://git.openjdk.org/jdk/pull/16759 From pchilanomate at openjdk.org Wed Nov 22 16:10:18 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 22 Nov 2023 16:10:18 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v8] In-Reply-To: References: <4JD2RhqD5U439s7se-vjGnNIglIAvr9g91J2ZOtyPvk=.9b7ceb98-0772-4ff1-ad05-f48a3e09bb4b@github.com> Message-ID: On Wed, 22 Nov 2023 16:02:28 GMT, Patricio Chilano Mateo wrote: >> Thanks for testing! >> >>> I run Tiers[1-7] and there is one failure in tier5 in test vmTestbase/nsk/monitoring/stress/thread/strace016/TestDescription.java on windows-x64-debug. I re-run tier5 twice and the test alone 100 times but unfortunately couldn't reproduce the issue. I checked the history of failures and haven't seen this failed before. But it could also be that there is some race already in the test uncovered by this patch. >> >> Yes, I think so too. I ran this test hundreds of times without failure. The output implies there is a thread that should be "blocked", but instead it is "runnable". I think the test itself contains the race condition, submitted: https://bugs.openjdk.org/browse/JDK-8320599. I would not treat this failure as integration blocker then. >> >> Do you think we should wait for Mac pipeline to complete? > >> Thanks for testing! >> >> > I run Tiers[1-7] and there is one failure in tier5 in test vmTestbase/nsk/monitoring/stress/thread/strace016/TestDescription.java on windows-x64-debug. I re-run tier5 twice and the test alone 100 times but unfortunately couldn't reproduce the issue. I checked the history of failures and haven't seen this failed before. But it could also be that there is some race already in the test uncovered by this patch. >> >> Yes, I think so too. I ran this test hundreds of times without failure. The output implies there is a thread that should be "blocked", but instead it is "runnable". I think the test itself contains the race condition, submitted: https://bugs.openjdk.org/browse/JDK-8320599. I would not treat this failure as integration blocker then. >> > Yes, I think the issue is in ThreadController.java with Blocker.block(). I'll keep investigating to see if I can reproduce it. > >> Do you think we should wait for Mac pipeline to complete? >> > I'm not sure when this tasks will finish. I think we should be good with all the testing done so far. > @pchilano you can change the waitBarrier.hpp so Linux also uses the generic one, as @shipilev did when he tested: > > ``` > #if defined(LINUX) > #include "waitBarrier_linux.hpp" > ``` > > And just use "typedef GenericWaitBarrier WaitBarrierDefault;" > > For better coverage. > I actually realized of this yesterday after the jobs have been running for a while. So I submitted extra runs with that change to test Linux too. I run Tiers[4-7]. Tier7 completed successfully, and Tiers[4-6] is almost done too with no failures. There are again some macos-x64 jobs that are pending. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1823057946 From mdoerr at openjdk.org Wed Nov 22 16:12:25 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 22 Nov 2023 16:12:25 GMT Subject: RFR: 8320418: PPC64: invokevfinal_helper duplicates code to handle ResolvedMethodEntry In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 15:40:10 GMT, Martin Doerr wrote: > `TemplateTable::invokevfinal_helper` should use `TemplateTable::prepare_invoke`. `TemplateInterpreter::invoke_return_entry_table_for` needs to support `_fast_invokevfinal` bytecode for that which is only used by PPC64. (It is probably still beneficial for AIX which doesn't support CDS.) > In addition, I've cleaned up some inaccurate comments. Thanks! GHA Pre-submit test errors are unrelated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16741#issuecomment-1823056610 From stuefe at openjdk.org Wed Nov 22 16:12:30 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 22 Nov 2023 16:12:30 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v5] In-Reply-To: References: Message-ID: > In `Metaspace::reserve_address_space_for_compressed_classes`, we reserve space for the future Klass range. We place the Klass range somewhere that allows us to use "good" narrow Klass decoding later when initializing the encoding scheme. > > Narrow Klass decoding is inherently CPU-specific, so doing this in shared coding is awkward. It leads to many ifdefs, vague code comments that are difficult to explain, and missed optimizations. > > There are common patterns: > - all platforms benefit from unscaled encoding so trying to reserve <4GB for CDS=off is worthwhile. > > But there are more differences than one would think: > - some platforms (s390, riscv) benefit from reservation < 4GB even with CDS=on since a 32-bit immediate requires fewer instructions > - some platforms (aarch64) don't benefit from zero-based encoding, so no need to try that > - some platforms benefit from optimizing the base for 16-bit moves (PPC, s390, aarch64) or for other immediate formats (riscv) > > It would be much better to have this section per CPU so that every CPU can implement its perfect, well documented version. A bit of code duplication is a good price for code clarity. > > ------------- > > This patch splits out `Metaspace::reserve_address_space_for_compressed_classes` into five variants, one per CPU (moving the code to CompressedKlassPointers); it also splits out `CompressedKlassPointers::initialize` into two variants, one for aarch64, one for all other platforms. > > Changes per-CPU: > > #### aarch64: > > Don't attempt to reserve for zero-based encoding; since lsl is not faster than movk. We reserve for movk mode right away if reserve for unscaled fails or if CDS=on. > > We also add a last-ditch attempt to reserve optimized for movk via over-alignment. We only do this on aarch64 to prevent errors like this one JDK-8318119: "Invalid narrow Klass base on aarch64 post 8312018" > > Since we don't want zero-based encoding, we need an aarch64-specific version for `CompressedKlassPointers::initialize()` > > #### riscv: > > We attempt to reserve at a "good" base that has only bits set either in [12..32), [32, 44) or in [44, 64). > > #### s390: > > We attempt to allocate < 4GB unconditionally. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: fix macos ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16743/files - new: https://git.openjdk.org/jdk/pull/16743/files/e4e23388..c15f361a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16743&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16743&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16743.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16743/head:pull/16743 PR: https://git.openjdk.org/jdk/pull/16743 From mdoerr at openjdk.org Wed Nov 22 16:12:27 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 22 Nov 2023 16:12:27 GMT Subject: Integrated: 8320418: PPC64: invokevfinal_helper duplicates code to handle ResolvedMethodEntry In-Reply-To: References: Message-ID: <5NE1On7ESfyLMmqN0kLRlgqG7V6QQt0IYELihQscnGU=.9d563bed-a588-4cff-9cdd-7b7832d47217@github.com> On Mon, 20 Nov 2023 15:40:10 GMT, Martin Doerr wrote: > `TemplateTable::invokevfinal_helper` should use `TemplateTable::prepare_invoke`. `TemplateInterpreter::invoke_return_entry_table_for` needs to support `_fast_invokevfinal` bytecode for that which is only used by PPC64. (It is probably still beneficial for AIX which doesn't support CDS.) > In addition, I've cleaned up some inaccurate comments. This pull request has now been integrated. Changeset: 524da141 Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/524da141e7976cb136fa1769714a01235cd39508 Stats: 38 lines in 3 files changed: 5 ins; 16 del; 17 mod 8320418: PPC64: invokevfinal_helper duplicates code to handle ResolvedMethodEntry Reviewed-by: rrich, lucy ------------- PR: https://git.openjdk.org/jdk/pull/16741 From duke at openjdk.org Wed Nov 22 16:19:30 2023 From: duke at openjdk.org (suchismith1993) Date: Wed, 22 Nov 2023 16:19:30 GMT Subject: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX Message-ID: <8-buFPL9W3149qcnluk_XqTQr-cJYqu_XvwU5ovyAIA=.396e5005-f896-48b9-919c-94164229d7bf@github.com> JDK-8320005 : Native library suffix impact on hotspot code in AIX ------------- Commit messages: - Adapt hotspot coding style - Improve comments and coding style. - Remove macro for file extension. - Move mapping function to aix specific file. - Introduce new macro for AIX archives. - Add support for .a extension in jvm agent. Changes: https://git.openjdk.org/jdk/pull/16604/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16604&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320005 Stats: 24 lines in 3 files changed: 24 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16604.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16604/head:pull/16604 PR: https://git.openjdk.org/jdk/pull/16604 From duke at openjdk.org Wed Nov 22 16:24:24 2023 From: duke at openjdk.org (suchismith1993) Date: Wed, 22 Nov 2023 16:24:24 GMT Subject: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2] In-Reply-To: <8-buFPL9W3149qcnluk_XqTQr-cJYqu_XvwU5ovyAIA=.396e5005-f896-48b9-919c-94164229d7bf@github.com> References: <8-buFPL9W3149qcnluk_XqTQr-cJYqu_XvwU5ovyAIA=.396e5005-f896-48b9-919c-94164229d7bf@github.com> Message-ID: <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> > JDK-8320005 : Native library suffix impact on hotspot code in AIX suchismith1993 has updated the pull request incrementally with one additional commit since the last revision: change macro position ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16604/files - new: https://git.openjdk.org/jdk/pull/16604/files/6fdfba81..077083d2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16604&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16604&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16604.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16604/head:pull/16604 PR: https://git.openjdk.org/jdk/pull/16604 From stuefe at openjdk.org Wed Nov 22 16:26:28 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 22 Nov 2023 16:26:28 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v6] In-Reply-To: References: Message-ID: > In `Metaspace::reserve_address_space_for_compressed_classes`, we reserve space for the future Klass range. We place the Klass range somewhere that allows us to use "good" narrow Klass decoding later when initializing the encoding scheme. > > Narrow Klass decoding is inherently CPU-specific, so doing this in shared coding is awkward. It leads to many ifdefs, vague code comments that are difficult to explain, and missed optimizations. > > There are common patterns: > - all platforms benefit from unscaled encoding so trying to reserve <4GB for CDS=off is worthwhile. > > But there are more differences than one would think: > - some platforms (s390, riscv) benefit from reservation < 4GB even with CDS=on since a 32-bit immediate requires fewer instructions > - some platforms (aarch64) don't benefit from zero-based encoding, so no need to try that > - some platforms benefit from optimizing the base for 16-bit moves (PPC, s390, aarch64) or for other immediate formats (riscv) > > It would be much better to have this section per CPU so that every CPU can implement its perfect, well documented version. A bit of code duplication is a good price for code clarity. > > ------------- > > This patch splits out `Metaspace::reserve_address_space_for_compressed_classes` into five variants, one per CPU (moving the code to CompressedKlassPointers); it also splits out `CompressedKlassPointers::initialize` into two variants, one for aarch64, one for all other platforms. > > Changes per-CPU: > > #### aarch64: > > Don't attempt to reserve for zero-based encoding; since lsl is not faster than movk. We reserve for movk mode right away if reserve for unscaled fails or if CDS=on. > > We also add a last-ditch attempt to reserve optimized for movk via over-alignment. We only do this on aarch64 to prevent errors like this one JDK-8318119: "Invalid narrow Klass base on aarch64 post 8312018" > > Since we don't want zero-based encoding, we need an aarch64-specific version for `CompressedKlassPointers::initialize()` > > #### riscv: > > We attempt to reserve at a "good" base that has only bits set either in [12..32), [32, 44) or in [44, 64). > > #### s390: > > We attempt to allocate < 4GB unconditionally. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: remove stray newline ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16743/files - new: https://git.openjdk.org/jdk/pull/16743/files/c15f361a..354bd0c1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16743&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16743&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16743.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16743/head:pull/16743 PR: https://git.openjdk.org/jdk/pull/16743 From stuefe at openjdk.org Wed Nov 22 16:38:11 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 22 Nov 2023 16:38:11 GMT Subject: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2] In-Reply-To: <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> References: <8-buFPL9W3149qcnluk_XqTQr-cJYqu_XvwU5ovyAIA=.396e5005-f896-48b9-919c-94164229d7bf@github.com> <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> Message-ID: On Wed, 22 Nov 2023 16:24:24 GMT, suchismith1993 wrote: >> JDK-8320005 : Native library suffix impact on hotspot code in AIX > > suchismith1993 has updated the pull request incrementally with one additional commit since the last revision: > > change macro position Hi, is this patch meant for review already? If yes, could you please describe the problem you fix, and how you fix it? If no, I suggest working on it in draft state till its ready for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1823105203 From matsaave at openjdk.org Wed Nov 22 16:48:16 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 22 Nov 2023 16:48:16 GMT Subject: RFR: 8320530: has_resolved_ref_index flag not restored after resetting entry Message-ID: <4Bi8mWx5pxfYciHnXHUla1X_BzUt_56q8MoRkCYc0dk=.50072528-a5d9-4708-995b-c0bd49f8c74b@github.com> ResolvedMethodEntry::reset_entry() clears the fields in the structure and then restores the constant pool index and the resolved references index if it has one. Currently, the resolved references index is restored without restoring the has_resolved_reference flag used by the structure. This patch restored the flag with the resolved references index. ------------- Commit messages: - 8320530: has_resolved_ref_index flag not restored after resetting entry Changes: https://git.openjdk.org/jdk/pull/16769/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16769&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320530 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16769.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16769/head:pull/16769 PR: https://git.openjdk.org/jdk/pull/16769 From tschatzl at openjdk.org Wed Nov 22 16:55:08 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 22 Nov 2023 16:55:08 GMT Subject: RFR: 8317809: Insertion of free code blobs into code cache can be very slow during class unloading [v2] In-Reply-To: References: <_dcFF70_w7IXSjb6w-HuHCBkPyS3a6NlzejtqdfdYnM=.74e0f9df-eca3-49b0-be68-2d5824c16003@github.com> Message-ID: On Wed, 22 Nov 2023 16:09:27 GMT, Thomas Schatzl wrote: >> Insert code blobs in a sorted fashion to exploit the finger-optimization when adding, making this procedure O(n) instead of O(n^2) >> >> Introduces a globally available ClassUnloadingContext that contains common methods pertaining to class and code unloading. GCs may use it to efficiently manage unlinked class loader datas and nmethods to allow use of common methods (unlink/merge). >> >> The steps typically are registering a new to be unlinked CLD/nmethod, and then purge its memory later. STW collectors perform this work in one big chunk taking the CodeCache_lock, for the entire duration, while concurrent collectors lock/unlock for every insertion to allow for concurrent users for the lock to progress. >> >> Some care has been taken to stay consistent with an "unloading = unlinking + purge" scheme; however particularly the existing CLD handling API (still) mixes unlinking and purging in its CLD::unload() call. To simplify this change that is mostly geared towards separating nmethod unlinking from purging, to make code blob freeing O(n) instead of O(n^2). >> >> Upcoming changes will >> * separate nmethod unregistering from nmethod purging to allow doing that in bulk (for the STW collectors); that can significantly reduce code purging time for the STW collectors. >> * better name the second stage of unlinking (called "cleaning" throughout, e.g. the work done in `G1CollectedHeap::complete_cleaning`) >> * untangle CLD unlinking and what's called "cleaning" now to allow moving more stuff into the second unlinking stage for better parallelism >> * G1: move some significant tasks from the remark pause to concurrent (unregistering nmethods, freeing code blobs and cld/metaspace purging) >> * Maybe move Serial/Parallel GC metaspace purging closer to other unlinking/purging code to keep things local and allow easier logging. >> >> These are the reason for the class hierarchy for `ClassUnloadingContext`: the goal is to ultimately have about this phasing (for G1): >> 1. collect all dead CLDs, using the `register_unloading_class_loader_data` method *only* >> 2. parallelize the stuff in `ClassLoaderData::unload()` in one way or another, adding them to the `complete_cleaning` (parallel) phase. >> 3. `purge_nmethods`, `free_code_blobs` and the `remove_unlinked_nmethods_from_code_root_set` (from JDK-8317007) will be concurrent. >> >> Particularly the split of `SystemDictionary::do_unloading` into "only" traversing the CLDs to find the dead ones and then in parallel process them in 2. a... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > iwalulya review, naming I added some explanation of why there is a class hierarchy for `ClassUnloadingContext` in the description (and some further background). ------------- PR Comment: https://git.openjdk.org/jdk/pull/16759#issuecomment-1823139932 From adinn at openjdk.org Wed Nov 22 16:58:05 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 22 Nov 2023 16:58:05 GMT Subject: RFR: 8320530: has_resolved_ref_index flag not restored after resetting entry In-Reply-To: <4Bi8mWx5pxfYciHnXHUla1X_BzUt_56q8MoRkCYc0dk=.50072528-a5d9-4708-995b-c0bd49f8c74b@github.com> References: <4Bi8mWx5pxfYciHnXHUla1X_BzUt_56q8MoRkCYc0dk=.50072528-a5d9-4708-995b-c0bd49f8c74b@github.com> Message-ID: On Tue, 21 Nov 2023 16:38:14 GMT, Matias Saavedra Silva wrote: > ResolvedMethodEntry::reset_entry() clears the fields in the structure and then restores the constant pool index and the resolved references index if it has one. Currently, the resolved references index is restored without restoring the has_resolved_reference flag used by the structure. > > This patch restored the flag with the resolved references index. Verified with tier 1-5 tests. Nice catch! ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16769#pullrequestreview-1744927229 From shade at openjdk.org Wed Nov 22 16:59:08 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 22 Nov 2023 16:59:08 GMT Subject: RFR: 8320582: Zero: Misplaced CX8 enablement flag In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 10:33:24 GMT, Aleksey Shipilev wrote: > When doing [JDK-8319777](https://bugs.openjdk.org/browse/JDK-8319777), I misplaced the `_supports_cx8 = true` flag setting in the method that is only called when CPU features are polled from perf counter code. We need to move the check to a proper place. [JDK-8318776](https://github.com/openjdk/jdk/pull/16625/files) would catch fire without this. > > Additional testing (redoing JDK-8319777 testing): > - [x] Linux arm Zero fastdebug now builds fine with JDK-8318776 fix > - [x] Linux x86_32 Zero release; jcstress > - [x] Linux x86_32 Zero fastdebug, `compiler/unsafe java/lang/invoke/VarHandles` > - [x] Linux x86_32 Zero fastdebug, bootcycle-images Testing passes. I would like to get this in now, if we agree this is trivial. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16779#issuecomment-1823145366 From shade at openjdk.org Wed Nov 22 17:05:13 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 22 Nov 2023 17:05:13 GMT Subject: RFR: 8318986: Improve GenericWaitBarrier performance [v8] In-Reply-To: <4JD2RhqD5U439s7se-vjGnNIglIAvr9g91J2ZOtyPvk=.9b7ceb98-0772-4ff1-ad05-f48a3e09bb4b@github.com> References: <4JD2RhqD5U439s7se-vjGnNIglIAvr9g91J2ZOtyPvk=.9b7ceb98-0772-4ff1-ad05-f48a3e09bb4b@github.com> Message-ID: On Tue, 21 Nov 2023 10:35:14 GMT, Aleksey Shipilev wrote: >> See the symptoms, reproducer and analysis in the bug. >> >> Current code waits on `disarm()`, which effectively stalls leaving the safepoint if some threads lag behind. Having more runnable threads than CPUs nearly guarantees that we would wait for quite some time, but it also reproduces well if you have enough threads near the CPU count. >> >> This PR implements a more efficient `GenericWaitBarrier` to recover the performance. Most of the implementation discussion is in the code comments. The key observation that drives this work is that we want to reuse `Semaphore` and related counters without being stuck waiting for threads to leave. (AFAICS, futex-based `LinuxWaitBarrier` does roughly the same, but handles this reuse on futex side, by assigning the "address" per futex.) >> >> This issue affects everything except Linux. I initially found this on my M1 Mac, but pretty sure it is easy to reproduce on Windows as well. The safepoints from the reproducer in the bug improved dramatically on a Mac, see the graph below. The new version gives **orders of magnitude** better safepoint times. This also translates to much more active GC and attainable allocating rate, because GC throughput is not blocked by overly long safepoints. >> >> ![plot-generic-wait-barrier-macos](https://github.com/openjdk/jdk/assets/1858943/3440f200-2360-482a-b8b2-71385cca35b7) >> >> Additional testing: >> - [x] MacOS AArch64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) >> - [x] MacOS AArch64 server fastdebug, `tier2 tier3` >> - [x] Linux x86_64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) >> - [x] Linux AArch64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Do not SpinYield at disarm loop > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Drop the Linux check in preparation for integration > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - Rework paddings > - Encode barrier tag into state, resolving another race condition > - Simple review feedback fixes: tracking wakeup numbers, reflowing some methods > - Merge branch 'master' into JDK-8318986-generic-wait-barrier > - ... and 6 more: https://git.openjdk.org/jdk/compare/c3fed00a...e56a2bfa All right then. I will integrate today, hopefully within an hour. Thank you all! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16404#issuecomment-1823154852 From duke at openjdk.org Wed Nov 22 17:30:09 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Wed, 22 Nov 2023 17:30:09 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v2] In-Reply-To: References: Message-ID: On Tue, 21 Nov 2023 18:24:54 GMT, Jatin Bhateja wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge remote-tracking branch 'jdk/master' into vp-ecore2 >> - review comments >> - emulate vblend on ecores > > test/hotspot/jtreg/compiler/vectorization/TestSignumVector.java line 112: > >> 110: if (fout[i] != 1.0) throw new RuntimeException("Expected positive numbers in second half of array: " + java.util.Arrays.toString(fout)); >> 111: } >> 112: } > > Its ok to add correctness check here, but test only intend to perform check IR validations, there are detailed function tests in following files > test/hotspot/jtreg/compiler/intrinsics/math/TestSignumIntrinsic.java > test/hotspot/jtreg/compiler/c2/cr6340864/TestFloatVect.java > test/hotspot/jtreg/compiler/c2/cr6340864/TestDoubleVect.java I am ok to remove this change to the test.. I didn't know where the other tests where and by the time I did find those, already added this. (Figured "more test === good", but its just duplicate) > test/hotspot/jtreg/compiler/vectorization/runner/BasicFloatOpTest.java line 119: > >> 117: } >> 118: } >> 119: > > Test performs IR validation, you can also update existing functional test with more test values. > test/hotspot/jtreg/compiler/intrinsics/math/TestFpMinMaxIntrinsics.java Same as above, I am ok to remove this change to the test too.. I didn't know where the other tests where and by the time I did find those, already added this. (Figured "more test === good", but its just duplicate) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1402452121 PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1402452663 From mbaesken at openjdk.org Wed Nov 22 17:46:22 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 22 Nov 2023 17:46:22 GMT Subject: RFR: JDK-8320383: refresh libraries cache on AIX in VMError::report [v2] In-Reply-To: References: Message-ID: > VMError::report outputs information about loaded shared libraries; but this info could be outdated on AIX because the libraries cache is currently not refreshed before printing. > This is similar to [JDK-8318587](https://bugs.openjdk.org/browse/JDK-8318587) . > The refresh in VMError::report could be omitted in some situations where the refresh call is considered problematic. Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: add prepare_native_symbols ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16730/files - new: https://git.openjdk.org/jdk/pull/16730/files/25f0b037..1df7a280 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16730&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16730&range=00-01 Stats: 18 lines in 6 files changed: 15 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16730.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16730/head:pull/16730 PR: https://git.openjdk.org/jdk/pull/16730 From dcubed at openjdk.org Wed Nov 22 17:48:13 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 22 Nov 2023 17:48:13 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v6] In-Reply-To: <9qMIC_BQk5i5MmbQLovTmNsla_qMxlgCCZhyK8eHHSc=.c7d958c4-deaa-4264-a3f4-1907240d26d2@github.com> References: <9qMIC_BQk5i5MmbQLovTmNsla_qMxlgCCZhyK8eHHSc=.c7d958c4-deaa-4264-a3f4-1907240d26d2@github.com> Message-ID: <25xmq9JFWlz_2L3NXWK8ghR5N2UlPE-uELjIZoyDvyg=.40c3316f-2580-4b2b-ad65-ec649e6a1c0a@github.com> On Tue, 21 Nov 2023 14:45:23 GMT, Axel Boldt-Christmas wrote: >> Implements the runtime part of JDK-8319796. >> The different CPU implementations are/will be created as dependent pull requests. >> >> This enhancement proposes introducing the ability for LM_LIGHTWEIGHT to handle consecutive recursive monitor enter. Limiting the implementation to only consecutive monitor enters allows for more efficient emitted code which only needs to look at the two top most entires on the lock stack to determine what to do in a monitor exit. >> >> A high level overview: >> * Locking is still performed on the mark word >> * Unlocked (0b01) <=> Locked (0b00) >> * Monitor enter on Obj with mark word Unlocked (0b01) is the same >> * Transition Obj's mark word Unlocked (0b01) => Locked (0b00) >> * Push Obj onto the lock stack >> * Success >> * Monitor enter on Obj with mark word Locked (0b00) will check the top entry on the lock stack >> * If top entry is Obj >> * Push Obj on the lock stack >> * Success >> * If top entry is not Obj >> * Inflate and call ObjectMonitor::enter >> * Monitor exit on Obj with mark word Locked (0b00) will check the two top entries on the lock stack >> * If just the top entry is Obj >> * Transition Obj's mark word Locked (0b00) => Unlocked (0b01) >> * Pop the entry >> * Success >> * If both entries are Obj >> * Pop the top entry >> * Success >> * Any other case only occurs for unstructured locking, then just inflate and call ObjectMonitor::exit >> * If the monitor has been inflated for object Obj which is owned by the current thread >> * All corresponding entries for Obj is removed from the lock stack >> * The monitor recursions is set to the number of removed entries - 1 >> * The owner is changed from anonymous to the thread >> * The regular ObjectMonitor::action is called. > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge remote-tracking branch 'upstream_jdk/pr/16603' into JDK-8319797 > - Merge remote-tracking branch 'upstream_jdk/pr/16603' into JDK-8319797 > - Merge remote-tracking branch 'upstream_jdk/pr/16603' into JDK-8319797 > - Fix nit > - Fix comment typos > - 8319797: Recursive lightweight locking: Runtime implementation I've finished my first read-thru of this patch. I've made a few comments and I need to mull on the changes some and will do another pass. src/hotspot/share/runtime/lockStack.inline.hpp line 127: > 125: int end = to_index(_top); > 126: if (end <= 1 || _base[end - 1] != o || _base[end - 2] != o) { > 127: // The two topmost oop does not match o. nit typo: s/The two topmost oop does not match o./The two topmost oops do not match o. src/hotspot/share/runtime/lockStack.inline.hpp line 144: > 142: for (int i = 0; i < end; i++) { > 143: if (_base[i] != o) { > 144: _base[inserted++] = _base[i]; This version of the removal algorithm stores into every `base[inserted]` memory location even when `inserted == i` before the first instance of `o` is found and logically removed. Granted the lock stack is only 8 elements, but storing into every memory location when you don't need to is wasteful. src/hotspot/share/runtime/synchronizer.cpp line 498: > 496: p2i(monitor->owner()), p2i(current), monitor->object()->mark_acquire().value()); > 497: assert(!lock_stack.is_full(), "must have made room here"); > 498: } This is an interesting idea. I'll have to see how you test this code later on... src/hotspot/share/runtime/synchronizer.cpp line 504: > 502: // Retry until a lock state change has been observed. cas_set_mark() may collide with non lock bits modifications. > 503: // Try to swing into 'fast-locked' state. > 504: assert(!lock_stack.contains(obj()), "thread must not already hold the lock"); It looks like the indent from L502 -> L515 is too much by two spaces. This could be a GitHub view glitch... test/hotspot/gtest/runtime/test_lockStack.cpp line 36: > 34: ls._base[ls.to_index(ls._top)] = obj; > 35: ls._top += oopSize; > 36: nit - please delete extra blank line. ------------- PR Review: https://git.openjdk.org/jdk/pull/16606#pullrequestreview-1743218757 PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1401262103 PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1401280804 PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1402448765 PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1402421724 PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1402461377 From shade at openjdk.org Wed Nov 22 17:58:23 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 22 Nov 2023 17:58:23 GMT Subject: Integrated: 8318986: Improve GenericWaitBarrier performance In-Reply-To: References: Message-ID: <8CU_dt_DNR_X-pGhemC_NDm7btjP52A5TDaF6ntOdSw=.af82ea2f-daa9-4068-bcf7-dc2e191d46d8@github.com> On Fri, 27 Oct 2023 15:40:11 GMT, Aleksey Shipilev wrote: > See the symptoms, reproducer and analysis in the bug. > > Current code waits on `disarm()`, which effectively stalls leaving the safepoint if some threads lag behind. Having more runnable threads than CPUs nearly guarantees that we would wait for quite some time, but it also reproduces well if you have enough threads near the CPU count. > > This PR implements a more efficient `GenericWaitBarrier` to recover the performance. Most of the implementation discussion is in the code comments. The key observation that drives this work is that we want to reuse `Semaphore` and related counters without being stuck waiting for threads to leave. (AFAICS, futex-based `LinuxWaitBarrier` does roughly the same, but handles this reuse on futex side, by assigning the "address" per futex.) > > This issue affects everything except Linux. I initially found this on my M1 Mac, but pretty sure it is easy to reproduce on Windows as well. The safepoints from the reproducer in the bug improved dramatically on a Mac, see the graph below. The new version gives **orders of magnitude** better safepoint times. This also translates to much more active GC and attainable allocating rate, because GC throughput is not blocked by overly long safepoints. > > ![plot-generic-wait-barrier-macos](https://github.com/openjdk/jdk/assets/1858943/3440f200-2360-482a-b8b2-71385cca35b7) > > Additional testing: > - [x] MacOS AArch64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) > - [x] Linux AArch64 server fastdebug, `tier1 tier2 tier3` (generic wait barrier enabled explicitly) > - [x] MacOS AArch64 server fastdebug, `tier2 tier3` > - [x] Linux x86_64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) > - [x] Linux AArch64 server fastdebug, `tier4` (generic wait barrier enabled explicitly) This pull request has now been integrated. Changeset: 30462f9d Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/30462f9da40d3a7ec18fcf46e2154fabb5fd4753 Stats: 286 lines in 2 files changed: 225 ins; 12 del; 49 mod 8318986: Improve GenericWaitBarrier performance Reviewed-by: rehn, iwalulya, pchilanomate ------------- PR: https://git.openjdk.org/jdk/pull/16404 From dcubed at openjdk.org Wed Nov 22 18:39:10 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 22 Nov 2023 18:39:10 GMT Subject: RFR: 8318776: Require supports_cx8 to always be true [v5] In-Reply-To: References: Message-ID: <6TkAoQRK_49MtT6wjb_JAwFzUATAvAx_rX9yxMa9Vfs=.7d1d5bf6-f088-44df-9d01-cc336bdeecaa@github.com> On Wed, 22 Nov 2023 02:09:38 GMT, David Holmes wrote: >> As discussed in JBS all platforms (some tweaks to Zero are in progress) actually do support `cx8` i.e. 64-bit compare-and-exchange, so we can strip out the locked-based alternatives to using it and just add a guarantee that it is true at runtime. And all platforms except some ARM variants set `SUPPORTS_NATIVE_CX8`, so we can greatly simplify things. Summary of changes: >> - `_supports_cx8` field is only needed when `SUPPORTS_NATIVE_CX8` is not defined >> - Assertions for `supports_cx8()` are removed >> - Compiler predicates requiring `supports_cx8()` are removed >> - Access backend is greatly simplified without the need for lock-based alternative >> - `java.util.concurrent.AtomicLongFieldUpdater` is simplified without the need for a lock-based alternative >> >> I did consider moving all the ARM `kuser_helper` related code to be only defined when `SUPPORTS_NATIVE_CX8` is not defined, but there was a theoretical risk this could change the behaviour if ARMv7 binaries were run on other ARM CPU's. I added a note to that effect in vm_version_linux_arm32.cpp so the ARM port maintainers could clean this up further if desired. >> >> Testing: >> - All Oracle tiers 1-5 builds (which includes an ARMv7 build) >> - GHA builds/tests >> - Oracle tiers 1-3 sanity testing >> >> Zero changes coming in via [JDK-8319777](https://bugs.openjdk.org/browse/JDK-8319777) will be merged when they arrive. >> >> Thanks. > > David Holmes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge with master and update Zero code accordingly > - Merge branch 'master' into 8318776-supports_cx8 > - Remove unnecessary includes of vm_version.hpp. > Fix copyright years. > - Remove cx8 comment as no longer relevant (the spinlock is used regardless of cx8) > - Remove suports_cx8() checks from gtest > - Remove test for VMSupportsCX8 > - 8318776: Require supports_cx8 to always be true Wow! This PR is much larger than I expected. Thumbs up! ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16625#pullrequestreview-1745089631 From dcubed at openjdk.org Wed Nov 22 18:39:15 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 22 Nov 2023 18:39:15 GMT Subject: RFR: 8318776: Require supports_cx8 to always be true [v5] In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 08:48:09 GMT, Aleksey Shipilev wrote: >> David Holmes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Merge with master and update Zero code accordingly >> - Merge branch 'master' into 8318776-supports_cx8 >> - Remove unnecessary includes of vm_version.hpp. >> Fix copyright years. >> - Remove cx8 comment as no longer relevant (the spinlock is used regardless of cx8) >> - Remove suports_cx8() checks from gtest >> - Remove test for VMSupportsCX8 >> - 8318776: Require supports_cx8 to always be true > > src/hotspot/cpu/x86/vm_version_x86.cpp line 819: > >> 817: } >> 818: >> 819: _supports_cx8 = supports_cmpxchg8(); > > I think we should leave the runtime check here (under `ifndef`, like in ARM?). This covers the remaining case of running on legacy x86 without CX8 implemented: the init guarantee would then fire and prevent any other surprises at runtime. Sure, it would be hard to come up with such a platform today, but it would be safer to refuse to run there right away on the off-chance someone actually has it :) @shipilev - Do you have a particular legacy x86 in mind? > src/hotspot/share/runtime/vm_version.cpp line 33: > >> 31: void VM_Version_init() { >> 32: VM_Version::initialize(); >> 33: guarantee(VM_Version::supports_cx8(), "Support for 64-bit atomic operations in required in this release"); > > Typo: "in required in". Also, no need to mention "this release" at all? > Suggestion for message: "JVM requires platform support for 64-bit atomic operations" Or the simpler change: s/in required/is required/ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16625#discussion_r1402515036 PR Review Comment: https://git.openjdk.org/jdk/pull/16625#discussion_r1402528045 From psandoz at openjdk.org Wed Nov 22 19:09:06 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 22 Nov 2023 19:09:06 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: References: <97YS_I-DY-Q5agE6mE-iBkoVxtvL7R4Q3NebjTsXMvI=.dac0dc99-84d7-4cbd-ada6-5190564688a9@github.com> Message-ID: On Wed, 22 Nov 2023 09:05:31 GMT, Andrew Haley wrote: > > Have you considered the possibility of copying the sleef source to the OpenJDK repository and thereby it becomes part of the build process? I don't know how straightforward that is technically and IANAL but I think it's worth exploring. > > Hi @PaulSandoz ! Thanks for the suggestion! Copying the sleef source sounds good. However, I actually have no idea about how to handle the third-party licence in OpenJDK project. Do you have any idea about this area? Some suggestions/guidence from the JDK team will be much helpful. Thanks! We (Oracle Java Platform Group) can handle the required "paperwork" on any third party dependencies and attribution of copyright before any PR can be integrated, if you can help detail what those are. First i think we need to determine if this is feasible e.g., copying a subset and integrating it into the build system, since it does not make sense to bring in the support for quad floats and DFT, which IIUC brings in a dependency on compiler support for OpenMP. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1823335443 From ihse at openjdk.org Wed Nov 22 19:30:13 2023 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 22 Nov 2023 19:30:13 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 01:32:00 GMT, Xiaohong Gong wrote: >> Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). >> >> SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. >> >> To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. >> >> Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. >> >> [1] https://github.com/openjdk/jdk/pull/3638 >> [2] https://sleef.org/ >> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ >> [4] https://packages.debian.org/bookworm/libsleef3 >> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Add a bundled native lib in jdk as a bridge to libsleef > - Merge 'jdk:master' into JDK-8312425 > - Disable sleef by default > - Merge 'jdk:master' into JDK-8312425 > - 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF I think it can be a good idea to start with using an external library, as is done in this PR. For all our other bundled libraries, we also always support the option of building with the "system" library instead of the bundled one, so this code will still be required. If we should push this PR first, and then add the source in a separate PR, or if we should bake everything together (both external library and bundled sources) into a single PR, I cannot say. But I expect it is easier to take it piece-wise. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1823387045 From dcubed at openjdk.org Wed Nov 22 21:36:09 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 22 Nov 2023 21:36:09 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v6] In-Reply-To: <9qMIC_BQk5i5MmbQLovTmNsla_qMxlgCCZhyK8eHHSc=.c7d958c4-deaa-4264-a3f4-1907240d26d2@github.com> References: <9qMIC_BQk5i5MmbQLovTmNsla_qMxlgCCZhyK8eHHSc=.c7d958c4-deaa-4264-a3f4-1907240d26d2@github.com> Message-ID: On Tue, 21 Nov 2023 14:45:23 GMT, Axel Boldt-Christmas wrote: >> Implements the runtime part of JDK-8319796. >> The different CPU implementations are/will be created as dependent pull requests. >> >> This enhancement proposes introducing the ability for LM_LIGHTWEIGHT to handle consecutive recursive monitor enter. Limiting the implementation to only consecutive monitor enters allows for more efficient emitted code which only needs to look at the two top most entires on the lock stack to determine what to do in a monitor exit. >> >> A high level overview: >> * Locking is still performed on the mark word >> * Unlocked (0b01) <=> Locked (0b00) >> * Monitor enter on Obj with mark word Unlocked (0b01) is the same >> * Transition Obj's mark word Unlocked (0b01) => Locked (0b00) >> * Push Obj onto the lock stack >> * Success >> * Monitor enter on Obj with mark word Locked (0b00) will check the top entry on the lock stack >> * If top entry is Obj >> * Push Obj on the lock stack >> * Success >> * If top entry is not Obj >> * Inflate and call ObjectMonitor::enter >> * Monitor exit on Obj with mark word Locked (0b00) will check the two top entries on the lock stack >> * If just the top entry is Obj >> * Transition Obj's mark word Locked (0b00) => Unlocked (0b01) >> * Pop the entry >> * Success >> * If both entries are Obj >> * Pop the top entry >> * Success >> * Any other case only occurs for unstructured locking, then just inflate and call ObjectMonitor::exit >> * If the monitor has been inflated for object Obj which is owned by the current thread >> * All corresponding entries for Obj is removed from the lock stack >> * The monitor recursions is set to the number of removed entries - 1 >> * The owner is changed from anonymous to the thread >> * The regular ObjectMonitor::action is called. > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge remote-tracking branch 'upstream_jdk/pr/16603' into JDK-8319797 > - Merge remote-tracking branch 'upstream_jdk/pr/16603' into JDK-8319797 > - Merge remote-tracking branch 'upstream_jdk/pr/16603' into JDK-8319797 > - Fix nit > - Fix comment typos > - 8319797: Recursive lightweight locking: Runtime implementation I've done another review pass and added a few more comments. I think I made only minor comments in both passes. Of course, when I review one or more of the platform specific fixes, I may have more to say about this "Runtime" portion. src/hotspot/share/runtime/synchronizer.cpp line 1320: > 1318: inf->set_owner_from_anonymous(current); > 1319: size_t removed = JavaThread::cast(current)->lock_stack().remove(object); > 1320: inf->set_recursions(removed - 1); Hmmmm... so now I'm wondering how non-lightweight locking gets the recursions count correct? IIRC, with LM_LEGACY we count the BasicLocks on the stack in order to get a proper recursion count, but I could be wrong... I don't see anything in the inflation code that's updating the recursion count based on BasicLocks on the stack. Of course that would be impossible for a Thread-A to do safely why inflating an ObjectMonitor for a lock held by a Thread-B. ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16606#pullrequestreview-1745354175 PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1402720113 From dcubed at openjdk.org Wed Nov 22 21:36:12 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 22 Nov 2023 21:36:12 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v6] In-Reply-To: <25xmq9JFWlz_2L3NXWK8ghR5N2UlPE-uELjIZoyDvyg=.40c3316f-2580-4b2b-ad65-ec649e6a1c0a@github.com> References: <9qMIC_BQk5i5MmbQLovTmNsla_qMxlgCCZhyK8eHHSc=.c7d958c4-deaa-4264-a3f4-1907240d26d2@github.com> <25xmq9JFWlz_2L3NXWK8ghR5N2UlPE-uELjIZoyDvyg=.40c3316f-2580-4b2b-ad65-ec649e6a1c0a@github.com> Message-ID: On Wed, 22 Nov 2023 17:24:22 GMT, Daniel D. Daugherty wrote: >> Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Merge remote-tracking branch 'upstream_jdk/pr/16603' into JDK-8319797 >> - Merge remote-tracking branch 'upstream_jdk/pr/16603' into JDK-8319797 >> - Merge remote-tracking branch 'upstream_jdk/pr/16603' into JDK-8319797 >> - Fix nit >> - Fix comment typos >> - 8319797: Recursive lightweight locking: Runtime implementation > > src/hotspot/share/runtime/synchronizer.cpp line 498: > >> 496: p2i(monitor->owner()), p2i(current), monitor->object()->mark_acquire().value()); >> 497: assert(!lock_stack.is_full(), "must have made room here"); >> 498: } > > This is an interesting idea. I'll have to see how you test this code later on... Update: I didn't see an explicit test case for overflowing the lock stack. Do you plan to add one? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1402697309 From dcubed at openjdk.org Wed Nov 22 21:36:13 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 22 Nov 2023 21:36:13 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v6] In-Reply-To: References: <2dVPtwS-M9xk4yHIZcFr3y_d1xSgGFqkfW3ABZvvb8M=.529435cb-d62d-4a5d-a545-5ee446457e5d@github.com> Message-ID: On Thu, 16 Nov 2023 08:51:47 GMT, David Holmes wrote: >> First of let us note that when reaching this code the unstructured exit is the common case. The normal exit and recursive exit is usually handled in the emitted code (this includes the interpreter). We reach this because either a CAS failed somewhere due to a concurrent hashCode instalment, or the exit was unstructured. Inflated monitors exit just jumps passed this code (everything is conditioned on `mark.is_fast_locked()`). >> >> Is this motivated by the structure/layout of the C++ code. Or an optimisation? >> >> If it is motivated by the structure/layout. Then we can lay it out as you described. It would add some code duplication. >> >> If it is motivated as an optimisation then after the recursive exit fail, we should just call remove and act based on the return value. > > I would not go so far as to say the unstructured locking case is common. Sure we are on the slow-path, which may be due to unstructured locking, or we may be here through deop (also a slow path) or through the native method wrapper, or ... but yes this is not really performance critical. Perhaps changing this comment will help. Please consider: // This lock is recursive but is not at the top of the lock stack so we're // doing an unstructured exit. We have to fall thru to inflation below and // let ObjectMonitor::exit() do the unlock. This check on L570 has to be done before the block that begins on L572 because that block assumes that `object` is not a recursive lock and it will remove all instances of `object` from the lock stack on L578. That would unexpectedly unlock `object` too many times. Just as a side note: The block from L572 -> L584 also handles both a structured exit of a non-recursive lock AND an unstructured exit of a non-recursive lock. This is true because we use `remove()` on L578 instead of a `pop()` which assumes top-of-stack/structured locking. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1402712313 From shade at openjdk.org Wed Nov 22 21:46:09 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 22 Nov 2023 21:46:09 GMT Subject: RFR: 8318776: Require supports_cx8 to always be true [v5] In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 18:26:12 GMT, Daniel D. Daugherty wrote: >> src/hotspot/cpu/x86/vm_version_x86.cpp line 819: >> >>> 817: } >>> 818: >>> 819: _supports_cx8 = supports_cmpxchg8(); >> >> I think we should leave the runtime check here (under `ifndef`, like in ARM?). This covers the remaining case of running on legacy x86 without CX8 implemented: the init guarantee would then fire and prevent any other surprises at runtime. Sure, it would be hard to come up with such a platform today, but it would be safer to refuse to run there right away on the off-chance someone actually has it :) > > @shipilev - Do you have a particular legacy x86 in mind? My point is that it is such an easy thing to do: leave the "cx8" flag sensing code in, and keep setting up `_supports_cx8` based on it. This both provides more safety by failing cleanly on non-CX8 platform, and gives other platforms some guidance: if you can check something is supported, check it. But now that you nerd-sniped me into this... I think non-CX8 platforms would probably predate Pentium. The oldest real machine my lab has is Z530, which already has CX8. But it was easy to also go to my QEMU-driven build-test server, ask for `i486` as platform there, and et voila, no `cx8` in CPU flags: buildworker-debian12-32:~$ lscpu Architecture: i486 CPU op-mode(s): 32-bit Address sizes: 36 bits physical, 32 bits virtual Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Vendor ID: GenuineIntel Model name: 486 DX/4 CPU family: 4 Model: 8 Thread(s) per core: 4 Core(s) per socket: 1 Socket(s): 1 Stepping: 0 BogoMIPS: 5699.99 Flags: fpu vme pse apic ht cpuid tsc_known_freq x2apic hypervisor cpuid_fault And mainline JDK even starts there! (with interpreter, there are some asserts firing in compiler code, having to do with odd instruction selection on some paths): $ jdk/bin/java -Xint -version openjdk version "22-testing" 2024-03-19 OpenJDK Runtime Environment (fastdebug build 22-testing-builds.shipilev.net-openjdk-jdk-b627-20231121) OpenJDK Server VM (fastdebug build 22-testing-builds.shipilev.net-openjdk-jdk-b627-20231121, interpreted mode, sharing) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16625#discussion_r1402738580 From dcubed at openjdk.org Wed Nov 22 22:01:10 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 22 Nov 2023 22:01:10 GMT Subject: RFR: 8318776: Require supports_cx8 to always be true [v5] In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 21:41:50 GMT, Aleksey Shipilev wrote: >> @shipilev - Do you have a particular legacy x86 in mind? > > My point is that it is such an easy thing to do: leave the "cx8" flag sensing code in, and keep setting up `_supports_cx8` based on it for `!_LP64` paths. This both provides more safety by failing cleanly on non-CX8 platform, and gives other platforms some guidance: if you can check something is supported, check it. I think we are generally trying to fail cleanly on unsupported configs, if that is easy to achieve. > > But now that you nerd-sniped me into this... I think non-CX8 platforms would probably predate Pentium. The oldest real machine my lab has is Z530, which already has CX8. But it was easy to also go to my QEMU-driven build-test server, ask for `i486` as platform there, and et voila, no `cx8` in CPU flags: > > > buildworker-debian12-32:~$ lscpu > Architecture: i486 > CPU op-mode(s): 32-bit > Address sizes: 36 bits physical, 32 bits virtual > Byte Order: Little Endian > CPU(s): 4 > On-line CPU(s) list: 0-3 > Vendor ID: GenuineIntel > Model name: 486 DX/4 > CPU family: 4 > Model: 8 > Thread(s) per core: 4 > Core(s) per socket: 1 > Socket(s): 1 > Stepping: 0 > BogoMIPS: 5699.99 > Flags: fpu vme pse apic ht cpuid tsc_known_freq x2apic hypervisor cpuid_fault > > > And mainline JDK even starts there! (with interpreter, there are some asserts firing in compiler code, having to do with odd instruction selection on some paths): > > > $ jdk/bin/java -Xint -version > openjdk version "22-testing" 2024-03-19 > OpenJDK Runtime Environment (fastdebug build 22-testing-builds.shipilev.net-openjdk-jdk-b627-20231121) > OpenJDK Server VM (fastdebug build 22-testing-builds.shipilev.net-openjdk-jdk-b627-20231121, interpreted mode, sharing) Nice spelunking... I was wondering if it was something that old. I wasn't trying to nerd-snipe... I was in the dev lab at Intel when Xenix on the i386 first came up and sent its "Hello World!" email... I left Intel for Sun in 1987 while i486 was still in development, but I still had periodic lunches with folks that worked on those teams. Life was simpler back then... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16625#discussion_r1402748121 From dholmes at openjdk.org Wed Nov 22 22:09:05 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 22 Nov 2023 22:09:05 GMT Subject: RFR: 8320582: Zero: Misplaced CX8 enablement flag In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 10:33:24 GMT, Aleksey Shipilev wrote: > When doing [JDK-8319777](https://bugs.openjdk.org/browse/JDK-8319777), I misplaced the `_supports_cx8 = true` flag setting in the method that is only called when CPU features are polled from perf counter code. We need to move the check to a proper place. [JDK-8318776](https://github.com/openjdk/jdk/pull/16625/files) would catch fire without this. > > Additional testing (redoing JDK-8319777 testing): > - [x] Linux arm Zero fastdebug now builds fine with JDK-8318776 fix > - [x] Linux x86_32 Zero release; jcstress > - [x] Linux x86_32 Zero fastdebug, `compiler/unsafe java/lang/invoke/VarHandles` > - [x] Linux x86_32 Zero fastdebug, bootcycle-images Yes trivial. Sorry bad timing ------------- PR Comment: https://git.openjdk.org/jdk/pull/16779#issuecomment-1823567173 From jiangli at openjdk.org Wed Nov 22 22:18:23 2023 From: jiangli at openjdk.org (Jiangli Zhou) Date: Wed, 22 Nov 2023 22:18:23 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is create for one JavaThread associated with attached native thread [v4] In-Reply-To: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: > Please review JvmtiThreadState::state_for_while_locked change to handle the state->get_thread_oop() null case. Please see https://bugs.openjdk.org/browse/JDK-8319935 for details. Jiangli Zhou has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into JDK-8319935 - Add a check for a thread is_attaching_via_jni, based on David Holmes' comment. - Don't try to setup_jvmti_thread_state for obj allocation sampling if the current thread is attaching from native and is allocating the thread oop. That's to make sure we don't create a 'partial' JvmtiThreadState. - 8319935: Ensure only one JvmtiThreadState is create for one JavaThread associated with attached native thread ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16642/files - new: https://git.openjdk.org/jdk/pull/16642/files/7c0214e2..de7fac6d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16642&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16642&range=02-03 Stats: 30285 lines in 831 files changed: 17825 ins; 7190 del; 5270 mod Patch: https://git.openjdk.org/jdk/pull/16642.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16642/head:pull/16642 PR: https://git.openjdk.org/jdk/pull/16642 From jiangli at openjdk.org Wed Nov 22 22:40:20 2023 From: jiangli at openjdk.org (Jiangli Zhou) Date: Wed, 22 Nov 2023 22:40:20 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is create for one JavaThread associated with attached native thread [v5] In-Reply-To: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: > Please review JvmtiThreadState::state_for_while_locked change to handle the state->get_thread_oop() null case. Please see https://bugs.openjdk.org/browse/JDK-8319935 for details. Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: Address Serguei Spitsyn's comments/suggestions: - Remove the redundant thread->is_Java_thread() check from JvmtiSampledObjectAllocEventCollector::object_alloc_is_safe_to_sample(). - Change the assert in JvmtiThreadState::state_for_while_locked to avoid #ifdef ASSERT. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16642/files - new: https://git.openjdk.org/jdk/pull/16642/files/de7fac6d..7c366df0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16642&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16642&range=03-04 Stats: 12 lines in 2 files changed: 0 ins; 5 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/16642.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16642/head:pull/16642 PR: https://git.openjdk.org/jdk/pull/16642 From jiangli at openjdk.org Wed Nov 22 22:40:22 2023 From: jiangli at openjdk.org (Jiangli Zhou) Date: Wed, 22 Nov 2023 22:40:22 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is create for one JavaThread associated with attached native thread [v3] In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: <-_KrlwZh4w1qg_mnKcb4OZYNuLf3syK0nrPhZhTXF9I=.1bae1a31-defa-4eac-857f-ae9d28b16b38@github.com> On Wed, 22 Nov 2023 02:53:23 GMT, David Holmes wrote: >> src/hotspot/share/prims/jvmtiExport.cpp line 3144: >> >>> 3142: // If the current thread is attaching from native and its thread oop is being >>> 3143: // allocated, things are not ready for allocation sampling. >>> 3144: if (thread->is_Java_thread()) { >> >> Nit: There is no need for this check at line 3144. >> There was already check for `!thread->is_Java_thread()` and return with false at line 3138: >> >> if (!thread->is_Java_thread() || thread->is_Compiler_thread()) { >> return false; >> } > > +1 Indeed, removed. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16642#discussion_r1402770901 From jiangli at openjdk.org Wed Nov 22 22:40:25 2023 From: jiangli at openjdk.org (Jiangli Zhou) Date: Wed, 22 Nov 2023 22:40:25 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is create for one JavaThread associated with attached native thread [v3] In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Tue, 21 Nov 2023 23:32:13 GMT, Serguei Spitsyn wrote: >> Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: >> >> Add a check for a thread is_attaching_via_jni, based on David Holmes' comment. > > src/hotspot/share/prims/jvmtiThreadState.inline.hpp line 100: > >> 98: assert(state->get_thread_oop() != nullptr, "incomplete state"); >> 99: } >> 100: #endif > > Nit: I would suggest to write this assert in the form: > > // Make sure we don't see an incomplete state. An incomplete state can cause > // a duplicate JvmtiThreadState being created below and bound to the 'thread' > // incorrectly, which leads to stale JavaThread* from the JvmtiThreadState > // after the thread exits. > assert(state == nullptr || state->get_thread_oop() != nullptr, "incomplete state"); > > The `#ifdef ASSERT` and `#endif` are not needed then. Changed as suggested. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16642#discussion_r1402771379 From jiangli at openjdk.org Wed Nov 22 22:51:07 2023 From: jiangli at openjdk.org (Jiangli Zhou) Date: Wed, 22 Nov 2023 22:51:07 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is created for one JavaThread associated with attached native thread [v2] In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Fri, 17 Nov 2023 02:51:03 GMT, Jiangli Zhou wrote: >> Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: >> >> Don't try to setup_jvmti_thread_state for obj allocation sampling if the current thread is attaching from native and is allocating the thread oop. That's to make sure we don't create a 'partial' JvmtiThreadState. > >> Thanks. The latest change to `JvmtiSampledObjectAllocEventCollector::object_alloc_is_safe_to_sample()` looks OK to me. Skipping a few allocations for JVMTI allocation sampler is better than resulting in a problematic `JvmtiThreadState` instance. >> >> My main question is if we can now change `if (state == nullptr || state->get_thread_oop() != thread_oop) ` to `if (state == nullptr)` in `JvmtiThreadState::state_for_while_locked()`. I suspect we would never run into a case of `state != nullptr && state->get_thread_oop() != thread_oop` with the latest change, even with virtual threads. This is backed up by testing with [00ace66](https://github.com/openjdk/jdk/commit/00ace66c36243671a0fb1b673b3f9845460c6d22) not triggering any failure. >> >> If we run into such as a case, it could still be problematic as `JvmtiThreadState::state_for_while_locked()` would allocate a new `JvmtiThreadState` instance pointing to the same JavaThread, and it does not delete the existing instance. >> >> Could anyone with deep knowledge on JvmtiThreadState and virtual threads provide some feedback on this change and https://bugs.openjdk.org/browse/JDK-8319935? @AlanBateman, do you know who would be the best reviewer for this? > > @caoman and I discussed about his suggestion on changing `if (state == nullptr || state->get_thread_oop() != thread_oop)` check in person today. Since it may affect vthread, my main concern is that our current testing may not cover that sufficiently. The suggestion could be worked by a separate enhancement bug. > > > @jianglizhou - I fixed a typo in the bug's synopsis line. Change this PR's title: s/is create/is created/ > > > Thanks, @dcubed-ojdk! > > Now, the PR title needs to be fixed accordingly. Done, thanks for the reminder! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16642#issuecomment-1823599300 From jjoo at openjdk.org Wed Nov 22 23:08:36 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Wed, 22 Nov 2023 23:08:36 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v47] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Cleanup and address comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/4ca30f32..fcc7e471 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=46 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=45-46 Stats: 19 lines in 10 files changed: 0 ins; 13 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From jjoo at openjdk.org Wed Nov 22 23:08:38 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Wed, 22 Nov 2023 23:08:38 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v46] In-Reply-To: <4S_iyhdkwxpMar7tdNxHobR6vaRcqKcikQQrrcNBwX0=.29697f18-b7cc-44a9-8900-90f7a3a1e780@github.com> References: <4S_iyhdkwxpMar7tdNxHobR6vaRcqKcikQQrrcNBwX0=.29697f18-b7cc-44a9-8900-90f7a3a1e780@github.com> Message-ID: On Tue, 21 Nov 2023 21:42:39 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with two additional commits since the last revision: > > - Update memory tracking type for CPUTimeCounters > - Fix assertion logic Addressed cleanups, PR should be RFR! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15082#issuecomment-1823609497 From jiangli at openjdk.org Thu Nov 23 00:02:08 2023 From: jiangli at openjdk.org (Jiangli Zhou) Date: Thu, 23 Nov 2023 00:02:08 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is created for one JavaThread associated with attached native thread [v3] In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Wed, 22 Nov 2023 01:24:46 GMT, Serguei Spitsyn wrote: > Thank you for filing and fixing this issue! I'm kind of late here. Sorry for that. Is it hard to create a JTreg test for an attaching native thread? I can help if you have a standalone prototype. You can look for some examples in the folder: `test/hotspot/jtreg/serviceability/jvmti/vthread`. Hi @sspitsyn we don't have an extracted standalone test case (yet) to demonstrate the crashes. The crashes could not reproduce consistently. Outside the debugger (lldb), I ran the test (one of the affected ones) 10 times/per-iteration in order to reproduce. I found the crashes could be affected by both timing and memory layout. During the investigation, I noticed the problem became hidden when I increased allocation size for ThreadsList::_threads (as one of the experiments that I did, I wanted to mprotect the memory to be read-only in order to find who trashed the memory, so was trying to allocate memory up to page boundary). That's the reason why I added noreg-hard tag earlier. I gave some more thoughts today. Perhaps, we could write a whitebox test to check the JvmtiThreadState, without being able to consistently trigger crashes. We could add a WhiteBox api to iterate the JvmtiThreadState list and validate if all the JavaThread pointers were valid after detaching. The test would need to create native threads to attach and detach before the check. That could more reliably test the 1-1 mapping of JvmtiThreadState and JavaThread. What do you think? Thanks for volunteering to help with the test. I created https://bugs.openjdk.org/browse/JDK-8320614 today. Should I assign it to you? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16642#issuecomment-1823672341 From duke at openjdk.org Thu Nov 23 00:03:30 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Thu, 23 Nov 2023 00:03:30 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v3] In-Reply-To: References: Message-ID: > Splitting vblendvp[sd] into boolean operations is bit faster on ECore, get up to 30% gain > > > =============== BEFORE =============== > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 77.766 ? 0.049 ns/op > VectorSignum.floatSignum 512 avgt 3 154.889 ? 0.242 ns/op > VectorSignum.floatSignum 1024 avgt 3 306.130 ? 0.605 ns/op > VectorSignum.floatSignum 2048 avgt 3 609.965 ? 0.927 ns/op > VectorSignum.doubleSignum 256 avgt 3 151.874 ? 1.748 ns/op > VectorSignum.doubleSignum 512 avgt 3 303.080 ? 0.310 ns/op > VectorSignum.doubleSignum 1024 avgt 3 607.517 ? 0.597 ns/op > VectorSignum.doubleSignum 2048 avgt 3 1214.282 ? 1.834 ns/op > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op > MaxMinOptimizeTest.dMax avgt 3 137.334 ? 0.128 us/op > MaxMinOptimizeTest.dMin avgt 3 137.160 ? 0.465 us/op > MaxMinOptimizeTest.dMul avgt 3 77.231 ? 0.051 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.165 ? 0.003 us/op > MaxMinOptimizeTest.fMax avgt 3 107.428 ? 1.501 us/op > MaxMinOptimizeTest.fMin avgt 3 107.186 ? 0.022 us/op > MaxMinOptimizeTest.fMul avgt 3 77.164 ? 0.012 us/op > > =============== AFTER =============== > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 61.816 ? 1.980 ns/op > VectorSignum.floatSignum 512 avgt 3 117.251 ? 0.052 ns/op > VectorSignum.floatSignum 1024 avgt 3 231.356 ? 0.397 ns/op > VectorSignum.floatSignum 2048 avgt 3 458.904 ? 0.774 ns/op > VectorSignum.doubleSignum 256 avgt 3 121.449 ? 0.184 ns/op > VectorSignum.doubleSignum 512 avgt 3 241.662 ? 0.189 ns/op > VectorSignum.doubleSignum 1024 avgt 3 482.365 ? 0.165 ns/op > VectorSignum.doubleSignum 2048 avgt 3 962.412 ? 1.401 ns/op > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op > MaxMinOptimizeTest.dMax avgt 3 125.701 ? 0.082 us/op > MaxMinOptimizeTest.dMin avgt 3 124.704 ? 0.119 us/op > MaxMinOptimizeTest.dMul avgt 3 77.232 ? 0.028 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.169 ? 0.103 us/op > MaxMinOptimizeTest.fMax avgt 3 97.939 ? 0.477 us/op > MaxMinOptimizeTest.fMin avgt 3 98.012 ? 0.154 us/op > MaxMinOptimizeTest.fMul avgt 3 77.174 ? 0.012 us/op Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - further review comments - Merge remote-tracking branch 'jdk/master' into vp-ecore2 - Merge remote-tracking branch 'jdk/master' into vp-ecore2 - review comments - emulate vblend on ecores ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16716/files - new: https://git.openjdk.org/jdk/pull/16716/files/74c68fe6..8d8d0d45 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16716&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16716&range=01-02 Stats: 8461 lines in 347 files changed: 6354 ins; 1279 del; 828 mod Patch: https://git.openjdk.org/jdk/pull/16716.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16716/head:pull/16716 PR: https://git.openjdk.org/jdk/pull/16716 From duke at openjdk.org Thu Nov 23 00:09:20 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Thu, 23 Nov 2023 00:09:20 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v4] In-Reply-To: References: Message-ID: <0BMdsgH9_y28twT_UkpN8gOZROSgONZC5FvPj50d5M4=.19526c86-2572-409d-91ad-9035dd792de8@github.com> > Splitting vblendvp[sd] into boolean operations is bit faster on ECore, get up to 30% gain > > > =============== BEFORE =============== > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 77.766 ? 0.049 ns/op > VectorSignum.floatSignum 512 avgt 3 154.889 ? 0.242 ns/op > VectorSignum.floatSignum 1024 avgt 3 306.130 ? 0.605 ns/op > VectorSignum.floatSignum 2048 avgt 3 609.965 ? 0.927 ns/op > VectorSignum.doubleSignum 256 avgt 3 151.874 ? 1.748 ns/op > VectorSignum.doubleSignum 512 avgt 3 303.080 ? 0.310 ns/op > VectorSignum.doubleSignum 1024 avgt 3 607.517 ? 0.597 ns/op > VectorSignum.doubleSignum 2048 avgt 3 1214.282 ? 1.834 ns/op > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op > MaxMinOptimizeTest.dMax avgt 3 137.334 ? 0.128 us/op > MaxMinOptimizeTest.dMin avgt 3 137.160 ? 0.465 us/op > MaxMinOptimizeTest.dMul avgt 3 77.231 ? 0.051 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.165 ? 0.003 us/op > MaxMinOptimizeTest.fMax avgt 3 107.428 ? 1.501 us/op > MaxMinOptimizeTest.fMin avgt 3 107.186 ? 0.022 us/op > MaxMinOptimizeTest.fMul avgt 3 77.164 ? 0.012 us/op > > =============== AFTER =============== > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 61.816 ? 1.980 ns/op > VectorSignum.floatSignum 512 avgt 3 117.251 ? 0.052 ns/op > VectorSignum.floatSignum 1024 avgt 3 231.356 ? 0.397 ns/op > VectorSignum.floatSignum 2048 avgt 3 458.904 ? 0.774 ns/op > VectorSignum.doubleSignum 256 avgt 3 121.449 ? 0.184 ns/op > VectorSignum.doubleSignum 512 avgt 3 241.662 ? 0.189 ns/op > VectorSignum.doubleSignum 1024 avgt 3 482.365 ? 0.165 ns/op > VectorSignum.doubleSignum 2048 avgt 3 962.412 ? 1.401 ns/op > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op > MaxMinOptimizeTest.dMax avgt 3 125.701 ? 0.082 us/op > MaxMinOptimizeTest.dMin avgt 3 124.704 ? 0.119 us/op > MaxMinOptimizeTest.dMul avgt 3 77.232 ? 0.028 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.169 ? 0.103 us/op > MaxMinOptimizeTest.fMax avgt 3 97.939 ? 0.477 us/op > MaxMinOptimizeTest.fMin avgt 3 98.012 ? 0.154 us/op > MaxMinOptimizeTest.fMul avgt 3 77.174 ? 0.012 us/op Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: remove whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16716/files - new: https://git.openjdk.org/jdk/pull/16716/files/8d8d0d45..707bea50 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16716&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16716&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16716.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16716/head:pull/16716 PR: https://git.openjdk.org/jdk/pull/16716 From duke at openjdk.org Thu Nov 23 00:19:07 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Thu, 23 Nov 2023 00:19:07 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v2] In-Reply-To: References: Message-ID: On Tue, 21 Nov 2023 17:48:30 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 3577: >> >>> 3575: if (EnableX86ECoreOpts && scratch_available && dst_available) { >>> 3576: XMMRegister full_mask = mask; >>> 3577: if (!fully_masked) { >> >> name change suggestion for better understanding. fully_masked -> compute_mask > > We can also remove full_mask register and directly update mask if compute mask is true. > name change suggestion for better understanding. fully_masked -> compute_mask Done. Also had to flip the boolean to default true and update all the call-sites (I like the new name, but don't like that the default is true..) > We can also remove full_mask register and directly update mask if compute mask is true. Good point, fixed (I keep on thinking of the parameters as pass-by-reference, don't touch... but its a copy.. lots of copy-construction..) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1402816131 From duke at openjdk.org Thu Nov 23 00:19:14 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Thu, 23 Nov 2023 00:19:14 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v2] In-Reply-To: References: Message-ID: On Tue, 21 Nov 2023 18:12:15 GMT, Jatin Bhateja wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge remote-tracking branch 'jdk/master' into vp-ecore2 >> - review comments >> - emulate vblend on ecores > > src/hotspot/cpu/x86/x86_64.ad line 4519: > >> 4517: __ vcmpps($btmp$$XMMRegister, $atmp$$XMMRegister, $atmp$$XMMRegister, Assembler::_false, vector_len); >> 4518: __ vblendvps($dst$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, vector_len, true, $btmp$$XMMRegister); >> 4519: } > > Please move into a new macro assembly routine. moved to existing routine > src/hotspot/cpu/x86/x86_64.ad line 4568: > >> 4566: __ vcmppd($btmp$$XMMRegister, $atmp$$XMMRegister, $atmp$$XMMRegister, Assembler::_false, vector_len); >> 4567: __ vblendvpd($dst$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, vector_len, true, $btmp$$XMMRegister); >> 4568: } > > Please move to a new macro assembly routine. moved to existing routine > src/hotspot/cpu/x86/x86_64.ad line 4645: > >> 4643: "vcmppd.unordered $btmp,$atmp,$atmp \n\t" >> 4644: "vblendvpd $dst,$tmp,$atmp,$btmp \n\t" >> 4645: %} > > Format block may not be valid for e-cores, you can replace it with following to be consistent on both the cores. > ` minD $dst, $a, $b \t! using %tmp, %atmp and %btmp as TEMP ` Done > src/hotspot/cpu/x86/x86_64.ad line 4665: > >> 4663: __ vcmppd($btmp$$XMMRegister, $atmp$$XMMRegister, $atmp$$XMMRegister, Assembler::_false, vector_len); >> 4664: __ vblendvpd($dst$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, vector_len, true, $btmp$$XMMRegister); >> 4665: } > > Please move this logic into a new macro assembly routine. of-course, should had used the other c2_MarcoAssembler_x86 routine to begin with! thanks, fixed: __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1402816984 PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1402816912 PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1402816036 PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1402816017 From dholmes at openjdk.org Thu Nov 23 00:51:04 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 23 Nov 2023 00:51:04 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 15:00:29 GMT, Stefan Karlsson wrote: > Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code I had not realized that. It explains some confusion in a separate issue I had been looking into! It is important that these monitors are exposed and unlocked at detach time, otherwise it also messes up the `held_monitor_count`. > The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure I think we may need to make that code tolerate the absence of an object. > For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. I think we probably should expose this to be accurate, but I think this needs investigation on the JVMTI side to ensure that the null entry is tolerated okay. So a separate RFE to handle this would be fine. Thanks ------------- PR Comment: https://git.openjdk.org/jdk/pull/16783#issuecomment-1823702358 From xgong at openjdk.org Thu Nov 23 01:31:05 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 23 Nov 2023 01:31:05 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: References: Message-ID: On Tue, 21 Nov 2023 14:12:13 GMT, Magnus Ihse Bursie wrote: >> This is just used to print the result of `AC_MSG_CEHCKING[if ARM SVE feature is supported]` in configure. > > Ah, now I se what you are trying to do here. First of all, in the detection part, only set `SVE_FEATURE_SUPPORT`. Then you can handle the `SVE_CFLAGS` addition elsewhere/later. > > Secondly, you should not mix these `SVE_CFLAGS` with the spleef C flags. Keeping them separate will allow for LIBSLEEF_CFLAGS to be named just that. > > Thirdly, I do not like at all how you just come crashing in setting `-march` like that. The `-march` flag is handled by `FLAGS_SETUP_ABI_PROFILE`. > > Actually, now that I think of it, this is just completely wrong! You are checking on features on the build machine, to determine what target machine code to generate, with no way to override. > > You need to break out the -march handling separately. It should be moved to FLAGS_SETUP_ABI_PROFILE. I'm guessing you will need to make something like a `aarch64-sve` profile, and possibly try to auto-select it based on the result of the sve test program above. But changing `OPENJDK_TARGET_ABI_PROFILE` can have further consequences; I do not know the full extent on the top of my head. Thanks for the advice! I will take a consideration for it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1402843737 From xgong at openjdk.org Thu Nov 23 01:44:08 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 23 Nov 2023 01:44:08 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: References: Message-ID: <_CHm262chkVi3EMvai4A5T-dal0pdCySL8aF0kXj_uU=.9d49baad-9de9-45e0-915b-9525feb8d610@github.com> On Tue, 21 Nov 2023 14:13:19 GMT, Magnus Ihse Bursie wrote: >> Yes, it seems weird. But the library we want to built out is `libvmath.so` instead of `libsleef.so`. And we not only check the sleef library, but also the ARM SVE feature inside it. So using `VMATH` suffix is more reasonable to me. WDYT? > > As I said above, you should not mix the two together. Keep the library handling for libsleef. Move the march setting to where it belongs. And rename the files, functions and variables after this. OK, I see. It makes sense that the suffix name should be choosed mainly based on the real module name that is searched/checked in configure. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1402847627 From dholmes at openjdk.org Thu Nov 23 02:43:15 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 23 Nov 2023 02:43:15 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 15:00:29 GMT, Stefan Karlsson wrote: > In the rewrites made for: > [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` > > I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. > > The provided tests provoke this assert form: > * the JNI thread detach code > * thread dumping with locked monitors, and > * the JVMTI GetOwnedMonitorInfo API. > > While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. > > The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. > > For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. > > Test: the written tests with and without the fix. Tier1-Tier3, so far. Functional fix is simple and fine but quite a lot of commentary on the tests. Thanks src/hotspot/share/runtime/vmOperations.cpp line 354: > 352: // alive. Filter out monitors with dead objects. > 353: return; > 354: } I don't think we need to do this, but even without this filtering I ran a number of tests and was unable to demonstrate any problem. The JNI locked monitor seems to be "invisible" to the frame that locked it and so the thread dump never encounters it. Were you able to provoke a failure here or is this defensive programming? test/hotspot/jtreg/runtime/Monitor/IterateMonitorWithDeadObjectTest.java line 28: > 26: * @test IterateMonitorWithDeadObjectTest > 27: * @summary This locks a monitor, GCs the object, and iterate and perform > 28: * various iteration and operations over this monitor. This doesn't read right with "iterate" and "iteration". Not sure exactly what you were trying to say. test/hotspot/jtreg/runtime/Monitor/IterateMonitorWithDeadObjectTest.java line 29: > 27: * @summary This locks a monitor, GCs the object, and iterate and perform > 28: * various iteration and operations over this monitor. > 29: * @requires os.family == "linux" I know the test this was copied from had this but I'm not sure it is actually a necessary restriction - any Posix platform should work. Though maybe perror is linux only ... test/hotspot/jtreg/runtime/Monitor/IterateMonitorWithDeadObjectTest.java line 31: > 29: * @requires os.family == "linux" > 30: * @library /testlibrary /test/lib > 31: * @build IterateMonitorWithDeadObjectTest You don't need an explicit `@build` step test/hotspot/jtreg/runtime/Monitor/IterateMonitorWithDeadObjectTest.java line 40: > 38: public class IterateMonitorWithDeadObjectTest { > 39: public static native void runTestAndDetachThread(); > 40: public static native void joinTestThread(); I don't think this form of the test needs to separate out the `pthread_join()`, it can just be done in `runTestAndDetachThread` AFAICS. I originally split it out to allow the Java code to do the GC while the native thread was sleeping prior to detaching. test/hotspot/jtreg/runtime/Monitor/IterateMonitorWithDeadObjectTest.java line 57: > 55: // - Drop the last reference to the object > 56: // - GC to clear the weak reference to the object in the monitor > 57: // - Detach the thread - provoke previous bug It also does a thread dump while the lock is held test/hotspot/jtreg/runtime/Monitor/IterateMonitorWithDeadObjectTest.java line 66: > 64: // dead object. The thread dumping code didn't tolerate such a monitor, > 65: // so run a thread dump and make sure that it doesn't crash/assert. > 66: dumpThreadsWithLockedMonitors(); But you've already detached the thread so there is no locked monitor any longer. And `runTestAndDetachThread()` also did a thread dump. test/hotspot/jtreg/runtime/Monitor/libIterateMonitorWithDeadObjectTest.c line 43: > 41: static jobject create_object(JNIEnv* env) { > 42: jclass clazz = (*env)->FindClass(env, "java/lang/Object"); > 43: if (clazz == 0) die("No class"); The `die` method is for errors with system calls. It won't show useful information for JNI calls that leave exceptions pending. test/hotspot/jtreg/runtime/Monitor/libIterateMonitorWithDeadObjectTest.c line 76: > 74: if (dumpAllThreadsMethod == 0) die("No dumpAllThreads method"); > 75: > 76: // The 'lockedMonitors == true' is what triggers the collection of the monitor with the dead object. "triggers the collection" sounds like a GC interaction but that is not what you mean. Suggestion: // The 'lockedMonitors == true' is what causes the monitor with a dead object to be examined. test/hotspot/jtreg/runtime/Monitor/libIterateMonitorWithDeadObjectTest.c line 94: > 92: > 93: // Let the GC clear the weak reference to the object. > 94: system_gc(env); AFAIK there is no guarantee that one call to `System.gc()` will suffice to clear the weakRef. We tend use a loop with a few iterations in other tests, or use a WhiteBox method to achieve it. In my testing I used the finalizer to observe that the objects had been finalized but even then, and with a loop, I did not always see them collected with G1. test/hotspot/jtreg/runtime/Monitor/libIterateMonitorWithDeadObjectTest.c line 104: > 102: // source of at least two bugs: > 103: // - When the object reference in the monitor was made weak, the code > 104: // didn't unlock the monitor, leaving it lingering in the system. Suggestion: // - When the object reference in the monitor was cleared, the monitor // iterator code would skip it, preventing it from being unlocked when // the owner thread detached, leaving it lingering in the system. the original made it sound to me like the code that cleared the reference (i.e. the GC) was expected to do the unlocking. test/hotspot/jtreg/runtime/Monitor/libIterateMonitorWithDeadObjectTest.c line 107: > 105: // - When the monitor iterator API was rewritten the code was changed to > 106: // assert that we didn't have "owned" monitors with dead objects. This > 107: // test provokes that situation and those asserts. nit: s/those asserts/that assert/ test/hotspot/jtreg/serviceability/jvmti/GetOwnedMonitorInfo/GetOwnedMonitorInfoTest.java line 54: > 52: > 53: private static void jniMonitorEnterAndLetObjectDie() { > 54: // The monitor iterator used GetOwnedMonitorInfo used to s/iterator used/iterator used by/ test/hotspot/jtreg/serviceability/jvmti/GetOwnedMonitorInfo/GetOwnedMonitorInfoTest.java line 59: > 57: // GetOwnedMonitorInfo testing. > 58: Object obj = new Object() { public String toString() {return "";} }; > 59: jniMonitorEnter(obj); I would add a check for `Thread.holdsLock(obj);` after this just to be sure it worked. test/hotspot/jtreg/serviceability/jvmti/GetOwnedMonitorInfo/GetOwnedMonitorInfoTest.java line 61: > 59: jniMonitorEnter(obj); > 60: obj = null; > 61: System.gc(); Again one gc() is generally not sufficient. How can this test tell that the object in the monitor was actually cleared? I think `monitorinflation` logging may be the only way to tell. ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16783#pullrequestreview-1745590548 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402843318 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402843678 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402843923 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402844118 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402846787 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402846942 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402845852 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402857972 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402859246 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402847898 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402848741 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402848946 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402851705 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402857110 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1402852983 From vlivanov at openjdk.org Thu Nov 23 03:05:14 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 23 Nov 2023 03:05:14 GMT Subject: RFR: 8267532: C2: Profile and prune untaken exception handlers [v9] In-Reply-To: <0A-M6LwxHmiYUfunlz_qgeFiPJoWcmzElMOD6RtxWmc=.da64f93c-9db9-4d19-aaa2-c204857f3595@github.com> References: <0A-M6LwxHmiYUfunlz_qgeFiPJoWcmzElMOD6RtxWmc=.da64f93c-9db9-4d19-aaa2-c204857f3595@github.com> Message-ID: On Fri, 17 Nov 2023 13:41:58 GMT, Jorn Vernee wrote: >> The issue is essentially that for the Java try-with-resource construct, javac generates multiple calls to `close(`) the resource. One of those calls is inside the hidden exception handler of the try block. The issue for us is that typically the exception handler is never entered (since no exception is thrown), however we don't profile exception handlers at the moment, so the block is not pruned. C2 doesn't inline the `close()` call in the handler due to low call site frequency. As a result, the receiver of that call escapes and can not be scalar replaced, which then leads to a loss in performance. >> >> There has been some discussion on the JBS issue that this could be fixed by profiling catch blocks. And another suggestion that partial escape analysis could help here to prevent the object from escaping. But, I think there are other benefits to being able to prune dead catch blocks, such as general reduction in code size, and other optimizations being possible by dead code being eliminated. So, I've implemented catch block profiling + pruning in this patch. >> >> The implementation is essentially very straightforward: we allocate an extra bit of profiling data for each >> exception handler of a method in the `MethodData` for that method (which holds all the profiling >> data). Then when looking up the exception handler after an exception is thrown, we mark the >> exception handler as entered. When C2 parses the exception handler block, and it sees that it has >> never been entered, we emit an uncommon trap instead. >> >> I've also cleaned up the handling of profiling data sections a bit. After adding the extra section of data to MethodData, I was seeing several crashes when ciMethodData was used. The underlying issue seemed to be that the offset of the parameter data was computed based on the total data size - parameter data size (which doesn't work if we add an additional section for exception handler data). I've re-written the code around this a bit to try and prevent issues in the future. Both MethodData and ciMethodData now track offsets of parameter data and exception handler data, and the size of the each data section is derived from the offsets. >> >> Finally, there was an assert firing in `freeze_internal` in `continuationFreezeThaw.cpp`: >> >> assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), >> "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int... > > Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: > > - fix linux compile > - Revert "add too_many_traps check" > > This reverts commit bee05534777dc2caf10362f66fea90a06705a144. Overall, looks very good. I have been thinking about the following choices made in this PR: * amount of profiling data: binary (seen vs not seen) vs integral (branch count) * deoptimization action: `reinterpret` vs `made_not_entrant` * place where uncommon trap is inserted (`Parse` vs `ciTypeFlow`) I haven't come with strong arguments to change any of these choices, so I'm the patch as it is now. We can adjust them later as follow-up enhancements if we decide to do so. On naming: `ex_handler` is used only once - `GraphKit::has_ex_handler()`. Everywhere else in the code base `exception_handler` is used. Please, align the naming. Feel free to adjust `GraphKit::has_ex_handler()`. The tests are very nice! Can you, please, point me to the test case which covers profiling in interpreter? ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16416#pullrequestreview-1745639528 From dholmes at openjdk.org Thu Nov 23 03:14:27 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 23 Nov 2023 03:14:27 GMT Subject: RFR: 8318776: Require supports_cx8 to always be true [v6] In-Reply-To: References: Message-ID: > As discussed in JBS all platforms (some tweaks to Zero are in progress) actually do support `cx8` i.e. 64-bit compare-and-exchange, so we can strip out the locked-based alternatives to using it and just add a guarantee that it is true at runtime. And all platforms except some ARM variants set `SUPPORTS_NATIVE_CX8`, so we can greatly simplify things. Summary of changes: > - `_supports_cx8` field is only needed when `SUPPORTS_NATIVE_CX8` is not defined > - Assertions for `supports_cx8()` are removed > - Compiler predicates requiring `supports_cx8()` are removed > - Access backend is greatly simplified without the need for lock-based alternative > - `java.util.concurrent.AtomicLongFieldUpdater` is simplified without the need for a lock-based alternative > > I did consider moving all the ARM `kuser_helper` related code to be only defined when `SUPPORTS_NATIVE_CX8` is not defined, but there was a theoretical risk this could change the behaviour if ARMv7 binaries were run on other ARM CPU's. I added a note to that effect in vm_version_linux_arm32.cpp so the ARM port maintainers could clean this up further if desired. > > Testing: > - All Oracle tiers 1-5 builds (which includes an ARMv7 build) > - GHA builds/tests > - Oracle tiers 1-3 sanity testing > > Zero changes coming in via [JDK-8319777](https://bugs.openjdk.org/browse/JDK-8319777) will be merged when they arrive. > > Thanks. David Holmes has updated the pull request incrementally with one additional commit since the last revision: Fix typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16625/files - new: https://git.openjdk.org/jdk/pull/16625/files/aad0a4c4..2393b9d4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16625&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16625&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16625.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16625/head:pull/16625 PR: https://git.openjdk.org/jdk/pull/16625 From dholmes at openjdk.org Thu Nov 23 03:14:32 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 23 Nov 2023 03:14:32 GMT Subject: RFR: 8318776: Require supports_cx8 to always be true [v5] In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 15:50:17 GMT, Doug Lea
wrote: >> David Holmes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Merge with master and update Zero code accordingly >> - Merge branch 'master' into 8318776-supports_cx8 >> - Remove unnecessary includes of vm_version.hpp. >> Fix copyright years. >> - Remove cx8 comment as no longer relevant (the spinlock is used regardless of cx8) >> - Remove suports_cx8() checks from gtest >> - Remove test for VMSupportsCX8 >> - 8318776: Require supports_cx8 to always be true > > The deletion of backup code and the check for it in java.util.concurrent.AtomicLongFieldUpdater are clearly OK. We always thought the need for it was transient. Thanks for looking at this @DougLea ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16625#issuecomment-1823770525 From dholmes at openjdk.org Thu Nov 23 03:14:34 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 23 Nov 2023 03:14:34 GMT Subject: RFR: 8318776: Require supports_cx8 to always be true [v5] In-Reply-To: <6TkAoQRK_49MtT6wjb_JAwFzUATAvAx_rX9yxMa9Vfs=.7d1d5bf6-f088-44df-9d01-cc336bdeecaa@github.com> References: <6TkAoQRK_49MtT6wjb_JAwFzUATAvAx_rX9yxMa9Vfs=.7d1d5bf6-f088-44df-9d01-cc336bdeecaa@github.com> Message-ID: On Wed, 22 Nov 2023 18:35:59 GMT, Daniel D. Daugherty wrote: > Wow! This PR is much larger than I expected. > > Thumbs up! Thanks for the Review Dan! Yes lots of code deletion engineering in this one - and even better I got to delete template code with meta-programming stuff! :D ------------- PR Comment: https://git.openjdk.org/jdk/pull/16625#issuecomment-1823771180 From dholmes at openjdk.org Thu Nov 23 03:14:36 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 23 Nov 2023 03:14:36 GMT Subject: RFR: 8318776: Require supports_cx8 to always be true [v6] In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 21:57:57 GMT, Daniel D. Daugherty wrote: >> My point is that it is such an easy thing to do: leave the "cx8" flag sensing code in, and keep setting up `_supports_cx8` based on it for `!_LP64` paths. This both provides more safety by failing cleanly on non-CX8 platform, and gives other platforms some guidance: if you can check something is supported, check it. I think we are generally trying to fail cleanly on unsupported configs, if that is easy to achieve. >> >> But now that you nerd-sniped me into this... I think non-CX8 platforms would probably predate Pentium. The oldest real machine my lab has is Z530, which already has CX8. But it was easy to also go to my QEMU-driven build-test server, ask for `i486` as platform there, and et voila, no `cx8` in CPU flags: >> >> >> buildworker-debian12-32:~$ lscpu >> Architecture: i486 >> CPU op-mode(s): 32-bit >> Address sizes: 36 bits physical, 32 bits virtual >> Byte Order: Little Endian >> CPU(s): 4 >> On-line CPU(s) list: 0-3 >> Vendor ID: GenuineIntel >> Model name: 486 DX/4 >> CPU family: 4 >> Model: 8 >> Thread(s) per core: 4 >> Core(s) per socket: 1 >> Socket(s): 1 >> Stepping: 0 >> BogoMIPS: 5699.99 >> Flags: fpu vme pse apic ht cpuid tsc_known_freq x2apic hypervisor cpuid_fault >> >> >> And mainline JDK even starts there! (with interpreter, there are some asserts firing in compiler code, having to do with odd instruction selection on some paths): >> >> >> $ jdk/bin/java -Xint -version >> openjdk version "22-testing" 2024-03-19 >> OpenJDK Runtime Environment (fastdebug build 22-testing-builds.shipilev.net-openjdk-jdk-b627-20231121) >> OpenJDK Server VM (fastdebug build 22-testing-builds.shipilev.net-openjdk-jdk-b627-20231121, interpreted mode, sharing) > > Nice spelunking... I was wondering if it was something that old. I wasn't trying to nerd-snipe... > > I was in the dev lab at Intel when Xenix on the i386 first came up and sent its "Hello World!" email... > I left Intel for Sun in 1987 while i486 was still in development, but I still had periodic lunches with > folks that worked on those teams. Life was simpler back then... I politely disagree. The whole point here is to leave the past behind as much as possible. We made a concession for ARM32 as there may still be old ARMv5 and ARMv6 systems in use. IIUC you need to go back to the i486 chip to not have cmpxchg8 support and I'd bet money on it that we can't run on such a chip any more for a whole swag of reasons. In any case I don't have an issue telling i486 machine owners they are stuck with JDK 21! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16625#discussion_r1402877044 From dholmes at openjdk.org Thu Nov 23 03:14:37 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 23 Nov 2023 03:14:37 GMT Subject: RFR: 8318776: Require supports_cx8 to always be true [v5] In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 18:35:33 GMT, Daniel D. Daugherty wrote: >> src/hotspot/share/runtime/vm_version.cpp line 33: >> >>> 31: void VM_Version_init() { >>> 32: VM_Version::initialize(); >>> 33: guarantee(VM_Version::supports_cx8(), "Support for 64-bit atomic operations in required in this release"); >> >> Typo: "in required in". Also, no need to mention "this release" at all? >> Suggestion for message: "JVM requires platform support for 64-bit atomic operations" > > Or the simpler change: > s/in required/is required/ Message tweaked: Support for 64-bit atomic operations is required ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16625#discussion_r1402877940 From amitkumar at openjdk.org Thu Nov 23 05:10:08 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 23 Nov 2023 05:10:08 GMT Subject: RFR: 8317809: Insertion of free code blobs into code cache can be very slow during class unloading [v2] In-Reply-To: References: <_dcFF70_w7IXSjb6w-HuHCBkPyS3a6NlzejtqdfdYnM=.74e0f9df-eca3-49b0-be68-2d5824c16003@github.com> Message-ID: On Wed, 22 Nov 2023 16:09:27 GMT, Thomas Schatzl wrote: >> Insert code blobs in a sorted fashion to exploit the finger-optimization when adding, making this procedure O(n) instead of O(n^2) >> >> Introduces a globally available ClassUnloadingContext that contains common methods pertaining to class and code unloading. GCs may use it to efficiently manage unlinked class loader datas and nmethods to allow use of common methods (unlink/merge). >> >> The steps typically are registering a new to be unlinked CLD/nmethod, and then purge its memory later. STW collectors perform this work in one big chunk taking the CodeCache_lock, for the entire duration, while concurrent collectors lock/unlock for every insertion to allow for concurrent users for the lock to progress. >> >> Some care has been taken to stay consistent with an "unloading = unlinking + purge" scheme; however particularly the existing CLD handling API (still) mixes unlinking and purging in its CLD::unload() call. To simplify this change that is mostly geared towards separating nmethod unlinking from purging, to make code blob freeing O(n) instead of O(n^2). >> >> Upcoming changes will >> * separate nmethod unregistering from nmethod purging to allow doing that in bulk (for the STW collectors); that can significantly reduce code purging time for the STW collectors. >> * better name the second stage of unlinking (called "cleaning" throughout, e.g. the work done in `G1CollectedHeap::complete_cleaning`) >> * untangle CLD unlinking and what's called "cleaning" now to allow moving more stuff into the second unlinking stage for better parallelism >> * G1: move some significant tasks from the remark pause to concurrent (unregistering nmethods, freeing code blobs and cld/metaspace purging) >> * Maybe move Serial/Parallel GC metaspace purging closer to other unlinking/purging code to keep things local and allow easier logging. >> >> These are the reason for the class hierarchy for `ClassUnloadingContext`: the goal is to ultimately have about this phasing (for G1): >> 1. collect all dead CLDs, using the `register_unloading_class_loader_data` method *only* >> 2. parallelize the stuff in `ClassLoaderData::unload()` in one way or another, adding them to the `complete_cleaning` (parallel) phase. >> 3. `purge_nmethods`, `free_code_blobs` and the `remove_unlinked_nmethods_from_code_root_set` (from JDK-8317007) will be concurrent. >> >> Particularly the split of `SystemDictionary::do_unloading` into "only" traversing the CLDs to find the dead ones and then in parallel process them in 2. a... > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > iwalulya review, naming src/hotspot/share/classfile/classLoaderData.cpp line 602: > 600: > 601: // Clean up class dependencies and tell serviceability tools > 602: // these classes are unloading. This must be called Suggestion: // these classes are unloading. This must be called ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16759#discussion_r1402920337 From fyang at openjdk.org Thu Nov 23 05:10:12 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 23 Nov 2023 05:10:12 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v6] In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 16:26:28 GMT, Thomas Stuefe wrote: >> In `Metaspace::reserve_address_space_for_compressed_classes`, we reserve space for the future Klass range. We place the Klass range somewhere that allows us to use "good" narrow Klass decoding later when initializing the encoding scheme. >> >> Narrow Klass decoding is inherently CPU-specific, so doing this in shared coding is awkward. It leads to many ifdefs, vague code comments that are difficult to explain, and missed optimizations. >> >> There are common patterns: >> - all platforms benefit from unscaled encoding so trying to reserve <4GB for CDS=off is worthwhile. >> >> But there are more differences than one would think: >> - some platforms (s390, riscv) benefit from reservation < 4GB even with CDS=on since a 32-bit immediate requires fewer instructions >> - some platforms (aarch64) don't benefit from zero-based encoding, so no need to try that >> - some platforms benefit from optimizing the base for 16-bit moves (PPC, s390, aarch64) or for other immediate formats (riscv) >> >> It would be much better to have this section per CPU so that every CPU can implement its perfect, well documented version. A bit of code duplication is a good price for code clarity. >> >> ------------- >> >> This patch splits out `Metaspace::reserve_address_space_for_compressed_classes` into five variants, one per CPU (moving the code to CompressedKlassPointers); it also splits out `CompressedKlassPointers::initialize` into two variants, one for aarch64, one for all other platforms. >> >> Changes per-CPU: >> >> #### aarch64: >> >> Don't attempt to reserve for zero-based encoding; since lsl is not faster than movk. We reserve for movk mode right away if reserve for unscaled fails or if CDS=on. >> >> We also add a last-ditch attempt to reserve optimized for movk via over-alignment. We only do this on aarch64 to prevent errors like this one JDK-8318119: "Invalid narrow Klass base on aarch64 post 8312018" >> >> Since we don't want zero-based encoding, we need an aarch64-specific version for `CompressedKlassPointers::initialize()` >> >> #### riscv: >> >> We attempt to reserve at a "good" base that has only bits set either in [12..32), [32, 44) or in [44, 64). >> >> #### s390: >> >> We attempt to allocate < 4GB unconditionally. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > remove stray newline src/hotspot/cpu/riscv/compressedKlass_riscv.cpp line 41: > 39: // > 40: // A "good" base is, in this order: > 41: // 1) only bits in A; this would be an address < 2KB, which is unrealistic on normal Linux boxes since @tstuefe : Hi Thomas, thanks for considering riscv. A small question: is the`2KB` in the code comment a typo? Seems that it should be `4KB` instead :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16743#discussion_r1402920607 From duke at openjdk.org Thu Nov 23 05:27:08 2023 From: duke at openjdk.org (suchismith1993) Date: Thu, 23 Nov 2023 05:27:08 GMT Subject: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2] In-Reply-To: References: <8-buFPL9W3149qcnluk_XqTQr-cJYqu_XvwU5ovyAIA=.396e5005-f896-48b9-919c-94164229d7bf@github.com> <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> Message-ID: On Wed, 22 Nov 2023 16:35:36 GMT, Thomas Stuefe wrote: > Hi, is this patch meant for review already? If yes, could you please describe the problem you fix, and how you fix it? If no, I suggest working on it in draft state till its ready for review. I have updated the description. Let me know if anything is missing ------------- PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1823834982 From amitkumar at openjdk.org Thu Nov 23 05:47:10 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 23 Nov 2023 05:47:10 GMT Subject: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2] In-Reply-To: <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> References: <8-buFPL9W3149qcnluk_XqTQr-cJYqu_XvwU5ovyAIA=.396e5005-f896-48b9-919c-94164229d7bf@github.com> <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> Message-ID: On Wed, 22 Nov 2023 16:24:24 GMT, suchismith1993 wrote: >> J2SE agent does not start and throws error when it tries to find the shared library ibm_16_am. >> After searching for ibm_16_am.so ,the jvm agent throws and error as dll_load fails.It fails to identify the shared library ibm_16_am.a shared archive file on AIX. >> Hence we are providing a function which will additionally search for .a file on AIX ,when the search for .so file fails. > > suchismith1993 has updated the pull request incrementally with one additional commit since the last revision: > > change macro position some nits you might want to consider. src/hotspot/os/aix/os_aix.cpp line 3064: > 3062: > 3063: //Replaces provided path with alternate path for the given file,if it doesnt exist. > 3064: //For AIX,this replaces .so with .a. Suggestion: // Replaces the specified path with an alternative path for the given file if the original path doesn't exist. // For AIX, this replaces extension from ".so" to ".a". src/hotspot/os/aix/os_aix.cpp line 3065: > 3063: //Replaces provided path with alternate path for the given file,if it doesnt exist. > 3064: //For AIX,this replaces .so with .a. > 3065: void os::Aix::mapAlternateName(char* buffer, const char *extension) { Suggestion: void os::Aix::map_alternate_name(char* buffer, const char *extension) { src/hotspot/os/aix/os_aix.hpp line 181: > 179: static int stat64x_via_LIBPATH(const char* path, struct stat64x* stat); > 180: // Provide alternate path name,if file does not exist. > 181: static void mapAlternateName(char* buffer, const char *extension); Suggestion: // Provides alternate path name, if file does not exist. static void map_alternate_name(char* buffer, const char *extension); ------------- Changes requested by amitkumar (Committer). PR Review: https://git.openjdk.org/jdk/pull/16604#pullrequestreview-1745734586 PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1402935976 PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1402936171 PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1402936497 From dholmes at openjdk.org Thu Nov 23 05:49:04 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 23 Nov 2023 05:49:04 GMT Subject: RFR: 8320530: has_resolved_ref_index flag not restored after resetting entry In-Reply-To: <4Bi8mWx5pxfYciHnXHUla1X_BzUt_56q8MoRkCYc0dk=.50072528-a5d9-4708-995b-c0bd49f8c74b@github.com> References: <4Bi8mWx5pxfYciHnXHUla1X_BzUt_56q8MoRkCYc0dk=.50072528-a5d9-4708-995b-c0bd49f8c74b@github.com> Message-ID: On Tue, 21 Nov 2023 16:38:14 GMT, Matias Saavedra Silva wrote: > ResolvedMethodEntry::reset_entry() clears the fields in the structure and then restores the constant pool index and the resolved references index if it has one. Currently, the resolved references index is restored without restoring the has_resolved_reference flag used by the structure. > > This patch restored the flag with the resolved references index. Verified with tier 1-5 tests. Change seems fine but what was the effect of not restoring the flag? Does this cause failures or just unnecessary re-resolution, or? Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16769#pullrequestreview-1745736845 From amitkumar at openjdk.org Thu Nov 23 05:52:05 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 23 Nov 2023 05:52:05 GMT Subject: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2] In-Reply-To: <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> References: <8-buFPL9W3149qcnluk_XqTQr-cJYqu_XvwU5ovyAIA=.396e5005-f896-48b9-919c-94164229d7bf@github.com> <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> Message-ID: <2X-eefx1BKO-hqncC3LF0It1wUxHRqvrX0qm2TsCbd0=.2626d8a4-d549-4a82-a2de-40003af36539@github.com> On Wed, 22 Nov 2023 16:24:24 GMT, suchismith1993 wrote: >> J2SE agent does not start and throws error when it tries to find the shared library ibm_16_am. >> After searching for ibm_16_am.so ,the jvm agent throws and error as dll_load fails.It fails to identify the shared library ibm_16_am.a shared archive file on AIX. >> Hence we are providing a function which will additionally search for .a file on AIX ,when the search for .so file fails. > > suchismith1993 has updated the pull request incrementally with one additional commit since the last revision: > > change macro position Also are you planning to close this one : https://github.com/openjdk/jdk/pull/16490 ? JBS issue is already closed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1823849323 From dholmes at openjdk.org Thu Nov 23 06:04:06 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 23 Nov 2023 06:04:06 GMT Subject: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2] In-Reply-To: <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> References: <8-buFPL9W3149qcnluk_XqTQr-cJYqu_XvwU5ovyAIA=.396e5005-f896-48b9-919c-94164229d7bf@github.com> <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> Message-ID: On Wed, 22 Nov 2023 16:24:24 GMT, suchismith1993 wrote: >> J2SE agent does not start and throws error when it tries to find the shared library ibm_16_am. >> After searching for ibm_16_am.so ,the jvm agent throws and error as dll_load fails.It fails to identify the shared library ibm_16_am.a shared archive file on AIX. >> Hence we are providing a function which will additionally search for .a file on AIX ,when the search for .so file fails. > > suchismith1993 has updated the pull request incrementally with one additional commit since the last revision: > > change macro position A couple of comments. First, if you always want to look for `libfoo.a` if you can't find `libfoo.so` then maybe that should be handled as the `os::dll_load` level? Otherwisae it is not clear why this is something you only do for agents. ?/ Second, the amount of AIX-specific code in the shared `jvmtiAgent.cpp` now seems unreasonable. As I think @tstuefe mentioned in another PR perhaps it is time to find better abstractions here to hide all these AIX specific quirks? src/hotspot/os/aix/os_aix.cpp line 3071: > 3069: } > 3070: buffer[end] = '\0'; > 3071: strcat(buffer, extension); At some point you need to check the length of extension won't overflow buffer. ------------- PR Review: https://git.openjdk.org/jdk/pull/16604#pullrequestreview-1745744207 PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1402941846 From stuefe at openjdk.org Thu Nov 23 06:20:11 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 23 Nov 2023 06:20:11 GMT Subject: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2] In-Reply-To: References: <8-buFPL9W3149qcnluk_XqTQr-cJYqu_XvwU5ovyAIA=.396e5005-f896-48b9-919c-94164229d7bf@github.com> <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> Message-ID: On Thu, 23 Nov 2023 05:55:28 GMT, Thomas Stuefe wrote: >> suchismith1993 has updated the pull request incrementally with one additional commit since the last revision: >> >> change macro position > > src/hotspot/os/aix/os_aix.cpp line 3069: > >> 3067: while (end > 0 && buffer[end] != '.') { >> 3068: end = end - 1; >> 3069: } > > Use strrchr. Pls handle the case where the string contains no dot. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1402941387 From stuefe at openjdk.org Thu Nov 23 06:20:09 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 23 Nov 2023 06:20:09 GMT Subject: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2] In-Reply-To: <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> References: <8-buFPL9W3149qcnluk_XqTQr-cJYqu_XvwU5ovyAIA=.396e5005-f896-48b9-919c-94164229d7bf@github.com> <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> Message-ID: On Wed, 22 Nov 2023 16:24:24 GMT, suchismith1993 wrote: >> J2SE agent does not start and throws error when it tries to find the shared library ibm_16_am. >> After searching for ibm_16_am.so ,the jvm agent throws and error as dll_load fails.It fails to identify the shared library ibm_16_am.a shared archive file on AIX. >> Hence we are providing a function which will additionally search for .a file on AIX ,when the search for .so file fails. > > suchismith1993 has updated the pull request incrementally with one additional commit since the last revision: > > change macro position I'm not a big fan of this approach. We accumulate more and more "#ifdef AIX" in shared code because of many recent AIX additions. No other platform has such a large ifdef footprint in shared code. I argue that all of this should be handled inside os_aix.cpp and not leak out into the external space: If .a is a valid shared object format on AIX, this should be handled in `os::dll_load()`, and be done for all shared objects. If not, why do we try to load a static archive via dlload in this case but not in other cases? *If* this is needed in shared code, the string replacement function should be a generic utility function for all platforms, and it should be tested with a small gtest. A gtest would have likely uncovered the buffer overflow too. src/hotspot/os/aix/os_aix.cpp line 3065: > 3063: //Replaces provided path with alternate path for the given file,if it doesnt exist. > 3064: //For AIX,this replaces .so with .a. > 3065: void os::Aix::mapAlternateName(char* buffer, const char *extension) { The documentation is wrong: // Replaces the specified path with an alternative path for the given file if the original path doesn't exist It does no such thing, it replaces the extension unconditionally. The comment sounds like it does a file system check. That does not happen here. The whole function is not well named - "map alternate name" does not really tell me anything, I need to look at the implementation and the caller to understand what it is doing. There is no mapping here, this is just a string utility function. The function should not modify the original buffer but instead assemble a copy. That is the conventional way to do these things. You can work with immutable strings as input, e.g. literals, and don't risk buffer overflows. All of this should be handled inside os_aix.cpp; see my other comment. This should not live in the external os::aix interface, since it has nothing to do with AIX. *If* this is needed in generic code, which I don't think, then this should be made generic utility API, available on all platforms, and with a small gtest. But I think all of this should be confined to os_aix.cpp. Proposal for a clearer name, comment, and pseudocode // Given a filename with an extension, return a new string containing the filename with the new extension. // New string is allocated in resource area. static char* replace_extension_in_filename(const char* filename, const char* new_extension) { - allocate buffer in RA - assemble new path by contacting old path - old extension + new extension - return new path } src/hotspot/os/aix/os_aix.cpp line 3069: > 3067: while (end > 0 && buffer[end] != '.') { > 3068: end = end - 1; > 3069: } Use strrchr. src/hotspot/os/aix/os_aix.cpp line 3072: > 3070: buffer[end] = '\0'; > 3071: strcat(buffer, extension); > 3072: } This is a buffer overrun waiting to happen if replacement is larger than original extension. ------------- Changes requested by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16604#pullrequestreview-1745743102 PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1402944936 PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1402941240 PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1402941730 From stuefe at openjdk.org Thu Nov 23 06:28:27 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 23 Nov 2023 06:28:27 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v7] In-Reply-To: References: Message-ID: > In `Metaspace::reserve_address_space_for_compressed_classes`, we reserve space for the future Klass range. We place the Klass range somewhere that allows us to use "good" narrow Klass decoding later when initializing the encoding scheme. > > Narrow Klass decoding is inherently CPU-specific, so doing this in shared coding is awkward. It leads to many ifdefs, vague code comments that are difficult to explain, and missed optimizations. > > There are common patterns: > - all platforms benefit from unscaled encoding so trying to reserve <4GB for CDS=off is worthwhile. > > But there are more differences than one would think: > - some platforms (s390, riscv) benefit from reservation < 4GB even with CDS=on since a 32-bit immediate requires fewer instructions > - some platforms (aarch64) don't benefit from zero-based encoding, so no need to try that > - some platforms benefit from optimizing the base for 16-bit moves (PPC, s390, aarch64) or for other immediate formats (riscv) > > It would be much better to have this section per CPU so that every CPU can implement its perfect, well documented version. A bit of code duplication is a good price for code clarity. > > ------------- > > This patch splits out `Metaspace::reserve_address_space_for_compressed_classes` into five variants, one per CPU (moving the code to CompressedKlassPointers); it also splits out `CompressedKlassPointers::initialize` into two variants, one for aarch64, one for all other platforms. > > Changes per-CPU: > > #### aarch64: > > Don't attempt to reserve for zero-based encoding; since lsl is not faster than movk. We reserve for movk mode right away if reserve for unscaled fails or if CDS=on. > > We also add a last-ditch attempt to reserve optimized for movk via over-alignment. We only do this on aarch64 to prevent errors like this one JDK-8318119: "Invalid narrow Klass base on aarch64 post 8312018" > > Since we don't want zero-based encoding, we need an aarch64-specific version for `CompressedKlassPointers::initialize()` > > #### riscv: > > We attempt to reserve at a "good" base that has only bits set either in [12..32), [32, 44) or in [44, 64). > > #### s390: > > We attempt to allocate < 4GB unconditionally. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Feedback Felix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16743/files - new: https://git.openjdk.org/jdk/pull/16743/files/354bd0c1..a449751c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16743&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16743&range=05-06 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16743.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16743/head:pull/16743 PR: https://git.openjdk.org/jdk/pull/16743 From stuefe at openjdk.org Thu Nov 23 06:28:29 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 23 Nov 2023 06:28:29 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v6] In-Reply-To: References: Message-ID: <8qW3I3Y-X6TSbDEZq9sRtgDxLv7gaWwBzVUhUYj8dW0=.8a1f24cb-7b90-4157-b0d1-e47e619c5bae@github.com> On Thu, 23 Nov 2023 05:07:45 GMT, Fei Yang wrote: >> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: >> >> remove stray newline > > src/hotspot/cpu/riscv/compressedKlass_riscv.cpp line 41: > >> 39: // >> 40: // A "good" base is, in this order: >> 41: // 1) only bits in A; this would be an address < 2KB, which is unrealistic on normal Linux boxes since > > @tstuefe : Hi Thomas, thanks for considering riscv. A small question: is the`2KB` in the code comment a typo? Seems that it should be `4KB` instead :-) Oh, you are right, thanks for catching that! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16743#discussion_r1402954055 From thartmann at openjdk.org Thu Nov 23 06:45:05 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 23 Nov 2023 06:45:05 GMT Subject: RFR: 8319700: [AArch64] C2 compilation fails with "Field too big for insn" In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 10:44:12 GMT, Axel Boldt-Christmas wrote: > Not all ZGC C2 BarrierStubs used on aarch64 participates in the laying out of trampoline stubs. (Used enable as many `tbX` instructions as possible.) This leads to to incorrect calculations which may cause the target offset for the `tbX` branch to become to large. > > This fix changes all the BarriesStubs to stubs which participates in the trampoline logic. > > Until more platforms requires specialised barrier stub layouts it is not worth adding better support for this pattern. Without a redesign it does make it harder to ensure that this is used correctly. For now the shared code asserts when building for aarch64 that the general shared stubs are not used directly. But care would still have to be taken if any new barrier stubs are introduced. > > The behaviour was more easily reproducible when large inlining heuristics. This flag combination was used to get somewhat reliable reproducibility `-esa -ea -XX:MaxInlineLevel=300 -XX:MaxInlineSize=1100 -XX:MaxTrivialSize=1000 -XX:LiveNodeCountInliningCutoff=1000000 -XX:MaxNodeLimit=3000000 -XX:NodeLimitFudgeFactor=600000 -XX:+UnlockExperimentalVMOptions -XX:+UseVectorStubs` > > There was also an observation inside the JBS comments that there where no `tbX` instructions branching to the emitted trampolines. However I was unable to reproduce this. Ran all tests with the following guarantee, this could not observe it either. > > > diff --git a/src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp b/src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp > index ebaf1829972..b6c40163a6b 100644 > --- a/src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp > +++ b/src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp > @@ -36,6 +36,7 @@ > #include "runtime/icache.hpp" > #include "runtime/jniHandles.hpp" > #include "runtime/sharedRuntime.hpp" > +#include "utilities/debug.hpp" > #include "utilities/macros.hpp" > #ifdef COMPILER1 > #include "c1/c1_LIRAssembler.hpp" > @@ -1358,6 +1359,7 @@ void ZLoadBarrierStubC2Aarch64::emit_code(MacroAssembler& masm) { > // Current assumption is that the barrier stubs are the first stubs emitted after the actual code > assert(stubs_start_offset() <= output->buffer_sizing_data()->_code, "stubs are assumed to be emitted directly after code and code_size is a hard limit on where it can start"); > > + guarantee(!_test_and_branch_reachable_entry.is_unused(), "Should be used"); > __ bind(_test_and_branch_reachable_entry); > > // Next branch's offset is unknown, but is > branch_offset > > > - T... Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16780#pullrequestreview-1745780239 From aboldtch at openjdk.org Thu Nov 23 07:35:08 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 23 Nov 2023 07:35:08 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v6] In-Reply-To: References: <9qMIC_BQk5i5MmbQLovTmNsla_qMxlgCCZhyK8eHHSc=.c7d958c4-deaa-4264-a3f4-1907240d26d2@github.com> Message-ID: On Wed, 22 Nov 2023 21:17:55 GMT, Daniel D. Daugherty wrote: >> Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Merge remote-tracking branch 'upstream_jdk/pr/16603' into JDK-8319797 >> - Merge remote-tracking branch 'upstream_jdk/pr/16603' into JDK-8319797 >> - Merge remote-tracking branch 'upstream_jdk/pr/16603' into JDK-8319797 >> - Fix nit >> - Fix comment typos >> - 8319797: Recursive lightweight locking: Runtime implementation > > src/hotspot/share/runtime/synchronizer.cpp line 1320: > >> 1318: inf->set_owner_from_anonymous(current); >> 1319: size_t removed = JavaThread::cast(current)->lock_stack().remove(object); >> 1320: inf->set_recursions(removed - 1); > > Hmmmm... so now I'm wondering how non-lightweight locking gets the > recursions count correct? IIRC, with LM_LEGACY we count the BasicLocks > on the stack in order to get a proper recursion count, but I could be wrong... > > I don't see anything in the inflation code that's updating the recursion count > based on BasicLocks on the stack. Of course that would be impossible for a > Thread-A to do safely why inflating an ObjectMonitor for a lock held by a > Thread-B. In LM_LEGACY the recursion is implicit through the virtual stack created by the BasicObjectLocks in thread frames + the `ObjectMonitors::_recursions` value (in case inflation occurred during the critical section). In LM_LEGACY we do not need to count, because when unwinding stack frames or when monitorexit / synchronised return occurs we always first check if the associated BasicObjectLocker contains `0` in its displaced mark word. In this case we know it was a recursive `LEGACY` monitorenter / synchronised method entry and we can simply do nothing by clear the BasicObjectLocker (regardless if the monitor is inflated). On the last exit the displace mark word will not be `0` so if it was inflated we will to an `ObjectMonitor::exit` the owner might be a pointer to the BasicObjectLocker (or at least into the current threads stack in the case of OSR) or the current thread, regardless the recursion will be zero and we unlock. There are two places where we need to patch up this to make it work, one is OSR where if the OSR contains the initial monitor enter it inflates the monitor so we no longer have a stale stack pointer in the mark word. Or if we need to re-lock elided locks, and here we deal with the fact that we now can have that we are locking in an earlier frame than the current initial enter. So we find that BasicObjectLock (trivial as the mark word points to it). Write 0 to its displaced mark word (transforming it from an initial enter to a recursive one) and then update the mark word to point at the new BasicObjectLock. So the recursions are handled implicitly in the algorithm. (Except for that inappropriate use of jni monitorexit can break the VMs assumptions and we can end up in very strange states, I have ideas on how to restrict this so at least it cannot invalidate our assumptions, as well as have a much stronger guarantee when it comes to the CheckJNICalls than we do today). One interesting point here is that we could do something similar for LM_LIGHTWEIGHT. We still need the lock stack to map Objects to Threads (in liliput we do not have space to write some thread identifying information in the mark word (in a safe, and non restrictive way)). But we could handle any type of recursion, (that is interleaved enters, not only consecutive like the current implementation). To avoid a linear scan of the lock stack in the fast lock we would only do the fast recursive path for consecutive enters. Then have a medium path where we linear scan for interleaved recursive enters. This is very similar to what legacy does that it has a fast path if the initial enters frame is within one page size of the recursive enters frame, and otherwise does a medium path which checks if it is the same thread stack more exactly. I believe this would be worth while investigating. However this can be done as an enhancement to this implementation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1402993813 From aboldtch at openjdk.org Thu Nov 23 07:41:09 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 23 Nov 2023 07:41:09 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v6] In-Reply-To: References: <9qMIC_BQk5i5MmbQLovTmNsla_qMxlgCCZhyK8eHHSc=.c7d958c4-deaa-4264-a3f4-1907240d26d2@github.com> <25xmq9JFWlz_2L3NXWK8ghR5N2UlPE-uELjIZoyDvyg=.40c3316f-2580-4b2b-ad65-ec649e6a1c0a@github.com> Message-ID: On Wed, 22 Nov 2023 20:46:26 GMT, Daniel D. Daugherty wrote: >> src/hotspot/share/runtime/synchronizer.cpp line 498: >> >>> 496: p2i(monitor->owner()), p2i(current), monitor->object()->mark_acquire().value()); >>> 497: assert(!lock_stack.is_full(), "must have made room here"); >>> 498: } >> >> This is an interesting idea. I'll have to see how you test this code later on... > > Update: I didn't see an explicit test case for overflowing the lock stack. > Do you plan to add one? Sure. They are generally tricky, you usually need to tell the compiler not to do stuff like eliding locks, and inline methods. I will attempt to write tests both for -Xint and the compiler that tests both recursive and normal full lock stack. (The recursive tests requires the platform to implement recursive lightweight for it to effectively test anything) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1402998083 From stuefe at openjdk.org Thu Nov 23 07:46:33 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 23 Nov 2023 07:46:33 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v8] In-Reply-To: References: Message-ID: <3z5l_WMFbTjpM9cD2UjqaJ2hwE3Wbc82g7pdDMLLNQs=.602c866c-50c4-401e-b503-26c1ff15249e@github.com> > In `Metaspace::reserve_address_space_for_compressed_classes`, we reserve space for the future Klass range. We place the Klass range somewhere that allows us to use "good" narrow Klass decoding later when initializing the encoding scheme. > > Narrow Klass decoding is inherently CPU-specific, so doing this in shared coding is awkward. It leads to many ifdefs, vague code comments that are difficult to explain, and missed optimizations. > > There are common patterns: > - all platforms benefit from unscaled encoding so trying to reserve <4GB for CDS=off is worthwhile. > > But there are more differences than one would think: > - some platforms (s390, riscv) benefit from reservation < 4GB even with CDS=on since a 32-bit immediate requires fewer instructions > - some platforms (aarch64) don't benefit from zero-based encoding, so no need to try that > - some platforms benefit from optimizing the base for 16-bit moves (PPC, s390, aarch64) or for other immediate formats (riscv) > > It would be much better to have this section per CPU so that every CPU can implement its perfect, well documented version. A bit of code duplication is a good price for code clarity. > > ------------- > > This patch splits out `Metaspace::reserve_address_space_for_compressed_classes` into five variants, one per CPU (moving the code to CompressedKlassPointers); it also splits out `CompressedKlassPointers::initialize` into two variants, one for aarch64, one for all other platforms. > > Changes per-CPU: > > #### aarch64: > > Don't attempt to reserve for zero-based encoding; since lsl is not faster than movk. We reserve for movk mode right away if reserve for unscaled fails or if CDS=on. > > We also add a last-ditch attempt to reserve optimized for movk via over-alignment. We only do this on aarch64 to prevent errors like this one JDK-8318119: "Invalid narrow Klass base on aarch64 post 8312018" > > Since we don't want zero-based encoding, we need an aarch64-specific version for `CompressedKlassPointers::initialize()` > > #### riscv: > > We attempt to reserve at a "good" base that has only bits set either in [12..32), [32, 44) or in [44, 64). > > #### s390: > > We attempt to allocate < 4GB unconditionally. Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision: - Adapt test; exclude on windows - Correctly name test flag ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16743/files - new: https://git.openjdk.org/jdk/pull/16743/files/a449751c..ecaf743c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16743&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16743&range=06-07 Stats: 18 lines in 3 files changed: 13 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16743.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16743/head:pull/16743 PR: https://git.openjdk.org/jdk/pull/16743 From aboldtch at openjdk.org Thu Nov 23 07:51:39 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 23 Nov 2023 07:51:39 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v7] In-Reply-To: References: Message-ID: > Implements the runtime part of JDK-8319796. > The different CPU implementations are/will be created as dependent pull requests. > > This enhancement proposes introducing the ability for LM_LIGHTWEIGHT to handle consecutive recursive monitor enter. Limiting the implementation to only consecutive monitor enters allows for more efficient emitted code which only needs to look at the two top most entires on the lock stack to determine what to do in a monitor exit. > > A high level overview: > * Locking is still performed on the mark word > * Unlocked (0b01) <=> Locked (0b00) > * Monitor enter on Obj with mark word Unlocked (0b01) is the same > * Transition Obj's mark word Unlocked (0b01) => Locked (0b00) > * Push Obj onto the lock stack > * Success > * Monitor enter on Obj with mark word Locked (0b00) will check the top entry on the lock stack > * If top entry is Obj > * Push Obj on the lock stack > * Success > * If top entry is not Obj > * Inflate and call ObjectMonitor::enter > * Monitor exit on Obj with mark word Locked (0b00) will check the two top entries on the lock stack > * If just the top entry is Obj > * Transition Obj's mark word Locked (0b00) => Unlocked (0b01) > * Pop the entry > * Success > * If both entries are Obj > * Pop the top entry > * Success > * Any other case only occurs for unstructured locking, then just inflate and call ObjectMonitor::exit > * If the monitor has been inflated for object Obj which is owned by the current thread > * All corresponding entries for Obj is removed from the lock stack > * The monitor recursions is set to the number of removed entries - 1 > * The owner is changed from anonymous to the thread > * The regular ObjectMonitor::action is called. Axel Boldt-Christmas has updated the pull request incrementally with three additional commits since the last revision: - Update unstructured unlock comment - Fix bad indent after merge - Remove whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16606/files - new: https://git.openjdk.org/jdk/pull/16606/files/dde4f957..a10471c7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16606&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16606&range=05-06 Stats: 18 lines in 2 files changed: 4 ins; 3 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/16606.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16606/head:pull/16606 PR: https://git.openjdk.org/jdk/pull/16606 From aboldtch at openjdk.org Thu Nov 23 07:51:42 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 23 Nov 2023 07:51:42 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v6] In-Reply-To: <25xmq9JFWlz_2L3NXWK8ghR5N2UlPE-uELjIZoyDvyg=.40c3316f-2580-4b2b-ad65-ec649e6a1c0a@github.com> References: <9qMIC_BQk5i5MmbQLovTmNsla_qMxlgCCZhyK8eHHSc=.c7d958c4-deaa-4264-a3f4-1907240d26d2@github.com> <25xmq9JFWlz_2L3NXWK8ghR5N2UlPE-uELjIZoyDvyg=.40c3316f-2580-4b2b-ad65-ec649e6a1c0a@github.com> Message-ID: On Tue, 21 Nov 2023 22:04:39 GMT, Daniel D. Daugherty wrote: >> Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Merge remote-tracking branch 'upstream_jdk/pr/16603' into JDK-8319797 >> - Merge remote-tracking branch 'upstream_jdk/pr/16603' into JDK-8319797 >> - Merge remote-tracking branch 'upstream_jdk/pr/16603' into JDK-8319797 >> - Fix nit >> - Fix comment typos >> - 8319797: Recursive lightweight locking: Runtime implementation > > src/hotspot/share/runtime/lockStack.inline.hpp line 127: > >> 125: int end = to_index(_top); >> 126: if (end <= 1 || _base[end - 1] != o || _base[end - 2] != o) { >> 127: // The two topmost oop does not match o. > > nit typo: s/The two topmost oop does not match o./The two topmost oops do not match o. Done. > src/hotspot/share/runtime/synchronizer.cpp line 504: > >> 502: // Retry until a lock state change has been observed. cas_set_mark() may collide with non lock bits modifications. >> 503: // Try to swing into 'fast-locked' state. >> 504: assert(!lock_stack.contains(obj()), "thread must not already hold the lock"); > > It looks like the indent from L502 -> L515 is too much by two spaces. > This could be a GitHub view glitch... Done. (Occurred when merging). > test/hotspot/gtest/runtime/test_lockStack.cpp line 36: > >> 34: ls._base[ls.to_index(ls._top)] = obj; >> 35: ls._top += oopSize; >> 36: > > nit - please delete extra blank line. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1403005792 PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1403004885 PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1403004988 From aboldtch at openjdk.org Thu Nov 23 07:55:35 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 23 Nov 2023 07:55:35 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v8] In-Reply-To: References: Message-ID: > Implements the runtime part of JDK-8319796. > The different CPU implementations are/will be created as dependent pull requests. > > This enhancement proposes introducing the ability for LM_LIGHTWEIGHT to handle consecutive recursive monitor enter. Limiting the implementation to only consecutive monitor enters allows for more efficient emitted code which only needs to look at the two top most entires on the lock stack to determine what to do in a monitor exit. > > A high level overview: > * Locking is still performed on the mark word > * Unlocked (0b01) <=> Locked (0b00) > * Monitor enter on Obj with mark word Unlocked (0b01) is the same > * Transition Obj's mark word Unlocked (0b01) => Locked (0b00) > * Push Obj onto the lock stack > * Success > * Monitor enter on Obj with mark word Locked (0b00) will check the top entry on the lock stack > * If top entry is Obj > * Push Obj on the lock stack > * Success > * If top entry is not Obj > * Inflate and call ObjectMonitor::enter > * Monitor exit on Obj with mark word Locked (0b00) will check the two top entries on the lock stack > * If just the top entry is Obj > * Transition Obj's mark word Locked (0b00) => Unlocked (0b01) > * Pop the entry > * Success > * If both entries are Obj > * Pop the top entry > * Success > * Any other case only occurs for unstructured locking, then just inflate and call ObjectMonitor::exit > * If the monitor has been inflated for object Obj which is owned by the current thread > * All corresponding entries for Obj is removed from the lock stack > * The monitor recursions is set to the number of removed entries - 1 > * The owner is changed from anonymous to the thread > * The regular ObjectMonitor::action is called. Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: - Avoid copy from and to the same location - Fix typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16606/files - new: https://git.openjdk.org/jdk/pull/16606/files/a10471c7..0fe2561a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16606&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16606&range=06-07 Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16606.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16606/head:pull/16606 PR: https://git.openjdk.org/jdk/pull/16606 From aboldtch at openjdk.org Thu Nov 23 07:55:39 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 23 Nov 2023 07:55:39 GMT Subject: RFR: 8319797: Recursive lightweight locking: Runtime implementation [v6] In-Reply-To: <25xmq9JFWlz_2L3NXWK8ghR5N2UlPE-uELjIZoyDvyg=.40c3316f-2580-4b2b-ad65-ec649e6a1c0a@github.com> References: <9qMIC_BQk5i5MmbQLovTmNsla_qMxlgCCZhyK8eHHSc=.c7d958c4-deaa-4264-a3f4-1907240d26d2@github.com> <25xmq9JFWlz_2L3NXWK8ghR5N2UlPE-uELjIZoyDvyg=.40c3316f-2580-4b2b-ad65-ec649e6a1c0a@github.com> Message-ID: On Tue, 21 Nov 2023 22:22:07 GMT, Daniel D. Daugherty wrote: >> Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Merge remote-tracking branch 'upstream_jdk/pr/16603' into JDK-8319797 >> - Merge remote-tracking branch 'upstream_jdk/pr/16603' into JDK-8319797 >> - Merge remote-tracking branch 'upstream_jdk/pr/16603' into JDK-8319797 >> - Fix nit >> - Fix comment typos >> - 8319797: Recursive lightweight locking: Runtime implementation > > src/hotspot/share/runtime/lockStack.inline.hpp line 144: > >> 142: for (int i = 0; i < end; i++) { >> 143: if (_base[i] != o) { >> 144: _base[inserted++] = _base[i]; > > This version of the removal algorithm stores into every `base[inserted]` memory location > even when `inserted == i` before the first instance of `o` is found and logically removed. > Granted the lock stack is only 8 elements, but storing into every memory location when > you don't need to is wasteful. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16606#discussion_r1403008973 From duke at openjdk.org Thu Nov 23 08:11:05 2023 From: duke at openjdk.org (suchismith1993) Date: Thu, 23 Nov 2023 08:11:05 GMT Subject: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2] In-Reply-To: <2X-eefx1BKO-hqncC3LF0It1wUxHRqvrX0qm2TsCbd0=.2626d8a4-d549-4a82-a2de-40003af36539@github.com> References: <8-buFPL9W3149qcnluk_XqTQr-cJYqu_XvwU5ovyAIA=.396e5005-f896-48b9-919c-94164229d7bf@github.com> <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> <2X-eefx1BKO-hqncC3LF0It1wUxHRqvrX0qm2TsCbd0=.2626d8a4-d549-4a82-a2de-40003af36539@github.com> Message-ID: On Thu, 23 Nov 2023 05:49:41 GMT, Amit Kumar wrote: > #16490 The JBS issue with respect to that has been closed. Need to check if that PR is required. Currently putting it on hold. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1823952398 From vklang at openjdk.org Thu Nov 23 08:12:11 2023 From: vklang at openjdk.org (Viktor Klang) Date: Thu, 23 Nov 2023 08:12:11 GMT Subject: RFR: 8318776: Require supports_cx8 to always be true [v5] In-Reply-To: References: <6TkAoQRK_49MtT6wjb_JAwFzUATAvAx_rX9yxMa9Vfs=.7d1d5bf6-f088-44df-9d01-cc336bdeecaa@github.com> Message-ID: On Thu, 23 Nov 2023 03:11:15 GMT, David Holmes wrote: >> Wow! This PR is much larger than I expected. >> >> Thumbs up! > >> Wow! This PR is much larger than I expected. >> >> Thumbs up! > > Thanks for the Review Dan! Yes lots of code deletion engineering in this one - and even better I got to delete template code with meta-programming stuff! :D @dholmes-ora Just passing by -- impressed by the thorough update! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16625#issuecomment-1823953312 From dholmes at openjdk.org Thu Nov 23 08:16:11 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 23 Nov 2023 08:16:11 GMT Subject: RFR: 8318776: Require supports_cx8 to always be true [v5] In-Reply-To: References: <6TkAoQRK_49MtT6wjb_JAwFzUATAvAx_rX9yxMa9Vfs=.7d1d5bf6-f088-44df-9d01-cc336bdeecaa@github.com> Message-ID: On Thu, 23 Nov 2023 08:09:44 GMT, Viktor Klang wrote: >>> Wow! This PR is much larger than I expected. >>> >>> Thumbs up! >> >> Thanks for the Review Dan! Yes lots of code deletion engineering in this one - and even better I got to delete template code with meta-programming stuff! :D > > @dholmes-ora Just passing by -- impressed by the thorough update! Thanks for taking a look @viktorklang-ora ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16625#issuecomment-1823957279 From fyang at openjdk.org Thu Nov 23 08:22:08 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 23 Nov 2023 08:22:08 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v2] In-Reply-To: References: <9i5yHmpRi3-XqL5lw0-0IexhCDr2FOi5nT4dgY7cWao=.ab8a1d6e-c9fc-4108-820b-374ce7815463@github.com> <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> Message-ID: On Mon, 20 Nov 2023 12:07:36 GMT, Yuri Gaevsky wrote: >> Sure, let me check . > > Done in [this commit](https://github.com/openjdk/jdk/pull/16629/commits/af940acd365677ec3c29a8f066b68b753ad362e4). I've tried the usage of iRegP/iRegI but that caused of the related failure (JVM even didn't start). I guess it might be a performance consideration (maybe saving some register-register moves?). I see the x86_64 counterpart also specifies certain regsiters [1]. You might want give it try on x86_64 to find out how it may make a difference on the JIT code. [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L11225 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1403030947 From shade at openjdk.org Thu Nov 23 08:23:14 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 23 Nov 2023 08:23:14 GMT Subject: RFR: 8320582: Zero: Misplaced CX8 enablement flag In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 10:33:24 GMT, Aleksey Shipilev wrote: > When doing [JDK-8319777](https://bugs.openjdk.org/browse/JDK-8319777), I misplaced the `_supports_cx8 = true` flag setting in the method that is only called when CPU features are polled from perf counter code. We need to move the check to a proper place. [JDK-8318776](https://github.com/openjdk/jdk/pull/16625/files) would catch fire without this. > > Additional testing (redoing JDK-8319777 testing): > - [x] Linux arm Zero fastdebug now builds fine with JDK-8318776 fix > - [x] Linux x86_32 Zero release; jcstress > - [x] Linux x86_32 Zero fastdebug, `compiler/unsafe java/lang/invoke/VarHandles` > - [x] Linux x86_32 Zero fastdebug, bootcycle-images No problem! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16779#issuecomment-1823963927 From shade at openjdk.org Thu Nov 23 08:23:15 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 23 Nov 2023 08:23:15 GMT Subject: Integrated: 8320582: Zero: Misplaced CX8 enablement flag In-Reply-To: References: Message-ID: <6S5AMSqiFR2K34J_U4_ds2X-aSe9vM1ytz56rkYqlX8=.540ef94d-4dd0-4c55-a74e-ce050a433876@github.com> On Wed, 22 Nov 2023 10:33:24 GMT, Aleksey Shipilev wrote: > When doing [JDK-8319777](https://bugs.openjdk.org/browse/JDK-8319777), I misplaced the `_supports_cx8 = true` flag setting in the method that is only called when CPU features are polled from perf counter code. We need to move the check to a proper place. [JDK-8318776](https://github.com/openjdk/jdk/pull/16625/files) would catch fire without this. > > Additional testing (redoing JDK-8319777 testing): > - [x] Linux arm Zero fastdebug now builds fine with JDK-8318776 fix > - [x] Linux x86_32 Zero release; jcstress > - [x] Linux x86_32 Zero fastdebug, `compiler/unsafe java/lang/invoke/VarHandles` > - [x] Linux x86_32 Zero fastdebug, bootcycle-images This pull request has now been integrated. Changeset: 06d957fd Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/06d957fd7c1ff59f314315969a545a8f4a5137be Stats: 12 lines in 1 file changed: 6 ins; 6 del; 0 mod 8320582: Zero: Misplaced CX8 enablement flag Reviewed-by: dholmes ------------- PR: https://git.openjdk.org/jdk/pull/16779 From shade at openjdk.org Thu Nov 23 08:28:12 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 23 Nov 2023 08:28:12 GMT Subject: RFR: 8318776: Require supports_cx8 to always be true [v6] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 03:14:27 GMT, David Holmes wrote: >> As discussed in JBS all platforms (some tweaks to Zero are in progress) actually do support `cx8` i.e. 64-bit compare-and-exchange, so we can strip out the locked-based alternatives to using it and just add a guarantee that it is true at runtime. And all platforms except some ARM variants set `SUPPORTS_NATIVE_CX8`, so we can greatly simplify things. Summary of changes: >> - `_supports_cx8` field is only needed when `SUPPORTS_NATIVE_CX8` is not defined >> - Assertions for `supports_cx8()` are removed >> - Compiler predicates requiring `supports_cx8()` are removed >> - Access backend is greatly simplified without the need for lock-based alternative >> - `java.util.concurrent.AtomicLongFieldUpdater` is simplified without the need for a lock-based alternative >> >> I did consider moving all the ARM `kuser_helper` related code to be only defined when `SUPPORTS_NATIVE_CX8` is not defined, but there was a theoretical risk this could change the behaviour if ARMv7 binaries were run on other ARM CPU's. I added a note to that effect in vm_version_linux_arm32.cpp so the ARM port maintainers could clean this up further if desired. >> >> Testing: >> - All Oracle tiers 1-5 builds (which includes an ARMv7 build) >> - GHA builds/tests >> - Oracle tiers 1-3 sanity testing >> >> Zero changes coming in via [JDK-8319777](https://bugs.openjdk.org/browse/JDK-8319777) will be merged when they arrive. >> >> Thanks. > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo Ran full jcstress on linux-arm-zero-release on RPi 4 without problem. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16625#pullrequestreview-1745896266 From jbachorik at openjdk.org Thu Nov 23 08:32:36 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Thu, 23 Nov 2023 08:32:36 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v2] In-Reply-To: References: Message-ID: > Please, review this fix for a corner case handling of `jmethodID` values. > > The issue is related to the interplay between `jmethodID` values and method redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` instance. Once that method gets redefined, the `jmethodID` is updated to point to the last `Method` version. > Unless the method is still on stack/running, in which case the original `jmethodID` will be redirected to the latest `Method` version and at the same time the 'previous' `Method` version will receive a new `jmethodID` pointing to that previous version. > > If we happen to capture stacktrace via `GetStackTrace` or `GetAllStackTraces` JVMTI calls while this previous `Method` version is still on stack we will have the corresponding frame identified by a `jmethodID` pointing to that version. > However, sooner or later the 'previous' class version becomes eligible for cleanup at what time all contained `Method` instances. The cleanup process will not perform the `jmethodID` pointer maintenance and we will end up with pointers to deallocated memory. > This is caused by the fact that the `jmethodID` lifecycle is bound to `ClassLoaderData` instance and all relevant `jmethodID`s will get batch-updated when the class loader is being released and all its classes are getting unloaded. > > This means that we need to make sure that if a `Method` instance is being deallocate the associated `jmethodID` (if any) must not point to the deallocated instance once we are finished. Unfortunately, we can not just update the `jmethodID` values in bulk when purging an old class version - the per `InstanceKlass` jmethodID cache is present only for the main class version and contains `jmethodID` values for both the old and current method versions. > > Therefore we need to perform `jmethodID` lookup when we are about to deallocate a `Method` instance and clean up the pointer only if that `jmethodID` is pointing to the `Method` instance which is being deallocated. > > _(For anyone interested, a much lengthier writeup is available in [my blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_ Jaroslav Bachorik has updated the pull request incrementally with three additional commits since the last revision: - Clean up imports - Simplify Method::clear_jmethod_id() - Use correct copyrights ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16662/files - new: https://git.openjdk.org/jdk/pull/16662/files/b2ca5e69..f47e9499 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16662&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16662&range=00-01 Stats: 25 lines in 6 files changed: 5 ins; 13 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/16662.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16662/head:pull/16662 PR: https://git.openjdk.org/jdk/pull/16662 From xgong at openjdk.org Thu Nov 23 08:44:08 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 23 Nov 2023 08:44:08 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 01:28:40 GMT, Xiaohong Gong wrote: >> Ah, now I se what you are trying to do here. First of all, in the detection part, only set `SVE_FEATURE_SUPPORT`. Then you can handle the `SVE_CFLAGS` addition elsewhere/later. >> >> Secondly, you should not mix these `SVE_CFLAGS` with the spleef C flags. Keeping them separate will allow for LIBSLEEF_CFLAGS to be named just that. >> >> Thirdly, I do not like at all how you just come crashing in setting `-march` like that. The `-march` flag is handled by `FLAGS_SETUP_ABI_PROFILE`. >> >> Actually, now that I think of it, this is just completely wrong! You are checking on features on the build machine, to determine what target machine code to generate, with no way to override. >> >> You need to break out the -march handling separately. It should be moved to FLAGS_SETUP_ABI_PROFILE. I'm guessing you will need to make something like a `aarch64-sve` profile, and possibly try to auto-select it based on the result of the sve test program above. But changing `OPENJDK_TARGET_ABI_PROFILE` can have further consequences; I do not know the full extent on the top of my head. > > Thanks for the advice! I will take a consideration for it. > Thirdly, I do not like at all how you just come crashing in setting -march like that. The -march flag is handled by FLAGS_SETUP_ABI_PROFILE. `-march=armv8-a+sve` is just used in this new added module, which may not expect to be used for other libraries. Per my understanding, flags handled by `FLAGS_SETUP_ABI_PROFILE` is not used for a specified module? > Actually, now that I think of it, this is just completely wrong! You are checking on features on the build machine, to determine what target machine code to generate, with no way to override. Yes, that's be a risk, although the usage to the SVE functions are controlled by SVE feature as well in runtime. I need time to find a better solution. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1403052964 From stefank at openjdk.org Thu Nov 23 08:44:10 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 23 Nov 2023 08:44:10 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 01:27:24 GMT, David Holmes wrote: >> In the rewrites made for: >> [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` >> >> I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. >> >> The provided tests provoke this assert form: >> * the JNI thread detach code >> * thread dumping with locked monitors, and >> * the JVMTI GetOwnedMonitorInfo API. >> >> While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. >> >> The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. >> >> For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. >> >> Test: the written tests with and without the fix. Tier1-Tier3, so far. > > src/hotspot/share/runtime/vmOperations.cpp line 354: > >> 352: // alive. Filter out monitors with dead objects. >> 353: return; >> 354: } > > I don't think we need to do this, but even without this filtering I ran a number of tests and was unable to demonstrate any problem. The JNI locked monitor seems to be "invisible" to the frame that locked it and so the thread dump never encounters it. Were you able to provoke a failure here or is this defensive programming? I provoked test failures for all paths I filtered. If I remove this check and run: make -C ../build/fastdebug test TEST=test/hotspot/jtreg/runtime/Monitor/IterateMonitorWithDeadObjectTest.java JTREG="JAVA_OPTIONS=-XX:+UseZGC" I hit this assert: # Internal Error (/home/stefank/git/jdk/open/src/hotspot/share/services/management.cpp:1274), pid=1546709, tid=1546754 # assert(object != nullptr) failed: must be a Java object ... V [libjvm.so+0x1330ce8] jmm_DumpThreads+0x1a48 (management.cpp:1274) j sun.management.ThreadImpl.dumpThreads0([JZZI)[Ljava/lang/management/ThreadInfo;+0 java.management at 22-internal j sun.management.ThreadImpl.dumpAllThreads(ZZI)[Ljava/lang/management/ThreadInfo;+28 java.management at 22-internal j sun.management.ThreadImpl.dumpAllThreads(ZZ)[Ljava/lang/management/ThreadInfo;+5 java.management at 22-internal j IterateMonitorWithDeadObjectTest.dumpThreadsWithLockedMonitors()V+7 j IterateMonitorWithDeadObjectTest.main([Ljava/lang/String;)V+11 If I remove that assert I hit an NPE in the Java layer: java.lang.NullPointerException: Cannot invoke "Object.getClass()" because "lock" is null at java.management/java.lang.management.ThreadInfo.(ThreadInfo.java:172) at java.management/sun.management.ThreadImpl.dumpThreads0(Native Method) at java.management/sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:518) at java.management/sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:506) at IterateMonitorWithDeadObjectTest.dumpThreadsWithLockedMonitors(IterateMonitorWithDeadObjectTest.java:44) at IterateMonitorWithDeadObjectTest.main(IterateMonitorWithDeadObjectTest.java:66) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) at java.base/java.lang.Thread.run(Thread.java:1570) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1403052907 From jbachorik at openjdk.org Thu Nov 23 08:49:30 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Thu, 23 Nov 2023 08:49:30 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v3] In-Reply-To: References: Message-ID: > Please, review this fix for a corner case handling of `jmethodID` values. > > The issue is related to the interplay between `jmethodID` values and method redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` instance. Once that method gets redefined, the `jmethodID` is updated to point to the last `Method` version. > Unless the method is still on stack/running, in which case the original `jmethodID` will be redirected to the latest `Method` version and at the same time the 'previous' `Method` version will receive a new `jmethodID` pointing to that previous version. > > If we happen to capture stacktrace via `GetStackTrace` or `GetAllStackTraces` JVMTI calls while this previous `Method` version is still on stack we will have the corresponding frame identified by a `jmethodID` pointing to that version. > However, sooner or later the 'previous' class version becomes eligible for cleanup at what time all contained `Method` instances. The cleanup process will not perform the `jmethodID` pointer maintenance and we will end up with pointers to deallocated memory. > This is caused by the fact that the `jmethodID` lifecycle is bound to `ClassLoaderData` instance and all relevant `jmethodID`s will get batch-updated when the class loader is being released and all its classes are getting unloaded. > > This means that we need to make sure that if a `Method` instance is being deallocate the associated `jmethodID` (if any) must not point to the deallocated instance once we are finished. Unfortunately, we can not just update the `jmethodID` values in bulk when purging an old class version - the per `InstanceKlass` jmethodID cache is present only for the main class version and contains `jmethodID` values for both the old and current method versions. > > Therefore we need to perform `jmethodID` lookup when we are about to deallocate a `Method` instance and clean up the pointer only if that `jmethodID` is pointing to the `Method` instance which is being deallocated. > > _(For anyone interested, a much lengthier writeup is available in [my blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_ Jaroslav Bachorik has updated the pull request incrementally with one additional commit since the last revision: Rewerite the test to use RedefineClassHelper ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16662/files - new: https://git.openjdk.org/jdk/pull/16662/files/f47e9499..967813b6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16662&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16662&range=01-02 Stats: 124 lines in 2 files changed: 15 ins; 93 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/16662.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16662/head:pull/16662 PR: https://git.openjdk.org/jdk/pull/16662 From jbachorik at openjdk.org Thu Nov 23 08:49:32 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Thu, 23 Nov 2023 08:49:32 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v2] In-Reply-To: References: Message-ID: <4m-e190SIpRd0-fkjZ4ja7mhVaQQsTwfqmWd8_82Els=.7b198822-3637-4109-b1e9-2451f882ce4b@github.com> On Thu, 23 Nov 2023 08:32:36 GMT, Jaroslav Bachorik wrote: >> Please, review this fix for a corner case handling of `jmethodID` values. >> >> The issue is related to the interplay between `jmethodID` values and method redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` instance. Once that method gets redefined, the `jmethodID` is updated to point to the last `Method` version. >> Unless the method is still on stack/running, in which case the original `jmethodID` will be redirected to the latest `Method` version and at the same time the 'previous' `Method` version will receive a new `jmethodID` pointing to that previous version. >> >> If we happen to capture stacktrace via `GetStackTrace` or `GetAllStackTraces` JVMTI calls while this previous `Method` version is still on stack we will have the corresponding frame identified by a `jmethodID` pointing to that version. >> However, sooner or later the 'previous' class version becomes eligible for cleanup at what time all contained `Method` instances. The cleanup process will not perform the `jmethodID` pointer maintenance and we will end up with pointers to deallocated memory. >> This is caused by the fact that the `jmethodID` lifecycle is bound to `ClassLoaderData` instance and all relevant `jmethodID`s will get batch-updated when the class loader is being released and all its classes are getting unloaded. >> >> This means that we need to make sure that if a `Method` instance is being deallocate the associated `jmethodID` (if any) must not point to the deallocated instance once we are finished. Unfortunately, we can not just update the `jmethodID` values in bulk when purging an old class version - the per `InstanceKlass` jmethodID cache is present only for the main class version and contains `jmethodID` values for both the old and current method versions. >> >> Therefore we need to perform `jmethodID` lookup when we are about to deallocate a `Method` instance and clean up the pointer only if that `jmethodID` is pointing to the `Method` instance which is being deallocated. >> >> _(For anyone interested, a much lengthier writeup is available in [my blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_ > > Jaroslav Bachorik has updated the pull request incrementally with three additional commits since the last revision: > > - Clean up imports > - Simplify Method::clear_jmethod_id() > - Use correct copyrights @dholmes-ora > Can't we just check method->method_holder() for null rather than passing in a parameter like this? I have removed the argument and I am now performing the check for `method_holder() != nullptr` as recommended. The code is a bit simpler and the cost of resolving the method holder for each method is probably quite low so we should be ok here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16662#issuecomment-1823997622 From jbachorik at openjdk.org Thu Nov 23 08:49:35 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Thu, 23 Nov 2023 08:49:35 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v3] In-Reply-To: References: Message-ID: <-NQyZq95fCsl2kAwAjrQXuxJoKqAdnoI3Y0ccQ9CNsk=.5fe65c5d-c326-45a3-8777-13c5b3f8cf06@github.com> On Mon, 20 Nov 2023 22:08:49 GMT, Coleen Phillimore wrote: >> Jaroslav Bachorik has updated the pull request incrementally with one additional commit since the last revision: >> >> Rewerite the test to use RedefineClassHelper > > src/hotspot/share/classfile/classFileParser.cpp line 5579: > >> 5577: >> 5578: if (_methods != nullptr) { >> 5579: // Free methods - those methods are not fully wired and miss the method holder > > How about saying: for methods whose InstanceKlass as method holder is not yet created? This comment went away with the change @dholmes-ora recommended > test/hotspot/jtreg/serviceability/jvmti/thread/GetStackTrace/GetStackTraceAndRetransformTest/GetStackTraceAndRetransformTest.java line 53: > >> 51: import java.util.List; >> 52: import java.util.concurrent.CyclicBarrier; >> 53: import java.util.concurrent.locks.LockSupport; > > Do you need all these imports? > > There's a simple RedefineClassHelper class that does most of the work, but maybe you need the explicit agent code to reproduce the crash? See test/hotspot/jtreg/serviceability/jvmti/RedefineClasses/RedefineRunningMethodsWithBacktrace.java as an example. I did clean up the imports and switched to `RedefineClassHelper`. Thanks for pointing that out! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1403064332 PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1403054332 From jbachorik at openjdk.org Thu Nov 23 08:49:37 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Thu, 23 Nov 2023 08:49:37 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v3] In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 01:27:25 GMT, David Holmes wrote: >> I see, holder is the right word and concept. So the parameter means has_method_holder, in that the InstanceKlass has been fully parsed at the point of clearing the jmethodIDs. > > Can't we just check `method->method_holder()` for null rather than passing in a parameter like this? Yes. I changed the code to do that. We will attempt to resolve the method holder for each method instead of just using one single boolean argument but I believe the resolution should be fast enough not to matter in this context. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1403068144 From stefank at openjdk.org Thu Nov 23 08:50:09 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 23 Nov 2023 08:50:09 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 01:28:32 GMT, David Holmes wrote: >> In the rewrites made for: >> [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` >> >> I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. >> >> The provided tests provoke this assert form: >> * the JNI thread detach code >> * thread dumping with locked monitors, and >> * the JVMTI GetOwnedMonitorInfo API. >> >> While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. >> >> The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. >> >> For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. >> >> Test: the written tests with and without the fix. Tier1-Tier3, so far. > > test/hotspot/jtreg/runtime/Monitor/IterateMonitorWithDeadObjectTest.java line 28: > >> 26: * @test IterateMonitorWithDeadObjectTest >> 27: * @summary This locks a monitor, GCs the object, and iterate and perform >> 28: * various iteration and operations over this monitor. > > This doesn't read right with "iterate" and "iteration". Not sure exactly what you were trying to say. I agree. I'll reword that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1403073457 From jbachorik at openjdk.org Thu Nov 23 08:49:39 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Thu, 23 Nov 2023 08:49:39 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v3] In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 01:21:15 GMT, David Holmes wrote: >> Jaroslav Bachorik has updated the pull request incrementally with one additional commit since the last revision: >> >> Rewerite the test to use RedefineClassHelper > > test/hotspot/jtreg/serviceability/jvmti/thread/GetStackTrace/GetStackTraceAndRetransformTest/GetStackTraceAndRetransformTest.java line 2: > >> 1: /* >> 2: * Copyright (c) 2023 Oracle and/or its affiliates. All rights reserved. > > An Oracle copyright is not needed here if you wrote this test from scratch. If it is present then we need a comma after the copyright year please. Fixed > test/hotspot/jtreg/serviceability/jvmti/thread/GetStackTrace/GetStackTraceAndRetransformTest/libGetStackTraceAndRetransformTest.cpp line 2: > >> 1: /* >> 2: * Copyright (c) 2023 Oracle and/or its affiliates. All rights reserved. > > Ditto comment about Oracle copyright. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1403054811 PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1403055037 From sjohanss at openjdk.org Thu Nov 23 08:51:20 2023 From: sjohanss at openjdk.org (Stefan Johansson) Date: Thu, 23 Nov 2023 08:51:20 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v47] In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 23:08:36 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup and address comments I think this looks good now. I've started some additional testing to make sure we don't run into anything unexpected with the newly added test on any platforms. Will approve once the testing is green. src/hotspot/share/runtime/cpuTimeCounters.hpp line 2: > 1: /* > 2: * Copyright (c) 2002, 2023, Oracle and/or its affiliates. All rights reserved. This is probably on me, but this 2002 is wrong: Suggestion: * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. ------------- PR Review: https://git.openjdk.org/jdk/pull/15082#pullrequestreview-1745951725 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1403070166 From xgong at openjdk.org Thu Nov 23 08:52:09 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 23 Nov 2023 08:52:09 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 01:32:00 GMT, Xiaohong Gong wrote: >> Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). >> >> SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. >> >> To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. >> >> Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. >> >> [1] https://github.com/openjdk/jdk/pull/3638 >> [2] https://sleef.org/ >> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ >> [4] https://packages.debian.org/bookworm/libsleef3 >> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html > > Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Add a bundled native lib in jdk as a bridge to libsleef > - Merge 'jdk:master' into JDK-8312425 > - Disable sleef by default > - Merge 'jdk:master' into JDK-8312425 > - 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF Agree that we using the external library as the first step, and then is adding the sleef sources in JDK if needed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1824007585 From amitkumar at openjdk.org Thu Nov 23 08:53:02 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 23 Nov 2023 08:53:02 GMT Subject: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2] In-Reply-To: References: <8-buFPL9W3149qcnluk_XqTQr-cJYqu_XvwU5ovyAIA=.396e5005-f896-48b9-919c-94164229d7bf@github.com> <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> <2X-eefx1BKO-hqncC3LF0It1wUxHRqvrX0qm2TsCbd0=.2626d8a4-d549-4a82-a2de-40003af36539@github.com> Message-ID: On Thu, 23 Nov 2023 08:08:51 GMT, suchismith1993 wrote: > The JBS issue with respect to that has been closed. Need to check if that PR is required. Currently putting it on hold. This response on the issue suggest otherwise: : The JDK does not support dynamically loaded archive files (.a files) and there are no plans to add this support. Closing as will not fix ------------- PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1824009424 From stefank at openjdk.org Thu Nov 23 08:55:11 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 23 Nov 2023 08:55:11 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 01:29:24 GMT, David Holmes wrote: >> In the rewrites made for: >> [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` >> >> I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. >> >> The provided tests provoke this assert form: >> * the JNI thread detach code >> * thread dumping with locked monitors, and >> * the JVMTI GetOwnedMonitorInfo API. >> >> While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. >> >> The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. >> >> For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. >> >> Test: the written tests with and without the fix. Tier1-Tier3, so far. > > test/hotspot/jtreg/runtime/Monitor/IterateMonitorWithDeadObjectTest.java line 29: > >> 27: * @summary This locks a monitor, GCs the object, and iterate and perform >> 28: * various iteration and operations over this monitor. >> 29: * @requires os.family == "linux" > > I know the test this was copied from had this but I'm not sure it is actually a necessary restriction - any Posix platform should work. Though maybe perror is linux only ... I couldn't find any test filtering for posix, but some tests have: @requires os.family != "windows" That seems to work on my Mac. > test/hotspot/jtreg/runtime/Monitor/IterateMonitorWithDeadObjectTest.java line 31: > >> 29: * @requires os.family == "linux" >> 30: * @library /testlibrary /test/lib >> 31: * @build IterateMonitorWithDeadObjectTest > > You don't need an explicit `@build` step Sure. This was just copied brought over from your original reproducer. > test/hotspot/jtreg/runtime/Monitor/IterateMonitorWithDeadObjectTest.java line 66: > >> 64: // dead object. The thread dumping code didn't tolerate such a monitor, >> 65: // so run a thread dump and make sure that it doesn't crash/assert. >> 66: dumpThreadsWithLockedMonitors(); > > But you've already detached the thread so there is no locked monitor any longer. And `runTestAndDetachThread()` also did a thread dump. Right. This is left-overs from my earlier attempt. I'll remove this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1403077397 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1403077988 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1403078851 From xgong at openjdk.org Thu Nov 23 08:57:23 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 23 Nov 2023 08:57:23 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: References: Message-ID: <3dscjw75an6_n3ko_2LjHdpdj08xsyjhfpStuZrS5-M=.f43592fe-036e-47b5-babc-67f126885839@github.com> > Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). > > SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. > > To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. > > Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. > > [1] https://github.com/openjdk/jdk/pull/3638 > [2] https://sleef.org/ > [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ > [4] https://packages.debian.org/bookworm/libsleef3 > [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Address review comments in build system ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16234/files - new: https://git.openjdk.org/jdk/pull/16234/files/b29df846..2c3c4a64 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16234&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16234&range=02-03 Stats: 126 lines in 7 files changed: 72 ins; 22 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/16234.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16234/head:pull/16234 PR: https://git.openjdk.org/jdk/pull/16234 From stefank at openjdk.org Thu Nov 23 08:59:07 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 23 Nov 2023 08:59:07 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 01:38:57 GMT, David Holmes wrote: >> In the rewrites made for: >> [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` >> >> I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. >> >> The provided tests provoke this assert form: >> * the JNI thread detach code >> * thread dumping with locked monitors, and >> * the JVMTI GetOwnedMonitorInfo API. >> >> While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. >> >> The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. >> >> For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. >> >> Test: the written tests with and without the fix. Tier1-Tier3, so far. > > test/hotspot/jtreg/runtime/Monitor/IterateMonitorWithDeadObjectTest.java line 40: > >> 38: public class IterateMonitorWithDeadObjectTest { >> 39: public static native void runTestAndDetachThread(); >> 40: public static native void joinTestThread(); > > I don't think this form of the test needs to separate out the `pthread_join()`, it can just be done in `runTestAndDetachThread` AFAICS. I originally split it out to allow the Java code to do the GC while the native thread was sleeping prior to detaching. All this is left-overs that I thought I had removed. I'm removing this. > test/hotspot/jtreg/runtime/Monitor/IterateMonitorWithDeadObjectTest.java line 57: > >> 55: // - Drop the last reference to the object >> 56: // - GC to clear the weak reference to the object in the monitor >> 57: // - Detach the thread - provoke previous bug > > It also does a thread dump while the lock is held Updated ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1403082283 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1403082528 From stuefe at openjdk.org Thu Nov 23 09:35:30 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 23 Nov 2023 09:35:30 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v9] In-Reply-To: References: Message-ID: > In `Metaspace::reserve_address_space_for_compressed_classes`, we reserve space for the future Klass range. We place the Klass range somewhere that allows us to use "good" narrow Klass decoding later when initializing the encoding scheme. > > Narrow Klass decoding is inherently CPU-specific, so doing this in shared coding is awkward. It leads to many ifdefs, vague code comments that are difficult to explain, and missed optimizations. > > There are common patterns: > - all platforms benefit from unscaled encoding so trying to reserve <4GB for CDS=off is worthwhile. > > But there are more differences than one would think: > - some platforms (s390, riscv) benefit from reservation < 4GB even with CDS=on since a 32-bit immediate requires fewer instructions > - some platforms (aarch64) don't benefit from zero-based encoding, so no need to try that > - some platforms benefit from optimizing the base for 16-bit moves (PPC, s390, aarch64) or for other immediate formats (riscv) > > It would be much better to have this section per CPU so that every CPU can implement its perfect, well documented version. A bit of code duplication is a good price for code clarity. > > ------------- > > This patch splits out `Metaspace::reserve_address_space_for_compressed_classes` into five variants, one per CPU (moving the code to CompressedKlassPointers); it also splits out `CompressedKlassPointers::initialize` into two variants, one for aarch64, one for all other platforms. > > Changes per-CPU: > > #### aarch64: > > Don't attempt to reserve for zero-based encoding; since lsl is not faster than movk. We reserve for movk mode right away if reserve for unscaled fails or if CDS=on. > > We also add a last-ditch attempt to reserve optimized for movk via over-alignment. We only do this on aarch64 to prevent errors like this one JDK-8318119: "Invalid narrow Klass base on aarch64 post 8312018" > > Since we don't want zero-based encoding, we need an aarch64-specific version for `CompressedKlassPointers::initialize()` > > #### riscv: > > We attempt to reserve at a "good" base that has only bits set either in [12..32), [32, 44) or in [44, 64). > > #### s390: > > We attempt to allocate < 4GB unconditionally. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Fix test for riscv ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16743/files - new: https://git.openjdk.org/jdk/pull/16743/files/ecaf743c..edc19d65 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16743&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16743&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16743.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16743/head:pull/16743 PR: https://git.openjdk.org/jdk/pull/16743 From duke at openjdk.org Thu Nov 23 09:51:06 2023 From: duke at openjdk.org (suchismith1993) Date: Thu, 23 Nov 2023 09:51:06 GMT Subject: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2] In-Reply-To: References: <8-buFPL9W3149qcnluk_XqTQr-cJYqu_XvwU5ovyAIA=.396e5005-f896-48b9-919c-94164229d7bf@github.com> <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> Message-ID: On Thu, 23 Nov 2023 06:16:54 GMT, Thomas Stuefe wrote: > If .a is a valid shared object format on AIX, this should be handled in `os::dll_load()`, and be done for all shared objects. If not, why do we try to load a static archive via dlload in this case but not in other cases? In AIX, we have shared objects with .a extension and also static archives with .a extension.I think in other platforms the format for shared objects is fixed? . In that case does this become specific to AIX? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1824086655 From stuefe at openjdk.org Thu Nov 23 09:54:06 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 23 Nov 2023 09:54:06 GMT Subject: RFR: JDK-8320383: refresh libraries cache on AIX in VMError::report [v2] In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 17:46:22 GMT, Matthias Baesken wrote: >> VMError::report outputs information about loaded shared libraries; but this info could be outdated on AIX because the libraries cache is currently not refreshed before printing. >> This is similar to [JDK-8318587](https://bugs.openjdk.org/browse/JDK-8318587) . >> The refresh in VMError::report could be omitted in some situations where the refresh call is considered problematic. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > add prepare_native_symbols You can now replace the code in VMError::get_vm_info with the generic function and remove the AIX specific include. ------------- Changes requested by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16730#pullrequestreview-1746072673 From stefank at openjdk.org Thu Nov 23 10:14:09 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 23 Nov 2023 10:14:09 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 00:48:19 GMT, David Holmes wrote: >> In the rewrites made for: >> [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` >> >> I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. >> >> The provided tests provoke this assert form: >> * the JNI thread detach code >> * thread dumping with locked monitors, and >> * the JVMTI GetOwnedMonitorInfo API. >> >> While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. >> >> The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. >> >> For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. >> >> Test: the written tests with and without the fix. Tier1-Tier3, so far. > >> Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code > > I had not realized that. It explains some confusion in a separate issue I had been looking into! It is important that these monitors are exposed and unlocked at detach time, otherwise it also messes up the `held_monitor_count`. > >> The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure > > I think we may need to make that code tolerate the absence of an object. > >> For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. > > I think we probably should expose this to be accurate, but I think this needs investigation on the JVMTI side to ensure that the null entry is tolerated okay. So a separate RFE to handle this would be fine. > > Thanks While addressing @dholmes-ora's comments about the tests I found that the JNI implementation was incorrect and caused a failure, that seems to prevent the thread detach, and then the Java layer thread dumping caught the thread dumping bug. I'm going to see if I can split the various test cases so that they can be tested individually. Don't look too closely at the current test until it has been updated. > test/hotspot/jtreg/runtime/Monitor/libIterateMonitorWithDeadObjectTest.c line 94: > >> 92: >> 93: // Let the GC clear the weak reference to the object. >> 94: system_gc(env); > > AFAIK there is no guarantee that one call to `System.gc()` will suffice to clear the weakRef. We tend use a loop with a few iterations in other tests, or use a WhiteBox method to achieve it. In my testing I used the finalizer to observe that the objects had been finalized but even then, and with a loop, I did not always see them collected with G1. ZGC will clear the weak reference. G1 clears the weak reference, except for in a few specific situations. I've verified that G1 clears this weak reference, so I don't think it would be worth making this test longer or more complicated. > test/hotspot/jtreg/runtime/Monitor/libIterateMonitorWithDeadObjectTest.c line 104: > >> 102: // source of at least two bugs: >> 103: // - When the object reference in the monitor was made weak, the code >> 104: // didn't unlock the monitor, leaving it lingering in the system. > > Suggestion: > > // - When the object reference in the monitor was cleared, the monitor > // iterator code would skip it, preventing it from being unlocked when > // the owner thread detached, leaving it lingering in the system. > > the original made it sound to me like the code that cleared the reference (i.e. the GC) was expected to do the unlocking. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16783#issuecomment-1824116115 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1403168662 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1403169323 From azafari at openjdk.org Thu Nov 23 10:52:35 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 23 Nov 2023 10:52:35 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v10] In-Reply-To: References: Message-ID: > The `find` method now is > ```C++ > template > int find(T* token, bool f(T*, E)) const { > ... > > Any other functions which use this are also changed. > Local linux-x64-debug hotspot:tier1 passed. Mach5 tier1 build on linux and Windows passed. Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: Some cleanups, documentaions and tests. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15418/files - new: https://git.openjdk.org/jdk/pull/15418/files/9973f5d7..74c4076b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15418&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15418&range=08-09 Stats: 165 lines in 13 files changed: 99 ins; 33 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/15418.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15418/head:pull/15418 PR: https://git.openjdk.org/jdk/pull/15418 From jbachorik at openjdk.org Thu Nov 23 10:58:46 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Thu, 23 Nov 2023 10:58:46 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v4] In-Reply-To: References: Message-ID: > Please, review this fix for a corner case handling of `jmethodID` values. > > The issue is related to the interplay between `jmethodID` values and method redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` instance. Once that method gets redefined, the `jmethodID` is updated to point to the last `Method` version. > Unless the method is still on stack/running, in which case the original `jmethodID` will be redirected to the latest `Method` version and at the same time the 'previous' `Method` version will receive a new `jmethodID` pointing to that previous version. > > If we happen to capture stacktrace via `GetStackTrace` or `GetAllStackTraces` JVMTI calls while this previous `Method` version is still on stack we will have the corresponding frame identified by a `jmethodID` pointing to that version. > However, sooner or later the 'previous' class version becomes eligible for cleanup at what time all contained `Method` instances. The cleanup process will not perform the `jmethodID` pointer maintenance and we will end up with pointers to deallocated memory. > This is caused by the fact that the `jmethodID` lifecycle is bound to `ClassLoaderData` instance and all relevant `jmethodID`s will get batch-updated when the class loader is being released and all its classes are getting unloaded. > > This means that we need to make sure that if a `Method` instance is being deallocate the associated `jmethodID` (if any) must not point to the deallocated instance once we are finished. Unfortunately, we can not just update the `jmethodID` values in bulk when purging an old class version - the per `InstanceKlass` jmethodID cache is present only for the main class version and contains `jmethodID` values for both the old and current method versions. > > Therefore we need to perform `jmethodID` lookup when we are about to deallocate a `Method` instance and clean up the pointer only if that `jmethodID` is pointing to the `Method` instance which is being deallocated. > > _(For anyone interested, a much lengthier writeup is available in [my blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_ Jaroslav Bachorik has updated the pull request incrementally with one additional commit since the last revision: Move assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16662/files - new: https://git.openjdk.org/jdk/pull/16662/files/967813b6..9f61d8e0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16662&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16662&range=02-03 Stats: 4 lines in 1 file changed: 2 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16662.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16662/head:pull/16662 PR: https://git.openjdk.org/jdk/pull/16662 From azafari at openjdk.org Thu Nov 23 11:03:13 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 23 Nov 2023 11:03:13 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v10] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 10:52:35 GMT, Afshin Zafari wrote: >> The `find` method now is >> ```C++ >> template >> int find(T* token, bool f(T*, E)) const { >> ... >> >> Any other functions which use this are also changed. >> Local linux-x64-debug hotspot:tier1 passed. Mach5 tier1 build on linux and Windows passed. > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > Some cleanups, documentaions and tests. > Thanks for making this change. > > I'd like to suggest the following cleanups, some documentation, and a few tests: [20d4502](https://github.com/openjdk/jdk/commit/20d4502471ba396ae395512cfa3dab3f87555421) > > I think it might be easier to review by looking at the final diff: [master...stefank:jdk:pr_15418](https://github.com/openjdk/jdk/compare/master...stefank:jdk:pr_15418) One question: the `private static bool by_name(const char* name, PerfData* pd);` is added to `PerfDataList` class but is never used. Is something missing? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15418#issuecomment-1824226437 From eosterlund at openjdk.org Thu Nov 23 11:21:15 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 23 Nov 2023 11:21:15 GMT Subject: RFR: 8310644: Make panama memory segment close use async handshakes Message-ID: The current logic for closing memory in panama today is susceptible to live lock if we have a closing thread that wants to close the memory in a loop that keeps failing, and a bunch of accessing threads that want to perform accesses as long as the memory is alive. They can both create impediments for the other. By using asynchronous handshakes to install an exception onto threads that are in @Scoped memory accesses, we can have close always succeed, and the accessing threads bail out. The idea is that we perform a synchronous handshake first to find threads that are in scoped methods. They might however be in the middle of throwing an exception or something wild like there, where an exception can't be delivered. We install an async handshake that will roll us forward to the first place where we can indeed install exceptions, then we reevaluate if we still need to do that, or if we have unwound out from the scoped method. If we are still inside of it, we ensure an exception is installed so we don't continue executing bytecodes that might access the memory that we have freed. Tested tier 1-5 as well as running test/jdk/java/foreign/TestHandshake.java hundreds of times, which tests this API pretty well. ------------- Commit messages: - 8310644: Make panama memory segment close use async handshakes Changes: https://git.openjdk.org/jdk/pull/16792/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16792&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310644 Stats: 131 lines in 7 files changed: 59 ins; 25 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/16792.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16792/head:pull/16792 PR: https://git.openjdk.org/jdk/pull/16792 From stefank at openjdk.org Thu Nov 23 11:25:07 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 23 Nov 2023 11:25:07 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 02:07:59 GMT, David Holmes wrote: >> In the rewrites made for: >> [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` >> >> I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. >> >> The provided tests provoke this assert form: >> * the JNI thread detach code >> * thread dumping with locked monitors, and >> * the JVMTI GetOwnedMonitorInfo API. >> >> While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. >> >> The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. >> >> For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. >> >> Test: the written tests with and without the fix. Tier1-Tier3, so far. > > test/hotspot/jtreg/serviceability/jvmti/GetOwnedMonitorInfo/GetOwnedMonitorInfoTest.java line 59: > >> 57: // GetOwnedMonitorInfo testing. >> 58: Object obj = new Object() { public String toString() {return "";} }; >> 59: jniMonitorEnter(obj); > > I would add a check for `Thread.holdsLock(obj);` after this just to be sure it worked. Done > test/hotspot/jtreg/serviceability/jvmti/GetOwnedMonitorInfo/GetOwnedMonitorInfoTest.java line 61: > >> 59: jniMonitorEnter(obj); >> 60: obj = null; >> 61: System.gc(); > > Again one gc() is generally not sufficient. > > How can this test tell that the object in the monitor was actually cleared? I think `monitorinflation` logging may be the only way to tell. Yes, probably. I've been looking at the `monitorinflation` logging to very that it gets cleared. I think it would be messy to try to get this test to also start to parse logs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1403245976 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1403244666 From stefank at openjdk.org Thu Nov 23 11:44:57 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 23 Nov 2023 11:44:57 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v2] In-Reply-To: References: Message-ID: > In the rewrites made for: > [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` > > I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. > > The provided tests provoke this assert form: > * the JNI thread detach code > * thread dumping with locked monitors, and > * the JVMTI GetOwnedMonitorInfo API. > > While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. > > The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. > > For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. > > Test: the written tests with and without the fix. Tier1-Tier3, so far. Stefan Karlsson has updated the pull request incrementally with two additional commits since the last revision: - Tweaked comment in test - Rewrite tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16783/files - new: https://git.openjdk.org/jdk/pull/16783/files/b1dd4cf8..4b0976a8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16783&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16783&range=00-01 Stats: 605 lines in 9 files changed: 388 ins; 214 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16783.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16783/head:pull/16783 PR: https://git.openjdk.org/jdk/pull/16783 From stefank at openjdk.org Thu Nov 23 11:45:00 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 23 Nov 2023 11:45:00 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v2] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 02:10:41 GMT, David Holmes wrote: >> Stefan Karlsson has updated the pull request incrementally with two additional commits since the last revision: >> >> - Tweaked comment in test >> - Rewrite tests > > test/hotspot/jtreg/runtime/Monitor/libIterateMonitorWithDeadObjectTest.c line 43: > >> 41: static jobject create_object(JNIEnv* env) { >> 42: jclass clazz = (*env)->FindClass(env, "java/lang/Object"); >> 43: if (clazz == 0) die("No class"); > > The `die` method is for errors with system calls. It won't show useful information for JNI calls that leave exceptions pending. I've added better checks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1403263549 From stefank at openjdk.org Thu Nov 23 11:52:38 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 23 Nov 2023 11:52:38 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v3] In-Reply-To: References: Message-ID: > In the rewrites made for: > [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` > > I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. > > The provided tests provoke this assert form: > * the JNI thread detach code > * thread dumping with locked monitors, and > * the JVMTI GetOwnedMonitorInfo API. > > While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. > > The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. > > For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. > > Test: the written tests with and without the fix. Tier1-Tier3, so far. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Tweak test comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16783/files - new: https://git.openjdk.org/jdk/pull/16783/files/4b0976a8..3239b822 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16783&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16783&range=01-02 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/16783.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16783/head:pull/16783 PR: https://git.openjdk.org/jdk/pull/16783 From stefank at openjdk.org Thu Nov 23 11:52:38 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 23 Nov 2023 11:52:38 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object In-Reply-To: References: Message-ID: <56AIMDRiXJZpaiHjEYPlnT1t7jlf4gKJk7UxmtWmpo8=.a9a08009-2048-4354-80c2-9cb0c8746369@github.com> On Thu, 23 Nov 2023 00:48:19 GMT, David Holmes wrote: > > Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code > > I had not realized that. It explains some confusion in a separate issue I had been looking into! It is important that these monitors are exposed and unlocked at detach time, otherwise it also messes up the `held_monitor_count`. > I think I managed to reproduce that while reworking the tests and temporarily reintroducing the bug. > > The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure > > I think we may need to make that code tolerate the absence of an object. IDK. That's another thing that would be good to discuss in a separate RFE. > > > For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. > > I think we probably should expose this to be accurate, but I think this needs investigation on the JVMTI side to ensure that the null entry is tolerated okay. So a separate RFE to handle this would be fine. This will lead to NPE in the management code, but it even if we fixed that, it could potentially causing NPEs in user code. So, yes, I wouldn't mind if someone wanted to investigate this as a separate RFE. > > Thanks ------------- PR Comment: https://git.openjdk.org/jdk/pull/16783#issuecomment-1824286615 From stefank at openjdk.org Thu Nov 23 11:56:19 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 23 Nov 2023 11:56:19 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v10] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 11:00:32 GMT, Afshin Zafari wrote: > > Thanks for making this change. > > I'd like to suggest the following cleanups, some documentation, and a few tests: [20d4502](https://github.com/openjdk/jdk/commit/20d4502471ba396ae395512cfa3dab3f87555421) > > I think it might be easier to review by looking at the final diff: [master...stefank:jdk:pr_15418](https://github.com/openjdk/jdk/compare/master...stefank:jdk:pr_15418) > > One question: the `private static bool by_name(const char* name, PerfData* pd);` is added to `PerfDataList` class but is never used. Is something missing? That should have been removed when I added name_equals. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15418#issuecomment-1824302304 From dholmes at openjdk.org Thu Nov 23 12:09:27 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 23 Nov 2023 12:09:27 GMT Subject: RFR: 8318776: Require supports_cx8 to always be true [v7] In-Reply-To: References: Message-ID: > As discussed in JBS all platforms (some tweaks to Zero are in progress) actually do support `cx8` i.e. 64-bit compare-and-exchange, so we can strip out the locked-based alternatives to using it and just add a guarantee that it is true at runtime. And all platforms except some ARM variants set `SUPPORTS_NATIVE_CX8`, so we can greatly simplify things. Summary of changes: > - `_supports_cx8` field is only needed when `SUPPORTS_NATIVE_CX8` is not defined > - Assertions for `supports_cx8()` are removed > - Compiler predicates requiring `supports_cx8()` are removed > - Access backend is greatly simplified without the need for lock-based alternative > - `java.util.concurrent.AtomicLongFieldUpdater` is simplified without the need for a lock-based alternative > > I did consider moving all the ARM `kuser_helper` related code to be only defined when `SUPPORTS_NATIVE_CX8` is not defined, but there was a theoretical risk this could change the behaviour if ARMv7 binaries were run on other ARM CPU's. I added a note to that effect in vm_version_linux_arm32.cpp so the ARM port maintainers could clean this up further if desired. > > Testing: > - All Oracle tiers 1-5 builds (which includes an ARMv7 build) > - GHA builds/tests > - Oracle tiers 1-3 sanity testing > > Zero changes coming in via [JDK-8319777](https://bugs.openjdk.org/browse/JDK-8319777) will be merged when they arrive. > > Thanks. David Holmes has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - Merge - Fix typo - Merge with master and update Zero code accordingly - Merge branch 'master' into 8318776-supports_cx8 - Remove unnecessary includes of vm_version.hpp. Fix copyright years. - Remove cx8 comment as no longer relevant (the spinlock is used regardless of cx8) - Remove suports_cx8() checks from gtest - Remove test for VMSupportsCX8 - 8318776: Require supports_cx8 to always be true ------------- Changes: https://git.openjdk.org/jdk/pull/16625/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16625&range=06 Stats: 460 lines in 39 files changed: 16 ins; 429 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/16625.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16625/head:pull/16625 PR: https://git.openjdk.org/jdk/pull/16625 From aboldtch at openjdk.org Thu Nov 23 12:44:18 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 23 Nov 2023 12:44:18 GMT Subject: RFR: 8319700: [AArch64] C2 compilation fails with "Field too big for insn" In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 10:44:12 GMT, Axel Boldt-Christmas wrote: > Not all ZGC C2 BarrierStubs used on aarch64 participates in the laying out of trampoline stubs. (Used enable as many `tbX` instructions as possible.) This leads to to incorrect calculations which may cause the target offset for the `tbX` branch to become to large. > > This fix changes all the BarriesStubs to stubs which participates in the trampoline logic. > > Until more platforms requires specialised barrier stub layouts it is not worth adding better support for this pattern. Without a redesign it does make it harder to ensure that this is used correctly. For now the shared code asserts when building for aarch64 that the general shared stubs are not used directly. But care would still have to be taken if any new barrier stubs are introduced. > > The behaviour was more easily reproducible when large inlining heuristics. This flag combination was used to get somewhat reliable reproducibility `-esa -ea -XX:MaxInlineLevel=300 -XX:MaxInlineSize=1100 -XX:MaxTrivialSize=1000 -XX:LiveNodeCountInliningCutoff=1000000 -XX:MaxNodeLimit=3000000 -XX:NodeLimitFudgeFactor=600000 -XX:+UnlockExperimentalVMOptions -XX:+UseVectorStubs` > > There was also an observation inside the JBS comments that there where no `tbX` instructions branching to the emitted trampolines. However I was unable to reproduce this. Ran all tests with the following guarantee, this could not observe it either. > > > diff --git a/src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp b/src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp > index ebaf1829972..b6c40163a6b 100644 > --- a/src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp > +++ b/src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp > @@ -36,6 +36,7 @@ > #include "runtime/icache.hpp" > #include "runtime/jniHandles.hpp" > #include "runtime/sharedRuntime.hpp" > +#include "utilities/debug.hpp" > #include "utilities/macros.hpp" > #ifdef COMPILER1 > #include "c1/c1_LIRAssembler.hpp" > @@ -1358,6 +1359,7 @@ void ZLoadBarrierStubC2Aarch64::emit_code(MacroAssembler& masm) { > // Current assumption is that the barrier stubs are the first stubs emitted after the actual code > assert(stubs_start_offset() <= output->buffer_sizing_data()->_code, "stubs are assumed to be emitted directly after code and code_size is a hard limit on where it can start"); > > + guarantee(!_test_and_branch_reachable_entry.is_unused(), "Should be used"); > __ bind(_test_and_branch_reachable_entry); > > // Next branch's offset is unknown, but is > branch_offset > > > - T... It would be nice if the branch shortening for these stubs could be done as some fixpoint iteration. Or take part when laying out the blocks, as maybe even more branches could be shortened if all the BarrierStubs were not at the very end. Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16780#issuecomment-1824365630 From aboldtch at openjdk.org Thu Nov 23 12:44:19 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 23 Nov 2023 12:44:19 GMT Subject: Integrated: 8319700: [AArch64] C2 compilation fails with "Field too big for insn" In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 10:44:12 GMT, Axel Boldt-Christmas wrote: > Not all ZGC C2 BarrierStubs used on aarch64 participates in the laying out of trampoline stubs. (Used enable as many `tbX` instructions as possible.) This leads to to incorrect calculations which may cause the target offset for the `tbX` branch to become to large. > > This fix changes all the BarriesStubs to stubs which participates in the trampoline logic. > > Until more platforms requires specialised barrier stub layouts it is not worth adding better support for this pattern. Without a redesign it does make it harder to ensure that this is used correctly. For now the shared code asserts when building for aarch64 that the general shared stubs are not used directly. But care would still have to be taken if any new barrier stubs are introduced. > > The behaviour was more easily reproducible when large inlining heuristics. This flag combination was used to get somewhat reliable reproducibility `-esa -ea -XX:MaxInlineLevel=300 -XX:MaxInlineSize=1100 -XX:MaxTrivialSize=1000 -XX:LiveNodeCountInliningCutoff=1000000 -XX:MaxNodeLimit=3000000 -XX:NodeLimitFudgeFactor=600000 -XX:+UnlockExperimentalVMOptions -XX:+UseVectorStubs` > > There was also an observation inside the JBS comments that there where no `tbX` instructions branching to the emitted trampolines. However I was unable to reproduce this. Ran all tests with the following guarantee, this could not observe it either. > > > diff --git a/src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp b/src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp > index ebaf1829972..b6c40163a6b 100644 > --- a/src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp > +++ b/src/hotspot/cpu/aarch64/gc/z/zBarrierSetAssembler_aarch64.cpp > @@ -36,6 +36,7 @@ > #include "runtime/icache.hpp" > #include "runtime/jniHandles.hpp" > #include "runtime/sharedRuntime.hpp" > +#include "utilities/debug.hpp" > #include "utilities/macros.hpp" > #ifdef COMPILER1 > #include "c1/c1_LIRAssembler.hpp" > @@ -1358,6 +1359,7 @@ void ZLoadBarrierStubC2Aarch64::emit_code(MacroAssembler& masm) { > // Current assumption is that the barrier stubs are the first stubs emitted after the actual code > assert(stubs_start_offset() <= output->buffer_sizing_data()->_code, "stubs are assumed to be emitted directly after code and code_size is a hard limit on where it can start"); > > + guarantee(!_test_and_branch_reachable_entry.is_unused(), "Should be used"); > __ bind(_test_and_branch_reachable_entry); > > // Next branch's offset is unknown, but is > branch_offset > > > - T... This pull request has now been integrated. Changeset: 3787ff8d Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/3787ff8d1d8dbcaaebb9616c5bc543e2fe21a90c Stats: 17 lines in 5 files changed: 15 ins; 1 del; 1 mod 8319700: [AArch64] C2 compilation fails with "Field too big for insn" Reviewed-by: aph, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/16780 From jvernee at openjdk.org Thu Nov 23 12:46:12 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 23 Nov 2023 12:46:12 GMT Subject: RFR: 8310644: Make panama memory segment close use async handshakes In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 11:14:29 GMT, Erik ?sterlund wrote: > The current logic for closing memory in panama today is susceptible to live lock if we have a closing thread that wants to close the memory in a loop that keeps failing, and a bunch of accessing threads that want to perform accesses as long as the memory is alive. They can both create impediments for the other. > > By using asynchronous handshakes to install an exception onto threads that are in @Scoped memory accesses, we can have close always succeed, and the accessing threads bail out. The idea is that we perform a synchronous handshake first to find threads that are in scoped methods. They might however be in the middle of throwing an exception or something wild like there, where an exception can't be delivered. We install an async handshake that will roll us forward to the first place where we can indeed install exceptions, then we reevaluate if we still need to do that, or if we have unwound out from the scoped method. If we are still inside of it, we ensure an exception is installed so we don't continue executing bytecodes that might access the memory that we have freed. > > Tested tier 1-5 as well as running test/jdk/java/foreign/TestHandshake.java hundreds of times, which tests this API pretty well. src/hotspot/share/prims/scopedMemoryAccess.cpp line 151: > 149: ResourceMark rm; > 150: if (_session != nullptr && last_frame.is_compiled_frame() && last_frame.can_be_deoptimized()) { > 151: CloseScopedMemoryFindOopClosure cl(_session); Pre-existing, but this value (and class) is unused since we do an unconditional deopt. If you feel like it, you could remove the `CloseScopedMemoryFindOopClosure`. We can get it back from the git history later when that bug is fixed (https://bugs.openjdk.org/browse/JDK-8290892) src/java.base/share/classes/jdk/internal/foreign/SharedSession.java line 86: > 84: throw alreadyAcquired(prevState); > 85: } > 86: SCOPED_MEMORY_ACCESS.closeScope(this); ? src/java.base/share/classes/jdk/internal/misc/X-ScopedMemoryAccess.java.template line 87: > 85: > 86: public void closeScope(MemorySessionImpl session) { > 87: closeScope0(session, MemorySessionImpl.ALREADY_CLOSED); I suggest passing in the `ALREADY_CLOSED` instance as an argument to this method instead. Then we can avoid making the field in `MemorySessionImpl` public. test/jdk/java/foreign/TestHandshake.java line 107: > 105: if (!failed.get()) { > 106: // ignore - this means segment was alive, but was closed while we were accessing it > 107: // next isAlive test should fail If we see the exception, we should be able to test that the scope is not alive here as well Suggestion: // next isAlive test should fail assertFalse(segment.scope().isAlive()); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16792#discussion_r1403266514 PR Review Comment: https://git.openjdk.org/jdk/pull/16792#discussion_r1403255404 PR Review Comment: https://git.openjdk.org/jdk/pull/16792#discussion_r1403257610 PR Review Comment: https://git.openjdk.org/jdk/pull/16792#discussion_r1403330411 From jbachorik at openjdk.org Thu Nov 23 13:02:39 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Thu, 23 Nov 2023 13:02:39 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v5] In-Reply-To: References: Message-ID: > Please, review this fix for a corner case handling of `jmethodID` values. > > The issue is related to the interplay between `jmethodID` values and method redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` instance. Once that method gets redefined, the `jmethodID` is updated to point to the last `Method` version. > Unless the method is still on stack/running, in which case the original `jmethodID` will be redirected to the latest `Method` version and at the same time the 'previous' `Method` version will receive a new `jmethodID` pointing to that previous version. > > If we happen to capture stacktrace via `GetStackTrace` or `GetAllStackTraces` JVMTI calls while this previous `Method` version is still on stack we will have the corresponding frame identified by a `jmethodID` pointing to that version. > However, sooner or later the 'previous' class version becomes eligible for cleanup at what time all contained `Method` instances. The cleanup process will not perform the `jmethodID` pointer maintenance and we will end up with pointers to deallocated memory. > This is caused by the fact that the `jmethodID` lifecycle is bound to `ClassLoaderData` instance and all relevant `jmethodID`s will get batch-updated when the class loader is being released and all its classes are getting unloaded. > > This means that we need to make sure that if a `Method` instance is being deallocate the associated `jmethodID` (if any) must not point to the deallocated instance once we are finished. Unfortunately, we can not just update the `jmethodID` values in bulk when purging an old class version - the per `InstanceKlass` jmethodID cache is present only for the main class version and contains `jmethodID` values for both the old and current method versions. > > Therefore we need to perform `jmethodID` lookup when we are about to deallocate a `Method` instance and clean up the pointer only if that `jmethodID` is pointing to the `Method` instance which is being deallocated. > > _(For anyone interested, a much lengthier writeup is available in [my blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_ Jaroslav Bachorik has updated the pull request incrementally with one additional commit since the last revision: Add exhaustive check for method holder availability ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16662/files - new: https://git.openjdk.org/jdk/pull/16662/files/9f61d8e0..00f644a0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16662&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16662&range=03-04 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16662.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16662/head:pull/16662 PR: https://git.openjdk.org/jdk/pull/16662 From azafari at openjdk.org Thu Nov 23 13:35:28 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 23 Nov 2023 13:35:28 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v11] In-Reply-To: References: Message-ID: > The `find` method now is > ```C++ > template > int find(T* token, bool f(T*, E)) const { > ... > > Any other functions which use this are also changed. > Local linux-x64-debug hotspot:tier1 passed. Mach5 tier1 build on linux and Windows passed. Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: by_name method is removed from PerfDataList. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15418/files - new: https://git.openjdk.org/jdk/pull/15418/files/74c4076b..8d5f54ab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15418&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15418&range=09-10 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15418.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15418/head:pull/15418 PR: https://git.openjdk.org/jdk/pull/15418 From azafari at openjdk.org Thu Nov 23 13:35:29 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 23 Nov 2023 13:35:29 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v10] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 11:53:28 GMT, Stefan Karlsson wrote: >>> Thanks for making this change. >>> >>> I'd like to suggest the following cleanups, some documentation, and a few tests: [20d4502](https://github.com/openjdk/jdk/commit/20d4502471ba396ae395512cfa3dab3f87555421) >>> >>> I think it might be easier to review by looking at the final diff: [master...stefank:jdk:pr_15418](https://github.com/openjdk/jdk/compare/master...stefank:jdk:pr_15418) >> >> One question: the `private static bool by_name(const char* name, PerfData* pd);` is added to `PerfDataList` class but is never used. Is something missing? > >> > Thanks for making this change. >> > I'd like to suggest the following cleanups, some documentation, and a few tests: [20d4502](https://github.com/openjdk/jdk/commit/20d4502471ba396ae395512cfa3dab3f87555421) >> > I think it might be easier to review by looking at the final diff: [master...stefank:jdk:pr_15418](https://github.com/openjdk/jdk/compare/master...stefank:jdk:pr_15418) >> >> One question: the `private static bool by_name(const char* name, PerfData* pd);` is added to `PerfDataList` class but is never used. Is something missing? > > That should have been removed when I added name_equals. @stefank, the changes are applied as suggested and ready for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15418#issuecomment-1824446141 From jbachorik at openjdk.org Thu Nov 23 13:37:41 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Thu, 23 Nov 2023 13:37:41 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v6] In-Reply-To: References: Message-ID: > Please, review this fix for a corner case handling of `jmethodID` values. > > The issue is related to the interplay between `jmethodID` values and method redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` instance. Once that method gets redefined, the `jmethodID` is updated to point to the last `Method` version. > Unless the method is still on stack/running, in which case the original `jmethodID` will be redirected to the latest `Method` version and at the same time the 'previous' `Method` version will receive a new `jmethodID` pointing to that previous version. > > If we happen to capture stacktrace via `GetStackTrace` or `GetAllStackTraces` JVMTI calls while this previous `Method` version is still on stack we will have the corresponding frame identified by a `jmethodID` pointing to that version. > However, sooner or later the 'previous' class version becomes eligible for cleanup at what time all contained `Method` instances. The cleanup process will not perform the `jmethodID` pointer maintenance and we will end up with pointers to deallocated memory. > This is caused by the fact that the `jmethodID` lifecycle is bound to `ClassLoaderData` instance and all relevant `jmethodID`s will get batch-updated when the class loader is being released and all its classes are getting unloaded. > > This means that we need to make sure that if a `Method` instance is being deallocate the associated `jmethodID` (if any) must not point to the deallocated instance once we are finished. Unfortunately, we can not just update the `jmethodID` values in bulk when purging an old class version - the per `InstanceKlass` jmethodID cache is present only for the main class version and contains `jmethodID` values for both the old and current method versions. > > Therefore we need to perform `jmethodID` lookup when we are about to deallocate a `Method` instance and clean up the pointer only if that `jmethodID` is pointing to the `Method` instance which is being deallocated. > > _(For anyone interested, a much lengthier writeup is available in [my blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_ Jaroslav Bachorik has updated the pull request incrementally with one additional commit since the last revision: Add exhaustive check for method holder availability (1) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16662/files - new: https://git.openjdk.org/jdk/pull/16662/files/00f644a0..eaf2720e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16662&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16662&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16662.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16662/head:pull/16662 PR: https://git.openjdk.org/jdk/pull/16662 From ayang at openjdk.org Thu Nov 23 13:52:28 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 23 Nov 2023 13:52:28 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v47] In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 23:08:36 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup and address comments src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 2433: > 2431: } > 2432: WorkerThreads* worker_threads = workers(); > 2433: if (worker_threads != nullptr) { When will this be null? src/hotspot/share/runtime/cpuTimeCounters.cpp line 61: > 59: return true; > 60: case CPUTimeType::gc_service: > 61: return true; I think it would look cleaner if these are grouped into a single return. src/hotspot/share/runtime/cpuTimeCounters.cpp line 96: > 94: if (UsePerfData) { > 95: EXCEPTION_MARK; > 96: if (os::is_thread_cpu_time_supported()) { Why is this check inside the scope of `EXCEPTION_MARK`? (I'd expect sth like ` if (use-perf-data && is-cpu-time-supported)` at the top.) src/hotspot/share/runtime/cpuTimeCounters.cpp line 119: > 117: if (CPUTimeGroups::is_gc_counter(_name)) { > 118: instance->inc_gc_total_cpu_time(net_cpu_time); > 119: } I feel much of this is on the wrong abstraction level; `CPUTimeCounters::update_counter(_name, _total);` should be sufficient here. (The caller handles diff calculation and inc gc-counter if needed.) src/hotspot/share/runtime/cpuTimeCounters.cpp line 126: > 124: // pthread_getcpuclockid() and clock_gettime() must return 0. Thus caller > 125: // must ensure the thread exists and has not terminated. > 126: assert(os::is_thread_cpu_time_supported(), "os must support cpu time"); Could this assert be moved to the constructor? src/hotspot/share/runtime/cpuTimeCounters.hpp line 74: > 72: assert(_instance != nullptr, "no instance found"); > 73: return _instance; > 74: } Seems that this is needed solely for accessing the following instance methods. I wonder if it's possible to expose only static methods in the public API. src/hotspot/share/runtime/cpuTimeCounters.hpp line 83: > 81: // Prevent copy of singleton object. > 82: CPUTimeCounters(const CPUTimeCounters& copy) = delete; > 83: void operator=(const CPUTimeCounters& copy) = delete; I think `NONCOPYABLE` can be used here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1403333939 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1403220339 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1403219626 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1403411402 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1403333335 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1403413301 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1403225116 From ihse at openjdk.org Thu Nov 23 13:54:09 2023 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 23 Nov 2023 13:54:09 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 08:40:57 GMT, Xiaohong Gong wrote: >> Thanks for the advice! I will take a consideration for it. > >> Thirdly, I do not like at all how you just come crashing in setting -march like that. The -march flag is handled by FLAGS_SETUP_ABI_PROFILE. > > `-march=armv8-a+sve` is just used in this new added module, which may not expect to be used for other libraries. Per my understanding, flags handled by `FLAGS_SETUP_ABI_PROFILE` is not used for a specified module? > >> Actually, now that I think of it, this is just completely wrong! You are checking on features on the build machine, to determine what target machine code to generate, with no way to override. > > Yes, that's be a risk, although the usage to the SVE functions are controlled by SVE feature as well in runtime. I need time to find a better solution. It does not matter if you set the -march on just part of the build. Actually, there is no point in doing so. Either the JDK is run on a machine with the matching architecture, or it isn't. I don't know the details of what the aarch64 SVE feature means, but unless this is a special instance, any attempt to execute the compiled code on a machine that does not support that architecture will fail. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1403416310 From stefank at openjdk.org Thu Nov 23 13:55:13 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 23 Nov 2023 13:55:13 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v10] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 13:32:23 GMT, Afshin Zafari wrote: >>> > Thanks for making this change. >>> > I'd like to suggest the following cleanups, some documentation, and a few tests: [20d4502](https://github.com/openjdk/jdk/commit/20d4502471ba396ae395512cfa3dab3f87555421) >>> > I think it might be easier to review by looking at the final diff: [master...stefank:jdk:pr_15418](https://github.com/openjdk/jdk/compare/master...stefank:jdk:pr_15418) >>> >>> One question: the `private static bool by_name(const char* name, PerfData* pd);` is added to `PerfDataList` class but is never used. Is something missing? >> >> That should have been removed when I added name_equals. > > @stefank, the changes are applied as suggested and ready for review. @afshin-zafari you toke my changes and then you made a few small whitespace changes that messed up the patch. I merged our two branches to correct those issues. Could you make sure to pull the changes from my branch (don't copy the changes): https://github.com/stefank/jdk/tree/pr_15418 ------------- PR Comment: https://git.openjdk.org/jdk/pull/15418#issuecomment-1824474455 From ihse at openjdk.org Thu Nov 23 14:00:11 2023 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 23 Nov 2023 14:00:11 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: <3dscjw75an6_n3ko_2LjHdpdj08xsyjhfpStuZrS5-M=.f43592fe-036e-47b5-babc-67f126885839@github.com> References: <3dscjw75an6_n3ko_2LjHdpdj08xsyjhfpStuZrS5-M=.f43592fe-036e-47b5-babc-67f126885839@github.com> Message-ID: On Thu, 23 Nov 2023 08:57:23 GMT, Xiaohong Gong wrote: >> Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). >> >> SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. >> >> To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. >> >> Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. >> >> [1] https://github.com/openjdk/jdk/pull/3638 >> [2] https://sleef.org/ >> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ >> [4] https://packages.debian.org/bookworm/libsleef3 >> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments in build system make/autoconf/spec.gmk.in line 894: > 892: ENABLE_LIBSLEEF:=@ENABLE_LIBSLEEF@ > 893: LIBVMATH_CFLAGS:=@LIBVMATH_CFLAGS@ > 894: LIBVMATH_LIBS:=@LIBVMATH_LIBS@ It's getting better, but you still need to handle the naming here. It should be SLEEF_LIBS and SLEEF_CFLAGS, named after the library you import -- not the library you build. What if some other lib wants to use libsleef? And you need to untangle the SVE flags (whatever we end doing with that) from LIBVMATH_CFLAGS. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1403423892 From stefank at openjdk.org Thu Nov 23 14:09:07 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 23 Nov 2023 14:09:07 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v3] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 11:52:38 GMT, Stefan Karlsson wrote: >> In the rewrites made for: >> [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` >> >> I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. >> >> The provided tests provoke this assert form: >> * the JNI thread detach code >> * thread dumping with locked monitors, and >> * the JVMTI GetOwnedMonitorInfo API. >> >> While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. >> >> The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. >> >> For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. >> >> Test: the written tests with and without the fix. Tier1-Tier3, so far. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Tweak test comments GHA complains that the lib gets loaded into multiple class loaders. I need to figure out how to share code between the tests without so that I don't have to duplicate the code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16783#issuecomment-1824494844 From ihse at openjdk.org Thu Nov 23 14:14:17 2023 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 23 Nov 2023 14:14:17 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: <3dscjw75an6_n3ko_2LjHdpdj08xsyjhfpStuZrS5-M=.f43592fe-036e-47b5-babc-67f126885839@github.com> References: <3dscjw75an6_n3ko_2LjHdpdj08xsyjhfpStuZrS5-M=.f43592fe-036e-47b5-babc-67f126885839@github.com> Message-ID: On Thu, 23 Nov 2023 08:57:23 GMT, Xiaohong Gong wrote: >> Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). >> >> SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. >> >> To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. >> >> Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. >> >> [1] https://github.com/openjdk/jdk/pull/3638 >> [2] https://sleef.org/ >> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ >> [4] https://packages.debian.org/bookworm/libsleef3 >> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments in build system make/autoconf/lib-vmath.m4 line 58: > 56: test -e ${with_libsleef}/include/sleef.h; then > 57: LIBSLEEF_FOUND=yes > 58: LIBVMATH_LIBS="-L${with_libsleef}/lib" Suggestion: LIBVMATH_LIBS="-L${with_libsleef}/lib -lsleef" make/autoconf/lib-vmath.m4 line 70: > 68: if test "x$SYSROOT" = "x" && > 69: test "x${LIBSLEEF_FOUND}" = "xno"; then > 70: PKG_CHECK_MODULES([LIBSLEEF], [sleef], [LIBSLEEF_FOUND=yes], [LIBSLEEF_FOUND=no]) Suggestion: PKG_CHECK_MODULES([SLEEF], [sleef], [LIBSLEEF_FOUND=yes], [LIBSLEEF_FOUND=no]) Otherwise `PKG_CHECK_MODULES` will set the variables LIBSLEEF_CFLAGS and LIBSLEEF_LIBS. make/autoconf/lib-vmath.m4 line 74: > 72: if test "x${LIBSLEEF_FOUND}" = "xno"; then > 73: AC_CHECK_HEADERS([sleef.h], > 74: [LIBSLEEF_FOUND=yes], Suggestion: [ LIBSLEEF_FOUND=yes SLEEF_LIBS=-lsleef ], make/autoconf/lib-vmath.m4 line 89: > 87: if test "x${LIBSLEEF_FOUND}" = "xyes"; then > 88: ENABLE_LIBSLEEF=true > 89: LIBVMATH_LIBS="${LIBVMATH_LIBS} -lsleef" Remove this line. It would just add `-lsleef` twice if you go via `PKG_CHECK_MODULES`. You need to set -lsleef at the correct places. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1403436252 PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1403433087 PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1403438730 PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1403437474 From ihse at openjdk.org Thu Nov 23 14:14:18 2023 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 23 Nov 2023 14:14:18 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: <_CHm262chkVi3EMvai4A5T-dal0pdCySL8aF0kXj_uU=.9d49baad-9de9-45e0-915b-9525feb8d610@github.com> References: <_CHm262chkVi3EMvai4A5T-dal0pdCySL8aF0kXj_uU=.9d49baad-9de9-45e0-915b-9525feb8d610@github.com> Message-ID: <-XS17AVgOkuO6_JUId8P-XZxRlnfWXF0wz60w5B58L8=.e51cb13b-ec84-4943-a6b7-b09b4e8943d4@github.com> On Thu, 23 Nov 2023 01:41:46 GMT, Xiaohong Gong wrote: >> As I said above, you should not mix the two together. Keep the library handling for libsleef. Move the march setting to where it belongs. And rename the files, functions and variables after this. > > OK, I see. It makes sense that the suffix name should be choosed mainly based on the real module name that is searched/checked in configure. This still needs fixing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1403428592 From stuefe at openjdk.org Thu Nov 23 14:19:08 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 23 Nov 2023 14:19:08 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v2] In-Reply-To: <4m-e190SIpRd0-fkjZ4ja7mhVaQQsTwfqmWd8_82Els=.7b198822-3637-4109-b1e9-2451f882ce4b@github.com> References: <4m-e190SIpRd0-fkjZ4ja7mhVaQQsTwfqmWd8_82Els=.7b198822-3637-4109-b1e9-2451f882ce4b@github.com> Message-ID: <8-swJyF28KZeFLyqa2awML3umu2vog2Du9zVlJZfRDI=.5c435f8e-78af-47fa-a020-6bfce16b6c65@github.com> On Thu, 23 Nov 2023 08:44:08 GMT, Jaroslav Bachorik wrote: >> Jaroslav Bachorik has updated the pull request incrementally with three additional commits since the last revision: >> >> - Clean up imports >> - Simplify Method::clear_jmethod_id() >> - Use correct copyrights > > @dholmes-ora >> Can't we just check method->method_holder() for null rather than passing in a parameter like this? > > I have removed the argument and I am now performing the check for `method_holder() != nullptr` as recommended. The code is a bit simpler and the cost of resolving the method holder for each method is probably quite low so we should be ok here. @jbachorik You are aware that this fix only works for some uncommon corner cases, right? It only works if the Method is explicitly deallocated. The vast bulk of Method aren't. Method, as a Metaspace object, is released in bulk when the class is unloaded. The `::deallocate` path you fixed up - that eventually ends up in `MetaspaceArena::deallocate()` - is a rare case that only gets invoked if - a class cannot be loaded but parts of it have already been loaded into Metaspace. - a class gets transformed In case the class gets unloaded via conventional means, your fix won't get invoked (nor should it; releasing in bulk without having to care for individual allocations is the point of Metaspace). ------------- PR Comment: https://git.openjdk.org/jdk/pull/16662#issuecomment-1824509686 From ihse at openjdk.org Thu Nov 23 14:20:11 2023 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 23 Nov 2023 14:20:11 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v3] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 11:52:38 GMT, Stefan Karlsson wrote: >> In the rewrites made for: >> [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` >> >> I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. >> >> The provided tests provoke this assert form: >> * the JNI thread detach code >> * thread dumping with locked monitors, and >> * the JVMTI GetOwnedMonitorInfo API. >> >> While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. >> >> The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. >> >> For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. >> >> Test: the written tests with and without the fix. Tier1-Tier3, so far. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Tweak test comments Build changes are trivially fine. ------------- Marked as reviewed by ihse (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16783#pullrequestreview-1746557040 From rehn at openjdk.org Thu Nov 23 14:24:10 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 23 Nov 2023 14:24:10 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v9] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 09:35:30 GMT, Thomas Stuefe wrote: >> In `Metaspace::reserve_address_space_for_compressed_classes`, we reserve space for the future Klass range. We place the Klass range somewhere that allows us to use "good" narrow Klass decoding later when initializing the encoding scheme. >> >> Narrow Klass decoding is inherently CPU-specific, so doing this in shared coding is awkward. It leads to many ifdefs, vague code comments that are difficult to explain, and missed optimizations. >> >> There are common patterns: >> - all platforms benefit from unscaled encoding so trying to reserve <4GB for CDS=off is worthwhile. >> >> But there are more differences than one would think: >> - some platforms (s390, riscv) benefit from reservation < 4GB even with CDS=on since a 32-bit immediate requires fewer instructions >> - some platforms (aarch64) don't benefit from zero-based encoding, so no need to try that >> - some platforms benefit from optimizing the base for 16-bit moves (PPC, s390, aarch64) or for other immediate formats (riscv) >> >> It would be much better to have this section per CPU so that every CPU can implement its perfect, well documented version. A bit of code duplication is a good price for code clarity. >> >> ------------- >> >> This patch splits out `Metaspace::reserve_address_space_for_compressed_classes` into five variants, one per CPU (moving the code to CompressedKlassPointers); it also splits out `CompressedKlassPointers::initialize` into two variants, one for aarch64, one for all other platforms. >> >> Changes per-CPU: >> >> #### aarch64: >> >> Don't attempt to reserve for zero-based encoding; since lsl is not faster than movk. We reserve for movk mode right away if reserve for unscaled fails or if CDS=on. >> >> We also add a last-ditch attempt to reserve optimized for movk via over-alignment. We only do this on aarch64 to prevent errors like this one JDK-8318119: "Invalid narrow Klass base on aarch64 post 8312018" >> >> Since we don't want zero-based encoding, we need an aarch64-specific version for `CompressedKlassPointers::initialize()` >> >> #### riscv: >> >> We attempt to reserve at a "good" base that has only bits set either in [12..32), [32, 44) or in [44, 64). >> >> #### s390: >> >> We attempt to allocate < 4GB unconditionally. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Fix test for riscv Thanks, looks good, tests passes and for cherry-picked cases I see expected ckp decoding. ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16743#pullrequestreview-1746564691 From stuefe at openjdk.org Thu Nov 23 14:25:13 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 23 Nov 2023 14:25:13 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v6] In-Reply-To: References: Message-ID: <4Sx6iriS9oGkl2xQRhOOzWMraDoP6DyvHtaCqkDn3IQ=.2a7e0fa3-81ad-4859-a57e-531179dddf3d@github.com> On Thu, 23 Nov 2023 13:37:41 GMT, Jaroslav Bachorik wrote: >> Please, review this fix for a corner case handling of `jmethodID` values. >> >> The issue is related to the interplay between `jmethodID` values and method redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` instance. Once that method gets redefined, the `jmethodID` is updated to point to the last `Method` version. >> Unless the method is still on stack/running, in which case the original `jmethodID` will be redirected to the latest `Method` version and at the same time the 'previous' `Method` version will receive a new `jmethodID` pointing to that previous version. >> >> If we happen to capture stacktrace via `GetStackTrace` or `GetAllStackTraces` JVMTI calls while this previous `Method` version is still on stack we will have the corresponding frame identified by a `jmethodID` pointing to that version. >> However, sooner or later the 'previous' class version becomes eligible for cleanup at what time all contained `Method` instances. The cleanup process will not perform the `jmethodID` pointer maintenance and we will end up with pointers to deallocated memory. >> This is caused by the fact that the `jmethodID` lifecycle is bound to `ClassLoaderData` instance and all relevant `jmethodID`s will get batch-updated when the class loader is being released and all its classes are getting unloaded. >> >> This means that we need to make sure that if a `Method` instance is being deallocate the associated `jmethodID` (if any) must not point to the deallocated instance once we are finished. Unfortunately, we can not just update the `jmethodID` values in bulk when purging an old class version - the per `InstanceKlass` jmethodID cache is present only for the main class version and contains `jmethodID` values for both the old and current method versions. >> >> Therefore we need to perform `jmethodID` lookup when we are about to deallocate a `Method` instance and clean up the pointer only if that `jmethodID` is pointing to the `Method` instance which is being deallocated. >> >> _(For anyone interested, a much lengthier writeup is available in [my blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_ > > Jaroslav Bachorik has updated the pull request incrementally with one additional commit since the last revision: > > Add exhaustive check for method holder availability (1) src/hotspot/share/oops/instanceKlass.cpp line 541: > 539: assert (!method->on_stack(), "shouldn't be called with methods on stack"); > 540: // Do the pointer maintenance before releasing the metadata > 541: method->clear_jmethod_id(); IIUC this is O(n^2) over number of methods. Seeing that this is a workaround for a special case (an app that does a lot of retransforms *and* uses async profiler), I'd opt for making it conditional. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1403449172 From stuefe at openjdk.org Thu Nov 23 14:29:11 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 23 Nov 2023 14:29:11 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v9] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 14:21:37 GMT, Robbin Ehn wrote: > Thanks, looks good, tests passes and for cherry-picked cases I see expected ckp decoding. Many thanks, @rhen! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16743#issuecomment-1824526195 From mbaesken at openjdk.org Thu Nov 23 14:46:28 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 23 Nov 2023 14:46:28 GMT Subject: RFR: JDK-8320383: refresh libraries cache on AIX in VMError::report [v3] In-Reply-To: References: Message-ID: > VMError::report outputs information about loaded shared libraries; but this info could be outdated on AIX because the libraries cache is currently not refreshed before printing. > This is similar to [JDK-8318587](https://bugs.openjdk.org/browse/JDK-8318587) . > The refresh in VMError::report could be omitted in some situations where the refresh call is considered problematic. Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: use new method also in print_vm_info ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16730/files - new: https://git.openjdk.org/jdk/pull/16730/files/1df7a280..2f0291d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16730&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16730&range=01-02 Stats: 5 lines in 1 file changed: 0 ins; 4 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16730.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16730/head:pull/16730 PR: https://git.openjdk.org/jdk/pull/16730 From mbaesken at openjdk.org Thu Nov 23 14:46:29 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 23 Nov 2023 14:46:29 GMT Subject: RFR: JDK-8320383: refresh libraries cache on AIX in VMError::report [v2] In-Reply-To: References: Message-ID: <-sKecG7keDL30SGq-GT_ZdEy3LJq8ZQQx5fWrfNOOM4=.d94c1d81-4439-4b54-bd30-ce1ef2bb967e@github.com> On Thu, 23 Nov 2023 09:51:31 GMT, Thomas Stuefe wrote: > You can now replace the code in VMError::get_vm_info with the generic function and remove the AIX specific include. Okay makes sense , I adjusted the coding . ------------- PR Comment: https://git.openjdk.org/jdk/pull/16730#issuecomment-1824550657 From stuefe at openjdk.org Thu Nov 23 15:16:13 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 23 Nov 2023 15:16:13 GMT Subject: RFR: JDK-8320383: refresh libraries cache on AIX in VMError::report [v3] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 14:46:28 GMT, Matthias Baesken wrote: >> VMError::report outputs information about loaded shared libraries; but this info could be outdated on AIX because the libraries cache is currently not refreshed before printing. >> This is similar to [JDK-8318587](https://bugs.openjdk.org/browse/JDK-8318587) . >> The refresh in VMError::report could be omitted in some situations where the refresh call is considered problematic. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > use new method also in print_vm_info Good! src/hotspot/share/utilities/vmError.cpp line 727: > 725: if (should_report_bug(_id)) { > 726: os::prepare_native_symbols(); > 727: } Ugh, misuse of "should report bug" as "is oom error" :-/ But no problem, I see you just repeat the existing pattern, that is fine. We can clean that in a separate patch. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16730#pullrequestreview-1746656316 PR Review Comment: https://git.openjdk.org/jdk/pull/16730#discussion_r1403508964 From ogillespie at openjdk.org Thu Nov 23 15:18:21 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Thu, 23 Nov 2023 15:18:21 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v10] In-Reply-To: <6UvVZPlnf_cSgctUVPspR767dHLa7vxOA4DiC7fJ3LY=.35e24443-ab4d-4240-a991-2d98d120bb3b@github.com> References: <6UvVZPlnf_cSgctUVPspR767dHLa7vxOA4DiC7fJ3LY=.35e24443-ab4d-4240-a991-2d98d120bb3b@github.com> Message-ID: On Fri, 10 Nov 2023 12:23:27 GMT, Oli Gillespie wrote: >> Attempt to fix regressions in class-loading performance caused by fixing a symbol table leak in [JDK-8313678](https://bugs.openjdk.org/browse/JDK-8313678). >> >> See lengthy discussion in https://bugs.openjdk.org/browse/JDK-8315559 for more background. In short, the leak was providing an accidental cache for temporary symbols, allowing reuse. >> >> This change keeps new temporary symbols alive in a queue for a short time, allowing them to be re-used by subsequent operations. For example, when attempting to load a class we may call JVM_FindLoadedClass for multiple classloaders back-to-back, and all of them will create a TempNewSymbol for the same string. At present, each call will leave a dead symbol in the table and create a new one. Dead symbols add cleanup and lookup overhead, and insertion is also costly. With this change, the symbol from the first invocation will live for some time after it is used, and subsequent callers can find the symbol alive in the table - avoiding the extra work. >> >> The queue is bounded, and when full new entries displace the oldest entry. This means symbols are held for the time it takes for 100 new temp symbols to be created. 100 is chosen arbitrarily - the tradeoff is memory usage versus 'cache' hit rate. >> >> When concurrent symbol table cleanup runs, it also drains the queue. >> >> In my testing, this brings Dacapo pmd performance back to where it was before the leak was fixed. >> >> Thanks @shipilev , @coleenp and @MDBijman for helping with this fix. > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Fix indentation, rename test helper Thanks for the suggestions, I'm hoping to get back to this next week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16398#issuecomment-1824598329 From jbachorik at openjdk.org Thu Nov 23 15:22:09 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Thu, 23 Nov 2023 15:22:09 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v6] In-Reply-To: References: Message-ID: <0ul1T9qxWhSKv9NTx2Ejvm0nUoAHTYTHZzkEQnkR5NI=.e08e133f-7ea6-4817-b0ea-8319658b0668@github.com> On Mon, 20 Nov 2023 22:08:49 GMT, Coleen Phillimore wrote: >> Jaroslav Bachorik has updated the pull request incrementally with one additional commit since the last revision: >> >> Add exhaustive check for method holder availability (1) > > src/hotspot/share/classfile/classFileParser.cpp line 5579: > >> 5577: >> 5578: if (_methods != nullptr) { >> 5579: // Free methods - those methods are not fully wired and miss the method holder > > How about saying: for methods whose InstanceKlass as method holder is not yet created? And is now back, but I applied your suggestion, @coleenp ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1403513054 From jbachorik at openjdk.org Thu Nov 23 15:22:12 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Thu, 23 Nov 2023 15:22:12 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v6] In-Reply-To: <4Sx6iriS9oGkl2xQRhOOzWMraDoP6DyvHtaCqkDn3IQ=.2a7e0fa3-81ad-4859-a57e-531179dddf3d@github.com> References: <4Sx6iriS9oGkl2xQRhOOzWMraDoP6DyvHtaCqkDn3IQ=.2a7e0fa3-81ad-4859-a57e-531179dddf3d@github.com> Message-ID: On Thu, 23 Nov 2023 14:20:48 GMT, Thomas Stuefe wrote: >> Jaroslav Bachorik has updated the pull request incrementally with one additional commit since the last revision: >> >> Add exhaustive check for method holder availability (1) > > src/hotspot/share/oops/instanceKlass.cpp line 541: > >> 539: assert (!method->on_stack(), "shouldn't be called with methods on stack"); >> 540: // Do the pointer maintenance before releasing the metadata >> 541: method->clear_jmethod_id(); > > IIUC this is O(n^2) over number of methods. Seeing that this is a workaround for a special case (an app that does a lot of retransforms *and* uses async profiler), I'd opt for making it conditional. Sadly, this is not async-profiler specific. The same issue can be observed by JVMTI only code grabbing a stacktrace. What do you mean exactly by 'conditional'? Introducing a new JVM flag or something else? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1403515842 From jvernee at openjdk.org Thu Nov 23 15:31:28 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 23 Nov 2023 15:31:28 GMT Subject: RFR: 8267532: C2: Profile and prune untaken exception handlers [v10] In-Reply-To: References: Message-ID: > The issue is essentially that for the Java try-with-resource construct, javac generates multiple calls to `close(`) the resource. One of those calls is inside the hidden exception handler of the try block. The issue for us is that typically the exception handler is never entered (since no exception is thrown), however we don't profile exception handlers at the moment, so the block is not pruned. C2 doesn't inline the `close()` call in the handler due to low call site frequency. As a result, the receiver of that call escapes and can not be scalar replaced, which then leads to a loss in performance. > > There has been some discussion on the JBS issue that this could be fixed by profiling catch blocks. And another suggestion that partial escape analysis could help here to prevent the object from escaping. But, I think there are other benefits to being able to prune dead catch blocks, such as general reduction in code size, and other optimizations being possible by dead code being eliminated. So, I've implemented catch block profiling + pruning in this patch. > > The implementation is essentially very straightforward: we allocate an extra bit of profiling data for each > exception handler of a method in the `MethodData` for that method (which holds all the profiling > data). Then when looking up the exception handler after an exception is thrown, we mark the > exception handler as entered. When C2 parses the exception handler block, and it sees that it has > never been entered, we emit an uncommon trap instead. > > I've also cleaned up the handling of profiling data sections a bit. After adding the extra section of data to MethodData, I was seeing several crashes when ciMethodData was used. The underlying issue seemed to be that the offset of the parameter data was computed based on the total data size - parameter data size (which doesn't work if we add an additional section for exception handler data). I've re-written the code around this a bit to try and prevent issues in the future. Both MethodData and ciMethodData now track offsets of parameter data and exception handler data, and the size of the each data section is derived from the offsets. > > Finally, there was an assert firing in `freeze_internal` in `continuationFreezeThaw.cpp`: > > assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), > "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int64_t)current->held_monitor_count... Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: - add interpreter profiling specific test cases - rename ex_handler -> exception_handler ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16416/files - new: https://git.openjdk.org/jdk/pull/16416/files/46c94342..dfd5da1d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16416&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16416&range=08-09 Stats: 130 lines in 14 files changed: 57 ins; 0 del; 73 mod Patch: https://git.openjdk.org/jdk/pull/16416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16416/head:pull/16416 PR: https://git.openjdk.org/jdk/pull/16416 From jvernee at openjdk.org Thu Nov 23 15:31:29 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 23 Nov 2023 15:31:29 GMT Subject: RFR: 8267532: C2: Profile and prune untaken exception handlers [v9] In-Reply-To: References: <0A-M6LwxHmiYUfunlz_qgeFiPJoWcmzElMOD6RtxWmc=.da64f93c-9db9-4d19-aaa2-c204857f3595@github.com> Message-ID: On Thu, 23 Nov 2023 03:02:27 GMT, Vladimir Ivanov wrote: > On naming: `ex_handler` is used only once - `GraphKit::has_ex_handler()`. Everywhere else in the code base `exception_handler` is used. Please, align the naming. Feel free to adjust `GraphKit::has_ex_handler()`. Done. > The tests are very nice! Can you, please, point me to the test case which covers profiling in interpreter? I've added 2 more test cases that target interpreter profiling specifically. See: https://github.com/openjdk/jdk/pull/16416/commits/dfd5da1d0bee585e2cbec1b229b451bf4421a4ad ------------- PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1824615924 From aph at openjdk.org Thu Nov 23 15:43:16 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 23 Nov 2023 15:43:16 GMT Subject: RFR: 8319700: [AArch64] C2 compilation fails with "Field too big for insn" In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 12:41:04 GMT, Axel Boldt-Christmas wrote: > It would be nice if the branch shortening for these stubs could be done as some fixpoint iteration. Or take part when laying out the blocks, as maybe even more branches could be shortened if all the BarrierStubs were not at the very end. Probably not worth making the effort for AArch64, because even the short branches are fairly long, so it takes quite a lot of work to find cases where they're exceeded. The few cases where it might help are rare indeed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16780#issuecomment-1824633297 From aph at openjdk.org Thu Nov 23 15:46:16 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 23 Nov 2023 15:46:16 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: <3dscjw75an6_n3ko_2LjHdpdj08xsyjhfpStuZrS5-M=.f43592fe-036e-47b5-babc-67f126885839@github.com> References: <3dscjw75an6_n3ko_2LjHdpdj08xsyjhfpStuZrS5-M=.f43592fe-036e-47b5-babc-67f126885839@github.com> Message-ID: On Thu, 23 Nov 2023 08:57:23 GMT, Xiaohong Gong wrote: >> Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). >> >> SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. >> >> To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. >> >> Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. >> >> [1] https://github.com/openjdk/jdk/pull/3638 >> [2] https://sleef.org/ >> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ >> [4] https://packages.debian.org/bookworm/libsleef3 >> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments in build system make/autoconf/lib-vmath.m4 line 94: > 92: # Check the ARM SVE feature > 93: SVE_CFLAGS="-march=armv8-a+sve" > 94: What's this about? We're building a standard binary, and all SVE detection is to be deferred to runtime. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1403541835 From jbachorik at openjdk.org Thu Nov 23 15:49:09 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Thu, 23 Nov 2023 15:49:09 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v2] In-Reply-To: <8-swJyF28KZeFLyqa2awML3umu2vog2Du9zVlJZfRDI=.5c435f8e-78af-47fa-a020-6bfce16b6c65@github.com> References: <4m-e190SIpRd0-fkjZ4ja7mhVaQQsTwfqmWd8_82Els=.7b198822-3637-4109-b1e9-2451f882ce4b@github.com> <8-swJyF28KZeFLyqa2awML3umu2vog2Du9zVlJZfRDI=.5c435f8e-78af-47fa-a020-6bfce16b6c65@github.com> Message-ID: On Thu, 23 Nov 2023 14:15:54 GMT, Thomas Stuefe wrote: >> @dholmes-ora >>> Can't we just check method->method_holder() for null rather than passing in a parameter like this? >> >> I have removed the argument and I am now performing the check for `method_holder() != nullptr` as recommended. The code is a bit simpler and the cost of resolving the method holder for each method is probably quite low so we should be ok here. > > @jbachorik You are aware that this fix only works for some uncommon corner cases, right? > > It only works if the Method is explicitly deallocated. The vast bulk of Method aren't. Method, as a Metaspace object, is released in bulk when the class is unloaded. The `::deallocate` path you fixed up - that eventually ends up in `MetaspaceArena::deallocate()` - is a rare case that only gets invoked if > - a class cannot be loaded but parts of it have already been loaded into Metaspace. > - a class gets transformed > > In case the class gets unloaded via conventional means, your fix won't get invoked (nor should it; releasing in bulk without having to care for individual allocations is the point of Metaspace). @tstuefe > In case the class gets unloaded via conventional means, your fix won't get invoked (nor should it; releasing in bulk without having to care for individual allocations is the point of Metaspace). Yes. This is intended. The deallocation path via Metaspace is fine. It is just the code that is purging previous versions of a redefined class that has this bug. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16662#issuecomment-1824641628 From jbachorik at openjdk.org Thu Nov 23 15:49:12 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Thu, 23 Nov 2023 15:49:12 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v6] In-Reply-To: References: <4Sx6iriS9oGkl2xQRhOOzWMraDoP6DyvHtaCqkDn3IQ=.2a7e0fa3-81ad-4859-a57e-531179dddf3d@github.com> Message-ID: On Thu, 23 Nov 2023 15:19:43 GMT, Jaroslav Bachorik wrote: >> src/hotspot/share/oops/instanceKlass.cpp line 541: >> >>> 539: assert (!method->on_stack(), "shouldn't be called with methods on stack"); >>> 540: // Do the pointer maintenance before releasing the metadata >>> 541: method->clear_jmethod_id(); >> >> IIUC this is O(n^2) over number of methods. Seeing that this is a workaround for a special case (an app that does a lot of retransforms *and* uses async profiler), I'd opt for making it conditional. > > Sadly, this is not async-profiler specific. The same issue can be observed by JVMTI only code grabbing a stacktrace. > What do you mean exactly by 'conditional'? Introducing a new JVM flag or something else? Ok, I see now - I could do the jmethodID maintenance only from `purge_previous_version_list()` call, leaving the proper metaspace deallocation untouched, therefore not adding unnecessary overhead there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1403544462 From aph at openjdk.org Thu Nov 23 15:50:13 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 23 Nov 2023 15:50:13 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: <3dscjw75an6_n3ko_2LjHdpdj08xsyjhfpStuZrS5-M=.f43592fe-036e-47b5-babc-67f126885839@github.com> References: <3dscjw75an6_n3ko_2LjHdpdj08xsyjhfpStuZrS5-M=.f43592fe-036e-47b5-babc-67f126885839@github.com> Message-ID: <7iScaIdG-XOySGAiFk7hOKkzdf8fDUosIP2NIfWy04g=.c5194b4f-0018-431f-8559-89ef72f104f2@github.com> On Thu, 23 Nov 2023 08:57:23 GMT, Xiaohong Gong wrote: >> Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). >> >> SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. >> >> To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. >> >> Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. >> >> [1] https://github.com/openjdk/jdk/pull/3638 >> [2] https://sleef.org/ >> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ >> [4] https://packages.debian.org/bookworm/libsleef3 >> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments in build system src/jdk.incubator.vector/linux/native/libvmath/vect_math.c line 63: > 61: DEFINE_VECTOR_MATH_UNARY(expm1d2_u10, float64x2_t, advsimd) > 62: > 63: #ifdef __ARM_FEATURE_SVE No, we're building a standard binary that will use SVE if it's present on the machine it's running on. Such compile-time feature tests are inappropriate. You need to provide SVE entry points for the target machine, no matter how HotSpot is configured and built. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1403545717 From mdoerr at openjdk.org Thu Nov 23 15:58:07 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 23 Nov 2023 15:58:07 GMT Subject: RFR: JDK-8320383: refresh libraries cache on AIX in VMError::report [v3] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 14:46:28 GMT, Matthias Baesken wrote: >> VMError::report outputs information about loaded shared libraries; but this info could be outdated on AIX because the libraries cache is currently not refreshed before printing. >> This is similar to [JDK-8318587](https://bugs.openjdk.org/browse/JDK-8318587) . >> The refresh in VMError::report could be omitted in some situations where the refresh call is considered problematic. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > use new method also in print_vm_info LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16730#pullrequestreview-1746735681 From stuefe at openjdk.org Thu Nov 23 16:05:10 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 23 Nov 2023 16:05:10 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v2] In-Reply-To: <8-swJyF28KZeFLyqa2awML3umu2vog2Du9zVlJZfRDI=.5c435f8e-78af-47fa-a020-6bfce16b6c65@github.com> References: <4m-e190SIpRd0-fkjZ4ja7mhVaQQsTwfqmWd8_82Els=.7b198822-3637-4109-b1e9-2451f882ce4b@github.com> <8-swJyF28KZeFLyqa2awML3umu2vog2Du9zVlJZfRDI=.5c435f8e-78af-47fa-a020-6bfce16b6c65@github.com> Message-ID: On Thu, 23 Nov 2023 14:15:54 GMT, Thomas Stuefe wrote: >> @dholmes-ora >>> Can't we just check method->method_holder() for null rather than passing in a parameter like this? >> >> I have removed the argument and I am now performing the check for `method_holder() != nullptr` as recommended. The code is a bit simpler and the cost of resolving the method holder for each method is probably quite low so we should be ok here. > > @jbachorik You are aware that this fix only works for some uncommon corner cases, right? > > It only works if the Method is explicitly deallocated. The vast bulk of Method aren't. Method, as a Metaspace object, is released in bulk when the class is unloaded. The `::deallocate` path you fixed up - that eventually ends up in `MetaspaceArena::deallocate()` - is a rare case that only gets invoked if > - a class cannot be loaded but parts of it have already been loaded into Metaspace. > - a class gets transformed > > In case the class gets unloaded via conventional means, your fix won't get invoked (nor should it; releasing in bulk without having to care for individual allocations is the point of Metaspace). > @tstuefe > > > In case the class gets unloaded via conventional means, your fix won't get invoked (nor should it; releasing in bulk without having to care for individual allocations is the point of Metaspace). > > Yes. This is intended. The deallocation path via Metaspace is fine. It is just the code that is purging previous versions of a redefined class that has this bug. I see it now. On class unloading, we reset the jmethodID "master table" linked list nodes, set all slots to null, but don't delete them to keep outside users from crashing. And we release the IK itself and before we do that free the methodID cache in IK. So we are covered. I thought this was about class unloading, sorry for not reading the description carefully. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16662#issuecomment-1824662450 From duke at openjdk.org Thu Nov 23 16:09:19 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Thu, 23 Nov 2023 16:09:19 GMT Subject: Integrated: 8318159: RISC-V: Improve itable_stub In-Reply-To: References: Message-ID: On Tue, 14 Nov 2023 15:01:51 GMT, Yuri Gaevsky wrote: > Please review the change for RISC-V similar to #13792(AARCH64) and #13460(X86). > > From #13792: > The change replaces two separate iterations over the itable with new algorithm > consisting of two loops. First, we look for a match with resolved_klass, > checking for a match with holder_klass along the way. Then we continue iterating > (not starting over) the itable using the second loop, checking only for a match > with holder_klass. > > ### Correctness checks > > Testing: tier1 tests successfully passed on HiFive Unmatched board. > > #### Performance results on RISC-V StarFive JH7110 board: > > > InterfaceCalls: before fix after fix > ------------------------------------------------------------------- > Benchmark Mode Cnt Score Error Score Error Units > ------------------------------------------------------------------- > test1stInt2Types avgt 100 14.380 ? 0.017 | 14.370 ? 0.014 ns/op > test1stInt3Types avgt 100 72.724 ? 0.552 | 66.290 ? 0.080 ns/op > test1stInt5Types avgt 100 73.948 ? 0.524 | 68.781 ? 0.377 ns/op > test2ndInt2Types avgt 100 15.705 ? 0.016 | 15.707 ? 0.018 ns/op > test2ndInt3Types avgt 100 82.370 ? 0.453 | 75.363 ? 0.156 ns/op > test2ndInt5Types avgt 100 85.266 ? 0.466 | 80.969 ? 0.752 ns/op > testIfaceCall avgt 100 75.684 ? 0.648 | 72.603 ? 0.460 ns/op > testIfaceExtCall avgt 100 86.293 ? 0.567 | 77.939 ? 0.340 ns/op > testMonomorphic avgt 100 11.357 ? 0.007 | 11.359 ? 0.009 ns/op > ------------------------------------------------------------------- > > > #### Performance results on RISC-V HiFive Unmatched board: > > > InterfaceCalls: before fix after fix > --------------------------------------------------------------------- > Benchmark Mode Cnt Score Error Score Error Units > --------------------------------------------------------------------- > test1stInt2Types avgt 100 24.432 ? 1.811 | 23.205 ? 1.512 ns/op > test1stInt3Types avgt 100 135.800 ? 3.991 | 127.112 ? 2.299 ns/op > test1stInt5Types avgt 100 141.746 ? 4.272 | 136.069 ? 4.919 ns/op > test2ndInt2Types avgt 100 31.474 ? 2.468 | 26.978 ? 1.951 ns/op > test2ndInt3Types avgt 100 146.410 ? 3.575 | 139.443 ? 3.677 ns/op > test2ndInt5Types avgt 100 156.083 ? 3.617 | 150.583 ? 2.909 ns/op > testIfaceCall avgt 100 136.392 ? 2.546 | 129.632 ? 1.662 ns/op > testIfaceExtCall avgt 100 155.602 ? 3.836 | 138.058 ? 2.147 ns/op > testMonomorphic avgt 100 24.018 ? 1.888 | 21.522 ? 1.662 ns/op > ---------... This pull request has now been integrated. Changeset: 6d79e0aa Author: Yuri Gaevsky Committer: Vladimir Kempik URL: https://git.openjdk.org/jdk/commit/6d79e0aa3c32f687d5120811de955d5ae19e0fb2 Stats: 131 lines in 3 files changed: 112 ins; 15 del; 4 mod 8318159: RISC-V: Improve itable_stub Reviewed-by: fyang, rehn ------------- PR: https://git.openjdk.org/jdk/pull/16657 From azafari at openjdk.org Thu Nov 23 16:11:32 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 23 Nov 2023 16:11:32 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v12] In-Reply-To: References: Message-ID: > The `find` method now is > ```C++ > template > int find(T* token, bool f(T*, E)) const { > ... > > Any other functions which use this are also changed. > Local linux-x64-debug hotspot:tier1 passed. Mach5 tier1 build on linux and Windows passed. Afshin Zafari has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'upstream/pr/15418' into pr_15418 - Suggested cleanups and tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15418/files - new: https://git.openjdk.org/jdk/pull/15418/files/8d5f54ab..cb88988e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15418&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15418&range=10-11 Stats: 14 lines in 7 files changed: 5 ins; 6 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/15418.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15418/head:pull/15418 PR: https://git.openjdk.org/jdk/pull/15418 From stefank at openjdk.org Thu Nov 23 16:18:43 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 23 Nov 2023 16:18:43 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v4] In-Reply-To: References: Message-ID: > In the rewrites made for: > [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` > > I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. > > The provided tests provoke this assert form: > * the JNI thread detach code > * thread dumping with locked monitors, and > * the JVMTI GetOwnedMonitorInfo API. > > While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. > > The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. > > For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. > > Test: the written tests with and without the fix. Tier1-Tier3, so far. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Rewrite tests to prevent problem with native libs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16783/files - new: https://git.openjdk.org/jdk/pull/16783/files/3239b822..9521b26d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16783&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16783&range=02-03 Stats: 299 lines in 7 files changed: 99 ins; 194 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/16783.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16783/head:pull/16783 PR: https://git.openjdk.org/jdk/pull/16783 From stefank at openjdk.org Thu Nov 23 16:18:44 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 23 Nov 2023 16:18:44 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v3] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 14:17:18 GMT, Magnus Ihse Bursie wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Tweak test comments > > Build changes are trivially fine. Thanks @magicus. I'm removing the build label now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16783#issuecomment-1824675404 From stefank at openjdk.org Thu Nov 23 16:18:45 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 23 Nov 2023 16:18:45 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v3] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 11:52:38 GMT, Stefan Karlsson wrote: >> In the rewrites made for: >> [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` >> >> I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. >> >> The provided tests provoke this assert form: >> * the JNI thread detach code >> * thread dumping with locked monitors, and >> * the JVMTI GetOwnedMonitorInfo API. >> >> While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. >> >> The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. >> >> For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. >> >> Test: the written tests with and without the fix. Tier1-Tier3, so far. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Tweak test comments I reworked the MonitorWithDeadObject tests so that they all run in one class loader. Jtreg loads the tests in different class loaders, which causes problems because it's only allowed to load a library from *one* class loader. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16783#issuecomment-1824679121 From stefank at openjdk.org Thu Nov 23 16:20:14 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 23 Nov 2023 16:20:14 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v12] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 16:11:32 GMT, Afshin Zafari wrote: >> The `find` method now is >> ```C++ >> template >> int find(T* token, bool f(T*, E)) const { >> ... >> >> Any other functions which use this are also changed. >> Local linux-x64-debug hotspot:tier1 passed. Mach5 tier1 build on linux and Windows passed. > > Afshin Zafari has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/pr/15418' into pr_15418 > - Suggested cleanups and tests Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15418#pullrequestreview-1746766388 From eosterlund at openjdk.org Thu Nov 23 16:23:20 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 23 Nov 2023 16:23:20 GMT Subject: RFR: 8310644: Make panama memory segment close use async handshakes [v2] In-Reply-To: References: Message-ID: <-sZTbRUmxI9N4jDw8UsOwqwXk429wUxBvgVr_NVtd7g=.af4df37c-4dfd-4942-8d5b-150b8c91455f@github.com> > The current logic for closing memory in panama today is susceptible to live lock if we have a closing thread that wants to close the memory in a loop that keeps failing, and a bunch of accessing threads that want to perform accesses as long as the memory is alive. They can both create impediments for the other. > > By using asynchronous handshakes to install an exception onto threads that are in @Scoped memory accesses, we can have close always succeed, and the accessing threads bail out. The idea is that we perform a synchronous handshake first to find threads that are in scoped methods. They might however be in the middle of throwing an exception or something wild like there, where an exception can't be delivered. We install an async handshake that will roll us forward to the first place where we can indeed install exceptions, then we reevaluate if we still need to do that, or if we have unwound out from the scoped method. If we are still inside of it, we ensure an exception is installed so we don't continue executing bytecodes that might access the memory that we have freed. > > Tested tier 1-5 as well as running test/jdk/java/foreign/TestHandshake.java hundreds of times, which tests this API pretty well. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: Update test/jdk/java/foreign/TestHandshake.java Co-authored-by: Jorn Vernee ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16792/files - new: https://git.openjdk.org/jdk/pull/16792/files/aac442ae..d12fa908 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16792&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16792&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16792.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16792/head:pull/16792 PR: https://git.openjdk.org/jdk/pull/16792 From jbachorik at openjdk.org Thu Nov 23 16:25:29 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Thu, 23 Nov 2023 16:25:29 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v7] In-Reply-To: References: Message-ID: > Please, review this fix for a corner case handling of `jmethodID` values. > > The issue is related to the interplay between `jmethodID` values and method redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` instance. Once that method gets redefined, the `jmethodID` is updated to point to the last `Method` version. > Unless the method is still on stack/running, in which case the original `jmethodID` will be redirected to the latest `Method` version and at the same time the 'previous' `Method` version will receive a new `jmethodID` pointing to that previous version. > > If we happen to capture stacktrace via `GetStackTrace` or `GetAllStackTraces` JVMTI calls while this previous `Method` version is still on stack we will have the corresponding frame identified by a `jmethodID` pointing to that version. > However, sooner or later the 'previous' class version becomes eligible for cleanup at what time all contained `Method` instances. The cleanup process will not perform the `jmethodID` pointer maintenance and we will end up with pointers to deallocated memory. > This is caused by the fact that the `jmethodID` lifecycle is bound to `ClassLoaderData` instance and all relevant `jmethodID`s will get batch-updated when the class loader is being released and all its classes are getting unloaded. > > This means that we need to make sure that if a `Method` instance is being deallocate the associated `jmethodID` (if any) must not point to the deallocated instance once we are finished. Unfortunately, we can not just update the `jmethodID` values in bulk when purging an old class version - the per `InstanceKlass` jmethodID cache is present only for the main class version and contains `jmethodID` values for both the old and current method versions. > > Therefore we need to perform `jmethodID` lookup when we are about to deallocate a `Method` instance and clean up the pointer only if that `jmethodID` is pointing to the `Method` instance which is being deallocated. > > _(For anyone interested, a much lengthier writeup is available in [my blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_ Jaroslav Bachorik has updated the pull request incrementally with one additional commit since the last revision: Do expensive cleanup only in `purge_previous_version_list()` ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16662/files - new: https://git.openjdk.org/jdk/pull/16662/files/eaf2720e..4c6a57e8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16662&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16662&range=05-06 Stats: 25 lines in 3 files changed: 17 ins; 6 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16662.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16662/head:pull/16662 PR: https://git.openjdk.org/jdk/pull/16662 From jbachorik at openjdk.org Thu Nov 23 16:25:30 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Thu, 23 Nov 2023 16:25:30 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v6] In-Reply-To: References: <4Sx6iriS9oGkl2xQRhOOzWMraDoP6DyvHtaCqkDn3IQ=.2a7e0fa3-81ad-4859-a57e-531179dddf3d@github.com> Message-ID: On Thu, 23 Nov 2023 15:46:10 GMT, Jaroslav Bachorik wrote: >> Sadly, this is not async-profiler specific. The same issue can be observed by JVMTI only code grabbing a stacktrace. >> What do you mean exactly by 'conditional'? Introducing a new JVM flag or something else? > > Ok, I see now - I could do the jmethodID maintenance only from `purge_previous_version_list()` call, leaving the proper metaspace deallocation untouched, therefore not adding unnecessary overhead there. I have modified the code to do jmethodID cleanup only when in `purge_previous_version_list()` - this should help with the added overhead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1403578138 From jbachorik at openjdk.org Thu Nov 23 16:28:06 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Thu, 23 Nov 2023 16:28:06 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v2] In-Reply-To: References: <4m-e190SIpRd0-fkjZ4ja7mhVaQQsTwfqmWd8_82Els=.7b198822-3637-4109-b1e9-2451f882ce4b@github.com> <8-swJyF28KZeFLyqa2awML3umu2vog2Du9zVlJZfRDI=.5c435f8e-78af-47fa-a020-6bfce16b6c65@github.com> Message-ID: On Thu, 23 Nov 2023 16:02:18 GMT, Thomas Stuefe wrote: > sorry for not reading the description carefully. No worries. This is a rather convoluted issue. I don't mind being challenged - I am quite new to this part of the code and I don't want to accidentally break something ... ------------- PR Comment: https://git.openjdk.org/jdk/pull/16662#issuecomment-1824691458 From eosterlund at openjdk.org Thu Nov 23 16:37:19 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 23 Nov 2023 16:37:19 GMT Subject: RFR: 8310644: Make panama memory segment close use async handshakes [v3] In-Reply-To: References: Message-ID: > The current logic for closing memory in panama today is susceptible to live lock if we have a closing thread that wants to close the memory in a loop that keeps failing, and a bunch of accessing threads that want to perform accesses as long as the memory is alive. They can both create impediments for the other. > > By using asynchronous handshakes to install an exception onto threads that are in @Scoped memory accesses, we can have close always succeed, and the accessing threads bail out. The idea is that we perform a synchronous handshake first to find threads that are in scoped methods. They might however be in the middle of throwing an exception or something wild like there, where an exception can't be delivered. We install an async handshake that will roll us forward to the first place where we can indeed install exceptions, then we reevaluate if we still need to do that, or if we have unwound out from the scoped method. If we are still inside of it, we ensure an exception is installed so we don't continue executing bytecodes that might access the memory that we have freed. > > Tested tier 1-5 as well as running test/jdk/java/foreign/TestHandshake.java hundreds of times, which tests this API pretty well. Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: Comments from Jorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16792/files - new: https://git.openjdk.org/jdk/pull/16792/files/d12fa908..5d34ddba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16792&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16792&range=01-02 Stats: 47 lines in 4 files changed: 0 ins; 39 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/16792.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16792/head:pull/16792 PR: https://git.openjdk.org/jdk/pull/16792 From jvernee at openjdk.org Thu Nov 23 16:37:19 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 23 Nov 2023 16:37:19 GMT Subject: RFR: 8310644: Make panama memory segment close use async handshakes [v3] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 16:34:23 GMT, Erik ?sterlund wrote: >> The current logic for closing memory in panama today is susceptible to live lock if we have a closing thread that wants to close the memory in a loop that keeps failing, and a bunch of accessing threads that want to perform accesses as long as the memory is alive. They can both create impediments for the other. >> >> By using asynchronous handshakes to install an exception onto threads that are in @Scoped memory accesses, we can have close always succeed, and the accessing threads bail out. The idea is that we perform a synchronous handshake first to find threads that are in scoped methods. They might however be in the middle of throwing an exception or something wild like there, where an exception can't be delivered. We install an async handshake that will roll us forward to the first place where we can indeed install exceptions, then we reevaluate if we still need to do that, or if we have unwound out from the scoped method. If we are still inside of it, we ensure an exception is installed so we don't continue executing bytecodes that might access the memory that we have freed. >> >> Tested tier 1-5 as well as running test/jdk/java/foreign/TestHandshake.java hundreds of times, which tests this API pretty well. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Comments from Jorn LGTM ------------- Marked as reviewed by jvernee (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16792#pullrequestreview-1746784992 From eosterlund at openjdk.org Thu Nov 23 16:37:21 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 23 Nov 2023 16:37:21 GMT Subject: RFR: 8310644: Make panama memory segment close use async handshakes [v2] In-Reply-To: <-sZTbRUmxI9N4jDw8UsOwqwXk429wUxBvgVr_NVtd7g=.af4df37c-4dfd-4942-8d5b-150b8c91455f@github.com> References: <-sZTbRUmxI9N4jDw8UsOwqwXk429wUxBvgVr_NVtd7g=.af4df37c-4dfd-4942-8d5b-150b8c91455f@github.com> Message-ID: On Thu, 23 Nov 2023 16:23:20 GMT, Erik ?sterlund wrote: >> The current logic for closing memory in panama today is susceptible to live lock if we have a closing thread that wants to close the memory in a loop that keeps failing, and a bunch of accessing threads that want to perform accesses as long as the memory is alive. They can both create impediments for the other. >> >> By using asynchronous handshakes to install an exception onto threads that are in @Scoped memory accesses, we can have close always succeed, and the accessing threads bail out. The idea is that we perform a synchronous handshake first to find threads that are in scoped methods. They might however be in the middle of throwing an exception or something wild like there, where an exception can't be delivered. We install an async handshake that will roll us forward to the first place where we can indeed install exceptions, then we reevaluate if we still need to do that, or if we have unwound out from the scoped method. If we are still inside of it, we ensure an exception is installed so we don't continue executing bytecodes that might access the memory that we have freed. >> >> Tested tier 1-5 as well as running test/jdk/java/foreign/TestHandshake.java hundreds of times, which tests this API pretty well. > > Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: > > Update test/jdk/java/foreign/TestHandshake.java > > Co-authored-by: Jorn Vernee Thanks for the review @JornVernee! I applied the changes you wanted I think. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16792#issuecomment-1824698160 From jbachorik at openjdk.org Thu Nov 23 16:42:35 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Thu, 23 Nov 2023 16:42:35 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v8] In-Reply-To: References: Message-ID: > Please, review this fix for a corner case handling of `jmethodID` values. > > The issue is related to the interplay between `jmethodID` values and method redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` instance. Once that method gets redefined, the `jmethodID` is updated to point to the last `Method` version. > Unless the method is still on stack/running, in which case the original `jmethodID` will be redirected to the latest `Method` version and at the same time the 'previous' `Method` version will receive a new `jmethodID` pointing to that previous version. > > If we happen to capture stacktrace via `GetStackTrace` or `GetAllStackTraces` JVMTI calls while this previous `Method` version is still on stack we will have the corresponding frame identified by a `jmethodID` pointing to that version. > However, sooner or later the 'previous' class version becomes eligible for cleanup at what time all contained `Method` instances. The cleanup process will not perform the `jmethodID` pointer maintenance and we will end up with pointers to deallocated memory. > This is caused by the fact that the `jmethodID` lifecycle is bound to `ClassLoaderData` instance and all relevant `jmethodID`s will get batch-updated when the class loader is being released and all its classes are getting unloaded. > > This means that we need to make sure that if a `Method` instance is being deallocate the associated `jmethodID` (if any) must not point to the deallocated instance once we are finished. Unfortunately, we can not just update the `jmethodID` values in bulk when purging an old class version - the per `InstanceKlass` jmethodID cache is present only for the main class version and contains `jmethodID` values for both the old and current method versions. > > Therefore we need to perform `jmethodID` lookup when we are about to deallocate a `Method` instance and clean up the pointer only if that `jmethodID` is pointing to the `Method` instance which is being deallocated. > > _(For anyone interested, a much lengthier writeup is available in [my blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_ Jaroslav Bachorik has updated the pull request incrementally with one additional commit since the last revision: Reinstate mistakenly deleted comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16662/files - new: https://git.openjdk.org/jdk/pull/16662/files/4c6a57e8..46eff8d3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16662&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16662&range=06-07 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16662.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16662/head:pull/16662 PR: https://git.openjdk.org/jdk/pull/16662 From duke at openjdk.org Thu Nov 23 17:09:09 2023 From: duke at openjdk.org (suchismith1993) Date: Thu, 23 Nov 2023 17:09:09 GMT Subject: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2] In-Reply-To: <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> References: <8-buFPL9W3149qcnluk_XqTQr-cJYqu_XvwU5ovyAIA=.396e5005-f896-48b9-919c-94164229d7bf@github.com> <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> Message-ID: On Wed, 22 Nov 2023 16:24:24 GMT, suchismith1993 wrote: >> J2SE agent does not start and throws error when it tries to find the shared library ibm_16_am. >> After searching for ibm_16_am.so ,the jvm agent throws and error as dll_load fails.It fails to identify the shared library ibm_16_am.a shared archive file on AIX. >> Hence we are providing a function which will additionally search for .a file on AIX ,when the search for .so file fails. > > suchismith1993 has updated the pull request incrementally with one additional commit since the last revision: > > change macro position > I'm not a big fan of this approach. We accumulate more and more "#ifdef AIX" in shared code because of many recent AIX additions. No other platform has such a large ifdef footprint in shared code. > > I argue that all of this should be handled inside os_aix.cpp and not leak out into the external space: > > If .a is a valid shared object format on AIX, this should be handled in `os::dll_load()`, and be done for all shared objects. If not, why do we try to load a static archive via dlload in this case but not in other cases? > > _If_ this is needed in shared code, the string replacement function should be a generic utility function for all platforms, and it should be tested with a small gtest. A gtest would have likely uncovered the buffer overflow too. So i tried to check how to move the code to os_aix file. A few problems is see : 1. When i have to implemented the logic at dll_load function, i would have to repeat a lot of code after dlopen, i.e i have to call dlopen again for .so files and hence i have to copy the logic again for it. 2. Currently using function before dll_load,in the shared code makes this a bit easier. I have an alternate suggestion . Shall we declare the utlity function as part of os ? and implement it platform wise. In that way we just keep the actual implentation and aix and in windows and linux we keep it empty. So that way we can just call the utility function in shared code and it wouldnt affect other platform and will run the usecase for AIX. If that is not acceptable, then is there a better way to avoid repeating the dlopen again in os_aix file ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1824737158 From mli at openjdk.org Thu Nov 23 17:18:28 2023 From: mli at openjdk.org (Hamlin Li) Date: Thu, 23 Nov 2023 17:18:28 GMT Subject: RFR: 8318227: RISC-V: C2 ConvHF2F Message-ID: Hi, Can you review the patch to add ConvHF2F intrinsic to JDK for riscv? Thanks! (By latest kernel patch, `#define RISCV_HWPROBE_EXT_ZFH (1 << 27)` https://lore.kernel.org/lkml/20231114141256.126749-11-cleger at rivosinc.com/) ## Test ### Functionality #### hotspot tests test/hotspot/jtreg/compiler/intrinsics/ test/hotspot/jtreg/compiler/c2/irTests #### jdk tests test/jdk/java/lang/Float/Binary16Conversion*.java ### Performance tested on licheepi. #### with UseZfh enabled (i.e. enable the intrinsic) Benchmark (size) Mode Cnt Score Error Units Fp16ConversionBenchmark.float16ToFloat 2048 avgt 10 4659.796 ? 13.262 ns/op Fp16ConversionBenchmark.float16ToFloatMemory 2048 avgt 10 22.957 ? 0.098 ns/op #### with UseZfh disabled (i.e. disable the intrinsic) Benchmark (size) Mode Cnt Score Error Units Fp16ConversionBenchmark.float16ToFloat 2048 avgt 10 22930.591 ? 72.595 ns/op Fp16ConversionBenchmark.float16ToFloatMemory 2048 avgt 10 25.970 ? 0.063 ns/op ------------- Commit messages: - update RISCV_HWPROBE_EXT_ZFH value - Initial commit Changes: https://git.openjdk.org/jdk/pull/16802/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16802&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318227 Stats: 67 lines in 13 files changed: 67 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16802.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16802/head:pull/16802 PR: https://git.openjdk.org/jdk/pull/16802 From stuefe at openjdk.org Thu Nov 23 18:06:05 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 23 Nov 2023 18:06:05 GMT Subject: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2] In-Reply-To: References: <8-buFPL9W3149qcnluk_XqTQr-cJYqu_XvwU5ovyAIA=.396e5005-f896-48b9-919c-94164229d7bf@github.com> <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> Message-ID: On Thu, 23 Nov 2023 17:05:29 GMT, suchismith1993 wrote: > > I'm not a big fan of this approach. We accumulate more and more "#ifdef AIX" in shared code because of many recent AIX additions. No other platform has such a large ifdef footprint in shared code. > > I argue that all of this should be handled inside os_aix.cpp and not leak out into the external space: > > If .a is a valid shared object format on AIX, this should be handled in `os::dll_load()`, and be done for all shared objects. If not, why do we try to load a static archive via dlload in this case but not in other cases? > > _If_ this is needed in shared code, the string replacement function should be a generic utility function for all platforms, and it should be tested with a small gtest. A gtest would have likely uncovered the buffer overflow too. > > So i tried to check how to move the code to os_aix file. A few problems is see : > > 1. When i have to implemented the logic at dll_load function, i would have to repeat a lot of code after dlopen, i.e i have to call dlopen again for .so files and hence i have to copy the logic again for it. > 2. Currently using function before dll_load,in the shared code makes this a bit easier. > I have an alternate suggestion . > Shall we declare the utlity function as part of os ? and implement it platform wise. Not without any need. If this is an AIX specific issue, it should be handed in os::dll_load on AIX. > In that way we just keep the actual implentation and aix and in windows and linux we keep it empty. > So that way we can just call the utility function in shared code and it wouldnt affect other platform and will run the usecase for AIX. > If that is not acceptable, then is there a better way to avoid repeating the dlopen again in os_aix file ? I don't understand the problem. What is preventing you from using a file local scope utility function inside os::dll_load? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1824787258 From duke at openjdk.org Thu Nov 23 18:30:05 2023 From: duke at openjdk.org (suchismith1993) Date: Thu, 23 Nov 2023 18:30:05 GMT Subject: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2] In-Reply-To: References: <8-buFPL9W3149qcnluk_XqTQr-cJYqu_XvwU5ovyAIA=.396e5005-f896-48b9-919c-94164229d7bf@github.com> <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> Message-ID: On Thu, 23 Nov 2023 18:03:33 GMT, Thomas Stuefe wrote: > > > I'm not a big fan of this approach. We accumulate more and more "#ifdef AIX" in shared code because of many recent AIX additions. No other platform has such a large ifdef footprint in shared code. > > > I argue that all of this should be handled inside os_aix.cpp and not leak out into the external space: > > > If .a is a valid shared object format on AIX, this should be handled in `os::dll_load()`, and be done for all shared objects. If not, why do we try to load a static archive via dlload in this case but not in other cases? > > > _If_ this is needed in shared code, the string replacement function should be a generic utility function for all platforms, and it should be tested with a small gtest. A gtest would have likely uncovered the buffer overflow too. > > > > > > So i tried to check how to move the code to os_aix file. A few problems is see : > > > > 1. When i have to implemented the logic at dll_load function, i would have to repeat a lot of code after dlopen, i.e i have to call dlopen again for .so files and hence i have to copy the logic again for it. > > 2. Currently using function before dll_load,in the shared code makes this a bit easier. > > I have an alternate suggestion . > > Shall we declare the utlity function as part of os ? and implement it platform wise. > > Not without any need. If this is an AIX specific issue, it should be handed in os::dll_load on AIX. > > > In that way we just keep the actual implentation and aix and in windows and linux we keep it empty. > > So that way we can just call the utility function in shared code and it wouldnt affect other platform and will run the usecase for AIX. > > If that is not acceptable, then is there a better way to avoid repeating the dlopen again in os_aix file ? > > I don't understand the problem. What is preventing you from using a file local scope utility function inside os::dll_load? i would have to repeat the line 1132 and 1139 in os_aix.cpp again , if the condition fails for .so files, because i have to reload it again and check if the .a exists. In the shared code i had repeat less number of lines i believe. Do you suggest moving lines 1132 to 1139 to another function then ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1824804477 From duke at openjdk.org Thu Nov 23 19:12:21 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Thu, 23 Nov 2023 19:12:21 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v5] In-Reply-To: References: Message-ID: <93M-a4Fckf8STLcvAP1cV4msQHqoQ4vUgWo02_YiJxo=.1c79764d-603e-497c-bab1-04ac2d30fa72@github.com> > Splitting vblendvp[sd] into boolean operations is bit faster on ECore, get up to 30% gain > > > =============== BEFORE =============== > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 77.766 ? 0.049 ns/op > VectorSignum.floatSignum 512 avgt 3 154.889 ? 0.242 ns/op > VectorSignum.floatSignum 1024 avgt 3 306.130 ? 0.605 ns/op > VectorSignum.floatSignum 2048 avgt 3 609.965 ? 0.927 ns/op > VectorSignum.doubleSignum 256 avgt 3 151.874 ? 1.748 ns/op > VectorSignum.doubleSignum 512 avgt 3 303.080 ? 0.310 ns/op > VectorSignum.doubleSignum 1024 avgt 3 607.517 ? 0.597 ns/op > VectorSignum.doubleSignum 2048 avgt 3 1214.282 ? 1.834 ns/op > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op > MaxMinOptimizeTest.dMax avgt 3 137.334 ? 0.128 us/op > MaxMinOptimizeTest.dMin avgt 3 137.160 ? 0.465 us/op > MaxMinOptimizeTest.dMul avgt 3 77.231 ? 0.051 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.165 ? 0.003 us/op > MaxMinOptimizeTest.fMax avgt 3 107.428 ? 1.501 us/op > MaxMinOptimizeTest.fMin avgt 3 107.186 ? 0.022 us/op > MaxMinOptimizeTest.fMul avgt 3 77.164 ? 0.012 us/op > > =============== AFTER =============== > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 61.816 ? 1.980 ns/op > VectorSignum.floatSignum 512 avgt 3 117.251 ? 0.052 ns/op > VectorSignum.floatSignum 1024 avgt 3 231.356 ? 0.397 ns/op > VectorSignum.floatSignum 2048 avgt 3 458.904 ? 0.774 ns/op > VectorSignum.doubleSignum 256 avgt 3 121.449 ? 0.184 ns/op > VectorSignum.doubleSignum 512 avgt 3 241.662 ? 0.189 ns/op > VectorSignum.doubleSignum 1024 avgt 3 482.365 ? 0.165 ns/op > VectorSignum.doubleSignum 2048 avgt 3 962.412 ? 1.401 ns/op > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op > MaxMinOptimizeTest.dMax avgt 3 125.701 ? 0.082 us/op > MaxMinOptimizeTest.dMin avgt 3 124.704 ? 0.119 us/op > MaxMinOptimizeTest.dMul avgt 3 77.232 ? 0.028 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.169 ? 0.103 us/op > MaxMinOptimizeTest.fMax avgt 3 97.939 ? 0.477 us/op > MaxMinOptimizeTest.fMin avgt 3 98.012 ? 0.154 us/op > MaxMinOptimizeTest.fMul avgt 3 77.174 ? 0.012 us/op Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: test break fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16716/files - new: https://git.openjdk.org/jdk/pull/16716/files/707bea50..1739bda8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16716&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16716&range=03-04 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16716.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16716/head:pull/16716 PR: https://git.openjdk.org/jdk/pull/16716 From jbhateja at openjdk.org Thu Nov 23 19:12:23 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 23 Nov 2023 19:12:23 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v2] In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 17:27:16 GMT, Volodymyr Paprotski wrote: >> test/hotspot/jtreg/compiler/vectorization/TestSignumVector.java line 112: >> >>> 110: if (fout[i] != 1.0) throw new RuntimeException("Expected positive numbers in second half of array: " + java.util.Arrays.toString(fout)); >>> 111: } >>> 112: } >> >> Its ok to add correctness check here, but test only intend to perform check IR validations, there are detailed function tests in following files >> test/hotspot/jtreg/compiler/intrinsics/math/TestSignumIntrinsic.java >> test/hotspot/jtreg/compiler/c2/cr6340864/TestFloatVect.java >> test/hotspot/jtreg/compiler/c2/cr6340864/TestDoubleVect.java > > I am ok to remove this change to the test.. I didn't know where the other tests where and by the time I did find those, already added this. (Figured "more test === good", but its just duplicate) I am fine with it. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1403637285 From azafari at openjdk.org Thu Nov 23 22:20:21 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 23 Nov 2023 22:20:21 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v3] In-Reply-To: References: Message-ID: On Sun, 29 Oct 2023 08:07:55 GMT, Kim Barrett wrote: >> I still approve of this patch as it's better than what we had before. There are a lot of suggested improvements that can be done either in this PR or in a future RFE. `git blame` shows that this hasn't been touched since 2008, so I don't think applying all suggestions now is in any sense critical :-). > >> I still approve of this patch as it's better than what we had before. There are a lot of suggested improvements that can be done either in this PR or in a future RFE. `git blame` shows that this hasn't been touched since 2008, so I don't think applying all suggestions now is in any sense critical :-). > > Not touched since 2008 suggests to me there might not be a rush to make the change as proposed, and instead take > the (I think small) additional time to do the better thing, e.g. the unary-predicate suggestion made by several folks. Dear @kimbarrett , @dholmes-ora , @stefank, @rose00 , @jdksjolen, @sspitsyn, @merykitty, Thank you all. The final code is now cleaner and nicer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15418#issuecomment-1824962932 From azafari at openjdk.org Thu Nov 23 22:20:21 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 23 Nov 2023 22:20:21 GMT Subject: Integrated: 8314502: Change the comparator taking version of GrowableArray::find to be a template method In-Reply-To: References: Message-ID: On Thu, 24 Aug 2023 14:09:46 GMT, Afshin Zafari wrote: > The `find` method now is > ```C++ > template > int find(T* token, bool f(T*, E)) const { > ... > > Any other functions which use this are also changed. > Local linux-x64-debug hotspot:tier1 passed. Mach5 tier1 build on linux and Windows passed. This pull request has now been integrated. Changeset: 14557e72 Author: Afshin Zafari URL: https://git.openjdk.org/jdk/commit/14557e72ef55c6161a3fa0c1960f7be618a34bf1 Stats: 155 lines in 13 files changed: 98 ins; 32 del; 25 mod 8314502: Change the comparator taking version of GrowableArray::find to be a template method Reviewed-by: jsjolen, sspitsyn, stefank ------------- PR: https://git.openjdk.org/jdk/pull/15418 From dholmes at openjdk.org Thu Nov 23 22:26:22 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 23 Nov 2023 22:26:22 GMT Subject: Integrated: 8318776: Require supports_cx8 to always be true In-Reply-To: References: Message-ID: On Mon, 13 Nov 2023 04:38:35 GMT, David Holmes wrote: > As discussed in JBS all platforms (some tweaks to Zero are in progress) actually do support `cx8` i.e. 64-bit compare-and-exchange, so we can strip out the locked-based alternatives to using it and just add a guarantee that it is true at runtime. And all platforms except some ARM variants set `SUPPORTS_NATIVE_CX8`, so we can greatly simplify things. Summary of changes: > - `_supports_cx8` field is only needed when `SUPPORTS_NATIVE_CX8` is not defined > - Assertions for `supports_cx8()` are removed > - Compiler predicates requiring `supports_cx8()` are removed > - Access backend is greatly simplified without the need for lock-based alternative > - `java.util.concurrent.AtomicLongFieldUpdater` is simplified without the need for a lock-based alternative > > I did consider moving all the ARM `kuser_helper` related code to be only defined when `SUPPORTS_NATIVE_CX8` is not defined, but there was a theoretical risk this could change the behaviour if ARMv7 binaries were run on other ARM CPU's. I added a note to that effect in vm_version_linux_arm32.cpp so the ARM port maintainers could clean this up further if desired. > > Testing: > - All Oracle tiers 1-5 builds (which includes an ARMv7 build) > - GHA builds/tests > - Oracle tiers 1-3 sanity testing > > Zero changes coming in via [JDK-8319777](https://bugs.openjdk.org/browse/JDK-8319777) will be merged when they arrive. > > Thanks. This pull request has now been integrated. Changeset: c75c3887 Author: David Holmes URL: https://git.openjdk.org/jdk/commit/c75c38871ee7b5c9f7f0c195d649c16967f786bb Stats: 460 lines in 39 files changed: 16 ins; 429 del; 15 mod 8318776: Require supports_cx8 to always be true Reviewed-by: eosterlund, shade, dcubed ------------- PR: https://git.openjdk.org/jdk/pull/16625 From dholmes at openjdk.org Thu Nov 23 22:26:20 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 23 Nov 2023 22:26:20 GMT Subject: RFR: 8318776: Require supports_cx8 to always be true [v7] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 12:09:27 GMT, David Holmes wrote: >> As discussed in JBS all platforms (some tweaks to Zero are in progress) actually do support `cx8` i.e. 64-bit compare-and-exchange, so we can strip out the locked-based alternatives to using it and just add a guarantee that it is true at runtime. And all platforms except some ARM variants set `SUPPORTS_NATIVE_CX8`, so we can greatly simplify things. Summary of changes: >> - `_supports_cx8` field is only needed when `SUPPORTS_NATIVE_CX8` is not defined >> - Assertions for `supports_cx8()` are removed >> - Compiler predicates requiring `supports_cx8()` are removed >> - Access backend is greatly simplified without the need for lock-based alternative >> - `java.util.concurrent.AtomicLongFieldUpdater` is simplified without the need for a lock-based alternative >> >> I did consider moving all the ARM `kuser_helper` related code to be only defined when `SUPPORTS_NATIVE_CX8` is not defined, but there was a theoretical risk this could change the behaviour if ARMv7 binaries were run on other ARM CPU's. I added a note to that effect in vm_version_linux_arm32.cpp so the ARM port maintainers could clean this up further if desired. >> >> Testing: >> - All Oracle tiers 1-5 builds (which includes an ARMv7 build) >> - GHA builds/tests >> - Oracle tiers 1-3 sanity testing >> >> Zero changes coming in via [JDK-8319777](https://bugs.openjdk.org/browse/JDK-8319777) will be merged when they arrive. >> >> Thanks. > > David Holmes has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: > > - Merge > - Fix typo > - Merge with master and update Zero code accordingly > - Merge branch 'master' into 8318776-supports_cx8 > - Remove unnecessary includes of vm_version.hpp. > Fix copyright years. > - Remove cx8 comment as no longer relevant (the spinlock is used regardless of cx8) > - Remove suports_cx8() checks from gtest > - Remove test for VMSupportsCX8 > - 8318776: Require supports_cx8 to always be true Thanks for all the reviews. Integrating now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16625#issuecomment-1824966904 From dholmes at openjdk.org Fri Nov 24 05:52:16 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 24 Nov 2023 05:52:16 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v8] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 16:42:35 GMT, Jaroslav Bachorik wrote: >> Please, review this fix for a corner case handling of `jmethodID` values. >> >> The issue is related to the interplay between `jmethodID` values and method redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` instance. Once that method gets redefined, the `jmethodID` is updated to point to the last `Method` version. >> Unless the method is still on stack/running, in which case the original `jmethodID` will be redirected to the latest `Method` version and at the same time the 'previous' `Method` version will receive a new `jmethodID` pointing to that previous version. >> >> If we happen to capture stacktrace via `GetStackTrace` or `GetAllStackTraces` JVMTI calls while this previous `Method` version is still on stack we will have the corresponding frame identified by a `jmethodID` pointing to that version. >> However, sooner or later the 'previous' class version becomes eligible for cleanup at what time all contained `Method` instances. The cleanup process will not perform the `jmethodID` pointer maintenance and we will end up with pointers to deallocated memory. >> This is caused by the fact that the `jmethodID` lifecycle is bound to `ClassLoaderData` instance and all relevant `jmethodID`s will get batch-updated when the class loader is being released and all its classes are getting unloaded. >> >> This means that we need to make sure that if a `Method` instance is being deallocate the associated `jmethodID` (if any) must not point to the deallocated instance once we are finished. Unfortunately, we can not just update the `jmethodID` values in bulk when purging an old class version - the per `InstanceKlass` jmethodID cache is present only for the main class version and contains `jmethodID` values for both the old and current method versions. >> >> Therefore we need to perform `jmethodID` lookup when we are about to deallocate a `Method` instance and clean up the pointer only if that `jmethodID` is pointing to the `Method` instance which is being deallocated. >> >> _(For anyone interested, a much lengthier writeup is available in [my blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_ > > Jaroslav Bachorik has updated the pull request incrementally with one additional commit since the last revision: > > Reinstate mistakenly deleted comment This has gotten a lot more complicated. All I was suggesting was if this: if (with_method_holders) { method->clear_jmethod_id(); } could be changed to if (method->method_holder() == nullptr) { method->clear_jmethod_id(); } Now I'm not at all sure what you are doing. src/hotspot/share/oops/instanceKlass.hpp line 1084: > 1082: inline void release_set_methods_jmethod_ids(jmethodID* jmeths); > 1083: // Used to explicitly clear jmethodIDs for the contained methods > 1084: // This is required for JDK-8313816 but should not be used otherwise! This comment is not helpful - what does it actually mean? Referring to a bugid doesn't help. ------------- PR Review: https://git.openjdk.org/jdk/pull/16662#pullrequestreview-1747340307 PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1403962272 From dholmes at openjdk.org Fri Nov 24 06:27:28 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 24 Nov 2023 06:27:28 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v3] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 22:15:35 GMT, Afshin Zafari wrote: >>> I still approve of this patch as it's better than what we had before. There are a lot of suggested improvements that can be done either in this PR or in a future RFE. `git blame` shows that this hasn't been touched since 2008, so I don't think applying all suggestions now is in any sense critical :-). >> >> Not touched since 2008 suggests to me there might not be a rush to make the change as proposed, and instead take >> the (I think small) additional time to do the better thing, e.g. the unary-predicate suggestion made by several folks. > > Dear @kimbarrett , @dholmes-ora , @stefank, @rose00 , @jdksjolen, @sspitsyn, @merykitty, > Thank you all. > The final code is now cleaner and nicer. @afshin-zafari I'm happy to hear the code is now cleaner and nicer, but AFAICS this new version of the code has only had a single review. It should really have been re-reviewed by at least one of the earlier reviewers. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15418#issuecomment-1825200669 From dholmes at openjdk.org Fri Nov 24 06:46:08 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 24 Nov 2023 06:46:08 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v4] In-Reply-To: References: Message-ID: <-YLzEt2tPsupH0CLU6278f4yX3si2I60OvHfDitr-tM=.e5d3dad2-b846-4da5-812f-ea2600cd2780@github.com> On Thu, 23 Nov 2023 08:40:55 GMT, Stefan Karlsson wrote: >> src/hotspot/share/runtime/vmOperations.cpp line 354: >> >>> 352: // alive. Filter out monitors with dead objects. >>> 353: return; >>> 354: } >> >> I don't think we need to do this, but even without this filtering I ran a number of tests and was unable to demonstrate any problem. The JNI locked monitor seems to be "invisible" to the frame that locked it and so the thread dump never encounters it. Were you able to provoke a failure here or is this defensive programming? > > I provoked test failures for all paths I filtered. If I remove this check and run: > > make -C ../build/fastdebug test TEST=test/hotspot/jtreg/runtime/Monitor/IterateMonitorWithDeadObjectTest.java JTREG="JAVA_OPTIONS=-XX:+UseZGC" > > > I hit this assert: > > # Internal Error (/home/stefank/git/jdk/open/src/hotspot/share/services/management.cpp:1274), pid=1546709, tid=1546754 > # assert(object != nullptr) failed: must be a Java object > ... > V [libjvm.so+0x1330ce8] jmm_DumpThreads+0x1a48 (management.cpp:1274) > j sun.management.ThreadImpl.dumpThreads0([JZZI)[Ljava/lang/management/ThreadInfo;+0 java.management at 22-internal > j sun.management.ThreadImpl.dumpAllThreads(ZZI)[Ljava/lang/management/ThreadInfo;+28 java.management at 22-internal > j sun.management.ThreadImpl.dumpAllThreads(ZZ)[Ljava/lang/management/ThreadInfo;+5 java.management at 22-internal > j IterateMonitorWithDeadObjectTest.dumpThreadsWithLockedMonitors()V+7 > j IterateMonitorWithDeadObjectTest.main([Ljava/lang/String;)V+11 > > > If I remove that assert I hit an NPE in the Java layer: > > java.lang.NullPointerException: Cannot invoke "Object.getClass()" because "lock" is null > at java.management/java.lang.management.ThreadInfo.(ThreadInfo.java:172) > at java.management/sun.management.ThreadImpl.dumpThreads0(Native Method) > at java.management/sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:518) > at java.management/sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:506) > at IterateMonitorWithDeadObjectTest.dumpThreadsWithLockedMonitors(IterateMonitorWithDeadObjectTest.java:44) > at IterateMonitorWithDeadObjectTest.main(IterateMonitorWithDeadObjectTest.java:66) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) > at java.base/java.lang.reflect.Method.invoke(Method.java:580) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) > at java.base/java.lang.Thread.run(Thread.java:1570) Thanks for that. Looks like JMM thread dump is different to VM Thread dump. Okay we definitely need RFEs to look into how to handle this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1403992144 From stuefe at openjdk.org Fri Nov 24 06:53:13 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 24 Nov 2023 06:53:13 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v8] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 16:42:35 GMT, Jaroslav Bachorik wrote: >> Please, review this fix for a corner case handling of `jmethodID` values. >> >> The issue is related to the interplay between `jmethodID` values and method redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` instance. Once that method gets redefined, the `jmethodID` is updated to point to the last `Method` version. >> Unless the method is still on stack/running, in which case the original `jmethodID` will be redirected to the latest `Method` version and at the same time the 'previous' `Method` version will receive a new `jmethodID` pointing to that previous version. >> >> If we happen to capture stacktrace via `GetStackTrace` or `GetAllStackTraces` JVMTI calls while this previous `Method` version is still on stack we will have the corresponding frame identified by a `jmethodID` pointing to that version. >> However, sooner or later the 'previous' class version becomes eligible for cleanup at what time all contained `Method` instances. The cleanup process will not perform the `jmethodID` pointer maintenance and we will end up with pointers to deallocated memory. >> This is caused by the fact that the `jmethodID` lifecycle is bound to `ClassLoaderData` instance and all relevant `jmethodID`s will get batch-updated when the class loader is being released and all its classes are getting unloaded. >> >> This means that we need to make sure that if a `Method` instance is being deallocate the associated `jmethodID` (if any) must not point to the deallocated instance once we are finished. Unfortunately, we can not just update the `jmethodID` values in bulk when purging an old class version - the per `InstanceKlass` jmethodID cache is present only for the main class version and contains `jmethodID` values for both the old and current method versions. >> >> Therefore we need to perform `jmethodID` lookup when we are about to deallocate a `Method` instance and clean up the pointer only if that `jmethodID` is pointing to the `Method` instance which is being deallocated. >> >> _(For anyone interested, a much lengthier writeup is available in [my blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_ > > Jaroslav Bachorik has updated the pull request incrementally with one additional commit since the last revision: > > Reinstate mistakenly deleted comment src/hotspot/share/oops/instanceKlass.hpp line 1087: > 1085: // - We can not use the jmethodID cache associated with klass directly because the 'previous' versions > 1086: // do not have the jmethodID cache filled in. Instead, we need to lookup jmethodID for each method and this > 1087: // is expensive - O(n) for one jmethodID lookup. For all contained methods it is O(n^2). The comment is helpful, but its an implementation comment, and I would move this part to the implementation. Here, what matters to know is that this function nulls out jmethodIDs for all methods in this IK. The comment also refers to class transformation specifically, I'd mention that so that readers get a context. Do I understand correctly that this comment tries to explain why we - instead of just iterating IK->_methods_jmethod_ids directly - we iterate all methods of this IK and then lookup the associated jmethodID in IK->_methods_jmethod_ids, right? I wondered about that but IIUC the pointers inside IK->_methods_jmethod_ids may refer to jmethodID slots that have been reused for different methods? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1403995913 From jbhateja at openjdk.org Fri Nov 24 06:57:12 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 24 Nov 2023 06:57:12 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v5] In-Reply-To: <93M-a4Fckf8STLcvAP1cV4msQHqoQ4vUgWo02_YiJxo=.1c79764d-603e-497c-bab1-04ac2d30fa72@github.com> References: <93M-a4Fckf8STLcvAP1cV4msQHqoQ4vUgWo02_YiJxo=.1c79764d-603e-497c-bab1-04ac2d30fa72@github.com> Message-ID: On Thu, 23 Nov 2023 19:12:21 GMT, Volodymyr Paprotski wrote: >> Splitting vblendvp[sd] into boolean operations is bit faster on ECore, get up to 30% gain >> >> >> =============== BEFORE =============== >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorSignum.floatSignum 256 avgt 3 77.766 ? 0.049 ns/op >> VectorSignum.floatSignum 512 avgt 3 154.889 ? 0.242 ns/op >> VectorSignum.floatSignum 1024 avgt 3 306.130 ? 0.605 ns/op >> VectorSignum.floatSignum 2048 avgt 3 609.965 ? 0.927 ns/op >> VectorSignum.doubleSignum 256 avgt 3 151.874 ? 1.748 ns/op >> VectorSignum.doubleSignum 512 avgt 3 303.080 ? 0.310 ns/op >> VectorSignum.doubleSignum 1024 avgt 3 607.517 ? 0.597 ns/op >> VectorSignum.doubleSignum 2048 avgt 3 1214.282 ? 1.834 ns/op >> Benchmark Mode Cnt Score Error Units >> MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op >> MaxMinOptimizeTest.dMax avgt 3 137.334 ? 0.128 us/op >> MaxMinOptimizeTest.dMin avgt 3 137.160 ? 0.465 us/op >> MaxMinOptimizeTest.dMul avgt 3 77.231 ? 0.051 us/op >> MaxMinOptimizeTest.fAdd avgt 3 77.165 ? 0.003 us/op >> MaxMinOptimizeTest.fMax avgt 3 107.428 ? 1.501 us/op >> MaxMinOptimizeTest.fMin avgt 3 107.186 ? 0.022 us/op >> MaxMinOptimizeTest.fMul avgt 3 77.164 ? 0.012 us/op >> >> =============== AFTER =============== >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorSignum.floatSignum 256 avgt 3 61.816 ? 1.980 ns/op >> VectorSignum.floatSignum 512 avgt 3 117.251 ? 0.052 ns/op >> VectorSignum.floatSignum 1024 avgt 3 231.356 ? 0.397 ns/op >> VectorSignum.floatSignum 2048 avgt 3 458.904 ? 0.774 ns/op >> VectorSignum.doubleSignum 256 avgt 3 121.449 ? 0.184 ns/op >> VectorSignum.doubleSignum 512 avgt 3 241.662 ? 0.189 ns/op >> VectorSignum.doubleSignum 1024 avgt 3 482.365 ? 0.165 ns/op >> VectorSignum.doubleSignum 2048 avgt 3 962.412 ? 1.401 ns/op >> Benchmark Mode Cnt Score Error Units >> MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op >> MaxMinOptimizeTest.dMax avgt 3 125.701 ? 0.082 us/op >> MaxMinOptimizeTest.dMin avgt 3 124.704 ? 0.119 us/op >> MaxMinOptimizeTest.dMul avgt 3 77.232 ? 0.028 us/op >> MaxMinOptimizeTest.fAdd avgt 3 77.169 ? 0.103 us/op >> MaxMinOptimizeTest.fMax avgt 3 97.939 ? 0.477 us/op >> MaxMinOptimizeTest.fMin avgt 3 98.012 ? 0.154 us/op >> MaxMinO... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > test break fix src/hotspot/cpu/x86/macroAssembler_x86.cpp line 3601: > 3599: if (compute_mask) { > 3600: vpxor(scratch, scratch, scratch, vector_len); > 3601: vpcmpgtq(scratch, scratch, mask, vector_len); I see assertion failures in following tests with JAVA_OPTIONS= -XX:UseAVX=1 -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts -Xbatch compiler/c2/cr6340864/TestDoubleVect.java compiler/loopopts/superword/ReductionPerf.java compiler/vectorization/TestSignumVector.java compiler/vectorization/runner/BasicDoubleOpTest.java AVX1 does not support integral vectors above 16 bytes, please use floating point compare instruction. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1403998019 From mbaesken at openjdk.org Fri Nov 24 07:59:18 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 24 Nov 2023 07:59:18 GMT Subject: RFR: JDK-8320383: refresh libraries cache on AIX in VMError::report [v3] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 15:13:14 GMT, Thomas Stuefe wrote: >> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: >> >> use new method also in print_vm_info > > src/hotspot/share/utilities/vmError.cpp line 727: > >> 725: if (should_report_bug(_id)) { >> 726: os::prepare_native_symbols(); >> 727: } > > Ugh, misuse of "should report bug" as "is oom error" :-/ But no problem, I see you just repeat the existing pattern, that is fine. We can clean that in a separate patch. Hi Thomas, any suggestion for a better name? Maybe there was a reason to keep it generic ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16730#discussion_r1404040760 From mbaesken at openjdk.org Fri Nov 24 07:59:19 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 24 Nov 2023 07:59:19 GMT Subject: Integrated: JDK-8320383: refresh libraries cache on AIX in VMError::report In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 09:14:29 GMT, Matthias Baesken wrote: > VMError::report outputs information about loaded shared libraries; but this info could be outdated on AIX because the libraries cache is currently not refreshed before printing. > This is similar to [JDK-8318587](https://bugs.openjdk.org/browse/JDK-8318587) . > The refresh in VMError::report could be omitted in some situations where the refresh call is considered problematic. This pull request has now been integrated. Changeset: 26c33904 Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/26c3390421f4888eb59017cadb2bf21a15e25b5e Stats: 25 lines in 6 files changed: 20 ins; 4 del; 1 mod 8320383: refresh libraries cache on AIX in VMError::report Reviewed-by: stuefe, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/16730 From jvernee at openjdk.org Fri Nov 24 08:01:34 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 24 Nov 2023 08:01:34 GMT Subject: RFR: 8320310: CompiledMethod::has_monitors flag can be incorrect Message-ID: <4efExybeWDkEbcsckI1Qdz8kpYFqd-Rbmt7oiWz5qlo=.d8d38d0e-affa-48dc-b963-45f958041c4e@github.com> Currently, the `CompiledMethod::has_monitors` flag is set when either a `monitorenter` is parsed by C1, and `monitorexit` is parsed by C1 or C2 during method compilation. However, not necessarily every bytecode of a method is parsed, which means that we could miss all `monitorenter`/`monitorexit` byte codes in a method, while it actually does use monitors. This can lead to situations where a thread holds a monitor, but `has_monitors` for all frames is set to `false`, leading to an assertion failure in 'freeze_internal' in continuationFreezeThaw.cpp: assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int64_t)current->held_monitor_count(), (int64_t)current->jni_monitor_count()); The proposed fix is to rely on `Method::has_monitor_bytecodes` to set the `has_monitors` flag when compiling, which is immune to issues where not all byte codes of a method are parsed during compilation. We can follow the pattern established for `has_reserved_stack_access`, which is similar. Note that this PR is based on: https://github.com/openjdk/jdk/pull/16416 which disables the assertion. The goal of this PR is to fix the issue, and then re-enable the assertion. Testing: Tier 1-4, `java/lang/Thread/virtual/stress/PinALot.java` ------------- Depends on: https://git.openjdk.org/jdk/pull/16416 Commit messages: - fix has_monitors tracking. Re-enable assert Changes: https://git.openjdk.org/jdk/pull/16799/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16799&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320310 Stats: 48 lines in 5 files changed: 9 ins; 17 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/16799.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16799/head:pull/16799 PR: https://git.openjdk.org/jdk/pull/16799 From azafari at openjdk.org Fri Nov 24 08:02:21 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 24 Nov 2023 08:02:21 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v3] In-Reply-To: References: Message-ID: <-zu_qHuhBvXKpJp4e0Hn1tE1a2Gsk4GpRBi-kapasFo=.362a1fd6-03b9-4b9f-a22f-c5df4ab243f0@github.com> On Thu, 23 Nov 2023 22:15:35 GMT, Afshin Zafari wrote: >>> I still approve of this patch as it's better than what we had before. There are a lot of suggested improvements that can be done either in this PR or in a future RFE. `git blame` shows that this hasn't been touched since 2008, so I don't think applying all suggestions now is in any sense critical :-). >> >> Not touched since 2008 suggests to me there might not be a rush to make the change as proposed, and instead take >> the (I think small) additional time to do the better thing, e.g. the unary-predicate suggestion made by several folks. > > Dear @kimbarrett , @dholmes-ora , @stefank, @rose00 , @jdksjolen, @sspitsyn, @merykitty, > Thank you all. > The final code is now cleaner and nicer. > @afshin-zafari I'm happy to hear the code is now cleaner and nicer, but AFAICS this new version of the code has only had a single review. It should really have been re-reviewed by at least one of the earlier reviewers. Thanks. Oh, really sorry. That's obviously my mistake. Your are right, I should have waited for others. What to do now? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15418#issuecomment-1825277911 From stefank at openjdk.org Fri Nov 24 08:04:06 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 24 Nov 2023 08:04:06 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v4] In-Reply-To: <-YLzEt2tPsupH0CLU6278f4yX3si2I60OvHfDitr-tM=.e5d3dad2-b846-4da5-812f-ea2600cd2780@github.com> References: <-YLzEt2tPsupH0CLU6278f4yX3si2I60OvHfDitr-tM=.e5d3dad2-b846-4da5-812f-ea2600cd2780@github.com> Message-ID: On Fri, 24 Nov 2023 06:43:35 GMT, David Holmes wrote: >> I provoked test failures for all paths I filtered. If I remove this check and run: >> >> make -C ../build/fastdebug test TEST=test/hotspot/jtreg/runtime/Monitor/IterateMonitorWithDeadObjectTest.java JTREG="JAVA_OPTIONS=-XX:+UseZGC" >> >> >> I hit this assert: >> >> # Internal Error (/home/stefank/git/jdk/open/src/hotspot/share/services/management.cpp:1274), pid=1546709, tid=1546754 >> # assert(object != nullptr) failed: must be a Java object >> ... >> V [libjvm.so+0x1330ce8] jmm_DumpThreads+0x1a48 (management.cpp:1274) >> j sun.management.ThreadImpl.dumpThreads0([JZZI)[Ljava/lang/management/ThreadInfo;+0 java.management at 22-internal >> j sun.management.ThreadImpl.dumpAllThreads(ZZI)[Ljava/lang/management/ThreadInfo;+28 java.management at 22-internal >> j sun.management.ThreadImpl.dumpAllThreads(ZZ)[Ljava/lang/management/ThreadInfo;+5 java.management at 22-internal >> j IterateMonitorWithDeadObjectTest.dumpThreadsWithLockedMonitors()V+7 >> j IterateMonitorWithDeadObjectTest.main([Ljava/lang/String;)V+11 >> >> >> If I remove that assert I hit an NPE in the Java layer: >> >> java.lang.NullPointerException: Cannot invoke "Object.getClass()" because "lock" is null >> at java.management/java.lang.management.ThreadInfo.(ThreadInfo.java:172) >> at java.management/sun.management.ThreadImpl.dumpThreads0(Native Method) >> at java.management/sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:518) >> at java.management/sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:506) >> at IterateMonitorWithDeadObjectTest.dumpThreadsWithLockedMonitors(IterateMonitorWithDeadObjectTest.java:44) >> at IterateMonitorWithDeadObjectTest.main(IterateMonitorWithDeadObjectTest.java:66) >> at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) >> at java.base/java.lang.reflect.Method.invoke(Method.java:580) >> at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) >> at java.base/java.lang.Thread.run(Thread.java:1570) > > Thanks for that. Looks like JMM thread dump is different to VM Thread dump. Okay we definitely need RFEs to look into how to handle this. Will you create the RFE? I'm not as convinced that this is something that needs to be fixed, so it would be better if you create the RFE with the proper motivation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1404046328 From stuefe at openjdk.org Fri Nov 24 08:29:16 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 24 Nov 2023 08:29:16 GMT Subject: RFR: JDK-8320383: refresh libraries cache on AIX in VMError::report [v3] In-Reply-To: References: Message-ID: On Fri, 24 Nov 2023 07:54:27 GMT, Matthias Baesken wrote: >> src/hotspot/share/utilities/vmError.cpp line 727: >> >>> 725: if (should_report_bug(_id)) { >>> 726: os::prepare_native_symbols(); >>> 727: } >> >> Ugh, misuse of "should report bug" as "is oom error" :-/ But no problem, I see you just repeat the existing pattern, that is fine. We can clean that in a separate patch. > > Hi Thomas, any suggestion for a better name? Maybe there was a reason to keep it generic ? If you want to change this, I would do this: - static bool should_report_bug(unsigned int id) { - return (id != OOM_MALLOC_ERROR) && (id != OOM_MMAP_ERROR); - } + static bool is_oom_error(unsigned int id) { + return (id == OOM_MALLOC_ERROR) || (id == OOM_MMAP_ERROR); + } + static bool should_report_bug(unsigned int id) { + // since oom errors depend on external test conditions, they don't count as real bugs + return !is_oom_error(id); + } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16730#discussion_r1404066346 From stefank at openjdk.org Fri Nov 24 08:38:26 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 24 Nov 2023 08:38:26 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v3] In-Reply-To: <-zu_qHuhBvXKpJp4e0Hn1tE1a2Gsk4GpRBi-kapasFo=.362a1fd6-03b9-4b9f-a22f-c5df4ab243f0@github.com> References: <-zu_qHuhBvXKpJp4e0Hn1tE1a2Gsk4GpRBi-kapasFo=.362a1fd6-03b9-4b9f-a22f-c5df4ab243f0@github.com> Message-ID: <-4W-9hUd7pls0UYyItZ9kF0jnWHIjv2Lk9u4noy0GMY=.334e45a7-75c7-4bf1-93ff-ffec226545ee@github.com> On Fri, 24 Nov 2023 07:59:51 GMT, Afshin Zafari wrote: > > @afshin-zafari I'm happy to hear the code is now cleaner and nicer, but AFAICS this new version of the code has only had a single review. It should really have been re-reviewed by at least one of the earlier reviewers. Thanks. > > Oh, really sorry. That's obviously my mistake. Your are right, I should have waited for others. What to do now? I think the other reviewers can give their review feedback then you create a new RFE to fix whatever they'd like to get fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15418#issuecomment-1825313436 From tschatzl at openjdk.org Fri Nov 24 09:18:25 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 24 Nov 2023 09:18:25 GMT Subject: RFR: 8317809: Insertion of free code blobs into code cache can be very slow during class unloading [v3] In-Reply-To: <_dcFF70_w7IXSjb6w-HuHCBkPyS3a6NlzejtqdfdYnM=.74e0f9df-eca3-49b0-be68-2d5824c16003@github.com> References: <_dcFF70_w7IXSjb6w-HuHCBkPyS3a6NlzejtqdfdYnM=.74e0f9df-eca3-49b0-be68-2d5824c16003@github.com> Message-ID: > Insert code blobs in a sorted fashion to exploit the finger-optimization when adding, making this procedure O(n) instead of O(n^2) > > Introduces a globally available ClassUnloadingContext that contains common methods pertaining to class and code unloading. GCs may use it to efficiently manage unlinked class loader datas and nmethods to allow use of common methods (unlink/merge). > > The steps typically are registering a new to be unlinked CLD/nmethod, and then purge its memory later. STW collectors perform this work in one big chunk taking the CodeCache_lock, for the entire duration, while concurrent collectors lock/unlock for every insertion to allow for concurrent users for the lock to progress. > > Some care has been taken to stay consistent with an "unloading = unlinking + purge" scheme; however particularly the existing CLD handling API (still) mixes unlinking and purging in its CLD::unload() call. To simplify this change that is mostly geared towards separating nmethod unlinking from purging, to make code blob freeing O(n) instead of O(n^2). > > Upcoming changes will > * separate nmethod unregistering from nmethod purging to allow doing that in bulk (for the STW collectors); that can significantly reduce code purging time for the STW collectors. > * better name the second stage of unlinking (called "cleaning" throughout, e.g. the work done in `G1CollectedHeap::complete_cleaning`) > * untangle CLD unlinking and what's called "cleaning" now to allow moving more stuff into the second unlinking stage for better parallelism > * G1: move some significant tasks from the remark pause to concurrent (unregistering nmethods, freeing code blobs and cld/metaspace purging) > * Maybe move Serial/Parallel GC metaspace purging closer to other unlinking/purging code to keep things local and allow easier logging. > > These are the reason for the class hierarchy for `ClassUnloadingContext`: the goal is to ultimately have about this phasing (for G1): > 1. collect all dead CLDs, using the `register_unloading_class_loader_data` method *only* > 2. parallelize the stuff in `ClassLoaderData::unload()` in one way or another, adding them to the `complete_cleaning` (parallel) phase. > 3. `purge_nmethods`, `free_code_blobs` and the `remove_unlinked_nmethods_from_code_root_set` (from JDK-8317007) will be concurrent. > > Particularly the split of `SystemDictionary::do_unloading` into "only" traversing the CLDs to find the dead ones and then in parallel process them in 2. above warrants a separate `ClassUnloadingCo... Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into mergeme - iwalulya review, naming - 8317809 Insert code blobs in a sorted fashion to exploit the finger-optimization when adding, making this procedure O(n) instead of O(n^2) Introduce a globally available ClassUnloadingContext that contains common methods pertaining to class and code unloading. GCs may use it to efficiently manage unlinked class loader datas and nmethods to allow use of common methods (unlink/merge). The steps typically are registering a new to be unlinked CLD/nmethod, and then purge its memory later. STW collectors perform this work in one big chunk taking the CodeCache_lock, for the entire duration, while concurrent collectors lock/unlock for every insertion to allow for concurrent users for the lock to progress. Some care has been taken to stay consistent with an "unloading = unlinking + purge" scheme; however particularly the existing CLD handling API (still) mixes unlinking and purging in its CLD::unload() call. To simplify this change that is mostly geared towards separating nmethod unlinking from purging, to make code blob freeing O(n) instead of O(n^2). Upcoming changes will * separate nmethod unregistering from nmethod purging to allow doing that in bulk (for the STW collectors); that can significantly reduce code purging time for the STW collectors. * better name the second stage of unlinking (called "cleaning" throughout, e.g. the work done in `G1CollectedHeap::complete_cleaning`) * untangle CLD unlinking and what's called "cleaning" now to allow moving more stuff into the second unlinking stage for better parallelism * G1: move some signifcant tasks from the remark pause to concurrent (unregistering nmethods, freeing code blobs and cld/metaspace purging) * Maybe move Serial/Parallel GC metaspace purging closer to other unlinking/purging code to keep things local and allow easier logging. - Only run test case on debug VMs, sufficient - 8320331 g1 full gc "during" verification accesses half-unloaded metadata ------------- Changes: https://git.openjdk.org/jdk/pull/16759/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16759&range=02 Stats: 495 lines in 28 files changed: 368 ins; 83 del; 44 mod Patch: https://git.openjdk.org/jdk/pull/16759.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16759/head:pull/16759 PR: https://git.openjdk.org/jdk/pull/16759 From tschatzl at openjdk.org Fri Nov 24 09:18:28 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 24 Nov 2023 09:18:28 GMT Subject: RFR: 8317809: Insertion of free code blobs into code cache can be very slow during class unloading [v2] In-Reply-To: References: <_dcFF70_w7IXSjb6w-HuHCBkPyS3a6NlzejtqdfdYnM=.74e0f9df-eca3-49b0-be68-2d5824c16003@github.com> Message-ID: On Thu, 23 Nov 2023 05:06:53 GMT, Amit Kumar wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> iwalulya review, naming > > src/hotspot/share/classfile/classLoaderData.cpp line 602: > >> 600: >> 601: // Clean up class dependencies and tell serviceability tools >> 602: // these classes are unloading. This must be called > > Suggestion: > > // these classes are unloading. This must be called Hotspot code style allows comments with both two spaces and one space between sentences in a paragraph. I did not want to change the style there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16759#discussion_r1404116626 From stefank at openjdk.org Fri Nov 24 09:18:46 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 24 Nov 2023 09:18:46 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v5] In-Reply-To: References: Message-ID: > In the rewrites made for: > [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` > > I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. > > The provided tests provoke this assert form: > * the JNI thread detach code > * thread dumping with locked monitors, and > * the JVMTI GetOwnedMonitorInfo API. > > While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. > > The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. > > For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. > > Test: the written tests with and without the fix. Tier1-Tier3, so far. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Split test and use othervm ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16783/files - new: https://git.openjdk.org/jdk/pull/16783/files/9521b26d..bad51926 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16783&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16783&range=03-04 Stats: 29 lines in 1 file changed: 23 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16783.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16783/head:pull/16783 PR: https://git.openjdk.org/jdk/pull/16783 From jsjolen at openjdk.org Fri Nov 24 10:03:26 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 24 Nov 2023 10:03:26 GMT Subject: RFR: 8314502: Change the comparator taking version of GrowableArray::find to be a template method [v12] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 16:11:32 GMT, Afshin Zafari wrote: >> The `find` method now is >> ```C++ >> template >> int find(T* token, bool f(T*, E)) const { >> ... >> >> Any other functions which use this are also changed. >> Local linux-x64-debug hotspot:tier1 passed. Mach5 tier1 build on linux and Windows passed. > > Afshin Zafari has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'upstream/pr/15418' into pr_15418 > - Suggested cleanups and tests Still LGTM. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15418#issuecomment-1825422568 From omikhaltcova at openjdk.org Fri Nov 24 10:06:18 2023 From: omikhaltcova at openjdk.org (Olga Mikhaltsova) Date: Fri, 24 Nov 2023 10:06:18 GMT Subject: RFR: 8318158: RISC-V: implement roundD/roundF intrinsics [v6] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 15:44:47 GMT, Olga Mikhaltsova wrote: >> Please, review this Implementation of the roundD/roundF intrinsics for RISC-V platform. >> >> In the table below it is shown that NaN argument should be processed as a special case. >> >> RISC-V Java >> (FCVT.W.S) (FCVT.L.D) (long round(double a)) (int round(float a)) >> Minimum valid input (after rounding) ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE >> Maximum valid input (after rounding) 2^31 ? 1 2^63 ? 1 Long.MAX_VALUE Integer.MAX_VALUE >> Output for out-of-range negative input ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE >> Output for ?? ?2^31 ?2^63 Long.MIN_VALUE Integer.MIN_VALUE >> Output for out-of-range positive input 2^31 ? 1 2^63 - 1 Long.MAX_VALUE Integer.MAX_VALUE >> Output for +? 2^31 ? 1 2^63 - 1 Long.MAX_VALUE Integer.MAX_VALUE >> Output for NaN 2^31 ? 1 2^63 - 1 0 0 >> >> The benchmark running with the 2nd fixed implementation on the T-Head RVB-ICE board shows the following performance improvement:: >> >> **Before** >> >> Benchmark (TESTSIZE) Mode Cnt Score Error Units >> FpRoundingBenchmark.test_round_double 2048 thrpt 15 59.555 0.179 ops/ms >> FpRoundingBenchmark.test_round_float 2048 thrpt 15 49.760 0.103 ops/ms >> >> >> **After** >> >> Benchmark (TESTSIZE) Mode Cnt Score Error Units >> FpRoundingBenchmark.test_round_double 2048 thrpt 15 110.956 0.186 ops/ms >> FpRoundingBenchmark.test_round_float 2048 thrpt 15 115.947 0.122 ops/ms > > Olga Mikhaltsova has updated the pull request incrementally with one additional commit since the last revision: > > Replaced tmp with t0 gentle ping, please take a look at this pr! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16382#issuecomment-1825426100 From duke at openjdk.org Fri Nov 24 10:18:31 2023 From: duke at openjdk.org (Lei Zaakjyu) Date: Fri, 24 Nov 2023 10:18:31 GMT Subject: RFR: JDK-8234502 : Merge GenCollectedHeap and SerialHeap [v8] In-Reply-To: References: Message-ID: > JDK-8234502 : Merge GenCollectedHeap and SerialHeap Lei Zaakjyu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Merge branch 'openjdk:master' into serialgc - replace a necessary include statement - clean up - add line-breaks - fix include statements - add some headers - Completely removed 'GenCollectedHeap' - Fix 'young_gen' function in 'genCollectedHeap.cpp' - include 'serialVMOperations.hpp' - fix trialing whitespace - ... and 2 more: https://git.openjdk.org/jdk/compare/8f060df7...12c680a3 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16623/files - new: https://git.openjdk.org/jdk/pull/16623/files/0563686a..12c680a3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16623&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16623&range=06-07 Stats: 627411 lines in 1531 files changed: 93131 ins; 472348 del; 61932 mod Patch: https://git.openjdk.org/jdk/pull/16623.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16623/head:pull/16623 PR: https://git.openjdk.org/jdk/pull/16623 From qamai at openjdk.org Fri Nov 24 10:35:07 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 24 Nov 2023 10:35:07 GMT Subject: RFR: 8318227: RISC-V: C2 ConvHF2F In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 17:13:47 GMT, Hamlin Li wrote: > Hi, > Can you review the patch to add ConvHF2F intrinsic to JDK for riscv? > Thanks! > > (By latest kernel patch, `#define RISCV_HWPROBE_EXT_ZFH (1 << 27)` > https://lore.kernel.org/lkml/20231114141256.126749-11-cleger at rivosinc.com/) > > ## Test > ### Functionality > #### hotspot tests > test/hotspot/jtreg/compiler/intrinsics/ > test/hotspot/jtreg/compiler/c2/irTests > > #### jdk tests > test/jdk/java/lang/Float/Binary16Conversion*.java > > ### Performance > tested on licheepi. > > #### with UseZfh enabled > (i.e. enable the intrinsic) > > Benchmark (size) Mode Cnt Score Error Units > Fp16ConversionBenchmark.float16ToFloat 2048 avgt 10 4659.796 ? 13.262 ns/op > Fp16ConversionBenchmark.float16ToFloatMemory 2048 avgt 10 22.957 ? 0.098 ns/op > > > #### with UseZfh disabled > (i.e. disable the intrinsic) > > Benchmark (size) Mode Cnt Score Error Units > Fp16ConversionBenchmark.float16ToFloat 2048 avgt 10 22930.591 ? 72.595 ns/op > Fp16ConversionBenchmark.float16ToFloatMemory 2048 avgt 10 25.970 ? 0.063 ns/op src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1691: > 1689: fmv_h_x(dst, src); > 1690: fcvt_s_h(dst, dst); > 1691: j(done); Since `Nan`s are exceptional inputs, it would be beneficial to move the handling below to an out-of-line stub. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16802#discussion_r1404198128 From jbachorik at openjdk.org Fri Nov 24 11:27:08 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Fri, 24 Nov 2023 11:27:08 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v8] In-Reply-To: References: Message-ID: <039WrJCh_1K3zhoGN8MlABuPMgPJ4Kom7XYCl5glqxY=.f8369b03-0f87-40d8-9763-1df4f0043815@github.com> On Fri, 24 Nov 2023 05:47:14 GMT, David Holmes wrote: >> Jaroslav Bachorik has updated the pull request incrementally with one additional commit since the last revision: >> >> Reinstate mistakenly deleted comment > > This has gotten a lot more complicated. All I was suggesting was if this: > > if (with_method_holders) { > method->clear_jmethod_id(); > } > > could be changed to > > if (method->method_holder() == nullptr) { > method->clear_jmethod_id(); > } > > Now I'm not at all sure what you are doing. @dholmes-ora Unfortunately, I can not just do `method->method_holder() == nullptr` as `method_holder()` is expanding to `Method::constants()->pool_holder()` and `Method::constants()` is expanding to `Method::constMethod()->constants()`. This can cause SIGSEGV if either `Method::_constMethod` or `ConstMethod::_constants` is NULL. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16662#issuecomment-1825532431 From jbachorik at openjdk.org Fri Nov 24 11:42:09 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Fri, 24 Nov 2023 11:42:09 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v8] In-Reply-To: References: Message-ID: <3Sf09SHxmBuy_R7aIUi3zZ7hdvr_HWtXmNlTY05XSqc=.bd601b5c-be27-48ca-ac89-0b4ab41c74a7@github.com> On Fri, 24 Nov 2023 06:49:14 GMT, Thomas Stuefe wrote: >I wondered about that but IIUC the pointers inside IK->_methods_jmethod_ids may refer to jmethodID slots that have been reused for different methods? Yes. The reason is that if a class has previous versions, these versions do not contain their own jmethodID cache but rather delegate to the 'main' version. So we have nothing to iterate over when a previous class version gets unloaded - also, to make things more interesting, the previous method versions are pointing to the main class version as their `method_holder`. Making it impossible to iterate over the full jmethodID cache in the main class version and nulling out a jmethodID only if the method it points to has `method_holder` pointing to the class being unloaded (it will always point to the main class version ...). If there is an easy way out of this without incurring O(n^2) complexity, please speak up. I was not able to figure out an alternative without eg. keeping an explicit method->jmethodID link but that would increase the Method instance footprint and it felt like a more invasive change for what I want to achieve. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1404260866 From duke at openjdk.org Fri Nov 24 11:48:05 2023 From: duke at openjdk.org (suchismith1993) Date: Fri, 24 Nov 2023 11:48:05 GMT Subject: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2] In-Reply-To: References: <8-buFPL9W3149qcnluk_XqTQr-cJYqu_XvwU5ovyAIA=.396e5005-f896-48b9-919c-94164229d7bf@github.com> <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> Message-ID: On Thu, 23 Nov 2023 18:26:56 GMT, suchismith1993 wrote: > > > > I'm not a big fan of this approach. We accumulate more and more "#ifdef AIX" in shared code because of many recent AIX additions. No other platform has such a large ifdef footprint in shared code. > > > > I argue that all of this should be handled inside os_aix.cpp and not leak out into the external space: > > > > If .a is a valid shared object format on AIX, this should be handled in `os::dll_load()`, and be done for all shared objects. If not, why do we try to load a static archive via dlload in this case but not in other cases? > > > > _If_ this is needed in shared code, the string replacement function should be a generic utility function for all platforms, and it should be tested with a small gtest. A gtest would have likely uncovered the buffer overflow too. > > > > > > > > > So i tried to check how to move the code to os_aix file. A few problems is see : > > > > > > 1. When i have to implemented the logic at dll_load function, i would have to repeat a lot of code after dlopen, i.e i have to call dlopen again for .so files and hence i have to copy the logic again for it. > > > 2. Currently using function before dll_load,in the shared code makes this a bit easier. > > > I have an alternate suggestion . > > > Shall we declare the utlity function as part of os ? and implement it platform wise. > > > > > > Not without any need. If this is an AIX specific issue, it should be handed in os::dll_load on AIX. > > > In that way we just keep the actual implentation and aix and in windows and linux we keep it empty. > > > So that way we can just call the utility function in shared code and it wouldnt affect other platform and will run the usecase for AIX. > > > If that is not acceptable, then is there a better way to avoid repeating the dlopen again in os_aix file ? > > > > > > I don't understand the problem. What is preventing you from using a file local scope utility function inside os::dll_load? > > i would have to repeat the line 1132 and 1139 in os_aix.cpp again , if the condition fails for .so files, because i have to reload it again and check if the .a exists. In the shared code i had repeat less number of lines i believe. Do you suggest moving lines 1132 to 1139 to another function then ? @tstuefe Any suggestion on this ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1825557400 From duke at openjdk.org Fri Nov 24 11:48:09 2023 From: duke at openjdk.org (suchismith1993) Date: Fri, 24 Nov 2023 11:48:09 GMT Subject: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2] In-Reply-To: References: <8-buFPL9W3149qcnluk_XqTQr-cJYqu_XvwU5ovyAIA=.396e5005-f896-48b9-919c-94164229d7bf@github.com> <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> Message-ID: On Thu, 23 Nov 2023 06:03:15 GMT, Thomas Stuefe wrote: >> suchismith1993 has updated the pull request incrementally with one additional commit since the last revision: >> >> change macro position > > src/hotspot/os/aix/os_aix.cpp line 3065: > >> 3063: //Replaces provided path with alternate path for the given file,if it doesnt exist. >> 3064: //For AIX,this replaces .so with .a. >> 3065: void os::Aix::mapAlternateName(char* buffer, const char *extension) { > > The documentation is wrong: > > // Replaces the specified path with an alternative path for the > given file if the original path doesn't exist > > It does no such thing; it replaces the extension unconditionally. The comment sounds like it does a file system check. > > The whole function is not that well named - "map alternate name" does not really tell me anything, I need to look at the implementation and the caller to understand what it is doing. There is no mapping here, this is just a string utility function. > > The function should not modify the original buffer but instead assemble a copy. That is the conventional way to do these things. You can work with immutable strings as input, e.g. literals, and don't risk buffer overflows. > > All of this should be handled inside os_aix.cpp; see my other comment. This should not live in the external os::aix interface, since it has nothing to do with AIX. But I think all of this should be confined to os_aix.cpp. > > Proposal for a clearer name, comment, and pseudocode > > // Given a filename with an extension, return a new string containing the filename with the new extension. > // New string is allocated in resource area. > static char* replace_extension_in_filename(const char* filename, const char* new_extension) { > - allocate buffer in RA > - assemble new path by contacting old path - old extension + new extension > - return new path > } thank you for the explanation. I am working on it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16604#discussion_r1404265530 From jbachorik at openjdk.org Fri Nov 24 12:01:27 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Fri, 24 Nov 2023 12:01:27 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v9] In-Reply-To: References: Message-ID: > Please, review this fix for a corner case handling of `jmethodID` values. > > The issue is related to the interplay between `jmethodID` values and method redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` instance. Once that method gets redefined, the `jmethodID` is updated to point to the last `Method` version. > Unless the method is still on stack/running, in which case the original `jmethodID` will be redirected to the latest `Method` version and at the same time the 'previous' `Method` version will receive a new `jmethodID` pointing to that previous version. > > If we happen to capture stacktrace via `GetStackTrace` or `GetAllStackTraces` JVMTI calls while this previous `Method` version is still on stack we will have the corresponding frame identified by a `jmethodID` pointing to that version. > However, sooner or later the 'previous' class version becomes eligible for cleanup at what time all contained `Method` instances. The cleanup process will not perform the `jmethodID` pointer maintenance and we will end up with pointers to deallocated memory. > This is caused by the fact that the `jmethodID` lifecycle is bound to `ClassLoaderData` instance and all relevant `jmethodID`s will get batch-updated when the class loader is being released and all its classes are getting unloaded. > > This means that we need to make sure that if a `Method` instance is being deallocate the associated `jmethodID` (if any) must not point to the deallocated instance once we are finished. Unfortunately, we can not just update the `jmethodID` values in bulk when purging an old class version - the per `InstanceKlass` jmethodID cache is present only for the main class version and contains `jmethodID` values for both the old and current method versions. > > Therefore we need to perform `jmethodID` lookup when we are about to deallocate a `Method` instance and clean up the pointer only if that `jmethodID` is pointing to the `Method` instance which is being deallocated. > > _(For anyone interested, a much lengthier writeup is available in [my blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_ Jaroslav Bachorik has updated the pull request incrementally with one additional commit since the last revision: Comment adjustments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16662/files - new: https://git.openjdk.org/jdk/pull/16662/files/46eff8d3..554b3ae0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16662&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16662&range=07-08 Stats: 12 lines in 2 files changed: 7 ins; 4 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16662.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16662/head:pull/16662 PR: https://git.openjdk.org/jdk/pull/16662 From dholmes at openjdk.org Fri Nov 24 12:53:06 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 24 Nov 2023 12:53:06 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v5] In-Reply-To: References: <-YLzEt2tPsupH0CLU6278f4yX3si2I60OvHfDitr-tM=.e5d3dad2-b846-4da5-812f-ea2600cd2780@github.com> Message-ID: <29uxp2DVbW7p6RLM3yZUXnhS8Bc47Il0CVLJAKY21jA=.8f7b43cc-28ab-454d-9d69-ddaf75296943@github.com> On Fri, 24 Nov 2023 08:01:15 GMT, Stefan Karlsson wrote: >> Thanks for that. Looks like JMM thread dump is different to VM Thread dump. Okay we definitely need RFEs to look into how to handle this. > > Will you create the RFE? I'm not as convinced that this is something that needs to be fixed, so it would be better if you create the RFE with the proper motivation. Yes I will create it. Thanks ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1404321515 From dchuyko at openjdk.org Fri Nov 24 14:06:22 2023 From: dchuyko at openjdk.org (Dmitry Chuyko) Date: Fri, 24 Nov 2023 14:06:22 GMT Subject: RFR: 8309271: A way to align already compiled methods with compiler directives [v11] In-Reply-To: References: Message-ID: > Compiler Control (https://openjdk.org/jeps/165) provides method-context dependent control of the JVM compilers (C1 and C2). The active directive stack is built from the directive files passed with the `-XX:CompilerDirectivesFile` diagnostic command-line option and the Compiler.add_directives diagnostic command. It is also possible to clear all directives or remove the top from the stack. > > A matching directive will be applied at method compilation time when such compilation is started. If directives are added or changed, but compilation does not start, then the state of compiled methods doesn't correspond to the rules. This is not an error, and it happens in long running applications when directives are added or removed after compilation of methods that could be matched. For example, the user decides that C2 compilation needs to be disabled for some method due to a compiler bug, issues such a directive but this does not affect the application behavior. In such case, the target application needs to be restarted, and such an operation can have high costs and risks. Another goal is testing/debugging compilers. > > It would be convenient to optionally reconcile at least existing matching nmethods to the current stack of compiler directives (so bypass inlined methods). > > Natural way to eliminate the discrepancy between the result of compilation and the broken rule is to discard the compilation result, i.e. deoptimization. Prior to that we can try to re-compile the method letting compile broker to perform it taking new directives stack into account. Re-compilation helps to prevent hot methods from execution in the interpreter. > > A new flag `-r` has beed introduced for some directives related to compile commands: `Compiler.add_directives`, `Compiler.remove_directives`, `Compiler.clear_directives`. The default behavior has not changed (no flag). If the new flag is present, the command scans already compiled methods and puts methods that have any active non-default matching compiler directives to re-compilation if possible, otherwise marks them for deoptimization. There is currently no distinction which directives are found. In particular, this means that if there are rules for inlining into some method, it will be refreshed. On the other hand, if there are rules for a method and it was inlined, top-level methods won't be refreshed, but this can be achieved by having rules for them. > > In addition, a new diagnostic command `Compiler.replace_directives`, has been added for ... Dmitry Chuyko has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: - Merge branch 'openjdk:master' into compiler-directives-force-update - Merge branch 'openjdk:master' into compiler-directives-force-update - Merge branch 'openjdk:master' into compiler-directives-force-update - Merge branch 'openjdk:master' into compiler-directives-force-update - Merge branch 'openjdk:master' into compiler-directives-force-update - Merge branch 'openjdk:master' into compiler-directives-force-update - jcheck - Unnecessary import - force_update->refresh - Merge branch 'openjdk:master' into compiler-directives-force-update - ... and 19 more: https://git.openjdk.org/jdk/compare/0c9a61c1...c0d887af ------------- Changes: https://git.openjdk.org/jdk/pull/14111/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14111&range=10 Stats: 372 lines in 15 files changed: 339 ins; 3 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/14111.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14111/head:pull/14111 PR: https://git.openjdk.org/jdk/pull/14111 From stuefe at openjdk.org Fri Nov 24 14:07:08 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 24 Nov 2023 14:07:08 GMT Subject: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2] In-Reply-To: References: <8-buFPL9W3149qcnluk_XqTQr-cJYqu_XvwU5ovyAIA=.396e5005-f896-48b9-919c-94164229d7bf@github.com> <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> Message-ID: On Fri, 24 Nov 2023 11:45:25 GMT, suchismith1993 wrote: > > i would have to repeat the line 1132 and 1139 in os_aix.cpp again , if the condition fails for .so files, because i have to reload it again and check if the .a exists. In the shared code i had repeat less number of lines i believe. Do you suggest moving lines 1132 to 1139 to another function then ? > > @tstuefe Any suggestion on this ? --- a/src/hotspot/os/aix/os_aix.cpp +++ b/src/hotspot/os/aix/os_aix.cpp @@ -1108,7 +1108,7 @@ bool os::dll_address_to_library_name(address addr, char* buf, return true; } -void *os::dll_load(const char *filename, char *ebuf, int ebuflen) { +static void* dll_load_inner(const char *filename, char *ebuf, int ebuflen) { log_info(os)("attempting shared library load of %s", filename); @@ -1158,6 +1158,35 @@ void *os::dll_load(const char *filename, char *ebuf, int ebuflen) { return nullptr; } +void* os::dll_load(const char *filename, char *ebuf, int ebuflen) { + + void* result = nullptr; + + // First try using *.so suffix; failing that, retry with *.a suffix. + const size_t len = strlen(filename); + constexpr size_t safety = 3 + 1; + constexpr size_t bufsize = len + safety; + char* buf = NEW_C_HEAP_ARRAY(char, bufsize, mtInternal); + strcpy(buf, filename); + char* const dot = strrchr(buf, '.'); + + assert(dot != nullptr, "Attempting to load a shared object without extension? %s", filename); + assert(strcmp(dot, ".a") == 0 || strcmp(dot, ".so") == 0, + "Attempting to load a shared object that is neither *.so nor *.a", filename); + + sprintf(dot, ".so"); + result = dll_load_inner(buf, ebuf, ebuflen); + + if (result == nullptr) { + sprintf(dot, ".a"); + result = dll_load_inner(buf, ebuf, ebuflen); + } + + FREE_C_HEAP_ARRAY(char, buf); + + return result; +} + ------------- PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1825721906 From stuefe at openjdk.org Fri Nov 24 14:17:36 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 24 Nov 2023 14:17:36 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v10] In-Reply-To: References: Message-ID: > In `Metaspace::reserve_address_space_for_compressed_classes`, we reserve space for the future Klass range. We place the Klass range somewhere that allows us to use "good" narrow Klass decoding later when initializing the encoding scheme. > > Narrow Klass decoding is inherently CPU-specific, so doing this in shared coding is awkward. It leads to many ifdefs, vague code comments that are difficult to explain, and missed optimizations. > > There are common patterns: > - all platforms benefit from unscaled encoding so trying to reserve <4GB for CDS=off is worthwhile. > > But there are more differences than one would think: > - some platforms (s390, riscv) benefit from reservation < 4GB even with CDS=on since a 32-bit immediate requires fewer instructions > - some platforms (aarch64) don't benefit from zero-based encoding, so no need to try that > - some platforms benefit from optimizing the base for 16-bit moves (PPC, s390, aarch64) or for other immediate formats (riscv) > > It would be much better to have this section per CPU so that every CPU can implement its perfect, well documented version. A bit of code duplication is a good price for code clarity. > > ------------- > > This patch splits out `Metaspace::reserve_address_space_for_compressed_classes` into five variants, one per CPU (moving the code to CompressedKlassPointers); it also splits out `CompressedKlassPointers::initialize` into two variants, one for aarch64, one for all other platforms. > > Changes per-CPU: > > #### aarch64: > > Don't attempt to reserve for zero-based encoding; since lsl is not faster than movk. We reserve for movk mode right away if reserve for unscaled fails or if CDS=on. > > We also add a last-ditch attempt to reserve optimized for movk via over-alignment. We only do this on aarch64 to prevent errors like this one JDK-8318119: "Invalid narrow Klass base on aarch64 post 8312018" > > Since we don't want zero-based encoding, we need an aarch64-specific version for `CompressedKlassPointers::initialize()` > > #### riscv: > > We attempt to reserve at a "good" base that has only bits set either in [12..32), [32, 44) or in [44, 64). > > #### s390: > > We attempt to allocate < 4GB unconditionally. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: switch off AIX tests since AIX does not support CDS ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16743/files - new: https://git.openjdk.org/jdk/pull/16743/files/edc19d65..a381ae86 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16743&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16743&range=08-09 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16743.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16743/head:pull/16743 PR: https://git.openjdk.org/jdk/pull/16743 From stuefe at openjdk.org Fri Nov 24 14:17:39 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 24 Nov 2023 14:17:39 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v9] In-Reply-To: References: Message-ID: <6F-uOgQTq0WMg-1yegbv9dPxIOqZvkkesINn78ZNo70=.1e5c87c6-92e3-4128-9899-a5412131ae06@github.com> On Thu, 23 Nov 2023 09:35:30 GMT, Thomas Stuefe wrote: >> In `Metaspace::reserve_address_space_for_compressed_classes`, we reserve space for the future Klass range. We place the Klass range somewhere that allows us to use "good" narrow Klass decoding later when initializing the encoding scheme. >> >> Narrow Klass decoding is inherently CPU-specific, so doing this in shared coding is awkward. It leads to many ifdefs, vague code comments that are difficult to explain, and missed optimizations. >> >> There are common patterns: >> - all platforms benefit from unscaled encoding so trying to reserve <4GB for CDS=off is worthwhile. >> >> But there are more differences than one would think: >> - some platforms (s390, riscv) benefit from reservation < 4GB even with CDS=on since a 32-bit immediate requires fewer instructions >> - some platforms (aarch64) don't benefit from zero-based encoding, so no need to try that >> - some platforms benefit from optimizing the base for 16-bit moves (PPC, s390, aarch64) or for other immediate formats (riscv) >> >> It would be much better to have this section per CPU so that every CPU can implement its perfect, well documented version. A bit of code duplication is a good price for code clarity. >> >> ------------- >> >> This patch splits out `Metaspace::reserve_address_space_for_compressed_classes` into five variants, one per CPU (moving the code to CompressedKlassPointers); it also splits out `CompressedKlassPointers::initialize` into two variants, one for aarch64, one for all other platforms. >> >> Changes per-CPU: >> >> #### aarch64: >> >> Don't attempt to reserve for zero-based encoding; since lsl is not faster than movk. We reserve for movk mode right away if reserve for unscaled fails or if CDS=on. >> >> We also add a last-ditch attempt to reserve optimized for movk via over-alignment. We only do this on aarch64 to prevent errors like this one JDK-8318119: "Invalid narrow Klass base on aarch64 post 8312018" >> >> Since we don't want zero-based encoding, we need an aarch64-specific version for `CompressedKlassPointers::initialize()` >> >> #### riscv: >> >> We attempt to reserve at a "good" base that has only bits set either in [12..32), [32, 44) or in [44, 64). >> >> #### s390: >> >> We attempt to allocate < 4GB unconditionally. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > Fix test for riscv SAP tested on AIX and ppcle. Test error on AIX since we don't support CDS there - I disabled the test for AIX. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16743#issuecomment-1825735361 From eosterlund at openjdk.org Fri Nov 24 14:19:21 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 24 Nov 2023 14:19:21 GMT Subject: RFR: 8310644: Make panama memory segment close use async handshakes [v4] In-Reply-To: References: Message-ID: > The current logic for closing memory in panama today is susceptible to live lock if we have a closing thread that wants to close the memory in a loop that keeps failing, and a bunch of accessing threads that want to perform accesses as long as the memory is alive. They can both create impediments for the other. > > By using asynchronous handshakes to install an exception onto threads that are in @Scoped memory accesses, we can have close always succeed, and the accessing threads bail out. The idea is that we perform a synchronous handshake first to find threads that are in scoped methods. They might however be in the middle of throwing an exception or something wild like there, where an exception can't be delivered. We install an async handshake that will roll us forward to the first place where we can indeed install exceptions, then we reevaluate if we still need to do that, or if we have unwound out from the scoped method. If we are still inside of it, we ensure an exception is installed so we don't continue executing bytecodes that might access the memory that we have freed. > > Tested tier 1-5 as well as running test/jdk/java/foreign/TestHandshake.java hundreds of times, which tests this API pretty well. Erik ?sterlund has updated the pull request incrementally with five additional commits since the last revision: - Merge pull request #2 from JornVernee/PR_async_close+ASYNC_EXCEPTION_CHECK - Add swappy copy cast to TestHandShake.java - Polish: use special UNSAFE_ENTRY_SCOPED - work around NoSafepointVerifier check in CopySwapMemory - use more fine grained exception check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16792/files - new: https://git.openjdk.org/jdk/pull/16792/files/5d34ddba..0b91ac9a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16792&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16792&range=02-03 Stats: 84 lines in 3 files changed: 44 ins; 18 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/16792.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16792/head:pull/16792 PR: https://git.openjdk.org/jdk/pull/16792 From stuefe at openjdk.org Fri Nov 24 15:11:10 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 24 Nov 2023 15:11:10 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v10] In-Reply-To: References: Message-ID: On Fri, 24 Nov 2023 14:17:36 GMT, Thomas Stuefe wrote: >> In `Metaspace::reserve_address_space_for_compressed_classes`, we reserve space for the future Klass range. We place the Klass range somewhere that allows us to use "good" narrow Klass decoding later when initializing the encoding scheme. >> >> Narrow Klass decoding is inherently CPU-specific, so doing this in shared coding is awkward. It leads to many ifdefs, vague code comments that are difficult to explain, and missed optimizations. >> >> There are common patterns: >> - all platforms benefit from unscaled encoding so trying to reserve <4GB for CDS=off is worthwhile. >> >> But there are more differences than one would think: >> - some platforms (s390, riscv) benefit from reservation < 4GB even with CDS=on since a 32-bit immediate requires fewer instructions >> - some platforms (aarch64) don't benefit from zero-based encoding, so no need to try that >> - some platforms benefit from optimizing the base for 16-bit moves (PPC, s390, aarch64) or for other immediate formats (riscv) >> >> It would be much better to have this section per CPU so that every CPU can implement its perfect, well documented version. A bit of code duplication is a good price for code clarity. >> >> ------------- >> >> This patch splits out `Metaspace::reserve_address_space_for_compressed_classes` into five variants, one per CPU (moving the code to CompressedKlassPointers); it also splits out `CompressedKlassPointers::initialize` into two variants, one for aarch64, one for all other platforms. >> >> Changes per-CPU: >> >> #### aarch64: >> >> Don't attempt to reserve for zero-based encoding; since lsl is not faster than movk. We reserve for movk mode right away if reserve for unscaled fails or if CDS=on. >> >> We also add a last-ditch attempt to reserve optimized for movk via over-alignment. We only do this on aarch64 to prevent errors like this one JDK-8318119: "Invalid narrow Klass base on aarch64 post 8312018" >> >> Since we don't want zero-based encoding, we need an aarch64-specific version for `CompressedKlassPointers::initialize()` >> >> #### riscv: >> >> We attempt to reserve at a "good" base that has only bits set either in [12..32), [32, 44) or in [44, 64). >> >> #### s390: >> >> We attempt to allocate < 4GB unconditionally. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > switch off AIX tests since AIX does not support CDS Tested on s390x. Works. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16743#issuecomment-1825800050 From duke at openjdk.org Fri Nov 24 17:26:08 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Fri, 24 Nov 2023 17:26:08 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v5] In-Reply-To: References: <93M-a4Fckf8STLcvAP1cV4msQHqoQ4vUgWo02_YiJxo=.1c79764d-603e-497c-bab1-04ac2d30fa72@github.com> Message-ID: <4-9392BGIYaep1e8XElzkLLTryL4-ULfjVTcak7a_4k=.92cd7529-5ba9-45bc-963b-00cc9e528760@github.com> On Fri, 24 Nov 2023 06:52:33 GMT, Jatin Bhateja wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> test break fix > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 3601: > >> 3599: if (compute_mask) { >> 3600: vpxor(scratch, scratch, scratch, vector_len); >> 3601: vpcmpgtq(scratch, scratch, mask, vector_len); > > I see assertion failures in following tests with JAVA_OPTIONS= -XX:UseAVX=1 -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts -Xbatch > > compiler/c2/cr6340864/TestDoubleVect.java > compiler/loopopts/superword/ReductionPerf.java > compiler/vectorization/TestSignumVector.java > compiler/vectorization/runner/BasicDoubleOpTest.java > > AVX1 does not support integral vectors above 16 bytes, please use floating point compare instruction. Hmm. Good catch! Thinking about AVX1 case some more.. Platforms where this `vpblendvp*` emulation is needed have AVX2 at least, otherwise vpblendvp is faster. I think its better to disable this optimization entirely if AVX1 is required to be used. I would go even further and disable `EnableX86ECoreOpts` if `UseAVX==1`. Preference? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1404552951 From mli at openjdk.org Fri Nov 24 17:45:04 2023 From: mli at openjdk.org (Hamlin Li) Date: Fri, 24 Nov 2023 17:45:04 GMT Subject: RFR: 8318227: RISC-V: C2 ConvHF2F In-Reply-To: References: Message-ID: On Fri, 24 Nov 2023 10:32:04 GMT, Quan Anh Mai wrote: >> Hi, >> Can you review the patch to add ConvHF2F intrinsic to JDK for riscv? >> Thanks! >> >> (By latest kernel patch, `#define RISCV_HWPROBE_EXT_ZFH (1 << 27)` >> https://lore.kernel.org/lkml/20231114141256.126749-11-cleger at rivosinc.com/) >> >> ## Test >> ### Functionality >> #### hotspot tests >> test/hotspot/jtreg/compiler/intrinsics/ >> test/hotspot/jtreg/compiler/c2/irTests >> >> #### jdk tests >> test/jdk/java/lang/Float/Binary16Conversion*.java >> >> ### Performance >> tested on licheepi. >> >> #### with UseZfh enabled >> (i.e. enable the intrinsic) >> >> Benchmark (size) Mode Cnt Score Error Units >> Fp16ConversionBenchmark.float16ToFloat 2048 avgt 10 4659.796 ? 13.262 ns/op >> Fp16ConversionBenchmark.float16ToFloatMemory 2048 avgt 10 22.957 ? 0.098 ns/op >> >> >> #### with UseZfh disabled >> (i.e. disable the intrinsic) >> >> Benchmark (size) Mode Cnt Score Error Units >> Fp16ConversionBenchmark.float16ToFloat 2048 avgt 10 22930.591 ? 72.595 ns/op >> Fp16ConversionBenchmark.float16ToFloatMemory 2048 avgt 10 25.970 ? 0.063 ns/op > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1691: > >> 1689: fmv_h_x(dst, src); >> 1690: fcvt_s_h(dst, dst); >> 1691: j(done); > > Since `Nan`s are exceptional inputs, it would be beneficial to move the handling below to an out-of-line stub. Thanks for the suggestion, it make sense. Just I'm not sure if there is a straight way to implement it currently. Following is what I've done, but blocked, please help to share your opinion. To jump to stub code to process NaN case, the stub code need to use specific register for `src` and `dst`, as `dst` is a FloatRegister, so need a way to specify a float register when matching `ConvHF2F` in ad file. I tried to add a `fRegF_F10` to enable specify a float register, but it does not work well. Some error occurs for fastdebug version: ----------System.out:(47/2030)---------- Start ... o558 SubF === _ o515 o559 [[ o557 ]] --N: o558 SubF === _ o515 o559 [[ o557 ]] --N: o515 MoveI2F === _ o681 [[ o470 o484 o620 o620 o618 o579 o558 o530 o434 4 10 10 ]] FREGF 0 FREGF FREGF_F10 0 FREGF_F10 --N: o559 ConvHF2F === _ o560 [[ o558 ]] #float FREGF_F10 100 convHF2F_reg_reg --N: o560 RShiftI === _ o561 o209 [[ o559 o434 4 ]] IREGI 0 IREGI IREGINOSP 0 IREGINOSP IREGI_R10 0 IREGI_R10 IREGI_R11 0 IREGI_R11 IREGI_R12 0 IREGI_R12 IREGI_R13 0 IREGI_R13 IREGI_R14 0 IREGI_R14 IREGIORL2I 0 IREGI IREGIORL 0 IREGI IREGILNP 0 IREGI IREGILNPNOSP 0 IREGINOSP # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/hamlin/workspace/repos/github/jdk/src/hotspot/share/opto/matcher.cpp:1727), pid=2212108, tid=2212156 # assert(false) failed: bad AD file ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16802#discussion_r1404562196 From mli at openjdk.org Fri Nov 24 17:53:05 2023 From: mli at openjdk.org (Hamlin Li) Date: Fri, 24 Nov 2023 17:53:05 GMT Subject: RFR: 8318227: RISC-V: C2 ConvHF2F In-Reply-To: References: Message-ID: On Fri, 24 Nov 2023 17:42:22 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1691: >> >>> 1689: fmv_h_x(dst, src); >>> 1690: fcvt_s_h(dst, dst); >>> 1691: j(done); >> >> Since `Nan`s are exceptional inputs, it would be beneficial to move the handling below to an out-of-line stub. > > Thanks for the suggestion, it make sense. > > Just I'm not sure if there is a straight way to implement it currently. Following is what I've done, but blocked, please help to share your opinion. > > To jump to stub code to process NaN case, the stub code need to use specific register for `src` and `dst`, as `dst` is a FloatRegister, so need a way to specify a float register when matching `ConvHF2F` in ad file. I tried to add a `fRegF_F10` to enable specify a float register, but it does not work well. Some error occurs for fastdebug version: > > > ----------System.out:(47/2030)---------- > Start ... > o558 SubF === _ o515 o559 [[ o557 ]] > > --N: o558 SubF === _ o515 o559 [[ o557 ]] > > --N: o515 MoveI2F === _ o681 [[ o470 o484 o620 o620 o618 o579 o558 o530 o434 4 10 10 ]] > FREGF 0 FREGF > FREGF_F10 0 FREGF_F10 > > --N: o559 ConvHF2F === _ o560 [[ o558 ]] #float > FREGF_F10 100 convHF2F_reg_reg > > --N: o560 RShiftI === _ o561 o209 [[ o559 o434 4 ]] > IREGI 0 IREGI > IREGINOSP 0 IREGINOSP > IREGI_R10 0 IREGI_R10 > IREGI_R11 0 IREGI_R11 > IREGI_R12 0 IREGI_R12 > IREGI_R13 0 IREGI_R13 > IREGI_R14 0 IREGI_R14 > IREGIORL2I 0 IREGI > IREGIORL 0 IREGI > IREGILNP 0 IREGI > IREGILNPNOSP 0 IREGINOSP > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/home/hamlin/workspace/repos/github/jdk/src/hotspot/share/opto/matcher.cpp:1727), pid=2212108, tid=2212156 > # assert(false) failed: bad AD file The workaround I can use is to mv the result back to `dst` when comes back from stub code, like below: bind(nan_case); // input: x11 // output: x10 RuntimeAddress stub = RuntimeAddress(StubRoutines::riscv::float16_to_float_process_nan()); assert(stub.target() != nullptr, "float16_to_float_process_nan stub has not been generated"); address call = trampoline_call(stub); fmv_w_x(dst, x10); But, this seems to me a bit more complicated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16802#discussion_r1404565645 From mli at openjdk.org Fri Nov 24 17:56:04 2023 From: mli at openjdk.org (Hamlin Li) Date: Fri, 24 Nov 2023 17:56:04 GMT Subject: RFR: 8318227: RISC-V: C2 ConvHF2F In-Reply-To: References: Message-ID: On Fri, 24 Nov 2023 17:50:31 GMT, Hamlin Li wrote: >> Thanks for the suggestion, it make sense. >> >> Just I'm not sure if there is a straight way to implement it currently. Following is what I've done, but blocked, please help to share your opinion. >> >> To jump to stub code to process NaN case, the stub code need to use specific register for `src` and `dst`, as `dst` is a FloatRegister, so need a way to specify a float register when matching `ConvHF2F` in ad file. I tried to add a `fRegF_F10` to enable specify a float register, but it does not work well. Some error occurs for fastdebug version: >> >> >> ----------System.out:(47/2030)---------- >> Start ... >> o558 SubF === _ o515 o559 [[ o557 ]] >> >> --N: o558 SubF === _ o515 o559 [[ o557 ]] >> >> --N: o515 MoveI2F === _ o681 [[ o470 o484 o620 o620 o618 o579 o558 o530 o434 4 10 10 ]] >> FREGF 0 FREGF >> FREGF_F10 0 FREGF_F10 >> >> --N: o559 ConvHF2F === _ o560 [[ o558 ]] #float >> FREGF_F10 100 convHF2F_reg_reg >> >> --N: o560 RShiftI === _ o561 o209 [[ o559 o434 4 ]] >> IREGI 0 IREGI >> IREGINOSP 0 IREGINOSP >> IREGI_R10 0 IREGI_R10 >> IREGI_R11 0 IREGI_R11 >> IREGI_R12 0 IREGI_R12 >> IREGI_R13 0 IREGI_R13 >> IREGI_R14 0 IREGI_R14 >> IREGIORL2I 0 IREGI >> IREGIORL 0 IREGI >> IREGILNP 0 IREGI >> IREGILNPNOSP 0 IREGINOSP >> >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (/home/hamlin/workspace/repos/github/jdk/src/hotspot/share/opto/matcher.cpp:1727), pid=2212108, tid=2212156 >> # assert(false) failed: bad AD file > > The workaround I can use is to mv the result back to `dst` when comes back from stub code, like below: > > bind(nan_case); > // input: x11 > // output: x10 > RuntimeAddress stub = RuntimeAddress(StubRoutines::riscv::float16_to_float_process_nan()); > assert(stub.target() != nullptr, "float16_to_float_process_nan stub has not been generated"); > address call = trampoline_call(stub); > fmv_w_x(dst, x10); > > > But, this seems to me a bit more complicated. And at the other side, the nan processing code is almost the same size as trampoline calling code, is it worth to do it in this case? Although I can image in another patch for floatToFloat16, it's worth to give it a try by using a out-of-line stub, as in that case, the nan processing a bit longer than float16ToFloat (i.e. this patch). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16802#discussion_r1404566769 From eosterlund at openjdk.org Fri Nov 24 18:30:17 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 24 Nov 2023 18:30:17 GMT Subject: RFR: 8310644: Make panama memory segment close use async handshakes [v5] In-Reply-To: References: Message-ID: > The current logic for closing memory in panama today is susceptible to live lock if we have a closing thread that wants to close the memory in a loop that keeps failing, and a bunch of accessing threads that want to perform accesses as long as the memory is alive. They can both create impediments for the other. > > By using asynchronous handshakes to install an exception onto threads that are in @Scoped memory accesses, we can have close always succeed, and the accessing threads bail out. The idea is that we perform a synchronous handshake first to find threads that are in scoped methods. They might however be in the middle of throwing an exception or something wild like there, where an exception can't be delivered. We install an async handshake that will roll us forward to the first place where we can indeed install exceptions, then we reevaluate if we still need to do that, or if we have unwound out from the scoped method. If we are still inside of it, we ensure an exception is installed so we don't continue executing bytecodes that might access the memory that we have freed. > > Tested tier 1-5 as well as running test/jdk/java/foreign/TestHandshake.java hundreds of times, which tests this API pretty well. Erik ?sterlund has updated the pull request incrementally with two additional commits since the last revision: - Merge pull request #3 from JornVernee/PR_async_close+NoToNativeTrans - don't transition to native state on Unsafe_CopySwapMemory0 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16792/files - new: https://git.openjdk.org/jdk/pull/16792/files/0b91ac9a..83cef378 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16792&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16792&range=03-04 Stats: 35 lines in 1 file changed: 9 ins; 18 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/16792.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16792/head:pull/16792 PR: https://git.openjdk.org/jdk/pull/16792 From mli at openjdk.org Fri Nov 24 18:44:09 2023 From: mli at openjdk.org (Hamlin Li) Date: Fri, 24 Nov 2023 18:44:09 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v2] In-Reply-To: References: <9i5yHmpRi3-XqL5lw0-0IexhCDr2FOi5nT4dgY7cWao=.ab8a1d6e-c9fc-4108-820b-374ce7815463@github.com> <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> Message-ID: On Thu, 23 Nov 2023 08:19:04 GMT, Fei Yang wrote: >> Done in [this commit](https://github.com/openjdk/jdk/pull/16629/commits/af940acd365677ec3c29a8f066b68b753ad362e4). I've tried the usage of iRegP/iRegI but that caused of the related failure (JVM even didn't start). > > I guess it might be a performance consideration (maybe saving some register-register moves?). I see the x86_64 counterpart also specifies certain regsiters [1]. You might want give it try on x86_64 to find out how it may make a difference on the JIT code. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L11225 Hey, I saw you already change the code in this patch to not use the specific registers, do you still face the JVM starting issue? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1404588307 From mli at openjdk.org Fri Nov 24 18:59:05 2023 From: mli at openjdk.org (Hamlin Li) Date: Fri, 24 Nov 2023 18:59:05 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v2] In-Reply-To: References: <9i5yHmpRi3-XqL5lw0-0IexhCDr2FOi5nT4dgY7cWao=.ab8a1d6e-c9fc-4108-820b-374ce7815463@github.com> <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> Message-ID: On Fri, 24 Nov 2023 18:41:03 GMT, Hamlin Li wrote: >> I guess it might be a performance consideration (maybe saving some register-register moves?). I see the x86_64 counterpart also specifies certain regsiters [1]. You might want give it try on x86_64 to find out how it may make a difference on the JIT code. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L11225 > > Hey, I saw you already change the code in this patch to not use the specific registers, do you still face the JVM starting issue? I think the reason might be: with specific register, you can add effect as `USE_KILL ary, USE_KILL cnt`, but without specific register, currently you have to way to do so. But, in current patch, it does modify the ary and cnt in the intrinsic, so I wonder if the current (lastest) patch is safe enough in all situation. It maybe be helpful to add 2 new register when matching the instrinsic in ad file, and I guess the register allocator will merge different use of temp register together? But I still think it's not necessary to specify the register when matching arrays_hashcode in ad file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1404593130 From mli at openjdk.org Fri Nov 24 18:59:07 2023 From: mli at openjdk.org (Hamlin Li) Date: Fri, 24 Nov 2023 18:59:07 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v2] In-Reply-To: References: <9i5yHmpRi3-XqL5lw0-0IexhCDr2FOi5nT4dgY7cWao=.ab8a1d6e-c9fc-4108-820b-374ce7815463@github.com> <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> Message-ID: On Fri, 17 Nov 2023 20:19:33 GMT, Yuri Gaevsky wrote: >>> What specific tests were run for this intrinsic implementation to verify the correctness? >> - jtreg tier1 >> - hotspot/jtreg/compiler/intrinsics/TestArraysHashCode.java with -Xcomp > >> BTW, can you add some comments about what java method or bytecode this intrinsic is for? > Done. Hmm, addition of TEMP_DEF result makes the bencmark results even worse tha without intrinsic (I haven't look at the generated assembler though). This seems bit confusing to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1404594033 From jbhateja at openjdk.org Sat Nov 25 00:40:09 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 25 Nov 2023 00:40:09 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v5] In-Reply-To: <4-9392BGIYaep1e8XElzkLLTryL4-ULfjVTcak7a_4k=.92cd7529-5ba9-45bc-963b-00cc9e528760@github.com> References: <93M-a4Fckf8STLcvAP1cV4msQHqoQ4vUgWo02_YiJxo=.1c79764d-603e-497c-bab1-04ac2d30fa72@github.com> <4-9392BGIYaep1e8XElzkLLTryL4-ULfjVTcak7a_4k=.92cd7529-5ba9-45bc-963b-00cc9e528760@github.com> Message-ID: On Fri, 24 Nov 2023 17:23:28 GMT, Volodymyr Paprotski wrote: >> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 3601: >> >>> 3599: if (compute_mask) { >>> 3600: vpxor(scratch, scratch, scratch, vector_len); >>> 3601: vpcmpgtq(scratch, scratch, mask, vector_len); >> >> I see assertion failures in following tests with JAVA_OPTIONS= -XX:UseAVX=1 -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts -Xbatch >> >> compiler/c2/cr6340864/TestDoubleVect.java >> compiler/loopopts/superword/ReductionPerf.java >> compiler/vectorization/TestSignumVector.java >> compiler/vectorization/runner/BasicDoubleOpTest.java >> >> AVX1 does not support integral vectors above 16 bytes, please use floating point compare instruction. > > Hmm. Good catch! > > Thinking about AVX1 case some more.. Platforms where this `vpblendvp*` emulation is needed have AVX2 at least, otherwise vpblendvp is faster. I think its better to disable this optimization entirely if AVX1 is required to be used. > > I would go even further and disable `EnableX86ECoreOpts` if `UseAVX==1`. Preference? vpblendps/pd are supported for AVX1 targets, Since the patch is about emulating floating point variable blends using alternate sequence I think we should remove any impediment which prohibit its usage over E-core at AVX1 level. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1404694438 From dholmes at openjdk.org Sat Nov 25 02:23:13 2023 From: dholmes at openjdk.org (David Holmes) Date: Sat, 25 Nov 2023 02:23:13 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v8] In-Reply-To: References: Message-ID: On Fri, 24 Nov 2023 05:47:14 GMT, David Holmes wrote: >> Jaroslav Bachorik has updated the pull request incrementally with one additional commit since the last revision: >> >> Reinstate mistakenly deleted comment > > This has gotten a lot more complicated. All I was suggesting was if this: > > if (with_method_holders) { > method->clear_jmethod_id(); > } > > could be changed to > > if (method->method_holder() == nullptr) { > method->clear_jmethod_id(); > } > > Now I'm not at all sure what you are doing. > @dholmes-ora Unfortunately, I can not just do `method->method_holder() == nullptr` as `method_holder()` is expanding to `Method::constants()->pool_holder()` and `Method::constants()` is expanding to `Method::constMethod()->constants()`. This can cause SIGSEGV if either `Method::_constMethod` or `ConstMethod::_constants` is NULL. I'm getting a strange sense of deja-vu here. This API is flawed if you cannot even ask for the method holder without some intervening value causing a SEGV. I've lost sight of the big picture here in terms of the lifecycle of the Method we are querying, the methodID we may be clearing and the existence or not of a method_holder(). ------------- PR Comment: https://git.openjdk.org/jdk/pull/16662#issuecomment-1826190641 From qamai at openjdk.org Sat Nov 25 02:25:04 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 25 Nov 2023 02:25:04 GMT Subject: RFR: 8318227: RISC-V: C2 ConvHF2F In-Reply-To: References: Message-ID: On Fri, 24 Nov 2023 17:53:11 GMT, Hamlin Li wrote: >> The workaround I can use is to mv the result back to `dst` when comes back from stub code, like below: >> >> bind(nan_case); >> // input: x11 >> // output: x10 >> RuntimeAddress stub = RuntimeAddress(StubRoutines::riscv::float16_to_float_process_nan()); >> assert(stub.target() != nullptr, "float16_to_float_process_nan stub has not been generated"); >> address call = trampoline_call(stub); >> fmv_w_x(dst, x10); >> >> >> But, this seems to me a bit more complicated. > > And at the other side, the nan processing code is almost the same size as trampoline calling code, is it worth to do it in this case? > > Although I can image in another patch for floatToFloat16, it's worth to give it a try by using a out-of-line stub, as in that case, the nan processing a bit longer than float16ToFloat (i.e. this patch). You can take a look at x86 implementation of ConvF2I node which takes advantages of a general stub mechanism in C2. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16802#discussion_r1404704835 From qamai at openjdk.org Sat Nov 25 02:25:05 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 25 Nov 2023 02:25:05 GMT Subject: RFR: 8318227: RISC-V: C2 ConvHF2F In-Reply-To: References: Message-ID: On Sat, 25 Nov 2023 02:20:12 GMT, Quan Anh Mai wrote: >> And at the other side, the nan processing code is almost the same size as trampoline calling code, is it worth to do it in this case? >> >> Although I can image in another patch for floatToFloat16, it's worth to give it a try by using a out-of-line stub, as in that case, the nan processing a bit longer than float16ToFloat (i.e. this patch). > > You can take a look at x86 implementation of ConvF2I node which takes advantages of a general stub mechanism in C2. https://github.com/openjdk/jdk/blob/6aa197667ad05bd93adf3afc7b06adbfb2b18a22/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L4307 Note that the stub will still reside in the code section of the current method, is a trampoline needed in that case? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16802#discussion_r1404704954 From aph at openjdk.org Sun Nov 26 15:00:11 2023 From: aph at openjdk.org (Andrew Haley) Date: Sun, 26 Nov 2023 15:00:11 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v10] In-Reply-To: References: Message-ID: On Fri, 24 Nov 2023 14:17:36 GMT, Thomas Stuefe wrote: >> In `Metaspace::reserve_address_space_for_compressed_classes`, we reserve space for the future Klass range. We place the Klass range somewhere that allows us to use "good" narrow Klass decoding later when initializing the encoding scheme. >> >> Narrow Klass decoding is inherently CPU-specific, so doing this in shared coding is awkward. It leads to many ifdefs, vague code comments that are difficult to explain, and missed optimizations. >> >> There are common patterns: >> - all platforms benefit from unscaled encoding so trying to reserve <4GB for CDS=off is worthwhile. >> >> But there are more differences than one would think: >> - some platforms (s390, riscv) benefit from reservation < 4GB even with CDS=on since a 32-bit immediate requires fewer instructions >> - some platforms (aarch64) don't benefit from zero-based encoding, so no need to try that >> - some platforms benefit from optimizing the base for 16-bit moves (PPC, s390, aarch64) or for other immediate formats (riscv) >> >> It would be much better to have this section per CPU so that every CPU can implement its perfect, well documented version. A bit of code duplication is a good price for code clarity. >> >> ------------- >> >> This patch splits out `Metaspace::reserve_address_space_for_compressed_classes` into five variants, one per CPU (moving the code to CompressedKlassPointers); it also splits out `CompressedKlassPointers::initialize` into two variants, one for aarch64, one for all other platforms. >> >> Changes per-CPU: >> >> #### aarch64: >> >> Don't attempt to reserve for zero-based encoding; since lsl is not faster than movk. We reserve for movk mode right away if reserve for unscaled fails or if CDS=on. >> >> We also add a last-ditch attempt to reserve optimized for movk via over-alignment. We only do this on aarch64 to prevent errors like this one JDK-8318119: "Invalid narrow Klass base on aarch64 post 8312018" >> >> Since we don't want zero-based encoding, we need an aarch64-specific version for `CompressedKlassPointers::initialize()` >> >> #### riscv: >> >> We attempt to reserve at a "good" base that has only bits set either in [12..32), [32, 44) or in [44, 64). >> >> #### s390: >> >> We attempt to allocate < 4GB unconditionally. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > switch off AIX tests since AIX does not support CDS AArch64 looks good. One question, though: did you ever find the root cause of JDK-8318119? Without that, how do we know we've fixed it? ------------- PR Review: https://git.openjdk.org/jdk/pull/16743#pullrequestreview-1749358976 From duke at openjdk.org Sun Nov 26 18:20:22 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Sun, 26 Nov 2023 18:20:22 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v6] In-Reply-To: References: Message-ID: > Splitting vblendvp[sd] into boolean operations is bit faster on ECore, get up to 30% gain > > > =============== BEFORE =============== > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 77.766 ? 0.049 ns/op > VectorSignum.floatSignum 512 avgt 3 154.889 ? 0.242 ns/op > VectorSignum.floatSignum 1024 avgt 3 306.130 ? 0.605 ns/op > VectorSignum.floatSignum 2048 avgt 3 609.965 ? 0.927 ns/op > VectorSignum.doubleSignum 256 avgt 3 151.874 ? 1.748 ns/op > VectorSignum.doubleSignum 512 avgt 3 303.080 ? 0.310 ns/op > VectorSignum.doubleSignum 1024 avgt 3 607.517 ? 0.597 ns/op > VectorSignum.doubleSignum 2048 avgt 3 1214.282 ? 1.834 ns/op > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op > MaxMinOptimizeTest.dMax avgt 3 137.334 ? 0.128 us/op > MaxMinOptimizeTest.dMin avgt 3 137.160 ? 0.465 us/op > MaxMinOptimizeTest.dMul avgt 3 77.231 ? 0.051 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.165 ? 0.003 us/op > MaxMinOptimizeTest.fMax avgt 3 107.428 ? 1.501 us/op > MaxMinOptimizeTest.fMin avgt 3 107.186 ? 0.022 us/op > MaxMinOptimizeTest.fMul avgt 3 77.164 ? 0.012 us/op > > =============== AFTER =============== > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 61.816 ? 1.980 ns/op > VectorSignum.floatSignum 512 avgt 3 117.251 ? 0.052 ns/op > VectorSignum.floatSignum 1024 avgt 3 231.356 ? 0.397 ns/op > VectorSignum.floatSignum 2048 avgt 3 458.904 ? 0.774 ns/op > VectorSignum.doubleSignum 256 avgt 3 121.449 ? 0.184 ns/op > VectorSignum.doubleSignum 512 avgt 3 241.662 ? 0.189 ns/op > VectorSignum.doubleSignum 1024 avgt 3 482.365 ? 0.165 ns/op > VectorSignum.doubleSignum 2048 avgt 3 962.412 ? 1.401 ns/op > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op > MaxMinOptimizeTest.dMax avgt 3 125.701 ? 0.082 us/op > MaxMinOptimizeTest.dMin avgt 3 124.704 ? 0.119 us/op > MaxMinOptimizeTest.dMul avgt 3 77.232 ? 0.028 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.169 ? 0.103 us/op > MaxMinOptimizeTest.fMax avgt 3 97.939 ? 0.477 us/op > MaxMinOptimizeTest.fMin avgt 3 98.012 ? 0.154 us/op > MaxMinOptimizeTest.fMul avgt 3 77.174 ? 0.012 us/op Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: partly disable for UseAVX=1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16716/files - new: https://git.openjdk.org/jdk/pull/16716/files/1739bda8..023fdaf6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16716&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16716&range=04-05 Stats: 12 lines in 2 files changed: 5 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/16716.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16716/head:pull/16716 PR: https://git.openjdk.org/jdk/pull/16716 From duke at openjdk.org Sun Nov 26 18:20:23 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Sun, 26 Nov 2023 18:20:23 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v5] In-Reply-To: References: <93M-a4Fckf8STLcvAP1cV4msQHqoQ4vUgWo02_YiJxo=.1c79764d-603e-497c-bab1-04ac2d30fa72@github.com> <4-9392BGIYaep1e8XElzkLLTryL4-ULfjVTcak7a_4k=.92cd7529-5ba9-45bc-963b-00cc9e528760@github.com> Message-ID: On Sat, 25 Nov 2023 00:36:59 GMT, Jatin Bhateja wrote: >> Hmm. Good catch! >> >> Thinking about AVX1 case some more.. Platforms where this `vpblendvp*` emulation is needed have AVX2 at least, otherwise vpblendvp is faster. I think its better to disable this optimization entirely if AVX1 is required to be used. >> >> I would go even further and disable `EnableX86ECoreOpts` if `UseAVX==1`. Preference? > > vpblendps/pd are supported for AVX1 targets, Since the patch is about emulating floating point variable blends using alternate sequence I think we should remove any impediment which prohibit its usage over E-core at AVX1 level. Decided to go with the smaller solution and disable just for the cases where the code generated is illegal. I still think the bigger solution of forcing `EnableX86ECoreOpts` to be off if `UseAVX==1`] is cleaner, but I suppose we can come back to that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1405454571 From duke at openjdk.org Sun Nov 26 18:25:24 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Sun, 26 Nov 2023 18:25:24 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v7] In-Reply-To: References: Message-ID: > Splitting vblendvp[sd] into boolean operations is bit faster on ECore, get up to 30% gain > > > =============== BEFORE =============== > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 77.766 ? 0.049 ns/op > VectorSignum.floatSignum 512 avgt 3 154.889 ? 0.242 ns/op > VectorSignum.floatSignum 1024 avgt 3 306.130 ? 0.605 ns/op > VectorSignum.floatSignum 2048 avgt 3 609.965 ? 0.927 ns/op > VectorSignum.doubleSignum 256 avgt 3 151.874 ? 1.748 ns/op > VectorSignum.doubleSignum 512 avgt 3 303.080 ? 0.310 ns/op > VectorSignum.doubleSignum 1024 avgt 3 607.517 ? 0.597 ns/op > VectorSignum.doubleSignum 2048 avgt 3 1214.282 ? 1.834 ns/op > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op > MaxMinOptimizeTest.dMax avgt 3 137.334 ? 0.128 us/op > MaxMinOptimizeTest.dMin avgt 3 137.160 ? 0.465 us/op > MaxMinOptimizeTest.dMul avgt 3 77.231 ? 0.051 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.165 ? 0.003 us/op > MaxMinOptimizeTest.fMax avgt 3 107.428 ? 1.501 us/op > MaxMinOptimizeTest.fMin avgt 3 107.186 ? 0.022 us/op > MaxMinOptimizeTest.fMul avgt 3 77.164 ? 0.012 us/op > > =============== AFTER =============== > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 61.816 ? 1.980 ns/op > VectorSignum.floatSignum 512 avgt 3 117.251 ? 0.052 ns/op > VectorSignum.floatSignum 1024 avgt 3 231.356 ? 0.397 ns/op > VectorSignum.floatSignum 2048 avgt 3 458.904 ? 0.774 ns/op > VectorSignum.doubleSignum 256 avgt 3 121.449 ? 0.184 ns/op > VectorSignum.doubleSignum 512 avgt 3 241.662 ? 0.189 ns/op > VectorSignum.doubleSignum 1024 avgt 3 482.365 ? 0.165 ns/op > VectorSignum.doubleSignum 2048 avgt 3 962.412 ? 1.401 ns/op > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op > MaxMinOptimizeTest.dMax avgt 3 125.701 ? 0.082 us/op > MaxMinOptimizeTest.dMin avgt 3 124.704 ? 0.119 us/op > MaxMinOptimizeTest.dMul avgt 3 77.232 ? 0.028 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.169 ? 0.103 us/op > MaxMinOptimizeTest.fMax avgt 3 97.939 ? 0.477 us/op > MaxMinOptimizeTest.fMin avgt 3 98.012 ? 0.154 us/op > MaxMinOptimizeTest.fMul avgt 3 77.174 ? 0.012 us/op Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: whitespace again ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16716/files - new: https://git.openjdk.org/jdk/pull/16716/files/023fdaf6..5ed257de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16716&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16716&range=05-06 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16716.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16716/head:pull/16716 PR: https://git.openjdk.org/jdk/pull/16716 From dholmes at openjdk.org Mon Nov 27 00:49:05 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 27 Nov 2023 00:49:05 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v5] In-Reply-To: <29uxp2DVbW7p6RLM3yZUXnhS8Bc47Il0CVLJAKY21jA=.8f7b43cc-28ab-454d-9d69-ddaf75296943@github.com> References: <-YLzEt2tPsupH0CLU6278f4yX3si2I60OvHfDitr-tM=.e5d3dad2-b846-4da5-812f-ea2600cd2780@github.com> <29uxp2DVbW7p6RLM3yZUXnhS8Bc47Il0CVLJAKY21jA=.8f7b43cc-28ab-454d-9d69-ddaf75296943@github.com> Message-ID: On Fri, 24 Nov 2023 12:50:13 GMT, David Holmes wrote: >> Will you create the RFE? I'm not as convinced that this is something that needs to be fixed, so it would be better if you create the RFE with the proper motivation. > > Yes I will create it. Thanks Filed: [JDK-8320720](https://bugs.openjdk.org/browse/JDK-8320720) JNI Locked Monitors can be associated with a dead (null) object ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1405524057 From xgong at openjdk.org Mon Nov 27 00:59:10 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 27 Nov 2023 00:59:10 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: References: <3dscjw75an6_n3ko_2LjHdpdj08xsyjhfpStuZrS5-M=.f43592fe-036e-47b5-babc-67f126885839@github.com> Message-ID: On Thu, 23 Nov 2023 14:10:02 GMT, Magnus Ihse Bursie wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments in build system > > make/autoconf/lib-vmath.m4 line 89: > >> 87: if test "x${LIBSLEEF_FOUND}" = "xyes"; then >> 88: ENABLE_LIBSLEEF=true >> 89: LIBVMATH_LIBS="${LIBVMATH_LIBS} -lsleef" > > Remove this line. It would just add `-lsleef` twice if you go via `PKG_CHECK_MODULES`. You need to set -lsleef at the correct places. Correct. Thanks! I will adjust all the relative names/cflags once the sve cflags is removed in the m4 file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1405526805 From xgong at openjdk.org Mon Nov 27 00:59:12 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 27 Nov 2023 00:59:12 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: <-XS17AVgOkuO6_JUId8P-XZxRlnfWXF0wz60w5B58L8=.e51cb13b-ec84-4943-a6b7-b09b4e8943d4@github.com> References: <_CHm262chkVi3EMvai4A5T-dal0pdCySL8aF0kXj_uU=.9d49baad-9de9-45e0-915b-9525feb8d610@github.com> <-XS17AVgOkuO6_JUId8P-XZxRlnfWXF0wz60w5B58L8=.e51cb13b-ec84-4943-a6b7-b09b4e8943d4@github.com> Message-ID: On Thu, 23 Nov 2023 14:01:48 GMT, Magnus Ihse Bursie wrote: >> OK, I see. It makes sense that the suffix name should be choosed mainly based on the real module name that is searched/checked in configure. > > This still needs fixing. Yes, I will fix this together with removing the SVE cflags which I need more time to handle it with a better way. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1405526236 From xgong at openjdk.org Mon Nov 27 01:09:11 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 27 Nov 2023 01:09:11 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: References: <3dscjw75an6_n3ko_2LjHdpdj08xsyjhfpStuZrS5-M=.f43592fe-036e-47b5-babc-67f126885839@github.com> Message-ID: <17T7DXMTf1rFVeQY_FWJx0DETVYlBWceJO4lltWZyw0=.847b02a3-ef40-464a-80b7-d62bd9dbc2b5@github.com> On Thu, 23 Nov 2023 15:43:34 GMT, Andrew Haley wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments in build system > > make/autoconf/lib-vmath.m4 line 94: > >> 92: # Check the ARM SVE feature >> 93: SVE_CFLAGS="-march=armv8-a+sve" >> 94: > > What's this about? We're building a standard binary, and all SVE detection is to be deferred to runtime. We have to use this c-compiler option to build out the SVE ABIs (e.g. `svfloat32_t sinfx_u10sve(svfloat32_t input)`) in `libvmath.so`. Without this option, at build time, all the sve related featues like `arm_sve.h / __ARM_FEATURE_SVE` are missing, together with the sve symbols in `libvmath.so` we needed at runtime. That means at runtime, hotspot cannot find out the sve symbols and the vector operations will fall back to the default java implementation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1405529521 From dholmes at openjdk.org Mon Nov 27 02:25:18 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 27 Nov 2023 02:25:18 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v5] In-Reply-To: References: Message-ID: On Fri, 24 Nov 2023 09:18:46 GMT, Stefan Karlsson wrote: >> In the rewrites made for: >> [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` >> >> I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. >> >> The provided tests provoke this assert form: >> * the JNI thread detach code >> * thread dumping with locked monitors, and >> * the JVMTI GetOwnedMonitorInfo API. >> >> While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. >> >> The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. >> >> For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. >> >> Test: the written tests with and without the fix. Tier1-Tier3, so far. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Split test and use othervm Sorry I still have quite a few issues with the tests. The new test in particular seems quite difficult to follow. IIUC you basically want to test the "dead object" thread dump scenario in two cases: before the native thread has detached and unlocked the object; after the native thread has detached and unlocked the object. To do the former you need to dump from the native thread; for the latter it could be done in the native thread after the detach, or the main thread after the join. test/hotspot/jtreg/runtime/Monitor/MonitorWithDeadObjectTest.java line 26: > 24: > 25: /* > 26: * @summary This test checks that ObjectMonitors with dead objects don't Please add `@bug` line test/hotspot/jtreg/runtime/Monitor/MonitorWithDeadObjectTest.java line 75: > 73: private static void testDetachThread() { > 74: // Create an ObjectMonitor with a dead object from an > 75: // attached thread. Unclear what the "Detach" in the method name has to do with anything. ?? test/hotspot/jtreg/runtime/Monitor/MonitorWithDeadObjectTest.java line 86: > 84: } > 85: > 86: private static void testDumpThreadsAfterDetachBeforeJoin() { The `AfterDetach` in the name is not accurate. If you don't join the new native thread then you are racing with its execution and you don't know when it will detach. test/hotspot/jtreg/runtime/Monitor/MonitorWithDeadObjectTest.java line 95: > 93: dumpThreadsWithLockedMonitors(); > 94: > 95: joinTestThread(); I'm not seeing the relevance of the join/no-join aspect of these tests. If you join the target thread then you know it has created the "bad" monitor, detached (and so unlocked) and exited; otherwise you are racing with it. test/hotspot/jtreg/runtime/Monitor/MonitorWithDeadObjectTest.java line 100: > 98: private static void testDumpThreadsAfterDetachAfterJoin() { > 99: createMonitorWithDeadObjectNoJoin(); > 100: joinTestThread(); How is this different to just calling `createMonitorWithDeadObject` ? test/hotspot/jtreg/runtime/Monitor/libMonitorWithDeadObjectTest.c line 47: > 45: > 46: #define check(env, what, msg) \ > 47: check_exception((env), (msg)); \ I'm not understanding why you have `check` and `check_exception` here nor why you choose to use one versus the other. ?? test/hotspot/jtreg/runtime/Monitor/libMonitorWithDeadObjectTest.c line 130: > 128: // test provokes that situation and that asserts. > 129: if ((*jvm)->DetachCurrentThread(jvm) != JNI_OK) die("DetachCurrentThread"); > 130: pthread_exit(NULL); You don't need to call `pthread_exit` - the thread's entry function can simply return. test/hotspot/jtreg/runtime/Monitor/libMonitorWithDeadObjectTest.c line 161: > 159: > 160: if (pthread_attr_init(&attr) != 0) die("pthread_attr_init"); > 161: if (pthread_create(&attacher, &attr, create_monitor_with_dead_object_in_thread, NULL) != 0) die("pthread_create"); You are not actually using the attr object to change anything. On AIX you may need to explicitly set the stack size. test/hotspot/jtreg/serviceability/jvmti/GetOwnedMonitorInfo/GetOwnedMonitorInfoTest.java line 1: > 1: /* Please update the `@bug` line and update the summary. test/hotspot/jtreg/serviceability/jvmti/GetOwnedMonitorInfo/GetOwnedMonitorInfoTest.java line 53: > 51: private static native boolean hasEventPosted(); > 52: > 53: private static void jniMonitorEnterAndLetObjectDie() { I can see it is convenient to just inject this test case in an existing test, but I'm not sure it is necessarily the right thing to do. Serviceability folk may have a stronger opinion. test/hotspot/jtreg/serviceability/jvmti/GetOwnedMonitorInfo/GetOwnedMonitorInfoTest.java line 58: > 56: // Inject this situation into this test that performs other > 57: // GetOwnedMonitorInfo testing. > 58: Object obj = new Object() { public String toString() {return "";} }; Nit: the `toString` definition is not needed. This could just be `new Object();`, or `new Object() {};` if you want to introduce a nested class. test/hotspot/jtreg/serviceability/jvmti/GetOwnedMonitorInfo/GetOwnedMonitorInfoTest.java line 78: > 76: final GetOwnedMonitorInfoTest lock = new GetOwnedMonitorInfoTest(); > 77: > 78: Thread t1 = threadFactory.newThread(() -> { Pre-existing nit: by default virtual threads have no name, so the output in the virtual thread case looks a little odd. Can you add: Thread.currentThread().setName("Worker-Thread"); please. test/hotspot/jtreg/serviceability/jvmti/GetOwnedMonitorInfo/libGetOwnedMonitorInfoTest.c line 270: > 268: Java_GetOwnedMonitorInfoTest_jniMonitorEnter(JNIEnv* env, jclass cls, jobject obj) { > 269: if ((*env)->MonitorEnter(env, obj) != 0) { > 270: fprintf(stderr, "MonitorEnter failed"); Should this be a fatal error? ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16783#pullrequestreview-1749493623 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1405540547 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1405541136 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1405547364 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1405548116 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1405548359 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1405543595 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1405551073 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1405544843 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1405548997 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1405550273 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1405549538 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1405552370 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1405551461 From dholmes at openjdk.org Mon Nov 27 02:25:19 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 27 Nov 2023 02:25:19 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v5] In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 01:45:09 GMT, David Holmes wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Split test and use othervm > > test/hotspot/jtreg/runtime/Monitor/MonitorWithDeadObjectTest.java line 75: > >> 73: private static void testDetachThread() { >> 74: // Create an ObjectMonitor with a dead object from an >> 75: // attached thread. > > Unclear what the "Detach" in the method name has to do with anything. ?? And why add these wrapper methods that simply call one other method? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1405546829 From iklam at openjdk.org Mon Nov 27 04:51:05 2023 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 27 Nov 2023 04:51:05 GMT Subject: RFR: 8320530: has_resolved_ref_index flag not restored after resetting entry In-Reply-To: References: <4Bi8mWx5pxfYciHnXHUla1X_BzUt_56q8MoRkCYc0dk=.50072528-a5d9-4708-995b-c0bd49f8c74b@github.com> Message-ID: On Thu, 23 Nov 2023 05:46:27 GMT, David Holmes wrote: > Change seems fine but what was the effect of not restoring the flag? Does this cause failures or just unnecessary re-resolution, or? > > Thanks The code works somehow, but in an unsafe manner. We are reading from `resolved_references_index()` even when the `has_resolved_ref_shift` bit has been (improperly) cleared. Adding the following assert makes it impossible to start jtreg: diff --git a/src/hotspot/share/oops/cpCache.cpp b/src/hotspot/share/oops/cpCache.cpp index daa094baa7e..760c5268c88 100644 --- a/src/hotspot/share/oops/cpCache.cpp +++ b/src/hotspot/share/oops/cpCache.cpp @@ -310,6 +310,9 @@ ResolvedMethodEntry* ConstantPoolCache::set_method_handle(int method_index, cons // Store appendix, if any. if (has_appendix) { + assert(method_entry->has_resolved_ref_index(), "huh"); const int appendix_index = method_entry->resolved_references_index(); objArrayOop resolved_references = constant_pool()->resolved_references(); I think we should take this as a chance to tighten up the code in resolvedMethodEntry.hpp: - `resolved_references_index()` should assert that `has_resolved_ref_index()`. - The following three functions should assert for mutual exclusivity. I.e., you can't call set_klass() and then call set_resolved_references_index(). Probably the easiest is to add two more bits: `_has_klass_shift` and `_has_table_index_shift`. At entry of these three setters, we should assert that all three klass/table_index/resolved_references_index bits are cleared. void set_klass(InstanceKlass* klass) { _entry_specific._interface_klass = klass; } void set_resolved_references_index(u2 ref_index) { set_flags(1 << has_resolved_ref_shift); _entry_specific._resolved_references_index = ref_index; } void set_table_index(u2 table_index) { _entry_specific._table_index = table_index; } Also, `has_resolved_ref_index` should be renamed to `has_resolved_reference_index`; otherwise it's difficult to search for all code related to `resolved_refenece_index`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16769#issuecomment-1827119981 From dholmes at openjdk.org Mon Nov 27 05:39:16 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 27 Nov 2023 05:39:16 GMT Subject: RFR: JDK-8320383: refresh libraries cache on AIX in VMError::report [v3] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 14:46:28 GMT, Matthias Baesken wrote: >> VMError::report outputs information about loaded shared libraries; but this info could be outdated on AIX because the libraries cache is currently not refreshed before printing. >> This is similar to [JDK-8318587](https://bugs.openjdk.org/browse/JDK-8318587) . >> The refresh in VMError::report could be omitted in some situations where the refresh call is considered problematic. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > use new method also in print_vm_info This one slipped past me and I don't like it. This is an AIX specific change masquerading as a general operation but it is not needed on any platform but AIX. > Otherwise, I'd be in favor of finding a reasonable OS abstraction for this. @stuefe this is the opposite to what you suggested for the AIX specific changes for static library loading. It was proposed there to implement an os abstraction and you rightly said no because it was an AIX only issue. I don't see this is any different. And I have to ask why are we suddenly seeing so many of these issues with AIX? Has it recently undergone some major changes that now make it incompatible with a lot of our shared code? Or have "we" just not cared about these issues in the past. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16730#issuecomment-1827152960 From jbhateja at openjdk.org Mon Nov 27 05:51:12 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 27 Nov 2023 05:51:12 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v7] In-Reply-To: References: Message-ID: <_GqHkr9u5z5IUCdULqmzh4auE22716Wiavxo-03PkQA=.015ba45e-6715-40f1-8863-345c6dc9e9d9@github.com> On Sun, 26 Nov 2023 18:25:24 GMT, Volodymyr Paprotski wrote: >> Splitting vblendvp[sd] into boolean operations is bit faster on ECore, get up to 30% gain >> >> >> =============== BEFORE =============== >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorSignum.floatSignum 256 avgt 3 77.766 ? 0.049 ns/op >> VectorSignum.floatSignum 512 avgt 3 154.889 ? 0.242 ns/op >> VectorSignum.floatSignum 1024 avgt 3 306.130 ? 0.605 ns/op >> VectorSignum.floatSignum 2048 avgt 3 609.965 ? 0.927 ns/op >> VectorSignum.doubleSignum 256 avgt 3 151.874 ? 1.748 ns/op >> VectorSignum.doubleSignum 512 avgt 3 303.080 ? 0.310 ns/op >> VectorSignum.doubleSignum 1024 avgt 3 607.517 ? 0.597 ns/op >> VectorSignum.doubleSignum 2048 avgt 3 1214.282 ? 1.834 ns/op >> Benchmark Mode Cnt Score Error Units >> MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op >> MaxMinOptimizeTest.dMax avgt 3 137.334 ? 0.128 us/op >> MaxMinOptimizeTest.dMin avgt 3 137.160 ? 0.465 us/op >> MaxMinOptimizeTest.dMul avgt 3 77.231 ? 0.051 us/op >> MaxMinOptimizeTest.fAdd avgt 3 77.165 ? 0.003 us/op >> MaxMinOptimizeTest.fMax avgt 3 107.428 ? 1.501 us/op >> MaxMinOptimizeTest.fMin avgt 3 107.186 ? 0.022 us/op >> MaxMinOptimizeTest.fMul avgt 3 77.164 ? 0.012 us/op >> >> =============== AFTER =============== >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorSignum.floatSignum 256 avgt 3 61.816 ? 1.980 ns/op >> VectorSignum.floatSignum 512 avgt 3 117.251 ? 0.052 ns/op >> VectorSignum.floatSignum 1024 avgt 3 231.356 ? 0.397 ns/op >> VectorSignum.floatSignum 2048 avgt 3 458.904 ? 0.774 ns/op >> VectorSignum.doubleSignum 256 avgt 3 121.449 ? 0.184 ns/op >> VectorSignum.doubleSignum 512 avgt 3 241.662 ? 0.189 ns/op >> VectorSignum.doubleSignum 1024 avgt 3 482.365 ? 0.165 ns/op >> VectorSignum.doubleSignum 2048 avgt 3 962.412 ? 1.401 ns/op >> Benchmark Mode Cnt Score Error Units >> MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op >> MaxMinOptimizeTest.dMax avgt 3 125.701 ? 0.082 us/op >> MaxMinOptimizeTest.dMin avgt 3 124.704 ? 0.119 us/op >> MaxMinOptimizeTest.dMul avgt 3 77.232 ? 0.028 us/op >> MaxMinOptimizeTest.fAdd avgt 3 77.169 ? 0.103 us/op >> MaxMinOptimizeTest.fMax avgt 3 97.939 ? 0.477 us/op >> MaxMinOptimizeTest.fMin avgt 3 98.012 ? 0.154 us/op >> MaxMinO... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > whitespace again There are few other usages of variable blends in following methods, please call new macro assembly routines in its place. C2_MacroAssembler::vector_cast_double_to_int_special_cases_avx C2_MacroAssembler::vector_count_leading_zeros_int_avx C2_MacroAssembler::vector_cast_float_to_int_special_cases_avx src/hotspot/cpu/x86/x86.ad line 7821: > 7819: > 7820: instruct vblendvpFD(legVec dst, legVec src1, legVec src2, legVec mask) %{ > 7821: predicate(UseAVX > 0 && !EnableX86ECoreOpts && Why do you not call newly added macro assembly routine to emulate vblendvps from instruction encoding, we already have EnableX86ECoreOpts checks within it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16716#issuecomment-1827160407 PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1405642111 From stuefe at openjdk.org Mon Nov 27 06:28:15 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 27 Nov 2023 06:28:15 GMT Subject: RFR: JDK-8320383: refresh libraries cache on AIX in VMError::report [v3] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 14:46:28 GMT, Matthias Baesken wrote: >> VMError::report outputs information about loaded shared libraries; but this info could be outdated on AIX because the libraries cache is currently not refreshed before printing. >> This is similar to [JDK-8318587](https://bugs.openjdk.org/browse/JDK-8318587) . >> The refresh in VMError::report could be omitted in some situations where the refresh call is considered problematic. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > use new method also in print_vm_info > This one slipped past me and I don't like it. This is an AIX specific change masquerading as a general operation but it is not needed on any platform but AIX. > > > Otherwise, I'd be in favor of finding a reasonable OS abstraction for this. > > @Stuefe this is the opposite to what you suggested for the AIX specific changes for static library loading. It was proposed there to implement an os abstraction and you rightly said no because it was an AIX only issue. I don't see this is any different. > @dholmes-ora The reason I proposed this is that on Windows, we have code paths that are executed on demand on symbol decoding. Its exactly the same thing as on AIX: we need to refresh the loaded pdb list. This would fit well into this abstraction. The ElfDecoder opens the dwarf file for sourceinfo on the first query. Again, would fit here. Every decoding mechanism that needs setup could use this dedicated setup routine instead of doing it on demand when the first query happens. > And I have to ask why are we suddenly seeing so many of these issues with AIX? Has it recently undergone some major changes that now make it incompatible with a lot of our shared code? Or have "we" just not cared about these issues in the past. Both IBM and SAP stepped up their AIX efforts because the AIX port is still needed. More eyes find more issues. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16730#issuecomment-1827192938 From stuefe at openjdk.org Mon Nov 27 06:42:08 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 27 Nov 2023 06:42:08 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v10] In-Reply-To: References: Message-ID: On Sun, 26 Nov 2023 14:57:22 GMT, Andrew Haley wrote: > AArch64 looks good. One question, though: did you ever find the root cause of JDK-8318119? Without that, how do we know we've fixed it? Didn't find it, no. I'll ping the originator again before pushing. The over-allocate-for-alignment allocation is just a stab, albeit an informed one. I had looked into kernel sources for OrangePi and examined the delta to the stock kernel. I did not find anything of note; the largest delta was added hardware descriptions for those SBCs. I wanted to find if the kernel limited mmap to specific addresses somehow, or if the address space was very small. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16743#issuecomment-1827207959 From dholmes at openjdk.org Mon Nov 27 07:36:18 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 27 Nov 2023 07:36:18 GMT Subject: RFR: JDK-8320383: refresh libraries cache on AIX in VMError::report [v3] In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 06:25:17 GMT, Thomas Stuefe wrote: > The reason I proposed this is that on Windows, we have code paths that are executed on demand on symbol decoding. Its exactly the same thing as on AIX: we need to refresh the loaded pdb list. This would fit well into this abstraction. But the Windows implementation of this abstraction is also empty! Is there some follow up to actually put this new abstraction into actual use? My recollection is that the Windows refresh worked fine in the Windows code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16730#issuecomment-1827270341 From duke at openjdk.org Mon Nov 27 08:59:14 2023 From: duke at openjdk.org (Liming Liu) Date: Mon, 27 Nov 2023 08:59:14 GMT Subject: RFR: 8315923: pretouch_memory by atomic-add-0 fragments huge pages unexpectedly [v12] In-Reply-To: References: Message-ID: On Mon, 30 Oct 2023 06:20:07 GMT, Liming Liu wrote: >> As described at [JDK-8315923](https://bugs.openjdk.org/browse/JDK-8315923), this patch uses madvise with MADV_POPULATE_WRITE to pretouch memory when supported (since kernel 5.14). >> >> Ran the newly added jtreg test on 64c Neoverse-N1 machines with kernel 4.18, 5.13 and 6.1, and observed that transparent huge pages formed right after pretouch on kernel 6.1. Recorded the time spent on the test in *seconds* with `VERBOSE=time` as the table below, and got that the patch takes improvements when the system call is supported, while does not hurt if not supported: >> >> >> >> >> >> >> >> >> >> >> >>
Kernel-XX:-TransparentHugePages-XX:+TransparentHugePages
UnpatchedPatchedUnpatchedPatched
4.1811.3011.300.250.25
5.130.220.223.423.42
6.10.270.333.540.33
> > Liming Liu has updated the pull request incrementally with one additional commit since the last revision: > > Update the name of the method Ping! ------------- PR Comment: https://git.openjdk.org/jdk/pull/15781#issuecomment-1827396001 From adinn at openjdk.org Mon Nov 27 09:06:08 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 27 Nov 2023 09:06:08 GMT Subject: RFR: 8320530: has_resolved_ref_index flag not restored after resetting entry In-Reply-To: References: <4Bi8mWx5pxfYciHnXHUla1X_BzUt_56q8MoRkCYc0dk=.50072528-a5d9-4708-995b-c0bd49f8c74b@github.com> Message-ID: On Mon, 27 Nov 2023 04:48:19 GMT, Ioi Lam wrote: > . . . Probably the easiest is to add two more bits: _has_klass_shift and _has_table_index_shift. . . . Maybe so, but only in debug builds? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16769#issuecomment-1827405538 From jbachorik at openjdk.org Mon Nov 27 09:16:30 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Mon, 27 Nov 2023 09:16:30 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v10] In-Reply-To: References: Message-ID: > Please, review this fix for a corner case handling of `jmethodID` values. > > The issue is related to the interplay between `jmethodID` values and method redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` instance. Once that method gets redefined, the `jmethodID` is updated to point to the last `Method` version. > Unless the method is still on stack/running, in which case the original `jmethodID` will be redirected to the latest `Method` version and at the same time the 'previous' `Method` version will receive a new `jmethodID` pointing to that previous version. > > If we happen to capture stacktrace via `GetStackTrace` or `GetAllStackTraces` JVMTI calls while this previous `Method` version is still on stack we will have the corresponding frame identified by a `jmethodID` pointing to that version. > However, sooner or later the 'previous' class version becomes eligible for cleanup at what time all contained `Method` instances. The cleanup process will not perform the `jmethodID` pointer maintenance and we will end up with pointers to deallocated memory. > This is caused by the fact that the `jmethodID` lifecycle is bound to `ClassLoaderData` instance and all relevant `jmethodID`s will get batch-updated when the class loader is being released and all its classes are getting unloaded. > > This means that we need to make sure that if a `Method` instance is being deallocate the associated `jmethodID` (if any) must not point to the deallocated instance once we are finished. Unfortunately, we can not just update the `jmethodID` values in bulk when purging an old class version - the per `InstanceKlass` jmethodID cache is present only for the main class version and contains `jmethodID` values for both the old and current method versions. > > ~Therefore we need to perform `jmethodID` lookup when we are about to deallocate a `Method` instance and clean up the pointer only if that `jmethodID` is pointing to the `Method` instance which is being deallocated.~ > > Therefore, we need to perform `jmethodID` lookup for each method in an old class version that is getting purged, and null out the pointer of that `jmethodID` to break the link from `jmethodID` to the method instance that is about to get deallocated. > > _(For anyone interested, a much lengthier writeup is available in [my blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_ Jaroslav Bachorik has updated the pull request incrementally with one additional commit since the last revision: Remove unnecessary assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16662/files - new: https://git.openjdk.org/jdk/pull/16662/files/554b3ae0..81e31dae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16662&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16662&range=08-09 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16662.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16662/head:pull/16662 PR: https://git.openjdk.org/jdk/pull/16662 From jbachorik at openjdk.org Mon Nov 27 09:16:31 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Mon, 27 Nov 2023 09:16:31 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v8] In-Reply-To: References: Message-ID: On Sat, 25 Nov 2023 02:20:18 GMT, David Holmes wrote: >> This has gotten a lot more complicated. All I was suggesting was if this: >> >> if (with_method_holders) { >> method->clear_jmethod_id(); >> } >> >> could be changed to >> >> if (method->method_holder() == nullptr) { >> method->clear_jmethod_id(); >> } >> >> Now I'm not at all sure what you are doing. > >> @dholmes-ora Unfortunately, I can not just do `method->method_holder() == nullptr` as `method_holder()` is expanding to `Method::constants()->pool_holder()` and `Method::constants()` is expanding to `Method::constMethod()->constants()`. This can cause SIGSEGV if either `Method::_constMethod` or `ConstMethod::_constants` is NULL. > > I'm getting a strange sense of deja-vu here. This API is flawed if you cannot even ask for the method holder without some intervening value causing a SEGV. I've lost sight of the big picture here in terms of the lifecycle of the Method we are querying, the methodID we may be clearing and the existence or not of a method_holder(). @dholmes-ora I removed the assert. It is not necessary any more as the call to `Method::clear_jmethod_id()` was moved from the more generic `Method::deallocate_contents()` to `InstanceKlass::clear_jmethod_ids()` which is called if and only if the previous class versions are being purged. Because the issue with the method holder is related to `ClassParser` and not fully initialized classes only, the assert can safely be removed. The `Method::clear_jmethod_id()` will never be called in a context when the link to its method holder is broken. > I've lost sight of the big picture here in terms of the lifecycle of the Method we are querying, the methodID we may be clearing and the existence or not of a method_holder(). I have updated the PR description to correspond to the actual state of affairs - the change is that instead of doing the `jmethodID` pointer maintenance for each `Method::deallocate_contents()` call it will be done only for methods contained by the old class versions that are getting purged. This has two benefits compared to the original proposal: - we add the overhead of `jmethodID` lookup only to the problematic case of purging old class versions - method holder is always valid when purging old class versions so we don't need to have checks/asserts for that ------------- PR Comment: https://git.openjdk.org/jdk/pull/16662#issuecomment-1827409695 PR Comment: https://git.openjdk.org/jdk/pull/16662#issuecomment-1827430093 From jbachorik at openjdk.org Mon Nov 27 09:16:33 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Mon, 27 Nov 2023 09:16:33 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v10] In-Reply-To: References: Message-ID: On Mon, 20 Nov 2023 22:09:47 GMT, Coleen Phillimore wrote: >> Jaroslav Bachorik has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unnecessary assert > > Good analysis for a very subtle bug. I have a couple of comments, and maybe the test can be simplified but approving the change. @coleenp I am sorry for bothering you again - the implementation changed slightly based on @tstuefe comments so I would like get your ? before proceeding with this version. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16662#issuecomment-1827432032 From stefank at openjdk.org Mon Nov 27 09:20:13 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 27 Nov 2023 09:20:13 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v5] In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 01:43:07 GMT, David Holmes wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Split test and use othervm > > test/hotspot/jtreg/runtime/Monitor/MonitorWithDeadObjectTest.java line 26: > >> 24: >> 25: /* >> 26: * @summary This test checks that ObjectMonitors with dead objects don't > > Please add `@bug` line Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1405858379 From stefank at openjdk.org Mon Nov 27 09:30:12 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 27 Nov 2023 09:30:12 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v5] In-Reply-To: References: Message-ID: <32f_pRuVEPp9K4uP018OxlePZw6bNXQcNVrcWXUadmQ=.c74d725e-5604-4726-bc5e-adbd67f781fb@github.com> On Mon, 27 Nov 2023 01:52:29 GMT, David Holmes wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Split test and use othervm > > test/hotspot/jtreg/runtime/Monitor/libMonitorWithDeadObjectTest.c line 47: > >> 45: >> 46: #define check(env, what, msg) \ >> 47: check_exception((env), (msg)); \ > > I'm not understanding why you have `check` and `check_exception` here nor why you choose to use one versus the other. ?? Some JNI calls return something, for those I can use `check` which combines a null-check and an exception check. Some tests don't return anything, they can't null-check and can only perform an exception check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1405874373 From stefank at openjdk.org Mon Nov 27 09:30:08 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 27 Nov 2023 09:30:08 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v5] In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 02:02:08 GMT, David Holmes wrote: >> test/hotspot/jtreg/runtime/Monitor/MonitorWithDeadObjectTest.java line 75: >> >>> 73: private static void testDetachThread() { >>> 74: // Create an ObjectMonitor with a dead object from an >>> 75: // attached thread. >> >> Unclear what the "Detach" in the method name has to do with anything. ?? > > And why add these wrapper methods that simply call one other method? > Unclear what the "Detach" in the method name has to do with anything. ?? This test case provokes the assert we hit when the monitor is visited inside DetachCurrentThread. I updated the comment to state that. > And why add these wrapper methods that simply call one other method? Because I find this structure more cohesive and better structured. I have four functions representing the four tests. The fact that two of them in turn only call one function is an implementation detail. I don't want to push the call to `createMonitorWithDeadObject` down into the main function, because then I also have to move the comment there, and suddenly the main function becomes more then just a super simple dispatch function. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1405871913 From stefank at openjdk.org Mon Nov 27 09:38:09 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 27 Nov 2023 09:38:09 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v5] In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 01:56:01 GMT, David Holmes wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Split test and use othervm > > test/hotspot/jtreg/runtime/Monitor/libMonitorWithDeadObjectTest.c line 161: > >> 159: >> 160: if (pthread_attr_init(&attr) != 0) die("pthread_attr_init"); >> 161: if (pthread_create(&attacher, &attr, create_monitor_with_dead_object_in_thread, NULL) != 0) die("pthread_create"); > > You are not actually using the attr object to change anything. On AIX you may need to explicitly set the stack size. OK. I copied this CommpleteExit.c. Should that be changed as well? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1405883743 From stefank at openjdk.org Mon Nov 27 09:41:09 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 27 Nov 2023 09:41:09 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v5] In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 02:03:30 GMT, David Holmes wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Split test and use othervm > > test/hotspot/jtreg/runtime/Monitor/MonitorWithDeadObjectTest.java line 86: > >> 84: } >> 85: >> 86: private static void testDumpThreadsAfterDetachBeforeJoin() { > > The `AfterDetach` in the name is not accurate. If you don't join the new native thread then you are racing with its execution and you don't know when it will detach. Thanks. I'll remove the "before join" test case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1405887536 From stefank at openjdk.org Mon Nov 27 09:44:09 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 27 Nov 2023 09:44:09 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v5] In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 02:09:56 GMT, David Holmes wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Split test and use othervm > > test/hotspot/jtreg/serviceability/jvmti/GetOwnedMonitorInfo/GetOwnedMonitorInfoTest.java line 58: > >> 56: // Inject this situation into this test that performs other >> 57: // GetOwnedMonitorInfo testing. >> 58: Object obj = new Object() { public String toString() {return "";} }; > > Nit: the `toString` definition is not needed. This could just be `new Object();`, or `new Object() {};` if you want to introduce a nested class. Thanks. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1405891539 From stefank at openjdk.org Mon Nov 27 09:49:11 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 27 Nov 2023 09:49:11 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v5] In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 02:14:05 GMT, David Holmes wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Split test and use othervm > > test/hotspot/jtreg/runtime/Monitor/libMonitorWithDeadObjectTest.c line 130: > >> 128: // test provokes that situation and that asserts. >> 129: if ((*jvm)->DetachCurrentThread(jvm) != JNI_OK) die("DetachCurrentThread"); >> 130: pthread_exit(NULL); > > You don't need to call `pthread_exit` - the thread's entry function can simply return. This is more code copied from CompleteExit.c. > test/hotspot/jtreg/serviceability/jvmti/GetOwnedMonitorInfo/GetOwnedMonitorInfoTest.java line 53: > >> 51: private static native boolean hasEventPosted(); >> 52: >> 53: private static void jniMonitorEnterAndLetObjectDie() { > > I can see it is convenient to just inject this test case in an existing test, but I'm not sure it is necessarily the right thing to do. Serviceability folk may have a stronger opinion. Yeah, I was thinking the same. Maybe @sspitsyn or @plummercj could give guidance here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1405897185 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1405894786 From stefank at openjdk.org Mon Nov 27 09:58:13 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 27 Nov 2023 09:58:13 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v5] In-Reply-To: References: Message-ID: <9LJbnE5xMGXeKEknTJYPvw9iEPkdPKzspxTM8wUppvM=.d31a22b2-5dca-442d-b298-095104d5b06d@github.com> On Mon, 27 Nov 2023 02:17:27 GMT, David Holmes wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Split test and use othervm > > test/hotspot/jtreg/serviceability/jvmti/GetOwnedMonitorInfo/GetOwnedMonitorInfoTest.java line 78: > >> 76: final GetOwnedMonitorInfoTest lock = new GetOwnedMonitorInfoTest(); >> 77: >> 78: Thread t1 = threadFactory.newThread(() -> { > > Pre-existing nit: by default virtual threads have no name, so the output in the virtual thread case looks a little odd. Can you add: > > Thread.currentThread().setName("Worker-Thread"); > > please. Sure. > test/hotspot/jtreg/serviceability/jvmti/GetOwnedMonitorInfo/libGetOwnedMonitorInfoTest.c line 270: > >> 268: Java_GetOwnedMonitorInfoTest_jniMonitorEnter(JNIEnv* env, jclass cls, jobject obj) { >> 269: if ((*env)->MonitorEnter(env, obj) != 0) { >> 270: fprintf(stderr, "MonitorEnter failed"); > > Should this be a fatal error? I added a call to exit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1405909361 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1405906191 From duke at openjdk.org Mon Nov 27 10:02:14 2023 From: duke at openjdk.org (suchismith1993) Date: Mon, 27 Nov 2023 10:02:14 GMT Subject: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2] In-Reply-To: References: <8-buFPL9W3149qcnluk_XqTQr-cJYqu_XvwU5ovyAIA=.396e5005-f896-48b9-919c-94164229d7bf@github.com> <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> Message-ID: On Fri, 24 Nov 2023 14:04:07 GMT, Thomas Stuefe wrote: > > > i would have to repeat the line 1132 and 1139 in os_aix.cpp again , if the condition fails for .so files, because i have to reload it again and check if the .a exists. In the shared code i had repeat less number of lines i believe. Do you suggest moving lines 1132 to 1139 to another function then ? > > > > > > @tstuefe Any suggestion on this ? > > ``` > --- a/src/hotspot/os/aix/os_aix.cpp > +++ b/src/hotspot/os/aix/os_aix.cpp > @@ -1108,7 +1108,7 @@ bool os::dll_address_to_library_name(address addr, char* buf, > return true; > } > > -void *os::dll_load(const char *filename, char *ebuf, int ebuflen) { > +static void* dll_load_inner(const char *filename, char *ebuf, int ebuflen) { > > log_info(os)("attempting shared library load of %s", filename); > > @@ -1158,6 +1158,35 @@ void *os::dll_load(const char *filename, char *ebuf, int ebuflen) { > return nullptr; > } > > +void* os::dll_load(const char *filename, char *ebuf, int ebuflen) { > + > + void* result = nullptr; > + > + // First try using *.so suffix; failing that, retry with *.a suffix. > + const size_t len = strlen(filename); > + constexpr size_t safety = 3 + 1; > + constexpr size_t bufsize = len + safety; > + char* buf = NEW_C_HEAP_ARRAY(char, bufsize, mtInternal); > + strcpy(buf, filename); > + char* const dot = strrchr(buf, '.'); > + > + assert(dot != nullptr, "Attempting to load a shared object without extension? %s", filename); > + assert(strcmp(dot, ".a") == 0 || strcmp(dot, ".so") == 0, > + "Attempting to load a shared object that is neither *.so nor *.a", filename); > + > + sprintf(dot, ".so"); > + result = dll_load_inner(buf, ebuf, ebuflen); > + > + if (result == nullptr) { > + sprintf(dot, ".a"); > + result = dll_load_inner(buf, ebuf, ebuflen); > + } > + > + FREE_C_HEAP_ARRAY(char, buf); > + > + return result; > +} > + > ``` Thanks for sharing ! I have worked on an alternate code as well, takling some lessons from your code. I see a couple of issues **Issue 1.** assert(strcmp(dot, ".a") == 0 || strcmp(dot, ".so") == 0, + "Attempting to load a shared object that is neither *.so nor *.a", filename); This fails as we have paths such as /usr/lib/libc.a(shr_64.0) . However this check is already done in dll_load_inner already. **Issue 2 :** After calling dll_load_inner for .so ,after appending ".a" to filename, i face a segmentation fault. On looking into it further, i see the dlopen succeeds, but it is failing in stat64 being done using save_signature method for AIX. I even tried by just supplying a string "libam_ibm_16.a" without doing any string operations and i still see the issue. Does this look familiar ? # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/hotspot/openjdk/jdk-suchi/jdk/src/hotspot/share/prims/jvmtiAgent.cpp:307), pid=31719930, tid=258 # assert(false) failed: stat64x failed # # JRE version: (22.0) (fastdebug build ) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 22-internal-adhoc.hotspot.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, aix-ppc64) # Core dump will be written. Default location: /home/hotspot/openjdk/jdk-suchi/j2se/j2se_dc/.gdc/7.3.0.15.0/runtime/j2secheckapp.p8-java1-hs01.checkapp/core or core.31719930 # # An error report file with more information is saved as: # /home/hotspot/openjdk/jdk-suchi/j2se/j2se_dc/.gdc/7.3.0.15.0/runtime/j2secheckapp.p8-java1-hs01.checkapp/hs_err_pid31719930.log # # ------------- PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1827513022 From jvernee at openjdk.org Mon Nov 27 10:04:13 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 27 Nov 2023 10:04:13 GMT Subject: RFR: 8267532: C2: Profile and prune untaken exception handlers [v10] In-Reply-To: References: Message-ID: <77x_EsDnKYUGRzIe6pvIk3t-7y5LjhSkxfR6xcVmH2s=.39b90785-1864-48bf-8da1-62246118e353@github.com> On Thu, 23 Nov 2023 15:31:28 GMT, Jorn Vernee wrote: >> The issue is essentially that for the Java try-with-resource construct, javac generates multiple calls to `close(`) the resource. One of those calls is inside the hidden exception handler of the try block. The issue for us is that typically the exception handler is never entered (since no exception is thrown), however we don't profile exception handlers at the moment, so the block is not pruned. C2 doesn't inline the `close()` call in the handler due to low call site frequency. As a result, the receiver of that call escapes and can not be scalar replaced, which then leads to a loss in performance. >> >> There has been some discussion on the JBS issue that this could be fixed by profiling catch blocks. And another suggestion that partial escape analysis could help here to prevent the object from escaping. But, I think there are other benefits to being able to prune dead catch blocks, such as general reduction in code size, and other optimizations being possible by dead code being eliminated. So, I've implemented catch block profiling + pruning in this patch. >> >> The implementation is essentially very straightforward: we allocate an extra bit of profiling data for each >> exception handler of a method in the `MethodData` for that method (which holds all the profiling >> data). Then when looking up the exception handler after an exception is thrown, we mark the >> exception handler as entered. When C2 parses the exception handler block, and it sees that it has >> never been entered, we emit an uncommon trap instead. >> >> I've also cleaned up the handling of profiling data sections a bit. After adding the extra section of data to MethodData, I was seeing several crashes when ciMethodData was used. The underlying issue seemed to be that the offset of the parameter data was computed based on the total data size - parameter data size (which doesn't work if we add an additional section for exception handler data). I've re-written the code around this a bit to try and prevent issues in the future. Both MethodData and ciMethodData now track offsets of parameter data and exception handler data, and the size of the each data section is derived from the offsets. >> >> Finally, there was an assert firing in `freeze_internal` in `continuationFreezeThaw.cpp`: >> >> assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), >> "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int... > > Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: > > - add interpreter profiling specific test cases > - rename ex_handler -> exception_handler Another round of tier 1 - 8 testing came back clean. I'm planning to integrate the patch tomorrow. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1827516420 From aph at openjdk.org Mon Nov 27 10:31:13 2023 From: aph at openjdk.org (Andrew Haley) Date: Mon, 27 Nov 2023 10:31:13 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: <17T7DXMTf1rFVeQY_FWJx0DETVYlBWceJO4lltWZyw0=.847b02a3-ef40-464a-80b7-d62bd9dbc2b5@github.com> References: <3dscjw75an6_n3ko_2LjHdpdj08xsyjhfpStuZrS5-M=.f43592fe-036e-47b5-babc-67f126885839@github.com> <17T7DXMTf1rFVeQY_FWJx0DETVYlBWceJO4lltWZyw0=.847b02a3-ef40-464a-80b7-d62bd9dbc2b5@github.com> Message-ID: On Mon, 27 Nov 2023 01:06:21 GMT, Xiaohong Gong wrote: >> make/autoconf/lib-vmath.m4 line 94: >> >>> 92: # Check the ARM SVE feature >>> 93: SVE_CFLAGS="-march=armv8-a+sve" >>> 94: >> >> What's this about? We're building a standard binary, and all SVE detection is to be deferred to runtime. > > We have to use this c-compiler option to build out the SVE ABIs (e.g. `svfloat32_t sinfx_u10sve(svfloat32_t input)`) in `libvmath.so`. Without this option, at build time, all the sve related featues like `arm_sve.h / __ARM_FEATURE_SVE` are missing, together with the sve symbols in `libvmath.so` we needed at runtime. That means at runtime, hotspot cannot find out the sve symbols and the vector operations will fall back to the default java implementation. That's fine, but we must make sure that SVE is not used by the compiler in any other places. If you've already built on non-SVE and tested the result on both SVE and non-SVE, I'm happy. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1405950723 From stefank at openjdk.org Mon Nov 27 10:38:58 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 27 Nov 2023 10:38:58 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v6] In-Reply-To: References: Message-ID: > In the rewrites made for: > [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` > > I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. > > The provided tests provoke this assert form: > * the JNI thread detach code > * thread dumping with locked monitors, and > * the JVMTI GetOwnedMonitorInfo API. > > While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. > > The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. > > For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. > > Test: the written tests with and without the fix. Tier1-Tier3, so far. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16783/files - new: https://git.openjdk.org/jdk/pull/16783/files/bad51926..d08a930e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16783&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16783&range=04-05 Stats: 62 lines in 4 files changed: 6 ins; 43 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/16783.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16783/head:pull/16783 PR: https://git.openjdk.org/jdk/pull/16783 From duke at openjdk.org Mon Nov 27 11:39:07 2023 From: duke at openjdk.org (suchismith1993) Date: Mon, 27 Nov 2023 11:39:07 GMT Subject: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2] In-Reply-To: References: <8-buFPL9W3149qcnluk_XqTQr-cJYqu_XvwU5ovyAIA=.396e5005-f896-48b9-919c-94164229d7bf@github.com> <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> Message-ID: On Fri, 24 Nov 2023 14:04:07 GMT, Thomas Stuefe wrote: > > > i would have to repeat the line 1132 and 1139 in os_aix.cpp again , if the condition fails for .so files, because i have to reload it again and check if the .a exists. In the shared code i had repeat less number of lines i believe. Do you suggest moving lines 1132 to 1139 to another function then ? > > > > > > @tstuefe Any suggestion on this ? > > ``` > --- a/src/hotspot/os/aix/os_aix.cpp > +++ b/src/hotspot/os/aix/os_aix.cpp > @@ -1108,7 +1108,7 @@ bool os::dll_address_to_library_name(address addr, char* buf, > return true; > } > > -void *os::dll_load(const char *filename, char *ebuf, int ebuflen) { > +static void* dll_load_inner(const char *filename, char *ebuf, int ebuflen) { > > log_info(os)("attempting shared library load of %s", filename); > > @@ -1158,6 +1158,35 @@ void *os::dll_load(const char *filename, char *ebuf, int ebuflen) { > return nullptr; > } > > +void* os::dll_load(const char *filename, char *ebuf, int ebuflen) { > + > + void* result = nullptr; > + > + // First try using *.so suffix; failing that, retry with *.a suffix. > + const size_t len = strlen(filename); > + constexpr size_t safety = 3 + 1; > + constexpr size_t bufsize = len + safety; > + char* buf = NEW_C_HEAP_ARRAY(char, bufsize, mtInternal); > + strcpy(buf, filename); > + char* const dot = strrchr(buf, '.'); > + > + assert(dot != nullptr, "Attempting to load a shared object without extension? %s", filename); > + assert(strcmp(dot, ".a") == 0 || strcmp(dot, ".so") == 0, > + "Attempting to load a shared object that is neither *.so nor *.a", filename); > + > + sprintf(dot, ".so"); > + result = dll_load_inner(buf, ebuf, ebuflen); > + > + if (result == nullptr) { > + sprintf(dot, ".a"); > + result = dll_load_inner(buf, ebuf, ebuflen); > + } > + > + FREE_C_HEAP_ARRAY(char, buf); > + > + return result; > +} > + > ``` Hi @tstuefe Thanks for sharing!. i have worked on an alternate code taking insight from your code. There are a couple of issues being faced, but the most improtant one is the segmentation fault that is occuring. 1. The dll_load function makes copy of filename and appends .a and then tries to load the library. However , in jvmTiAgent.cpp, we call save_library_signature method. Since the changes to filename are happening only inside dlll_load, it is not reflected in jvmTiAgent, due to which stat64 command is run on .old ".so" filename and not the ".a" filename. So stat64 function call happens with invalid library name and hence it causes segmentation fault. To make the changes reflected, we might need to do strcpy of modified filename(buffer) to filename inside dll_load , but that is not possible as it is const char*. I dont think we can/should change the signature of dll_load here. 2. I tried moving save_library_signature from jvmTiAgent to os_aix but then it also needs the "agent" object to be passed as well, which will lead to change in signature of dll_load. What do you think could be a solution for this ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1827661214 From mcimadamore at openjdk.org Mon Nov 27 11:39:10 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 27 Nov 2023 11:39:10 GMT Subject: RFR: 8310644: Make panama memory segment close use async handshakes [v5] In-Reply-To: References: Message-ID: On Fri, 24 Nov 2023 18:30:17 GMT, Erik ?sterlund wrote: >> The current logic for closing memory in panama today is susceptible to live lock if we have a closing thread that wants to close the memory in a loop that keeps failing, and a bunch of accessing threads that want to perform accesses as long as the memory is alive. They can both create impediments for the other. >> >> By using asynchronous handshakes to install an exception onto threads that are in @Scoped memory accesses, we can have close always succeed, and the accessing threads bail out. The idea is that we perform a synchronous handshake first to find threads that are in scoped methods. They might however be in the middle of throwing an exception or something wild like there, where an exception can't be delivered. We install an async handshake that will roll us forward to the first place where we can indeed install exceptions, then we reevaluate if we still need to do that, or if we have unwound out from the scoped method. If we are still inside of it, we ensure an exception is installed so we don't continue executing bytecodes that might access the memory that we have freed. >> >> Tested tier 1-5 as well as running test/jdk/java/foreign/TestHandshake.java hundreds of times, which tests this API pretty well. > > Erik ?sterlund has updated the pull request incrementally with two additional commits since the last revision: > > - Merge pull request #3 from JornVernee/PR_async_close+NoToNativeTrans > > - don't transition to native state on Unsafe_CopySwapMemory0 The new logic looks good. Good catch on the array copy with swap! ------------- Marked as reviewed by mcimadamore (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16792#pullrequestreview-1750232220 From sjohanss at openjdk.org Mon Nov 27 12:08:21 2023 From: sjohanss at openjdk.org (Stefan Johansson) Date: Mon, 27 Nov 2023 12:08:21 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v47] In-Reply-To: References: Message-ID: On Wed, 22 Nov 2023 23:08:36 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup and address comments My additional testing found a failure when running: `sun/tools/jstat/jstatSnap1.sh` When running `jstat -snap 0` (which is what the test does) with this change the new counters are displayed even though they belong to the unsupported namespace "sun.". From what I can tell, the reason for this is that when creating a new subsystem namespace we need to create three of them, one for each top-level namespace. Otherwise the code trying to figure out if this is supported or not is tripped over. I've suggested two changes to do this and with those the test in question no longer fails. It would be good if someone better familiar with jstat/perf-counter code verifies that this is how it should be solved. src/hotspot/share/runtime/perfData.cpp line 76: > 74: "com.sun.threads", > 75: "sun.threads", > 76: "sun.threads.cpu_time", // Subsystem for Sun Threads CPU times Suggestion: "java.threads.cpu_time", //Thread CPU time name spaces "com.sun.threads.cpu_time", "sun.threads.cpu_time", src/hotspot/share/runtime/perfData.hpp line 64: > 62: COM_THREADS, > 63: SUN_THREADS, > 64: SUN_THREADS_CPUTIME, // Subsystem for Sun Threads CPU times Suggestion: JAVA_THREADS_CPUTIME, // Thread CPU time name spaces COM_THREADS_CPUTIME, SUN_THREADS_CPUTIME, ------------- Changes requested by sjohanss (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15082#pullrequestreview-1750253595 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1406050140 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1406047123 From stefank at openjdk.org Mon Nov 27 12:58:56 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 27 Nov 2023 12:58:56 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v7] In-Reply-To: References: Message-ID: <54eLI7PoGn3jHcWzniPASmXB0ZUsmxqwe3JRhkyU4bM=.f6ad0469-727c-4f4b-9dd7-334dd7233a9a@github.com> > In the rewrites made for: > [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` > > I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. > > The provided tests provoke this assert form: > * the JNI thread detach code > * thread dumping with locked monitors, and > * the JVMTI GetOwnedMonitorInfo API. > > While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. > > The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. > > For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. > > Test: the written tests with and without the fix. Tier1-Tier3, so far. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Tweaks to jtreg run comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16783/files - new: https://git.openjdk.org/jdk/pull/16783/files/d08a930e..234175d9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16783&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16783&range=05-06 Stats: 6 lines in 1 file changed: 3 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16783.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16783/head:pull/16783 PR: https://git.openjdk.org/jdk/pull/16783 From stuefe at openjdk.org Mon Nov 27 13:26:23 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 27 Nov 2023 13:26:23 GMT Subject: RFR: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX [v5] In-Reply-To: References: Message-ID: On Fri, 15 Sep 2023 07:22:32 GMT, Joachim Kern wrote: >> After push of [JDK-8307478](https://bugs.openjdk.org/browse/JDK-8307478) , the following test started to fail on AIX : >> com/sun/tools/attach/warnings/DynamicLoadWarningTest.java; >> The problem was described in [JDK-8309549](https://bugs.openjdk.org/browse/JDK-8309549) with a first try of a fix. >> A second fix via [JDK-8310191](https://bugs.openjdk.org/browse/JDK-8310191) was necessary. >> Both fixes just disable the specific subtest on AIX, without correction of the root cause. >> The root cause is, that dlopen() on AIX returns different handles every time, even if you load a library twice. There is no official AIX API available to get this information on a different way. >> My proposal is, to use the stat64x API with the fields st_device and st_inode. After a dlopen() the stat64x() API is called additionally to get this information which is then stored parallel to the library handle in the jvmtiAgent. For AIX we then can compare these values instead of the library handle and get the same functionality as on linux. > > Joachim Kern has updated the pull request incrementally with one additional commit since the last revision: > > adopt types This now causes problems with https://github.com/openjdk/jdk/pull/16604#issuecomment-1827661214 since it removes the possibility of silently alternating the file path inside os::dll_load, which would be the preferred way for AIX to handle *.a shared objects. So this causes even more ifdef AIX to bloom up everywhere. A much better solution would have been to mimic stable-handle behavior inside the AIX version of `os::dll_load`. Proposal for an alternate solution: Hold dlhandle-to-(inode, devid)tuple mappings in a hash table. On dlopen, look up dl-handle by (inode, filename) tupel. On dlclose, remove entry. Could have been done inside os_aix.cpp without any changes to shared coding, and would have provided complete coverage for all users of dll_load. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15583#issuecomment-1827826924 From mli at openjdk.org Mon Nov 27 13:29:33 2023 From: mli at openjdk.org (Hamlin Li) Date: Mon, 27 Nov 2023 13:29:33 GMT Subject: RFR: 8318227: RISC-V: C2 ConvHF2F [v2] In-Reply-To: References: Message-ID: > Hi, > Can you review the patch to add ConvHF2F intrinsic to JDK for riscv? > Thanks! > > (By latest kernel patch, `#define RISCV_HWPROBE_EXT_ZFH (1 << 27)` > https://lore.kernel.org/lkml/20231114141256.126749-11-cleger at rivosinc.com/) > > ## Test > ### Functionality > #### hotspot tests > test/hotspot/jtreg/compiler/intrinsics/ > test/hotspot/jtreg/compiler/c2/irTests > > #### jdk tests > test/jdk/java/lang/Float/Binary16Conversion*.java > > ### Performance > tested on licheepi. > > #### with UseZfh enabled & stub out-of-band > > Benchmark (size) Mode Cnt Score Error Units > Fp16ConversionBenchmark.float16ToFloat 2048 avgt 10 3493.376 ? 18.631 ns/op > Fp16ConversionBenchmark.float16ToFloatMemory 2048 avgt 10 19.819 ? 0.193 ns/op > > > #### with UseZfh enabled only > (i.e. enable the intrinsic) > > Benchmark (size) Mode Cnt Score Error Units > Fp16ConversionBenchmark.float16ToFloat 2048 avgt 10 4659.796 ? 13.262 ns/op > Fp16ConversionBenchmark.float16ToFloatMemory 2048 avgt 10 22.957 ? 0.098 ns/op > > > #### with UseZfh disabled > (i.e. disable the intrinsic) > > Benchmark (size) Mode Cnt Score Error Units > Fp16ConversionBenchmark.float16ToFloat 2048 avgt 10 22930.591 ? 72.595 ns/op > Fp16ConversionBenchmark.float16ToFloatMemory 2048 avgt 10 25.970 ? 0.063 ns/op Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: optimize perf with stub out-of-line ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16802/files - new: https://git.openjdk.org/jdk/pull/16802/files/b0baca67..db50b68a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16802&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16802&range=00-01 Stats: 33 lines in 1 file changed: 20 ins; 10 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16802.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16802/head:pull/16802 PR: https://git.openjdk.org/jdk/pull/16802 From mli at openjdk.org Mon Nov 27 13:29:33 2023 From: mli at openjdk.org (Hamlin Li) Date: Mon, 27 Nov 2023 13:29:33 GMT Subject: RFR: 8318227: RISC-V: C2 ConvHF2F [v2] In-Reply-To: References: Message-ID: On Sat, 25 Nov 2023 02:22:02 GMT, Quan Anh Mai wrote: >> You can take a look at x86 implementation of ConvF2I node which takes advantages of a general stub mechanism in C2. > > https://github.com/openjdk/jdk/blob/6aa197667ad05bd93adf3afc7b06adbfb2b18a22/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L4307 > > Note that the stub will still reside in the code section of the current method, is a trampoline needed in that case? Thanks for pointing to the location! It DOES bring better performance. Please check the pr description for detailed data. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16802#discussion_r1406153790 From evergizova at openjdk.org Mon Nov 27 13:41:25 2023 From: evergizova at openjdk.org (Ekaterina Vergizova) Date: Mon, 27 Nov 2023 13:41:25 GMT Subject: Integrated: 8314220: Configurable InlineCacheBuffer size In-Reply-To: References: Message-ID: On Mon, 14 Aug 2023 13:12:16 GMT, Ekaterina Vergizova wrote: > InlineCacheBuffer size is currently hardcoded to 10K. > This can lead to multiple ICBufferFull safepoints for InlineCacheBuffer cleanup and possible performance degradation. > > Added experimental command line option InlineCacheBufferSize with the same default value, allowing it to be configured for performance experiments with ICBufferFull safepoints frequency. This pull request has now been integrated. Changeset: a40d8d97 Author: Ekaterina Vergizova Committer: Yuri Nesterenko URL: https://git.openjdk.org/jdk/commit/a40d8d97e84d88d1a65aba81bfc09339be95e427 Stats: 11 lines in 4 files changed: 8 ins; 2 del; 1 mod 8314220: Configurable InlineCacheBuffer size Reviewed-by: dlong, kvn ------------- PR: https://git.openjdk.org/jdk/pull/15271 From jkern at openjdk.org Mon Nov 27 14:37:21 2023 From: jkern at openjdk.org (Joachim Kern) Date: Mon, 27 Nov 2023 14:37:21 GMT Subject: RFR: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX [v5] In-Reply-To: References: Message-ID: <3I2y1CeVpK4RVlarzGMOnATCAB0biF_ifEM9M5PZf2E=.31d15983-0199-4885-af64-1b57b5d11392@github.com> On Mon, 27 Nov 2023 13:23:42 GMT, Thomas Stuefe wrote: >> Joachim Kern has updated the pull request incrementally with one additional commit since the last revision: >> >> adopt types > > This now causes problems with > > https://github.com/openjdk/jdk/pull/16604#issuecomment-1827661214 > > since it removes the possibility of silently alternating the file path inside os::dll_load, which would be the preferred way for AIX to handle *.a shared objects. So this causes even more ifdef AIX to bloom up everywhere. > > A much better solution would have been to mimic stable-handle behavior inside the AIX version of `os::dll_load`. > > Proposal for an alternate solution: Hold dlhandle-to-(inode, devid)tuple mappings in a hash table. On dlopen, look up dl-handle by (inode, filename) tupel. On dlclose, remove entry. Could have been done inside os_aix.cpp without any changes to shared coding, and would have provided complete coverage for all users of dll_load. @tstuefe: Hi Thomas, I'm not sure if I got it. We can make (inode, devid) to a hash, which replaces the dlhandle on return of os::dlload. This hash would of course be the same if the same library is loaded twice. But I do not know how to handle the two real dlhandles. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15583#issuecomment-1827954482 From iwalulya at openjdk.org Mon Nov 27 14:59:13 2023 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 27 Nov 2023 14:59:13 GMT Subject: RFR: 8317809: Insertion of free code blobs into code cache can be very slow during class unloading [v3] In-Reply-To: References: <_dcFF70_w7IXSjb6w-HuHCBkPyS3a6NlzejtqdfdYnM=.74e0f9df-eca3-49b0-be68-2d5824c16003@github.com> Message-ID: On Fri, 24 Nov 2023 09:18:25 GMT, Thomas Schatzl wrote: >> Insert code blobs in a sorted fashion to exploit the finger-optimization when adding, making this procedure O(n) instead of O(n^2) >> >> Introduces a globally available ClassUnloadingContext that contains common methods pertaining to class and code unloading. GCs may use it to efficiently manage unlinked class loader datas and nmethods to allow use of common methods (unlink/merge). >> >> The steps typically are registering a new to be unlinked CLD/nmethod, and then purge its memory later. STW collectors perform this work in one big chunk taking the CodeCache_lock, for the entire duration, while concurrent collectors lock/unlock for every insertion to allow for concurrent users for the lock to progress. >> >> Some care has been taken to stay consistent with an "unloading = unlinking + purge" scheme; however particularly the existing CLD handling API (still) mixes unlinking and purging in its CLD::unload() call. To simplify this change that is mostly geared towards separating nmethod unlinking from purging, to make code blob freeing O(n) instead of O(n^2). >> >> Upcoming changes will >> * separate nmethod unregistering from nmethod purging to allow doing that in bulk (for the STW collectors); that can significantly reduce code purging time for the STW collectors. >> * better name the second stage of unlinking (called "cleaning" throughout, e.g. the work done in `G1CollectedHeap::complete_cleaning`) >> * untangle CLD unlinking and what's called "cleaning" now to allow moving more stuff into the second unlinking stage for better parallelism >> * G1: move some significant tasks from the remark pause to concurrent (unregistering nmethods, freeing code blobs and cld/metaspace purging) >> * Maybe move Serial/Parallel GC metaspace purging closer to other unlinking/purging code to keep things local and allow easier logging. >> >> These are the reason for the class hierarchy for `ClassUnloadingContext`: the goal is to ultimately have about this phasing (for G1): >> 1. collect all dead CLDs, using the `register_unloading_class_loader_data` method *only* >> 2. parallelize the stuff in `ClassLoaderData::unload()` in one way or another, adding them to the `complete_cleaning` (parallel) phase. >> 3. `purge_nmethods`, `free_code_blobs` and the `remove_unlinked_nmethods_from_code_root_set` (from JDK-8317007) will be concurrent. >> >> Particularly the split of `SystemDictionary::do_unloading` into "only" traversing the CLDs to find the dead ones and then in parallel process them in 2. a... > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into mergeme > - iwalulya review, naming > - 8317809 Insert code blobs in a sorted fashion to exploit the finger-optimization when adding, making this procedure O(n) instead of O(n^2) > > Introduce a globally available ClassUnloadingContext that contains common methods pertaining to class and code unloading. > GCs may use it to efficiently manage unlinked class loader datas and nmethods to allow use of common methods (unlink/merge). > > The steps typically are registering a new to be unlinked CLD/nmethod, and then purge its memory later. STW collectors perform > this work in one big chunk taking the CodeCache_lock, for the entire duration, while concurrent collectors lock/unlock for every > insertion to allow for concurrent users for the lock to progress. > > Some care has been taken to stay consistent with an "unloading = unlinking + purge" scheme; however particularly the existing > CLD handling API (still) mixes unlinking and purging in its CLD::unload() call. To simplify this change that is mostly geared > towards separating nmethod unlinking from purging, to make code blob freeing O(n) instead of O(n^2). > > Upcoming changes will > * separate nmethod unregistering from nmethod purging to allow doing that in bulk (for the STW collectors); that can significantly > reduce code purging time for the STW collectors. > * better name the second stage of unlinking (called "cleaning" throughout, e.g. the work done in `G1CollectedHeap::complete_cleaning`) > * untangle CLD unlinking and what's called "cleaning" now to allow moving more stuff into the second unlinking stage for better > parallelism > * G1: move some signifcant tasks from the remark pause to concurrent (unregistering nmethods, freeing code blobs and cld/metaspace purging) > * Maybe move Serial/Parallel GC metaspace purging closer to other unlinking/purging code to keep things local and allow easier logging. > - Only run test case on debug VMs, sufficient > - 8320331 g1 full gc "during" verification accesses half-unloaded metadata Changes requested by iwalulya (Reviewer). src/hotspot/share/gc/shared/classUnloadingContext.cpp line 138: > 136: NMethodSet* nmethod_set = nullptr; > 137: > 138: bool is_parallel = _num_nmethod_unlink_workers != 1; Suggestion: bool is_parallel = _num_nmethod_unlink_workers > 1; Seems more intuitive. src/hotspot/share/gc/shared/classUnloadingContext.hpp line 66: > 64: ClassLoaderData* volatile _cld_head; > 65: > 66: uint _num_nmethod_unlink_workers; probably better to set `_num_nmethod_unlink_workers;` as const, the destructor depends on this being const through the lifetime of the object. src/hotspot/share/gc/shared/classUnloadingContext.hpp line 95: > 93: }; > 94: > 95: #endif // SHARE_GC_SHARED_CLASSUNLOADINGCONTEXT_HPP trailing space warning* ------------- PR Review: https://git.openjdk.org/jdk/pull/16759#pullrequestreview-1750594239 PR Review Comment: https://git.openjdk.org/jdk/pull/16759#discussion_r1406283555 PR Review Comment: https://git.openjdk.org/jdk/pull/16759#discussion_r1406260568 PR Review Comment: https://git.openjdk.org/jdk/pull/16759#discussion_r1406268113 From ayang at openjdk.org Mon Nov 27 14:59:17 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 27 Nov 2023 14:59:17 GMT Subject: RFR: 8317809: Insertion of free code blobs into code cache can be very slow during class unloading [v3] In-Reply-To: References: <_dcFF70_w7IXSjb6w-HuHCBkPyS3a6NlzejtqdfdYnM=.74e0f9df-eca3-49b0-be68-2d5824c16003@github.com> Message-ID: On Fri, 24 Nov 2023 09:18:25 GMT, Thomas Schatzl wrote: >> Insert code blobs in a sorted fashion to exploit the finger-optimization when adding, making this procedure O(n) instead of O(n^2) >> >> Introduces a globally available ClassUnloadingContext that contains common methods pertaining to class and code unloading. GCs may use it to efficiently manage unlinked class loader datas and nmethods to allow use of common methods (unlink/merge). >> >> The steps typically are registering a new to be unlinked CLD/nmethod, and then purge its memory later. STW collectors perform this work in one big chunk taking the CodeCache_lock, for the entire duration, while concurrent collectors lock/unlock for every insertion to allow for concurrent users for the lock to progress. >> >> Some care has been taken to stay consistent with an "unloading = unlinking + purge" scheme; however particularly the existing CLD handling API (still) mixes unlinking and purging in its CLD::unload() call. To simplify this change that is mostly geared towards separating nmethod unlinking from purging, to make code blob freeing O(n) instead of O(n^2). >> >> Upcoming changes will >> * separate nmethod unregistering from nmethod purging to allow doing that in bulk (for the STW collectors); that can significantly reduce code purging time for the STW collectors. >> * better name the second stage of unlinking (called "cleaning" throughout, e.g. the work done in `G1CollectedHeap::complete_cleaning`) >> * untangle CLD unlinking and what's called "cleaning" now to allow moving more stuff into the second unlinking stage for better parallelism >> * G1: move some significant tasks from the remark pause to concurrent (unregistering nmethods, freeing code blobs and cld/metaspace purging) >> * Maybe move Serial/Parallel GC metaspace purging closer to other unlinking/purging code to keep things local and allow easier logging. >> >> These are the reason for the class hierarchy for `ClassUnloadingContext`: the goal is to ultimately have about this phasing (for G1): >> 1. collect all dead CLDs, using the `register_unloading_class_loader_data` method *only* >> 2. parallelize the stuff in `ClassLoaderData::unload()` in one way or another, adding them to the `complete_cleaning` (parallel) phase. >> 3. `purge_nmethods`, `free_code_blobs` and the `remove_unlinked_nmethods_from_code_root_set` (from JDK-8317007) will be concurrent. >> >> Particularly the split of `SystemDictionary::do_unloading` into "only" traversing the CLDs to find the dead ones and then in parallel process them in 2. a... > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Merge branch 'master' into mergeme > - iwalulya review, naming > - 8317809 Insert code blobs in a sorted fashion to exploit the finger-optimization when adding, making this procedure O(n) instead of O(n^2) > > Introduce a globally available ClassUnloadingContext that contains common methods pertaining to class and code unloading. > GCs may use it to efficiently manage unlinked class loader datas and nmethods to allow use of common methods (unlink/merge). > > The steps typically are registering a new to be unlinked CLD/nmethod, and then purge its memory later. STW collectors perform > this work in one big chunk taking the CodeCache_lock, for the entire duration, while concurrent collectors lock/unlock for every > insertion to allow for concurrent users for the lock to progress. > > Some care has been taken to stay consistent with an "unloading = unlinking + purge" scheme; however particularly the existing > CLD handling API (still) mixes unlinking and purging in its CLD::unload() call. To simplify this change that is mostly geared > towards separating nmethod unlinking from purging, to make code blob freeing O(n) instead of O(n^2). > > Upcoming changes will > * separate nmethod unregistering from nmethod purging to allow doing that in bulk (for the STW collectors); that can significantly > reduce code purging time for the STW collectors. > * better name the second stage of unlinking (called "cleaning" throughout, e.g. the work done in `G1CollectedHeap::complete_cleaning`) > * untangle CLD unlinking and what's called "cleaning" now to allow moving more stuff into the second unlinking stage for better > parallelism > * G1: move some signifcant tasks from the remark pause to concurrent (unregistering nmethods, freeing code blobs and cld/metaspace purging) > * Maybe move Serial/Parallel GC metaspace purging closer to other unlinking/purging code to keep things local and allow easier logging. > - Only run test case on debug VMs, sufficient > - 8320331 g1 full gc "during" verification accesses half-unloaded metadata src/hotspot/share/gc/shared/classUnloadingContext.cpp line 120: > 118: for (uint i = 0; i < _num_nmethod_unlink_workers; ++i) { > 119: NMethodSet* set = _unlinked_nmethods[i]; > 120: for (int j = 0; j < set->length(); ++j) { I wonder if one can use range-based for loop here. src/hotspot/share/gc/shared/classUnloadingContext.cpp line 171: > 169: ConditionalMutexLocker ml_inner(CodeCache_lock, _lock_codeblob_free_separately, Mutex::_no_safepoint_check_flag); > 170: CodeCache::free(nmethod_set->at(i)); > 171: } I feel it would be clearer if the for-loop is duplicated to handle either case separately. src/hotspot/share/gc/shared/classUnloadingContext.hpp line 63: > 61: }; > 62: > 63: class DefaultClassUnloadingContext : public ClassUnloadingContext { I don't understand why they need to be two classes, even after reading "These are the reason for the class hierarchy for...". The reference to future/other PR(s) in the description doesn't really help -- it's unclear what is *necessary* for the current PR and what is preparation for future PR(s). src/hotspot/share/gc/shared/classUnloadingContext.hpp line 95: > 93: }; > 94: > 95: #endif // SHARE_GC_SHARED_CLASSUNLOADINGCONTEXT_HPP Seems missing a newline. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16759#discussion_r1406262742 PR Review Comment: https://git.openjdk.org/jdk/pull/16759#discussion_r1406264755 PR Review Comment: https://git.openjdk.org/jdk/pull/16759#discussion_r1406282829 PR Review Comment: https://git.openjdk.org/jdk/pull/16759#discussion_r1406283473 From ihse at openjdk.org Mon Nov 27 15:02:13 2023 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 27 Nov 2023 15:02:13 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: References: <3dscjw75an6_n3ko_2LjHdpdj08xsyjhfpStuZrS5-M=.f43592fe-036e-47b5-babc-67f126885839@github.com> <17T7DXMTf1rFVeQY_FWJx0DETVYlBWceJO4lltWZyw0=.847b02a3-ef40-464a-80b7-d62bd9dbc2b5@github.com> Message-ID: <3g_PCAyCoS9IB9PIulE0sXzT8KqvxeNbzdgfbElT72E=.60fa3790-4894-463a-b192-1b75ff42d24b@github.com> On Mon, 27 Nov 2023 10:28:45 GMT, Andrew Haley wrote: >> We have to use this c-compiler option to build out the SVE ABIs (e.g. `svfloat32_t sinfx_u10sve(svfloat32_t input)`) in `libvmath.so`. Without this option, at build time, all the sve related featues like `arm_sve.h / __ARM_FEATURE_SVE` are missing, together with the sve symbols in `libvmath.so` we needed at runtime. That means at runtime, hotspot cannot find out the sve symbols and the vector operations will fall back to the default java implementation. > > That's fine, but we must make sure that SVE is not used by the compiler in any other places. If you've already built on non-SVE and tested the result on both SVE and non-SVE, I'm happy. You still need to separate out the SVE detection from the libsleef code, and provide a way to enable/disable it from the configure command line. It is not okay to auto-detect if features should be turned on or off by default, but it should always be possible to override. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1406291471 From stuefe at openjdk.org Mon Nov 27 15:11:26 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 27 Nov 2023 15:11:26 GMT Subject: RFR: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX [v5] In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 13:23:42 GMT, Thomas Stuefe wrote: >> Joachim Kern has updated the pull request incrementally with one additional commit since the last revision: >> >> adopt types > > This now causes problems with > > https://github.com/openjdk/jdk/pull/16604#issuecomment-1827661214 > > since it removes the possibility of silently alternating the file path inside os::dll_load, which would be the preferred way for AIX to handle *.a shared objects. So this causes even more ifdef AIX to bloom up everywhere. > > A much better solution would have been to mimic stable-handle behavior inside the AIX version of `os::dll_load`. > > Proposal for an alternate solution: Hold dlhandle-to-(inode, devid)tuple mappings in a hash table. On dlopen, look up dl-handle by (inode, filename) tupel. On dlclose, remove entry. Could have been done inside os_aix.cpp without any changes to shared coding, and would have provided complete coverage for all users of dll_load. > @tstuefe: Hi Thomas, I'm not sure if I got it. We can make (inode, devid) to a hash, which replaces the dlhandle on return of os::dlload. This hash would of course be the same if the same library is loaded twice. But I do not know how to handle the two real dlhandles. Why do you need two dlhandles? A handle returned from dlopen should be valid for the whole process. If you cache that in a hashmap, and for the second caller of os::dll_load() return the cached variant, this should work, no? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15583#issuecomment-1828019003 From aph at openjdk.org Mon Nov 27 15:14:15 2023 From: aph at openjdk.org (Andrew Haley) Date: Mon, 27 Nov 2023 15:14:15 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: <3g_PCAyCoS9IB9PIulE0sXzT8KqvxeNbzdgfbElT72E=.60fa3790-4894-463a-b192-1b75ff42d24b@github.com> References: <3dscjw75an6_n3ko_2LjHdpdj08xsyjhfpStuZrS5-M=.f43592fe-036e-47b5-babc-67f126885839@github.com> <17T7DXMTf1rFVeQY_FWJx0DETVYlBWceJO4lltWZyw0=.847b02a3-ef40-464a-80b7-d62bd9dbc2b5@github.com> <3g_PCAyCoS9IB9PIulE0sXzT8KqvxeNbzdgfbElT72E=.60fa3790-4894-463a-b192-1b75ff42d24b@github.com> Message-ID: <3mhEtXP50-Lkn4iKkMGyCY4RJEcFbaTXyCUN3eoOH7M=.54337aac-d295-4fd8-884b-8dba5c4a68f6@github.com> On Mon, 27 Nov 2023 14:59:23 GMT, Magnus Ihse Bursie wrote: > You still need to separate out the SVE detection from the libsleef code, and provide a way to enable/disable it from the configure command line. Why? I don't think this should be a build-time option at all, because it puts the people who build binaries in an impossible position. Can't this all be built unconditionally, with run-time feature detection? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1406310998 From duke at openjdk.org Mon Nov 27 15:18:13 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 27 Nov 2023 15:18:13 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v7] In-Reply-To: <_GqHkr9u5z5IUCdULqmzh4auE22716Wiavxo-03PkQA=.015ba45e-6715-40f1-8863-345c6dc9e9d9@github.com> References: <_GqHkr9u5z5IUCdULqmzh4auE22716Wiavxo-03PkQA=.015ba45e-6715-40f1-8863-345c6dc9e9d9@github.com> Message-ID: <1fVxLdS6TZaPWxEY7bUSGCe2p9goBgFYKc6hOG1KLko=.3bc66f65-b71c-4d06-a1c6-72ff4afc1803@github.com> On Mon, 27 Nov 2023 05:46:26 GMT, Jatin Bhateja wrote: > There are few other usages of variable blends in following methods, please call new macro assembly routines in its place. C2_MacroAssembler::vector_cast_double_to_int_special_cases_avx C2_MacroAssembler::vector_count_leading_zeros_int_avx C2_MacroAssembler::vector_cast_float_to_int_special_cases_avx I considered it. To change it however, I need to understand what each of those functions are doing, write (of find) a test case and verify that the test case is indeed verifying my change (i.e. both functional assembly logs and performance benchmark).. (It looks like the new macro assembler should be 'completely safe' but.. just looking at this PR, I would rather be thorough) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16716#issuecomment-1828031522 From duke at openjdk.org Mon Nov 27 15:22:17 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 27 Nov 2023 15:22:17 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v7] In-Reply-To: <_GqHkr9u5z5IUCdULqmzh4auE22716Wiavxo-03PkQA=.015ba45e-6715-40f1-8863-345c6dc9e9d9@github.com> References: <_GqHkr9u5z5IUCdULqmzh4auE22716Wiavxo-03PkQA=.015ba45e-6715-40f1-8863-345c6dc9e9d9@github.com> Message-ID: On Mon, 27 Nov 2023 05:47:55 GMT, Jatin Bhateja wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> whitespace again > > src/hotspot/cpu/x86/x86.ad line 7821: > >> 7819: >> 7820: instruct vblendvpFD(legVec dst, legVec src1, legVec src2, legVec mask) %{ >> 7821: predicate(UseAVX > 0 && !EnableX86ECoreOpts && > > Why do you not call newly added macro assembly routine to emulate vblendvps from instruction encoding, we already have EnableX86ECoreOpts checks within it. I would have to change the effect to give myself a temp for the scratch register.. Seems safer not to change existing behaviour. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1406321929 From jkern at openjdk.org Mon Nov 27 15:22:29 2023 From: jkern at openjdk.org (Joachim Kern) Date: Mon, 27 Nov 2023 15:22:29 GMT Subject: RFR: JDK-8315706: com/sun/tools/attach/warnings/DynamicLoadWarningTest.java real fix for failure on AIX [v5] In-Reply-To: References: Message-ID: On Fri, 15 Sep 2023 07:22:32 GMT, Joachim Kern wrote: >> After push of [JDK-8307478](https://bugs.openjdk.org/browse/JDK-8307478) , the following test started to fail on AIX : >> com/sun/tools/attach/warnings/DynamicLoadWarningTest.java; >> The problem was described in [JDK-8309549](https://bugs.openjdk.org/browse/JDK-8309549) with a first try of a fix. >> A second fix via [JDK-8310191](https://bugs.openjdk.org/browse/JDK-8310191) was necessary. >> Both fixes just disable the specific subtest on AIX, without correction of the root cause. >> The root cause is, that dlopen() on AIX returns different handles every time, even if you load a library twice. There is no official AIX API available to get this information on a different way. >> My proposal is, to use the stat64x API with the fields st_device and st_inode. After a dlopen() the stat64x() API is called additionally to get this information which is then stored parallel to the library handle in the jvmtiAgent. For AIX we then can compare these values instead of the library handle and get the same functionality as on linux. > > Joachim Kern has updated the pull request incrementally with one additional commit since the last revision: > > adopt types If you have time, please call me for a short discussion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15583#issuecomment-1828040131 From ihse at openjdk.org Mon Nov 27 15:25:12 2023 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 27 Nov 2023 15:25:12 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: <3mhEtXP50-Lkn4iKkMGyCY4RJEcFbaTXyCUN3eoOH7M=.54337aac-d295-4fd8-884b-8dba5c4a68f6@github.com> References: <3dscjw75an6_n3ko_2LjHdpdj08xsyjhfpStuZrS5-M=.f43592fe-036e-47b5-babc-67f126885839@github.com> <17T7DXMTf1rFVeQY_FWJx0DETVYlBWceJO4lltWZyw0=.847b02a3-ef40-464a-80b7-d62bd9dbc2b5@github.com> <3g_PCAyCoS9IB9PIulE0sXzT8KqvxeNbzdgfbElT72E=.60fa3790-4894-463a-b192-1b75ff42d24b@github.com> <3mhEtXP50-Lkn4iKkMGyCY4RJEcFbaTXyCUN3eoOH7M=.54337aac-d295-4fd8-884b-8dba5c4a68f6@github.com> Message-ID: On Mon, 27 Nov 2023 15:11:32 GMT, Andrew Haley wrote: >> You still need to separate out the SVE detection from the libsleef code, and provide a way to enable/disable it from the configure command line. It is not okay to auto-detect if features should be turned on or off by default, but it should always be possible to override. > >> You still need to separate out the SVE detection from the libsleef code, and provide a way to enable/disable it from the configure command line. > > Why? I don't think this should be a build-time option at all, because it puts the people who build binaries in an impossible position. Can't this all be built unconditionally, with run-time feature detection? Apparently the situation is this: If your build machine happens to have SVE, then you will get SVE support in the vmath library. The SVE support will be used during runtime if the machine you run on has SVE support. If your build host happens to to not have SVE, then the vmath library will be built without SVE support, and no matter if your runtime machine has SVE or not, it will not provide SVE support in the vmath library. Now, if your CI farm has an arbitrarily selection of aarch64 machines with and without SVE, then you have no idea what you are going to get in your build. We have been very careful in staying clear of this kind of "random" build system behavior. The system you build on should not affect the output -- at least, not without a chance to override the default value. In fact, I am not even sure why it seems to the PR author to be a good idea to let the default be dependent on the build machine at all. My personal opinion is that it would be better to select either "SVE enabled" or "SVE disabled" as the default, and then let the user override this on the configure command line, if they target a platform with different SVE availability. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1406326711 From stuefe at openjdk.org Mon Nov 27 15:41:07 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 27 Nov 2023 15:41:07 GMT Subject: RFR: JDK-8319437: NMT should show library names in call stacks In-Reply-To: References: Message-ID: On Sun, 5 Nov 2023 06:28:11 GMT, Thomas Stuefe wrote: > With this tiny enhancement, NMT shows library names in callstacks. Friendly ping. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16508#issuecomment-1828074404 From jbhateja at openjdk.org Mon Nov 27 15:43:13 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 27 Nov 2023 15:43:13 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v7] In-Reply-To: References: <_GqHkr9u5z5IUCdULqmzh4auE22716Wiavxo-03PkQA=.015ba45e-6715-40f1-8863-345c6dc9e9d9@github.com> Message-ID: On Mon, 27 Nov 2023 15:19:09 GMT, Volodymyr Paprotski wrote: >> src/hotspot/cpu/x86/x86.ad line 7821: >> >>> 7819: >>> 7820: instruct vblendvpFD(legVec dst, legVec src1, legVec src2, legVec mask) %{ >>> 7821: predicate(UseAVX > 0 && !EnableX86ECoreOpts && >> >> Why do you not call newly added macro assembly routine to emulate vblendvps from instruction encoding, we already have EnableX86ECoreOpts checks within it. > > I would have to change the effect to give myself a temp for the scratch register.. Seems safer not to change existing behaviour. That should be fine, lets not penalize getting perf improvement due to additional temp. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1406356724 From duke at openjdk.org Mon Nov 27 15:48:07 2023 From: duke at openjdk.org (suchismith1993) Date: Mon, 27 Nov 2023 15:48:07 GMT Subject: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2] In-Reply-To: References: <8-buFPL9W3149qcnluk_XqTQr-cJYqu_XvwU5ovyAIA=.396e5005-f896-48b9-919c-94164229d7bf@github.com> <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> Message-ID: On Mon, 27 Nov 2023 11:36:02 GMT, suchismith1993 wrote: > > > i would have to repeat the line 1132 and 1139 in os_aix.cpp again , if the condition fails for .so files, because i have to reload it again and check if the .a exists. In the shared code i had repeat less number of lines i believe. Do you suggest moving lines 1132 to 1139 to another function then ? > > > > > > @tstuefe Any suggestion on this ? > > ``` > --- a/src/hotspot/os/aix/os_aix.cpp > +++ b/src/hotspot/os/aix/os_aix.cpp > @@ -1108,7 +1108,7 @@ bool os::dll_address_to_library_name(address addr, char* buf, > return true; > } > > -void *os::dll_load(const char *filename, char *ebuf, int ebuflen) { > +static void* dll_load_inner(const char *filename, char *ebuf, int ebuflen) { > > log_info(os)("attempting shared library load of %s", filename); > > @@ -1158,6 +1158,35 @@ void *os::dll_load(const char *filename, char *ebuf, int ebuflen) { > return nullptr; > } > > +void* os::dll_load(const char *filename, char *ebuf, int ebuflen) { > + > + void* result = nullptr; > + > + // First try using *.so suffix; failing that, retry with *.a suffix. > + const size_t len = strlen(filename); > + constexpr size_t safety = 3 + 1; > + constexpr size_t bufsize = len + safety; > + char* buf = NEW_C_HEAP_ARRAY(char, bufsize, mtInternal); > + strcpy(buf, filename); > + char* const dot = strrchr(buf, '.'); > + > + assert(dot != nullptr, "Attempting to load a shared object without extension? %s", filename); > + assert(strcmp(dot, ".a") == 0 || strcmp(dot, ".so") == 0, > + "Attempting to load a shared object that is neither *.so nor *.a", filename); > + > + sprintf(dot, ".so"); > + result = dll_load_inner(buf, ebuf, ebuflen); > + > + if (result == nullptr) { > + sprintf(dot, ".a"); > + result = dll_load_inner(buf, ebuf, ebuflen); > + } > + > + FREE_C_HEAP_ARRAY(char, buf); > + > + return result; > +} > + > ``` @tstuefe as discussed with @TheRealMDoerr do you think using default argument will help ? Either we pass agent object as 3rd parameter or an empty character buffer(and not const chat*) which would be spcifically used to copy the alternate filename to it using strcpy so that it is reflected in the jvmagent code ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1828086180 From jbhateja at openjdk.org Mon Nov 27 15:54:13 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 27 Nov 2023 15:54:13 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v7] In-Reply-To: <1fVxLdS6TZaPWxEY7bUSGCe2p9goBgFYKc6hOG1KLko=.3bc66f65-b71c-4d06-a1c6-72ff4afc1803@github.com> References: <_GqHkr9u5z5IUCdULqmzh4auE22716Wiavxo-03PkQA=.015ba45e-6715-40f1-8863-345c6dc9e9d9@github.com> <1fVxLdS6TZaPWxEY7bUSGCe2p9goBgFYKc6hOG1KLko=.3bc66f65-b71c-4d06-a1c6-72ff4afc1803@github.com> Message-ID: On Mon, 27 Nov 2023 15:15:04 GMT, Volodymyr Paprotski wrote: > > There are few other usages of variable blends in following methods, please call new macro assembly routines in its place. C2_MacroAssembler::vector_cast_double_to_int_special_cases_avx C2_MacroAssembler::vector_count_leading_zeros_int_avx C2_MacroAssembler::vector_cast_float_to_int_special_cases_avx > > I considered it. To change it however, I need to understand what each of those functions are doing, write (of find) a test case and verify that the test case is indeed verifying my change (i.e. both functional assembly logs and performance benchmark).. > > (It looks like the new macro assembler should be 'completely safe' but.. just looking at this PR, I would rather be thorough) We just need to replace all existing occurrences of variable blends with new macro assembly routine call, they already compute the mask, you will have to pass additional false argument. Since the patch emulates blends with special sequence for E-Cores lets the benefit also reach operations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16716#issuecomment-1828106455 From zgu at openjdk.org Mon Nov 27 16:25:18 2023 From: zgu at openjdk.org (Zhengyu Gu) Date: Mon, 27 Nov 2023 16:25:18 GMT Subject: RFR: JDK-8319437: NMT should show library names in call stacks In-Reply-To: References: Message-ID: On Sun, 5 Nov 2023 06:28:11 GMT, Thomas Stuefe wrote: > With this tiny enhancement, NMT shows library names in callstacks. LGTM ------------- Marked as reviewed by zgu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16508#pullrequestreview-1750848954 From aph at openjdk.org Mon Nov 27 16:46:16 2023 From: aph at openjdk.org (Andrew Haley) Date: Mon, 27 Nov 2023 16:46:16 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: References: <3dscjw75an6_n3ko_2LjHdpdj08xsyjhfpStuZrS5-M=.f43592fe-036e-47b5-babc-67f126885839@github.com> <17T7DXMTf1rFVeQY_FWJx0DETVYlBWceJO4lltWZyw0=.847b02a3-ef40-464a-80b7-d62bd9dbc2b5@github.com> <3g_PCAyCoS9IB9PIulE0sXzT8KqvxeNbzdgfbElT72E=.60fa3790-4894-463a-b192-1b75ff42d24b@github.com> <3mhEtXP50-Lkn4iKkMGyCY4RJEcFbaTXyCUN3eoOH7M=.54337aac-d295-4fd8-884b-8dba5c4a68f6@github.com> Message-ID: On Mon, 27 Nov 2023 15:22:32 GMT, Magnus Ihse Bursie wrote: > In fact, I am not even sure why it seems to the PR author to be a good idea to let the default be dependent on the build machine at all. My personal opinion is that it would be better to select either "SVE enabled" or "SVE disabled" as the default, and then let the user override this on the configure command line, if they target a platform with different SVE availability. SVE support should be enabled regardless of the build machine. The same binary must run on both SVE and non-SVE machines, using SVE if it is advantageous. I suppose some ancient C++ compilers without SVE support still exist, but I see no very good reason to support them building JDK 22+. Making a configure option to disable SVE support for vector math is a mistake, but IMO mostly harmless because no-one will ever turn it off. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1406450888 From dfenacci at openjdk.org Mon Nov 27 16:49:17 2023 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 27 Nov 2023 16:49:17 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v12] In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: On Wed, 22 Nov 2023 05:03:41 GMT, Roger Riggs wrote: >> Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. >> The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. >> >> The changes include: >> >> - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate >> if the input array is modified before the constructor returns. >> The resulting string may contain any combination of characters sampled from the input array. >> >> - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. >> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. >> If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. >> If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. >> Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. >> Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. >> >> - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. >> >> - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. > > Roger Riggs has updated the pull request incrementally with one additional commit since the last revision: > > Apply StringUTF16.coderFromArrayLen src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8538: > 8536: > 8537: testl(len, -64); > 8538: jcc(Assembler::zero, post_alignment); @cl4es since some of the `jcc` instructions have been changed into `jccb` in the rest of the intrinsic, I was wondering if it would make sense do the same for the rest of them (where the jump is short): Suggestion: jccb(Assembler::zero, post_alignment); src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8547: > 8545: // bail out when there is nothing to be done > 8546: testl(tmp5, 0xFFFFFFFF); > 8547: jcc(Assembler::zero, post_alignment); Suggestion: jccb(Assembler::zero, post_alignment); and here: https://github.com/openjdk/jdk/blob/d201344b631bf2cc9fb1990874fc7d42d523eeab/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L8574 src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8584: > 8582: evpcmpuw(mask1, tmp1Reg, tmp2Reg, Assembler::le, Assembler::AVX_512bit); > 8583: kortestdl(mask1, mask1); > 8584: jcc(Assembler::carryClear, reset_for_copy_tail); Suggestion: jccb(Assembler::carryClear, reset_for_copy_tail); and here: https://github.com/openjdk/jdk/blob/d201344b631bf2cc9fb1990874fc7d42d523eeab/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L8590 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1406147105 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1406147739 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1406178794 From duke at openjdk.org Mon Nov 27 17:09:20 2023 From: duke at openjdk.org (Yi-Fan Tsai) Date: Mon, 27 Nov 2023 17:09:20 GMT Subject: RFR: 8314029: Add file name parameter to Compiler.perfmap [v4] In-Reply-To: References: Message-ID: > `jcmd Compiler.perfmap` uses the hard-coded file name for a perf map: `/tmp/perf-%d.map`. This change adds an optional argument for specifying a file name. > > `jcmd PID help Compiler.perfmap` shows the following usage. > > > Compiler.perfmap > Write map file for Linux perf tool. > > Impact: Low > > Syntax : Compiler.perfmap [] > > Arguments: > filename : [optional] Name of the map file (STRING, no default value) > > > The man page of jcmd will be updated in a separate PR. Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: Fix PerfMapTest ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15871/files - new: https://git.openjdk.org/jdk/pull/15871/files/61d6f6f4..0ec20ea3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15871&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15871&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15871/head:pull/15871 PR: https://git.openjdk.org/jdk/pull/15871 From rriggs at openjdk.org Mon Nov 27 17:31:22 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Mon, 27 Nov 2023 17:31:22 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v12] In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: <4aeW-4xV_Fwl9bvrdDtW-TwRzDZ41JWiLQRXwkrwWP4=.4e426206-6912-4360-a37a-c2abc17bcc32@github.com> On Mon, 27 Nov 2023 13:43:52 GMT, Damon Fenacci wrote: >> Roger Riggs has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply StringUTF16.coderFromArrayLen > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8584: > >> 8582: evpcmpuw(mask1, tmp1Reg, tmp2Reg, Assembler::le, Assembler::AVX_512bit); >> 8583: kortestdl(mask1, mask1); >> 8584: jcc(Assembler::carryClear, reset_for_copy_tail); > > Suggestion: > > jccb(Assembler::carryClear, reset_for_copy_tail); > > > and here: > > https://github.com/openjdk/jdk/blob/d201344b631bf2cc9fb1990874fc7d42d523eeab/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L8590 Thanks for the suggestions to use byte offset branches. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1406506896 From duke at openjdk.org Mon Nov 27 17:40:10 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 27 Nov 2023 17:40:10 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v7] In-Reply-To: References: <_GqHkr9u5z5IUCdULqmzh4auE22716Wiavxo-03PkQA=.015ba45e-6715-40f1-8863-345c6dc9e9d9@github.com> <1fVxLdS6TZaPWxEY7bUSGCe2p9goBgFYKc6hOG1KLko=.3bc66f65-b71c-4d06-a1c6-72ff4afc1803@github.com> Message-ID: On Mon, 27 Nov 2023 15:51:45 GMT, Jatin Bhateja wrote: > > > There are few other usages of variable blends in following methods, please call new macro assembly routines in its place. C2_MacroAssembler::vector_cast_double_to_int_special_cases_avx C2_MacroAssembler::vector_count_leading_zeros_int_avx C2_MacroAssembler::vector_cast_float_to_int_special_cases_avx > > > > > > I considered it. To change it however, I need to understand what each of those functions are doing, write (of find) a test case and verify that the test case is indeed verifying my change (i.e. both functional assembly logs and performance benchmark).. > > (It looks like the new macro assembler should be 'completely safe' but.. just looking at this PR, I would rather be thorough) > > We need to replace all existing occurrences of variable blends with new macro assembly routine call, they already compute the mask, you will have to pass additional false argument. Since the patch emulates blends with special sequence for E-Cores lets the benefit also reach operations. Filed https://bugs.openjdk.org/browse/JDK-8320794 to follow up on rest of blend instances ------------- PR Comment: https://git.openjdk.org/jdk/pull/16716#issuecomment-1828314672 From ogillespie at openjdk.org Mon Nov 27 17:55:22 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Mon, 27 Nov 2023 17:55:22 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v10] In-Reply-To: References: <6UvVZPlnf_cSgctUVPspR767dHLa7vxOA4DiC7fJ3LY=.35e24443-ab4d-4240-a991-2d98d120bb3b@github.com> Message-ID: On Mon, 13 Nov 2023 01:12:29 GMT, Kim Barrett wrote: >> Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix indentation, rename test helper > > src/hotspot/share/oops/symbolHandle.hpp line 53: > >> 51: >> 52: public: >> 53: static constexpr uint CLEANUP_DELAY_MAX_ENTRIES = 128; > > This doesn't need to be public. It's used in the test, do you prefer another approach like friend class? > src/hotspot/share/oops/symbolHandle.hpp line 58: > >> 56: >> 57: // Conversion from a Symbol* to a SymbolHandleBase. >> 58: // Does not increment the current reference count if temporary. > > This comment is no longer true for temp symbols, since adding to the delay queue increments the refcount. Well spotted, thanks. > src/hotspot/share/oops/symbolHandle.hpp line 115: > >> 113: } >> 114: >> 115: static void drain_cleanup_delay_queue() { > > It's not obvious that draining the queue is useful. Unless there's a reason I'm missing, I suggest not doing so. I don't feel too strongly either way but someone else previously suggested draining during the periodic task so I added it. The benefit is not leaving Symbols hanging around in the queue indefinitely (though granted, a fixed number of them, so the memory waste is limited). The downside is a small piece of added code and work on the periodic task. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1406536331 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1406536288 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1406536415 From kdnilsen at openjdk.org Mon Nov 27 18:05:23 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Mon, 27 Nov 2023 18:05:23 GMT Subject: RFR: 8314220: Configurable InlineCacheBuffer size [v4] In-Reply-To: <0WB7f9VNThtrvvvopegzy51wIssRwuXpuiwdq5_2r8w=.82a3753c-7ac1-4d46-a6bc-6817cdedfd53@github.com> References: <0WB7f9VNThtrvvvopegzy51wIssRwuXpuiwdq5_2r8w=.82a3753c-7ac1-4d46-a6bc-6817cdedfd53@github.com> Message-ID: <1QCg-0cpECaLN5vh0h5c6nypcX1ge3wQML66AkRQzGY=.c7b5c3ae-59c3-4c37-befb-d68e478a7fce@github.com> On Tue, 21 Nov 2023 17:41:26 GMT, Ekaterina Vergizova wrote: >> InlineCacheBuffer size is currently hardcoded to 10K. >> This can lead to multiple ICBufferFull safepoints for InlineCacheBuffer cleanup and possible performance degradation. >> >> Added experimental command line option InlineCacheBufferSize with the same default value, allowing it to be configured for performance experiments with ICBufferFull safepoints frequency. > > Ekaterina Vergizova has updated the pull request incrementally with one additional commit since the last revision: > > Changed InlineCacheBufferSize limit Thanks for integrating this patch. Generational Shenandoah has also observed problems with this issue. For a user who is not intimately familiar with the internal workings of the JIT compilers, is there any advice as to what value we should set this to? I'm guessing the answer might relate to: 1. How often do we experience ICBufferFull safepoints in the absence of concurrent GC. Does this tell us anything? 2. What is the typical duration of a concurrent GC cycle? 3. Is there a recommendation for safety buffer? FWIW: We are aware of a service that consistently experiences roughly 30 ICBufferFull safepoints during the third concurrent GC following each restart (without your patch). Once we survive this startup storm, we do not experience any further problems. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15271#issuecomment-1828350944 From redestad at openjdk.org Mon Nov 27 18:08:19 2023 From: redestad at openjdk.org (Claes Redestad) Date: Mon, 27 Nov 2023 18:08:19 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v12] In-Reply-To: <4aeW-4xV_Fwl9bvrdDtW-TwRzDZ41JWiLQRXwkrwWP4=.4e426206-6912-4360-a37a-c2abc17bcc32@github.com> References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> <4aeW-4xV_Fwl9bvrdDtW-TwRzDZ41JWiLQRXwkrwWP4=.4e426206-6912-4360-a37a-c2abc17bcc32@github.com> Message-ID: On Mon, 27 Nov 2023 17:28:34 GMT, Roger Riggs wrote: >> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8584: >> >>> 8582: evpcmpuw(mask1, tmp1Reg, tmp2Reg, Assembler::le, Assembler::AVX_512bit); >>> 8583: kortestdl(mask1, mask1); >>> 8584: jcc(Assembler::carryClear, reset_for_copy_tail); >> >> Suggestion: >> >> jccb(Assembler::carryClear, reset_for_copy_tail); >> >> >> and here: >> >> https://github.com/openjdk/jdk/blob/d201344b631bf2cc9fb1990874fc7d42d523eeab/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L8590 > > Thanks for the suggestions to use byte offset branches. Seems reasonable. AFAICT these suggestions are all in the AVX-512-code that is soft disabled by the need to supply `-XX:AVX3Threshold=0`. We don't do any systematic performance testing of those (maybe we should?), so some manual verification is necessary. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1406554229 From ogillespie at openjdk.org Mon Nov 27 18:09:04 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Mon, 27 Nov 2023 18:09:04 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v11] In-Reply-To: References: Message-ID: > Attempt to fix regressions in class-loading performance caused by fixing a symbol table leak in [JDK-8313678](https://bugs.openjdk.org/browse/JDK-8313678). > > See lengthy discussion in https://bugs.openjdk.org/browse/JDK-8315559 for more background. In short, the leak was providing an accidental cache for temporary symbols, allowing reuse. > > This change keeps new temporary symbols alive in a queue for a short time, allowing them to be re-used by subsequent operations. For example, when attempting to load a class we may call JVM_FindLoadedClass for multiple classloaders back-to-back, and all of them will create a TempNewSymbol for the same string. At present, each call will leave a dead symbol in the table and create a new one. Dead symbols add cleanup and lookup overhead, and insertion is also costly. With this change, the symbol from the first invocation will live for some time after it is used, and subsequent callers can find the symbol alive in the table - avoiding the extra work. > > The queue is bounded, and when full new entries displace the oldest entry. This means symbols are held for the time it takes for 100 new temp symbols to be created. 100 is chosen arbitrarily - the tradeoff is memory usage versus 'cache' hit rate. > > When concurrent symbol table cleanup runs, it also drains the queue. > > In my testing, this brings Dacapo pmd performance back to where it was before the leak was fixed. > > Thanks @shipilev , @coleenp and @MDBijman for helping with this fix. Oli Gillespie has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Implement suggestions - Merge remote-tracking branch 'origin/master' into keepalive2 - Fix indentation, rename test helper - Remove trailing whitespace - Remove is_enabled check, use modulo shortcut, add drain test - Set queue size to power of 2, use constant in test - Add missing atomic.hpp include - Switch to a ringbuffer instead of NonblockingQueue - Adress comments Fix indentation Improve tests Improve comment Remove redundant null check Improve naming Pop when >, not >= max len - Fix failing tests TempNewSymbol now increments refcount again, messing with the expectations. This is not a complete fix, I will have to read the individual cases and make sure they are correct. - ... and 2 more: https://git.openjdk.org/jdk/compare/18602c9b...d83ea056 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16398/files - new: https://git.openjdk.org/jdk/pull/16398/files/6e06f007..d83ea056 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16398&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16398&range=09-10 Stats: 861164 lines in 6978 files changed: 207725 ins; 528860 del; 124579 mod Patch: https://git.openjdk.org/jdk/pull/16398.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16398/head:pull/16398 PR: https://git.openjdk.org/jdk/pull/16398 From ogillespie at openjdk.org Mon Nov 27 18:09:07 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Mon, 27 Nov 2023 18:09:07 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v10] In-Reply-To: References: <6UvVZPlnf_cSgctUVPspR767dHLa7vxOA4DiC7fJ3LY=.35e24443-ab4d-4240-a991-2d98d120bb3b@github.com> Message-ID: On Mon, 13 Nov 2023 01:35:42 GMT, Kim Barrett wrote: >> Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix indentation, rename test helper > > test/hotspot/gtest/classfile/test_placeholders.cpp line 45: > >> 43: Symbol* D = SymbolTable::new_symbol("def2_8_2023_class"); >> 44: Symbol* super = SymbolTable::new_symbol("super2_8_2023_supername"); >> 45: Symbol* interf = SymbolTable::new_symbol("interface2_8_2023_supername"); > > This doesn't seem like the right way to update this test. Doesn't this leave the symbols dangling? > And in the face of potential queue draining, it seems to me this could lead the test to intermittent failures. Updated to use the same 'create then drain' approach as the other test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1406553070 From jbhateja at openjdk.org Mon Nov 27 18:16:18 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 27 Nov 2023 18:16:18 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v7] In-Reply-To: References: Message-ID: On Sun, 26 Nov 2023 18:25:24 GMT, Volodymyr Paprotski wrote: >> Splitting vblendvp[sd] into boolean operations is bit faster on ECore, get up to 30% gain >> >> >> =============== BEFORE =============== >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorSignum.floatSignum 256 avgt 3 77.766 ? 0.049 ns/op >> VectorSignum.floatSignum 512 avgt 3 154.889 ? 0.242 ns/op >> VectorSignum.floatSignum 1024 avgt 3 306.130 ? 0.605 ns/op >> VectorSignum.floatSignum 2048 avgt 3 609.965 ? 0.927 ns/op >> VectorSignum.doubleSignum 256 avgt 3 151.874 ? 1.748 ns/op >> VectorSignum.doubleSignum 512 avgt 3 303.080 ? 0.310 ns/op >> VectorSignum.doubleSignum 1024 avgt 3 607.517 ? 0.597 ns/op >> VectorSignum.doubleSignum 2048 avgt 3 1214.282 ? 1.834 ns/op >> Benchmark Mode Cnt Score Error Units >> MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op >> MaxMinOptimizeTest.dMax avgt 3 137.334 ? 0.128 us/op >> MaxMinOptimizeTest.dMin avgt 3 137.160 ? 0.465 us/op >> MaxMinOptimizeTest.dMul avgt 3 77.231 ? 0.051 us/op >> MaxMinOptimizeTest.fAdd avgt 3 77.165 ? 0.003 us/op >> MaxMinOptimizeTest.fMax avgt 3 107.428 ? 1.501 us/op >> MaxMinOptimizeTest.fMin avgt 3 107.186 ? 0.022 us/op >> MaxMinOptimizeTest.fMul avgt 3 77.164 ? 0.012 us/op >> >> =============== AFTER =============== >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorSignum.floatSignum 256 avgt 3 61.816 ? 1.980 ns/op >> VectorSignum.floatSignum 512 avgt 3 117.251 ? 0.052 ns/op >> VectorSignum.floatSignum 1024 avgt 3 231.356 ? 0.397 ns/op >> VectorSignum.floatSignum 2048 avgt 3 458.904 ? 0.774 ns/op >> VectorSignum.doubleSignum 256 avgt 3 121.449 ? 0.184 ns/op >> VectorSignum.doubleSignum 512 avgt 3 241.662 ? 0.189 ns/op >> VectorSignum.doubleSignum 1024 avgt 3 482.365 ? 0.165 ns/op >> VectorSignum.doubleSignum 2048 avgt 3 962.412 ? 1.401 ns/op >> Benchmark Mode Cnt Score Error Units >> MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op >> MaxMinOptimizeTest.dMax avgt 3 125.701 ? 0.082 us/op >> MaxMinOptimizeTest.dMin avgt 3 124.704 ? 0.119 us/op >> MaxMinOptimizeTest.dMul avgt 3 77.232 ? 0.028 us/op >> MaxMinOptimizeTest.fAdd avgt 3 77.169 ? 0.103 us/op >> MaxMinOptimizeTest.fMax avgt 3 97.939 ? 0.477 us/op >> MaxMinOptimizeTest.fMin avgt 3 98.012 ? 0.154 us/op >> MaxMinO... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > whitespace again Thanks for filing the RFE for remaining changes. src/hotspot/cpu/x86/x86.ad line 7835: > 7833: > 7834: instruct ablendvp(legVec dst, legVec src1, legVec src2, legVec mask, legVec vtmp) %{ > 7835: predicate(UseAVX > 0 && EnableX86ECoreOpts && Suggestion: instruct vblendvp(legVec dst, legVec src1, legVec src2, legVec mask, legVec vtmp) %{ predicate(UseAVX > 0 && EnableX86ECoreOpts && ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16716#pullrequestreview-1751072900 PR Review Comment: https://git.openjdk.org/jdk/pull/16716#discussion_r1406562620 From duke at openjdk.org Mon Nov 27 18:35:25 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 27 Nov 2023 18:35:25 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v8] In-Reply-To: References: Message-ID: > Splitting vblendvp[sd] into boolean operations is bit faster on ECore, get up to 30% gain > > > =============== BEFORE =============== > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 77.766 ? 0.049 ns/op > VectorSignum.floatSignum 512 avgt 3 154.889 ? 0.242 ns/op > VectorSignum.floatSignum 1024 avgt 3 306.130 ? 0.605 ns/op > VectorSignum.floatSignum 2048 avgt 3 609.965 ? 0.927 ns/op > VectorSignum.doubleSignum 256 avgt 3 151.874 ? 1.748 ns/op > VectorSignum.doubleSignum 512 avgt 3 303.080 ? 0.310 ns/op > VectorSignum.doubleSignum 1024 avgt 3 607.517 ? 0.597 ns/op > VectorSignum.doubleSignum 2048 avgt 3 1214.282 ? 1.834 ns/op > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op > MaxMinOptimizeTest.dMax avgt 3 137.334 ? 0.128 us/op > MaxMinOptimizeTest.dMin avgt 3 137.160 ? 0.465 us/op > MaxMinOptimizeTest.dMul avgt 3 77.231 ? 0.051 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.165 ? 0.003 us/op > MaxMinOptimizeTest.fMax avgt 3 107.428 ? 1.501 us/op > MaxMinOptimizeTest.fMin avgt 3 107.186 ? 0.022 us/op > MaxMinOptimizeTest.fMul avgt 3 77.164 ? 0.012 us/op > > =============== AFTER =============== > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 61.816 ? 1.980 ns/op > VectorSignum.floatSignum 512 avgt 3 117.251 ? 0.052 ns/op > VectorSignum.floatSignum 1024 avgt 3 231.356 ? 0.397 ns/op > VectorSignum.floatSignum 2048 avgt 3 458.904 ? 0.774 ns/op > VectorSignum.doubleSignum 256 avgt 3 121.449 ? 0.184 ns/op > VectorSignum.doubleSignum 512 avgt 3 241.662 ? 0.189 ns/op > VectorSignum.doubleSignum 1024 avgt 3 482.365 ? 0.165 ns/op > VectorSignum.doubleSignum 2048 avgt 3 962.412 ? 1.401 ns/op > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op > MaxMinOptimizeTest.dMax avgt 3 125.701 ? 0.082 us/op > MaxMinOptimizeTest.dMin avgt 3 124.704 ? 0.119 us/op > MaxMinOptimizeTest.dMul avgt 3 77.232 ? 0.028 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.169 ? 0.103 us/op > MaxMinOptimizeTest.fMax avgt 3 97.939 ? 0.477 us/op > MaxMinOptimizeTest.fMin avgt 3 98.012 ? 0.154 us/op > MaxMinOptimizeTest.fMul avgt 3 77.174 ? 0.012 us/op Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/cpu/x86/x86.ad Co-authored-by: Jatin Bhateja ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16716/files - new: https://git.openjdk.org/jdk/pull/16716/files/5ed257de..0480a5d2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16716&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16716&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16716.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16716/head:pull/16716 PR: https://git.openjdk.org/jdk/pull/16716 From sspitsyn at openjdk.org Mon Nov 27 18:56:09 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 27 Nov 2023 18:56:09 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v7] In-Reply-To: <54eLI7PoGn3jHcWzniPASmXB0ZUsmxqwe3JRhkyU4bM=.f6ad0469-727c-4f4b-9dd7-334dd7233a9a@github.com> References: <54eLI7PoGn3jHcWzniPASmXB0ZUsmxqwe3JRhkyU4bM=.f6ad0469-727c-4f4b-9dd7-334dd7233a9a@github.com> Message-ID: <3tjmanjKve-N23snZwQqwxv3-j4VfYd86AvhYzKl8XY=.411c16a3-b5d6-4a5a-85e7-fb5926643830@github.com> On Mon, 27 Nov 2023 12:58:56 GMT, Stefan Karlsson wrote: >> In the rewrites made for: >> [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` >> >> I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. >> >> The provided tests provoke this assert form: >> * the JNI thread detach code >> * thread dumping with locked monitors, and >> * the JVMTI GetOwnedMonitorInfo API. >> >> While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. >> >> The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. >> >> For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. >> >> Test: the written tests with and without the fix. Tier1-Tier3, so far. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Tweaks to jtreg run comments test/hotspot/jtreg/runtime/Monitor/libMonitorWithDeadObjectTest.c line 168: > 166: if (pthread_create(&attacher, NULL, create_monitor_with_dead_object_and_dump_threads_in_thread, NULL) != 0) die("pthread_create"); > 167: if (pthread_join(attacher, &ret) != 0) die("pthread_join"); > 168: } The lines 153-167 have an inconsistent indent. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1406610304 From sspitsyn at openjdk.org Mon Nov 27 19:01:13 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 27 Nov 2023 19:01:13 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v5] In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 09:44:23 GMT, Stefan Karlsson wrote: >> test/hotspot/jtreg/serviceability/jvmti/GetOwnedMonitorInfo/GetOwnedMonitorInfoTest.java line 53: >> >>> 51: private static native boolean hasEventPosted(); >>> 52: >>> 53: private static void jniMonitorEnterAndLetObjectDie() { >> >> I can see it is convenient to just inject this test case in an existing test, but I'm not sure it is necessarily the right thing to do. Serviceability folk may have a stronger opinion. > > Yeah, I was thinking the same. Maybe @sspitsyn or @plummercj could give guidance here? This looks okay. I see no problem with it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1406615663 From rriggs at openjdk.org Mon Nov 27 19:09:40 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Mon, 27 Nov 2023 19:09:40 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v13] In-Reply-To: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: > Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. > The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. > > The changes include: > > - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate > if the input array is modified before the constructor returns. > The resulting string may contain any combination of characters sampled from the input array. > > - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. > For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. > If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. > If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. > Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. > Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. > > - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. > > - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. Roger Riggs has updated the pull request incrementally with one additional commit since the last revision: Use byte off branches in char_array_compress Verified by manual tests with "-XX:AVX3Threshold=0" And test in the PR test/hotspot/jtreg/compiler/intrinsics/string/TestStringConstructionIntrinsics.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16425/files - new: https://git.openjdk.org/jdk/pull/16425/files/d201344b..5299c43b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16425&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16425&range=11-12 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16425/head:pull/16425 PR: https://git.openjdk.org/jdk/pull/16425 From stefank at openjdk.org Mon Nov 27 19:22:11 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 27 Nov 2023 19:22:11 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v5] In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 18:58:14 GMT, Serguei Spitsyn wrote: >> Yeah, I was thinking the same. Maybe @sspitsyn or @plummercj could give guidance here? > > This looks okay. I see no problem with it. OK. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1406637379 From stefank at openjdk.org Mon Nov 27 19:22:10 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 27 Nov 2023 19:22:10 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v7] In-Reply-To: References: <54eLI7PoGn3jHcWzniPASmXB0ZUsmxqwe3JRhkyU4bM=.f6ad0469-727c-4f4b-9dd7-334dd7233a9a@github.com> Message-ID: On Mon, 27 Nov 2023 19:04:42 GMT, Serguei Spitsyn wrote: > The fix looks good to me. How was this tested? Thanks. It was tested with the added and updated tests in GHA. Do you have any suggestions for more tests to run? > test/hotspot/jtreg/runtime/Monitor/libMonitorWithDeadObjectTest.c line 168: > >> 166: if (pthread_create(&attacher, NULL, create_monitor_with_dead_object_and_dump_threads_in_thread, NULL) != 0) die("pthread_create"); >> 167: if (pthread_join(attacher, &ret) != 0) die("pthread_join"); >> 168: } > > The lines 153-167 have an inconsistent indent. Fixed. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16783#issuecomment-1828467777 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1406636914 From sspitsyn at openjdk.org Mon Nov 27 19:22:08 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 27 Nov 2023 19:22:08 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v7] In-Reply-To: <54eLI7PoGn3jHcWzniPASmXB0ZUsmxqwe3JRhkyU4bM=.f6ad0469-727c-4f4b-9dd7-334dd7233a9a@github.com> References: <54eLI7PoGn3jHcWzniPASmXB0ZUsmxqwe3JRhkyU4bM=.f6ad0469-727c-4f4b-9dd7-334dd7233a9a@github.com> Message-ID: On Mon, 27 Nov 2023 12:58:56 GMT, Stefan Karlsson wrote: >> In the rewrites made for: >> [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` >> >> I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. >> >> The provided tests provoke this assert form: >> * the JNI thread detach code >> * thread dumping with locked monitors, and >> * the JVMTI GetOwnedMonitorInfo API. >> >> While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. >> >> The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. >> >> For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. >> >> Test: the written tests with and without the fix. Tier1-Tier3, so far. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Tweaks to jtreg run comments The fix looks good to me. How was this tested? ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16783#pullrequestreview-1751166223 From sspitsyn at openjdk.org Mon Nov 27 19:36:08 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 27 Nov 2023 19:36:08 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v7] In-Reply-To: <54eLI7PoGn3jHcWzniPASmXB0ZUsmxqwe3JRhkyU4bM=.f6ad0469-727c-4f4b-9dd7-334dd7233a9a@github.com> References: <54eLI7PoGn3jHcWzniPASmXB0ZUsmxqwe3JRhkyU4bM=.f6ad0469-727c-4f4b-9dd7-334dd7233a9a@github.com> Message-ID: <2u7QlAUR0MpiqurUcsrLKdiPsgDPv0jq8HrpAWfu4Mk=.31a59a07-b518-474b-a3d6-410292ede3c4@github.com> On Mon, 27 Nov 2023 12:58:56 GMT, Stefan Karlsson wrote: >> In the rewrites made for: >> [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` >> >> I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. >> >> The provided tests provoke this assert form: >> * the JNI thread detach code >> * thread dumping with locked monitors, and >> * the JVMTI GetOwnedMonitorInfo API. >> >> While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. >> >> The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. >> >> For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. >> >> Test: the written tests with and without the fix. Tier1-Tier3, so far. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Tweaks to jtreg run comments It is more safe to run mach5 tiers 1-4, tier5-svc and 6. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16783#issuecomment-1828479536 From stefank at openjdk.org Mon Nov 27 20:20:01 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 27 Nov 2023 20:20:01 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v8] In-Reply-To: References: Message-ID: > In the rewrites made for: > [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` > > I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. > > The provided tests provoke this assert form: > * the JNI thread detach code > * thread dumping with locked monitors, and > * the JVMTI GetOwnedMonitorInfo API. > > While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. > > The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. > > For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. > > Test: the written tests with and without the fix. Tier1-Tier3, so far. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Fix indentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16783/files - new: https://git.openjdk.org/jdk/pull/16783/files/234175d9..0e68fb68 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16783&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16783&range=06-07 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/16783.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16783/head:pull/16783 PR: https://git.openjdk.org/jdk/pull/16783 From mdoerr at openjdk.org Mon Nov 27 21:59:27 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 27 Nov 2023 21:59:27 GMT Subject: RFR: 8320807: [PPC64][ZGC] C1 generates wrong code for atomics Message-ID: Debugging test failures on PPC64 in java/lang/Thread/virtual/stress/Skynet.java#ZGenerational has shown that the ldarx+stdcx_ loop for uncompressed Oops in `LIR_Assembler::atomic_op` is wrong: `__ mr(Rtmp, Robj);` is inside of the ldarx+stdcx_ loop, but must be outside of it. Repeated execution leads to wrong store value. In addition, zBarrierSetC1.cpp expects `cas_obj` and `xchg` to contain all necessary memory barriers. That doesn't fit to the current PPC64 design which inserts memory barriers on LIR level instead. I've changed this and moved them into the assembler code for all GCs. While debugging, I have optimized out an unnecessary branch in `ZBarrierSetAssembler::store_barrier_medium`. ------------- Commit messages: - [PPC64][ZGC] C1 generates wrong code for atomics Changes: https://git.openjdk.org/jdk/pull/16835/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16835&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320807 Stats: 121 lines in 4 files changed: 42 ins; 70 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/16835.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16835/head:pull/16835 PR: https://git.openjdk.org/jdk/pull/16835 From kbarrett at openjdk.org Mon Nov 27 22:53:16 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 27 Nov 2023 22:53:16 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v10] In-Reply-To: References: <6UvVZPlnf_cSgctUVPspR767dHLa7vxOA4DiC7fJ3LY=.35e24443-ab4d-4240-a991-2d98d120bb3b@github.com> Message-ID: On Mon, 27 Nov 2023 17:52:05 GMT, Oli Gillespie wrote: >> src/hotspot/share/oops/symbolHandle.hpp line 115: >> >>> 113: } >>> 114: >>> 115: static void drain_cleanup_delay_queue() { >> >> It's not obvious that draining the queue is useful. Unless there's a reason I'm missing, I suggest not doing so. > > I don't feel too strongly either way but someone else previously suggested draining during the periodic task so I added it. > The benefit is not leaving Symbols hanging around in the queue indefinitely (though granted, a fixed number of them, so the memory waste is limited). The downside is a small piece of added code and work on the periodic task. I didn't find any discussion of whether draining is needed in this PR, and draining is in the initial commit. Other downsides include the need to test that feature and the impact that feature has on testing other parts of this change. Unless someone argues for it, I'd prefer to see it removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1406864079 From dcubed at openjdk.org Mon Nov 27 23:03:14 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 27 Nov 2023 23:03:14 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v8] In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 20:20:01 GMT, Stefan Karlsson wrote: >> In the rewrites made for: >> [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` >> >> I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. >> >> The provided tests provoke this assert form: >> * the JNI thread detach code >> * thread dumping with locked monitors, and >> * the JVMTI GetOwnedMonitorInfo API. >> >> While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. >> >> The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. >> >> For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. >> >> Test: the written tests with and without the fix. Tier1-Tier3, so far. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Fix indentation I've made a first pass over this PR. I don't think have anything that's a "must do". I'll make another pass tomorrow after I have a chance to think about this fix. test/hotspot/jtreg/runtime/Monitor/MonitorWithDeadObjectTest.java line 2: > 1: /* > 2: * Copyright (c) 2022, 2023, Oracle and/or its affiliates. All rights reserved. nit: why include 2022 in the copyright header? test/hotspot/jtreg/runtime/Monitor/libMonitorWithDeadObjectTest.c line 2: > 1: /* > 2: * Copyright (c) 2022, 2023, Oracle and/or its affiliates. All rights reserved. nit: why include 2022 in the copyright header? test/hotspot/jtreg/runtime/Monitor/libMonitorWithDeadObjectTest.c line 28: > 26: #include > 27: #include > 28: #include Should these be in sort order? test/hotspot/jtreg/runtime/Monitor/libMonitorWithDeadObjectTest.c line 110: > 108: > 109: // Let the GC clear the weak reference to the object. > 110: system_gc(env); A single GC may not be enough... test/hotspot/jtreg/runtime/Monitor/libMonitorWithDeadObjectTest.c line 121: > 119: create_monitor_with_dead_object(env); > 120: > 121: // DetachCurrenThread will try to unlock held monitors. This has been a nit typo: s/DetachCurrenThread/DetachCurrentThread/ test/hotspot/jtreg/runtime/Monitor/libMonitorWithDeadObjectTest.c line 131: > 129: if ((*jvm)->DetachCurrentThread(jvm) != JNI_OK) die("DetachCurrentThread"); > 130: > 131: return NULL; Why is this function return type "void*" when it only returns NULL? test/hotspot/jtreg/runtime/Monitor/libMonitorWithDeadObjectTest.c line 149: > 147: if ((*jvm)->DetachCurrentThread(jvm) != JNI_OK) die("DetachCurrentThread"); > 148: > 149: return NULL; Why is this function return type "void*" when it only returns NULL? ------------- PR Review: https://git.openjdk.org/jdk/pull/16783#pullrequestreview-1751551279 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1406858803 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1406861430 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1406862006 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1406864625 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1406864964 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1406866330 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1406866618 From dcubed at openjdk.org Mon Nov 27 23:03:17 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 27 Nov 2023 23:03:17 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v8] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 11:21:12 GMT, Stefan Karlsson wrote: >> test/hotspot/jtreg/serviceability/jvmti/GetOwnedMonitorInfo/GetOwnedMonitorInfoTest.java line 61: >> >>> 59: jniMonitorEnter(obj); >>> 60: obj = null; >>> 61: System.gc(); >> >> Again one gc() is generally not sufficient. >> >> How can this test tell that the object in the monitor was actually cleared? I think `monitorinflation` logging may be the only way to tell. > > Yes, probably. I've been looking at the `monitorinflation` logging to very that it gets cleared. I think it would be messy to try to get this test to also start to parse logs. There are other tests that try to make sure that some specific object is GC'ed. If the lack of the object being collected will cause the test to fail, then look else where for a more reliable mechanism. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1406869425 From sviswanathan at openjdk.org Mon Nov 27 23:36:09 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 27 Nov 2023 23:36:09 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v8] In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 18:35:25 GMT, Volodymyr Paprotski wrote: >> Splitting vblendvp[sd] into boolean operations is bit faster on ECore, get up to 30% gain >> >> >> =============== BEFORE =============== >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorSignum.floatSignum 256 avgt 3 77.766 ? 0.049 ns/op >> VectorSignum.floatSignum 512 avgt 3 154.889 ? 0.242 ns/op >> VectorSignum.floatSignum 1024 avgt 3 306.130 ? 0.605 ns/op >> VectorSignum.floatSignum 2048 avgt 3 609.965 ? 0.927 ns/op >> VectorSignum.doubleSignum 256 avgt 3 151.874 ? 1.748 ns/op >> VectorSignum.doubleSignum 512 avgt 3 303.080 ? 0.310 ns/op >> VectorSignum.doubleSignum 1024 avgt 3 607.517 ? 0.597 ns/op >> VectorSignum.doubleSignum 2048 avgt 3 1214.282 ? 1.834 ns/op >> Benchmark Mode Cnt Score Error Units >> MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op >> MaxMinOptimizeTest.dMax avgt 3 137.334 ? 0.128 us/op >> MaxMinOptimizeTest.dMin avgt 3 137.160 ? 0.465 us/op >> MaxMinOptimizeTest.dMul avgt 3 77.231 ? 0.051 us/op >> MaxMinOptimizeTest.fAdd avgt 3 77.165 ? 0.003 us/op >> MaxMinOptimizeTest.fMax avgt 3 107.428 ? 1.501 us/op >> MaxMinOptimizeTest.fMin avgt 3 107.186 ? 0.022 us/op >> MaxMinOptimizeTest.fMul avgt 3 77.164 ? 0.012 us/op >> >> =============== AFTER =============== >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorSignum.floatSignum 256 avgt 3 61.816 ? 1.980 ns/op >> VectorSignum.floatSignum 512 avgt 3 117.251 ? 0.052 ns/op >> VectorSignum.floatSignum 1024 avgt 3 231.356 ? 0.397 ns/op >> VectorSignum.floatSignum 2048 avgt 3 458.904 ? 0.774 ns/op >> VectorSignum.doubleSignum 256 avgt 3 121.449 ? 0.184 ns/op >> VectorSignum.doubleSignum 512 avgt 3 241.662 ? 0.189 ns/op >> VectorSignum.doubleSignum 1024 avgt 3 482.365 ? 0.165 ns/op >> VectorSignum.doubleSignum 2048 avgt 3 962.412 ? 1.401 ns/op >> Benchmark Mode Cnt Score Error Units >> MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op >> MaxMinOptimizeTest.dMax avgt 3 125.701 ? 0.082 us/op >> MaxMinOptimizeTest.dMin avgt 3 124.704 ? 0.119 us/op >> MaxMinOptimizeTest.dMul avgt 3 77.232 ? 0.028 us/op >> MaxMinOptimizeTest.fAdd avgt 3 77.169 ? 0.103 us/op >> MaxMinOptimizeTest.fMax avgt 3 97.939 ? 0.477 us/op >> MaxMinOptimizeTest.fMin avgt 3 98.012 ? 0.154 us/op >> MaxMinO... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/cpu/x86/x86.ad > > Co-authored-by: Jatin Bhateja Updates to the PR look good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16716#pullrequestreview-1751611568 From sviswanathan at openjdk.org Mon Nov 27 23:41:10 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 27 Nov 2023 23:41:10 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v8] In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 18:35:25 GMT, Volodymyr Paprotski wrote: >> Splitting vblendvp[sd] into boolean operations is bit faster on ECore, get up to 30% gain >> >> >> =============== BEFORE =============== >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorSignum.floatSignum 256 avgt 3 77.766 ? 0.049 ns/op >> VectorSignum.floatSignum 512 avgt 3 154.889 ? 0.242 ns/op >> VectorSignum.floatSignum 1024 avgt 3 306.130 ? 0.605 ns/op >> VectorSignum.floatSignum 2048 avgt 3 609.965 ? 0.927 ns/op >> VectorSignum.doubleSignum 256 avgt 3 151.874 ? 1.748 ns/op >> VectorSignum.doubleSignum 512 avgt 3 303.080 ? 0.310 ns/op >> VectorSignum.doubleSignum 1024 avgt 3 607.517 ? 0.597 ns/op >> VectorSignum.doubleSignum 2048 avgt 3 1214.282 ? 1.834 ns/op >> Benchmark Mode Cnt Score Error Units >> MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op >> MaxMinOptimizeTest.dMax avgt 3 137.334 ? 0.128 us/op >> MaxMinOptimizeTest.dMin avgt 3 137.160 ? 0.465 us/op >> MaxMinOptimizeTest.dMul avgt 3 77.231 ? 0.051 us/op >> MaxMinOptimizeTest.fAdd avgt 3 77.165 ? 0.003 us/op >> MaxMinOptimizeTest.fMax avgt 3 107.428 ? 1.501 us/op >> MaxMinOptimizeTest.fMin avgt 3 107.186 ? 0.022 us/op >> MaxMinOptimizeTest.fMul avgt 3 77.164 ? 0.012 us/op >> >> =============== AFTER =============== >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorSignum.floatSignum 256 avgt 3 61.816 ? 1.980 ns/op >> VectorSignum.floatSignum 512 avgt 3 117.251 ? 0.052 ns/op >> VectorSignum.floatSignum 1024 avgt 3 231.356 ? 0.397 ns/op >> VectorSignum.floatSignum 2048 avgt 3 458.904 ? 0.774 ns/op >> VectorSignum.doubleSignum 256 avgt 3 121.449 ? 0.184 ns/op >> VectorSignum.doubleSignum 512 avgt 3 241.662 ? 0.189 ns/op >> VectorSignum.doubleSignum 1024 avgt 3 482.365 ? 0.165 ns/op >> VectorSignum.doubleSignum 2048 avgt 3 962.412 ? 1.401 ns/op >> Benchmark Mode Cnt Score Error Units >> MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op >> MaxMinOptimizeTest.dMax avgt 3 125.701 ? 0.082 us/op >> MaxMinOptimizeTest.dMin avgt 3 124.704 ? 0.119 us/op >> MaxMinOptimizeTest.dMul avgt 3 77.232 ? 0.028 us/op >> MaxMinOptimizeTest.fAdd avgt 3 77.169 ? 0.103 us/op >> MaxMinOptimizeTest.fMax avgt 3 97.939 ? 0.477 us/op >> MaxMinOptimizeTest.fMin avgt 3 98.012 ? 0.154 us/op >> MaxMinO... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/cpu/x86/x86.ad > > Co-authored-by: Jatin Bhateja @vnkozlov @TobiHartmann Please advice if we could go ahead and merge this PR or if you would like to review/test it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16716#issuecomment-1828808387 From coleenp at openjdk.org Tue Nov 28 00:10:20 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 28 Nov 2023 00:10:20 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v11] In-Reply-To: References: Message-ID: <1R111Zxy_o1oXLuVqKojBz2b3DV9vxo2LcF1rTC7SM0=.f1df96d4-e8fd-4d12-8d5f-9b966b16bbd7@github.com> On Mon, 27 Nov 2023 18:09:04 GMT, Oli Gillespie wrote: >> Attempt to fix regressions in class-loading performance caused by fixing a symbol table leak in [JDK-8313678](https://bugs.openjdk.org/browse/JDK-8313678). >> >> See lengthy discussion in https://bugs.openjdk.org/browse/JDK-8315559 for more background. In short, the leak was providing an accidental cache for temporary symbols, allowing reuse. >> >> This change keeps new temporary symbols alive in a queue for a short time, allowing them to be re-used by subsequent operations. For example, when attempting to load a class we may call JVM_FindLoadedClass for multiple classloaders back-to-back, and all of them will create a TempNewSymbol for the same string. At present, each call will leave a dead symbol in the table and create a new one. Dead symbols add cleanup and lookup overhead, and insertion is also costly. With this change, the symbol from the first invocation will live for some time after it is used, and subsequent callers can find the symbol alive in the table - avoiding the extra work. >> >> The queue is bounded, and when full new entries displace the oldest entry. This means symbols are held for the time it takes for 100 new temp symbols to be created. 100 is chosen arbitrarily - the tradeoff is memory usage versus 'cache' hit rate. >> >> When concurrent symbol table cleanup runs, it also drains the queue. >> >> In my testing, this brings Dacapo pmd performance back to where it was before the leak was fixed. >> >> Thanks @shipilev , @coleenp and @MDBijman for helping with this fix. > > Oli Gillespie has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Implement suggestions > - Merge remote-tracking branch 'origin/master' into keepalive2 > - Fix indentation, rename test helper > - Remove trailing whitespace > - Remove is_enabled check, use modulo shortcut, add drain test > - Set queue size to power of 2, use constant in test > - Add missing atomic.hpp include > - Switch to a ringbuffer instead of NonblockingQueue > - Adress comments > > Fix indentation > Improve tests > Improve comment > Remove redundant null check > Improve naming > Pop when >, not >= max len > - Fix failing tests > > TempNewSymbol now increments refcount again, messing with the > expectations. This is not a complete fix, I will have to read the > individual cases and make sure they are correct. > - ... and 2 more: https://git.openjdk.org/jdk/compare/9983968c...d83ea056 This still looks good to me. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16398#pullrequestreview-1751632368 From coleenp at openjdk.org Tue Nov 28 00:10:22 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 28 Nov 2023 00:10:22 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v10] In-Reply-To: References: <6UvVZPlnf_cSgctUVPspR767dHLa7vxOA4DiC7fJ3LY=.35e24443-ab4d-4240-a991-2d98d120bb3b@github.com> Message-ID: On Mon, 13 Nov 2023 01:20:19 GMT, Kim Barrett wrote: >> Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix indentation, rename test helper > > src/hotspot/share/oops/symbolHandle.hpp line 59: > >> 57: // Conversion from a Symbol* to a SymbolHandleBase. >> 58: // Does not increment the current reference count if temporary. >> 59: SymbolHandleBase(Symbol *s) : _temp(s) { > > Is this really called with nullptr sometimes? It would be better if that was disallowed. But that's probably > outside the scope of this PR. Yes SymbolHandle can be null. There's a SymbolHandle field in the PlaceholderEntry called supername which is set to nullptr. At least this is one instance where SymbolHandle can be null. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1406907726 From coleenp at openjdk.org Tue Nov 28 00:10:24 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 28 Nov 2023 00:10:24 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v4] In-Reply-To: References: Message-ID: <4DLEivE5wKT1xj9lRvcxxxPWeGLZ6y-q6b7sqa3csGg=.d6a0d8dd-5cca-4ab8-8a0a-6345b3b667ef@github.com> On Wed, 1 Nov 2023 13:29:07 GMT, Coleen Phillimore wrote: >> test/hotspot/gtest/classfile/test_placeholders.cpp line 46: >> >>> 44: Symbol* super = SymbolTable::new_symbol("super2_8_2023_supername"); >>> 45: Symbol* interf = SymbolTable::new_symbol("interface2_8_2023_supername"); >>> 46: >> >> I swapped these from TempNewSymbol to Symbol before I added the set_cleanup_delay_max_entries functionality to avoid interfering with refcounts in tests. >> Is the swap safe? Or should I use set_cleanup_delay_max_entries(0) and switch back? I can't tell why some of these were made temp and some not. > > I don't remember why they were TempNewSymbol either but I don't think it matters for this test, they can be Symbol*. I still don't think it matters for the test, but this seems to be an effective workaround. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1406924428 From lmesnik at openjdk.org Tue Nov 28 00:21:05 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 28 Nov 2023 00:21:05 GMT Subject: RFR: 8320239: add dynamic switch for JvmtiVTMSTransitionDisabler sync protocol [v2] In-Reply-To: References: Message-ID: On Fri, 17 Nov 2023 10:30:59 GMT, Serguei Spitsyn wrote: >> This is an update for a performance/scalability enhancement. >> >> The `JvmtiVTMSTransitionDisabler`sync protocol is on a performance critical path of the virtual threads mount state transitions (VTMS transitions). It has a noticeable performance overhead (about 10%) which contributes to the combined JVMTI performance overhead when Java apps are executed with loaded JVMTI agents. >> >> Please, also see another/related performance issue which contributes around 70% of total performance overhead: >> [8308614](https://bugs.openjdk.org/browse/JDK-8308614): Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 >> >> Testing: >> - Ran mach5 tiers 1-6 with no regressions noticed. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: make new fields volatile, use Atomic for access/update Marked as reviewed by lmesnik (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16688#pullrequestreview-1751668020 From pchilanomate at openjdk.org Tue Nov 28 00:38:15 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 28 Nov 2023 00:38:15 GMT Subject: RFR: 8320275: assert(_chunk->bitmap().at(index)) failed: Bit not set at index Message-ID: Please review the following fix. The assert fails while verifying the top frame of the stackChunk before returning from a thaw call. The stackChunk is in gc mode but we found a narrow oop for this c2 compiled frame that doesn't have its corresponding bit set. This is because while thawing its callee we cleared the bitmap range associated with the argument area, but this narrow oop happens to land at the very last stack slot of that region. Loom code assumes the size of the argument area is always a multiple of 2 stack slots, as SharedRuntime::java_calling_convention() shows. But c2 doesn't seem to follow this convention and, knowing the last passed argument only takes one stack slot, it's using the remaining space to store a narrow oop for the caller. There are more details about the specific crash in JBS. The initial proposed fix is to just restrict the range of the bitmap we clear by excluding the last stack slot of the argument area, since passed oops are always word aligned. I've also experimented with a patch where I changed SharedRuntime::java_calling_convention() and Fingerprinter::do_type_calling_convention() to not round up the number of stack slots used, and then changed the callers to use a round up value or not depending on the needs [1]. I wasn't convinced it was worthy given we only care about this difference in this Loom code, but I don't mind going with that fix instead. The 3rd alternative would be to just change c2 to not use this stack slot and start spilling at a word aligned offset from the sp. I run the patch with the failing test and verified the crash doesn't reproduce anymore. I've also run this patch through loom tiers1-5. Thanks, Patricio [1] https://github.com/pchilano/jdk/commit/42ae9269b28beb6f36c502182116545b680e418f ------------- Commit messages: - v1 Changes: https://git.openjdk.org/jdk/pull/16837/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16837&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320275 Stats: 15 lines in 3 files changed: 6 ins; 1 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/16837.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16837/head:pull/16837 PR: https://git.openjdk.org/jdk/pull/16837 From xgong at openjdk.org Tue Nov 28 01:40:08 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 28 Nov 2023 01:40:08 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: References: <3dscjw75an6_n3ko_2LjHdpdj08xsyjhfpStuZrS5-M=.f43592fe-036e-47b5-babc-67f126885839@github.com> <17T7DXMTf1rFVeQY_FWJx0DETVYlBWceJO4lltWZyw0=.847b02a3-ef40-464a-80b7-d62bd9dbc2b5@github.com> <3g_PCAyCoS9IB9PIulE0sXzT8KqvxeNbzdgfbElT72E=.60fa3790-4894-463a-b192-1b75ff42d24b@github.com> <3mhEtXP50-Lkn4iKkMGyCY4RJEcFbaTXyCUN3eoOH7M=.54337aac-d295-4fd8-884b-8dba5c4a68f6@github.com> Message-ID: On Mon, 27 Nov 2023 16:43:09 GMT, Andrew Haley wrote: >> Apparently the situation is this: If your build machine happens to have SVE, then you will get SVE support in the vmath library. The SVE support will be used during runtime if the machine you run on has SVE support. >> >> If your build host happens to to not have SVE, then the vmath library will be built without SVE support, and no matter if your runtime machine has SVE or not, it will not provide SVE support in the vmath library. >> >> Now, if your CI farm has an arbitrarily selection of aarch64 machines with and without SVE, then you have no idea what you are going to get in your build. >> >> We have been very careful in staying clear of this kind of "random" build system behavior. The system you build on should not affect the output -- at least, not without a chance to override the default value. >> >> In fact, I am not even sure why it seems to the PR author to be a good idea to let the default be dependent on the build machine at all. My personal opinion is that it would be better to select either "SVE enabled" or "SVE disabled" as the default, and then let the user override this on the configure command line, if they target a platform with different SVE availability. > >> In fact, I am not even sure why it seems to the PR author to be a good idea to let the default be dependent on the build machine at all. My personal opinion is that it would be better to select either "SVE enabled" or "SVE disabled" as the default, and then let the user override this on the configure command line, if they target a platform with different SVE availability. > > SVE support should be enabled regardless of the build machine. The same binary must run on both SVE and non-SVE machines, using SVE if it is advantageous. I suppose some ancient C++ compilers without SVE support still exist, but I see no very good reason to support them building JDK 22+. > > Making a configure option to disable SVE support for vector math is a mistake, but IMO mostly harmless because no-one will ever turn it off. > That's fine, but we must make sure that SVE is not used by the compiler in any other places. If you've already built on non-SVE and tested the result on both SVE and non-SVE, I'm happy. Agree. Since it contains both NEON and SVE functions in libvmath.so, we have to disable SVE feature when building those NEON functions. We want to separate NEON/SVE functions in two files, build them with different cflags (i.e. only build SVE sources with `-march=armv8-a+sve`), and then link to the single `libvmath.so`. Do we have such similar examples or functions in current jdk make system? I'm still struggling on finding out an effective way for it. > SVE support should be enabled regardless of the build machine. The same binary must run on both SVE and non-SVE machines, using SVE if it is advantageous. I suppose some ancient C++ compilers without SVE support still exist, but I see no very good reason to support them building JDK 22+. > > Making a configure option to disable SVE support for vector math is a mistake, but IMO mostly harmless because no-one will ever turn it off. Yes, SVE feature is also always enabled in jdk hotspot on SVE machines. If we add the option to give people disable SVE, it's weird that we disabling the SVE just in libvmath.so, and enabling it in hotspot. Besides, we choose the NEON stubs for smaller than 128-bit vector operations no matter whether the runtime machine supports SVE or not. So performance may not be an issue. Hence, I don't think people have reason disabling SVE features. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1406971807 From jjoo at openjdk.org Tue Nov 28 02:22:45 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Tue, 28 Nov 2023 02:22:45 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48] In-Reply-To: References: Message-ID: <_lEBVrWV8wrVbmhOiu3AAqPJo_xBs718ZtA9V-VSzGM=.253c0ec8-256e-4dee-b125-90be6338e4b8@github.com> > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with two additional commits since the last revision: - Fix namespace issues (2) Co-authored-by: Stefan Johansson <54407259+kstefanj at users.noreply.github.com> - Fix namespace issues Co-authored-by: Stefan Johansson <54407259+kstefanj at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/fcc7e471..abb90258 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=47 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=46-47 Stats: 6 lines in 2 files changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From sspitsyn at openjdk.org Tue Nov 28 02:55:05 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 28 Nov 2023 02:55:05 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is created for one JavaThread associated with attached native thread [v3] In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Wed, 22 Nov 2023 23:58:54 GMT, Jiangli Zhou wrote: >> Thank you for filing and fixing this issue! I'm kind of late here. Sorry for that. >> Is it hard to create a JTreg test for an attaching native thread? I can help if you have a standalone prototype. >> You can look for some examples in the folder: `test/hotspot/jtreg/serviceability/jvmti/vthread`. > >> Thank you for filing and fixing this issue! I'm kind of late here. Sorry for that. Is it hard to create a JTreg test for an attaching native thread? I can help if you have a standalone prototype. You can look for some examples in the folder: `test/hotspot/jtreg/serviceability/jvmti/vthread`. > > Hi @sspitsyn we don't have an extracted standalone test case (yet) to demonstrate the crashes. The crashes could not reproduce consistently. Outside the debugger (lldb), I ran the test (one of the affected ones) 10 times/per-iteration in order to reproduce. I found the crashes could be affected by both timing and memory layout. During the investigation, I noticed the problem became hidden when I increased allocation size for ThreadsList::_threads (as one of the experiments that I did, I wanted to mprotect the memory to be read-only in order to find who trashed the memory, so was trying to allocate memory up to page boundary). That's the reason why I added noreg-hard tag earlier. > > I gave some more thoughts today. Perhaps, we could write a whitebox test to check the JvmtiThreadState, without being able to consistently trigger crashes. We could add a WhiteBox api to iterate the JvmtiThreadState list and validate if all the JavaThread pointers were valid after detaching. The test would need to create native threads to attach and detach before the check. That could more reliably test the 1-1 mapping of JvmtiThreadState and JavaThread. What do you think? > > Thanks for volunteering to help with the test. I created https://bugs.openjdk.org/browse/JDK-8320614 today. Should I assign it to you? @jianglizhou Thank you for filing the sub-task. You have already seen some crashes. Even though you do not have a standalone test case, it is still valuable if you describe a test scenario (at least, surfacely) which helped to observe the problem. Could you, add it to the sub-task report, please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16642#issuecomment-1828977849 From sspitsyn at openjdk.org Tue Nov 28 05:12:06 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 28 Nov 2023 05:12:06 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is created for one JavaThread associated with attached native thread [v5] In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: <9bpDj7y-wn-5I1Hvnw9kCKqLVK0TOINrZvYeq_d-QsA=.aa8cd690-3198-4398-8e59-6be8288d90ee@github.com> On Mon, 13 Nov 2023 23:33:50 GMT, Man Cao wrote: >> Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: >> >> Address Serguei Spitsyn's comments/suggestions: >> - Remove the redundant thread->is_Java_thread() check from JvmtiSampledObjectAllocEventCollector::object_alloc_is_safe_to_sample(). >> - Change the assert in JvmtiThreadState::state_for_while_locked to avoid #ifdef ASSERT. > > src/hotspot/share/prims/jvmtiThreadState.inline.hpp line 98: > >> 96: state->get_thread_oop() != thread_oop)) { >> 97: // Check if java_lang_Thread already has a link to the JvmtiThreadState. >> 98: if (thread_oop != nullptr) { // thread_oop can be null during early VMStart. > > This comment is another case of `state->get_thread_oop()` being null. We should merge this comment with the new comment about attaching native thread. This also was caught by my eyes. :) With the lines 99-101 in place the only case when `thread_oop` can be equal to `nullptr` is when `thread->threadObj() == nullptr`. My understanding is that can be for a detached thread only. I would suggest to add an assert after the line 101: assert(thread_oop != nullptr, "sanity check"); Full testing with this assert should help to identify if it can be fired. Then we can get rid of the check at the line 104. The `JvmtiThreadState` constructor also allows for `thread_oop` to be `nullptr`. Some cleanup will be needed to get rid of unneeded checks there as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16642#discussion_r1407185890 From dholmes at openjdk.org Tue Nov 28 05:28:10 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 28 Nov 2023 05:28:10 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v8] In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 22:53:16 GMT, Daniel D. Daugherty wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix indentation > > test/hotspot/jtreg/runtime/Monitor/libMonitorWithDeadObjectTest.c line 131: > >> 129: if ((*jvm)->DetachCurrentThread(jvm) != JNI_OK) die("DetachCurrentThread"); >> 130: >> 131: return NULL; > > Why is this function return type "void*" when it only returns NULL? @dcubed-ojdk These are the thread routines passed to `pthread_create` and must be typed as `void *(*start_routine)(void*)`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1407195886 From dholmes at openjdk.org Tue Nov 28 05:42:12 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 28 Nov 2023 05:42:12 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v8] In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 20:20:01 GMT, Stefan Karlsson wrote: >> In the rewrites made for: >> [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` >> >> I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. >> >> The provided tests provoke this assert form: >> * the JNI thread detach code >> * thread dumping with locked monitors, and >> * the JVMTI GetOwnedMonitorInfo API. >> >> While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. >> >> The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. >> >> For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. >> >> Test: the written tests with and without the fix. Tier1-Tier3, so far. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Fix indentation test/hotspot/jtreg/runtime/Monitor/MonitorWithDeadObjectTest.java line 35: > 33: > 34: /* > 35: * @requires os.family != "windows" & os.family != "aix" I guess we can fix the AIX usage later. test/hotspot/jtreg/runtime/Monitor/MonitorWithDeadObjectTest.java line 88: > 86: // monitor with a dead object. The thread dumping code used to not > 87: // tolerate such a monitor and would assert. Run a thread dump and make > 88: // sure that it doesn't crash/assert. The comment is not correct, the monitor will be unlocked by the time `createMonitorWithDeadObject` has returned. This post-dump is really just a sanity test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1407196610 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1407201123 From dholmes at openjdk.org Tue Nov 28 05:42:13 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 28 Nov 2023 05:42:13 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v5] In-Reply-To: <32f_pRuVEPp9K4uP018OxlePZw6bNXQcNVrcWXUadmQ=.c74d725e-5604-4726-bc5e-adbd67f781fb@github.com> References: <32f_pRuVEPp9K4uP018OxlePZw6bNXQcNVrcWXUadmQ=.c74d725e-5604-4726-bc5e-adbd67f781fb@github.com> Message-ID: On Mon, 27 Nov 2023 09:27:40 GMT, Stefan Karlsson wrote: >> test/hotspot/jtreg/runtime/Monitor/libMonitorWithDeadObjectTest.c line 47: >> >>> 45: >>> 46: #define check(env, what, msg) \ >>> 47: check_exception((env), (msg)); \ >> >> I'm not understanding why you have `check` and `check_exception` here nor why you choose to use one versus the other. ?? > > Some JNI calls return something, for those I can use `check` which combines a null-check and an exception check. Some tests don't return anything, they can't null-check and can only perform an exception check. Okay I had missed how the return values were being passed as `what`. It is annoying that we have to keep redefining this little helpers for error checking in these kinds of tests as we end up with similar but different ways of doing it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1407202355 From dholmes at openjdk.org Tue Nov 28 05:42:15 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 28 Nov 2023 05:42:15 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v8] In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 22:50:46 GMT, Daniel D. Daugherty wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix indentation > > test/hotspot/jtreg/runtime/Monitor/libMonitorWithDeadObjectTest.c line 110: > >> 108: >> 109: // Let the GC clear the weak reference to the object. >> 110: system_gc(env); > > A single GC may not be enough... @dcubed-ojdk there's some earlier discussion on this. Apparently a single GC is sufficient to clear an oopStorage WeakHandle, even though it may not be enough for Java level reference processing actions to be observed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1407199130 From dholmes at openjdk.org Tue Nov 28 05:42:18 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 28 Nov 2023 05:42:18 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v8] In-Reply-To: References: Message-ID: <3wFhKioFk7TTAg0Nys5P3P999J6IQ0iL6w8Zl6hTExY=.07cf682f-53c3-4b75-b608-a7d718de9772@github.com> On Tue, 28 Nov 2023 05:25:17 GMT, David Holmes wrote: >> test/hotspot/jtreg/runtime/Monitor/libMonitorWithDeadObjectTest.c line 131: >> >>> 129: if ((*jvm)->DetachCurrentThread(jvm) != JNI_OK) die("DetachCurrentThread"); >>> 130: >>> 131: return NULL; >> >> Why is this function return type "void*" when it only returns NULL? > > @dcubed-ojdk These are the thread routines passed to `pthread_create` and must be typed as `void *(*start_routine)(void*)`. Though I just noticed the parameter is missing. As @stefank has pointed out this was copied from another test so all of that other test's issues are/were also present here unfortunately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1407200225 From dholmes at openjdk.org Tue Nov 28 05:42:20 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 28 Nov 2023 05:42:20 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v5] In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 02:08:33 GMT, David Holmes wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Split test and use othervm > > test/hotspot/jtreg/serviceability/jvmti/GetOwnedMonitorInfo/GetOwnedMonitorInfoTest.java line 1: > >> 1: /* > > Please update the `@bug` line and update the summary. This is still outstanding - thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1407203617 From stefank at openjdk.org Tue Nov 28 07:22:13 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 28 Nov 2023 07:22:13 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v8] In-Reply-To: References: Message-ID: <7bLLj-LjX9KQQj_eZVHDpxGnXdOYKpurlIHN1drIghc=.369633ed-2337-4279-94be-e1673f97f197@github.com> On Mon, 27 Nov 2023 22:46:15 GMT, Daniel D. Daugherty wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix indentation > > test/hotspot/jtreg/runtime/Monitor/libMonitorWithDeadObjectTest.c line 2: > >> 1: /* >> 2: * Copyright (c) 2022, 2023, Oracle and/or its affiliates. All rights reserved. > > nit: why include 2022 in the copyright header? Because the test was copied and built upon a test that was created in 2022. > test/hotspot/jtreg/runtime/Monitor/libMonitorWithDeadObjectTest.c line 28: > >> 26: #include >> 27: #include >> 28: #include > > Should these be in sort order? Yes > test/hotspot/jtreg/runtime/Monitor/libMonitorWithDeadObjectTest.c line 121: > >> 119: create_monitor_with_dead_object(env); >> 120: >> 121: // DetachCurrenThread will try to unlock held monitors. This has been a > > nit typo: s/DetachCurrenThread/DetachCurrentThread/ Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1407317548 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1407318161 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1407319756 From stefank at openjdk.org Tue Nov 28 07:22:16 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 28 Nov 2023 07:22:16 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v8] In-Reply-To: References: Message-ID: On Tue, 28 Nov 2023 05:31:11 GMT, David Holmes wrote: >> test/hotspot/jtreg/runtime/Monitor/libMonitorWithDeadObjectTest.c line 110: >> >>> 108: >>> 109: // Let the GC clear the weak reference to the object. >>> 110: system_gc(env); >> >> A single GC may not be enough... > > @dcubed-ojdk there's some earlier discussion on this. Apparently a single GC is sufficient to clear an oopStorage WeakHandle, even though it may not be enough for Java level reference processing actions to be observed. One is enough for this test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1407319349 From stefank at openjdk.org Tue Nov 28 07:22:16 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 28 Nov 2023 07:22:16 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v8] In-Reply-To: <3wFhKioFk7TTAg0Nys5P3P999J6IQ0iL6w8Zl6hTExY=.07cf682f-53c3-4b75-b608-a7d718de9772@github.com> References: <3wFhKioFk7TTAg0Nys5P3P999J6IQ0iL6w8Zl6hTExY=.07cf682f-53c3-4b75-b608-a7d718de9772@github.com> Message-ID: <4Ej-Seh7kiqw1P79IXIF3zDh68Qfq9QPx42EmDdQQBo=.40c1979f-d7ca-43fe-ba34-aa2e14fd8c9c@github.com> On Tue, 28 Nov 2023 05:33:08 GMT, David Holmes wrote: >> @dcubed-ojdk These are the thread routines passed to `pthread_create` and must be typed as `void *(*start_routine)(void*)`. > > Though I just noticed the parameter is missing. As @stefank has pointed out this was copied from another test so all of that other test's issues are/were also present here unfortunately. I've updated the functions to use the correct signature. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1407321590 From stefank at openjdk.org Tue Nov 28 07:28:12 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 28 Nov 2023 07:28:12 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v8] In-Reply-To: References: Message-ID: On Tue, 28 Nov 2023 05:26:32 GMT, David Holmes wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix indentation > > test/hotspot/jtreg/runtime/Monitor/MonitorWithDeadObjectTest.java line 35: > >> 33: >> 34: /* >> 35: * @requires os.family != "windows" & os.family != "aix" > > I guess we can fix the AIX usage later. This test tests a very specific platform-independent bug. Making it work for other platforms than Linux was done when it was easy to do so. I can't test AIX and don't know all the quirks of that OS, so I don't think it is worth for me to spend the time trying to make this test work for AIX. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1407327378 From stefank at openjdk.org Tue Nov 28 07:41:13 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 28 Nov 2023 07:41:13 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v8] In-Reply-To: References: Message-ID: On Tue, 28 Nov 2023 05:34:39 GMT, David Holmes wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix indentation > > test/hotspot/jtreg/runtime/Monitor/MonitorWithDeadObjectTest.java line 88: > >> 86: // monitor with a dead object. The thread dumping code used to not >> 87: // tolerate such a monitor and would assert. Run a thread dump and make >> 88: // sure that it doesn't crash/assert. > > The comment is not correct, the monitor will be unlocked by the time `createMonitorWithDeadObject` has returned. This post-dump is really just a sanity test. All of these are sanity checks now that this PR fixes the bug. This specific test was added because of the combinations of bugs I've seen and provoked by temporarily reinstating various combinations of the bugs. Specifically, if the detach code skips visiting monitors with dead objects, but the thread dumping code does. That is, the opposite of the currently proposed patch. I'll update the comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1407339116 From stefank at openjdk.org Tue Nov 28 08:11:13 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 28 Nov 2023 08:11:13 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v8] In-Reply-To: References: Message-ID: <1GDZveyZrUI-DoFEzi4aq-rXO7lGgqwZH3w7lD2NwrA=.81a6a6dc-a2f6-4d8b-a8f1-5a54031de5bb@github.com> On Mon, 27 Nov 2023 22:42:41 GMT, Daniel D. Daugherty wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix indentation > > test/hotspot/jtreg/runtime/Monitor/MonitorWithDeadObjectTest.java line 2: > >> 1: /* >> 2: * Copyright (c) 2022, 2023, Oracle and/or its affiliates. All rights reserved. > > nit: why include 2022 in the copyright header? See earlier comment. > test/hotspot/jtreg/runtime/Monitor/libMonitorWithDeadObjectTest.c line 149: > >> 147: if ((*jvm)->DetachCurrentThread(jvm) != JNI_OK) die("DetachCurrentThread"); >> 148: >> 149: return NULL; > > Why is this function return type "void*" when it only returns NULL? See earlier comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1407365519 PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1407366093 From stefank at openjdk.org Tue Nov 28 08:11:16 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 28 Nov 2023 08:11:16 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v5] In-Reply-To: References: Message-ID: On Tue, 28 Nov 2023 05:38:33 GMT, David Holmes wrote: >> test/hotspot/jtreg/serviceability/jvmti/GetOwnedMonitorInfo/GetOwnedMonitorInfoTest.java line 1: >> >>> 1: /* >> >> Please update the `@bug` line and update the summary. > > This is still outstanding - thanks. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16783#discussion_r1407367471 From epeter at openjdk.org Tue Nov 28 08:44:09 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 Nov 2023 08:44:09 GMT Subject: RFR: 8319709: Make GrowableArrayCHeap copyable [v2] In-Reply-To: <0CTdcVjWDVOuV23lJ0EGFGgqa4x_P_UxSo5N4WzTJTE=.45f3ba14-bd81-4c58-afe7-6b4849e568aa@github.com> References: <2SEJ0Rh7DNmKgcylAW7_DFxas2Bs3YzTnUSe39OIVsI=.03298520-694f-4ba7-bdce-d1e67eb3872e@github.com> <0UAh881Jw6L5YNbClDQmuE_Q6fzv0ayeqkrblIoigZ8=.5d81b8a8-d04c-48cb-8987-f3fba98ac403@github.com> <0CTdcVjWDVOuV23lJ0EGFGgqa4x_P_UxSo5N4WzTJTE=.45f3ba14-bd81-4c58-afe7-6b4849e568aa@github.com> Message-ID: On Thu, 9 Nov 2023 07:17:32 GMT, Johan Sj?len wrote: >> What is the motivation for this? Please add something to JBS. Also see query below. >> >> Thanks > >> What is the motivation for this? Please add something to JBS. Also see query below. >> >> Thanks > > I need this feature because I want to store `GrowableArray` s within `GrowableArray`s without an unnecessary pointer indirection. I'll add this justification to JBS. @jdksjolen have you checked all use cases where `GrowableArrayCHeap` is copied? I don't know the state of `GrowableArrayCHeap`, but for `GrowableArray`, there are lots of cases where we actually just would like to "swap" or "move" over the data, and not really duplicate/clone the data. It would just be nice to avoid the overhead of more allocations. I guess that is the drawback of the copy-constructor: you can very easily miss heavy allocations. Might it be better to forbid the assignment operator and the copy constructor, and make the copying explicit, i.e. with a `copy_from` method? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16559#issuecomment-1829345350 From ayang at openjdk.org Tue Nov 28 09:24:12 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 28 Nov 2023 09:24:12 GMT Subject: RFR: JDK-8234502 : Merge GenCollectedHeap and SerialHeap [v8] In-Reply-To: References: Message-ID: On Fri, 24 Nov 2023 10:18:31 GMT, Lei Zaakjyu wrote: >> JDK-8234502 : Merge GenCollectedHeap and SerialHeap > > Lei Zaakjyu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Merge branch 'openjdk:master' into serialgc > - replace a necessary include statement > - clean up > - add line-breaks > - fix include statements > - add some headers > - Completely removed 'GenCollectedHeap' > - Fix 'young_gen' function in 'genCollectedHeap.cpp' > - include 'serialVMOperations.hpp' > - fix trialing whitespace > - ... and 2 more: https://git.openjdk.org/jdk/compare/165c291d...12c680a3 I just noticed I referred to the wrong PR number in my previous msg... Could you resolve the conflict now that that PR is merged? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16623#issuecomment-1829410192 From aph at openjdk.org Tue Nov 28 10:06:09 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 28 Nov 2023 10:06:09 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: References: <3dscjw75an6_n3ko_2LjHdpdj08xsyjhfpStuZrS5-M=.f43592fe-036e-47b5-babc-67f126885839@github.com> <17T7DXMTf1rFVeQY_FWJx0DETVYlBWceJO4lltWZyw0=.847b02a3-ef40-464a-80b7-d62bd9dbc2b5@github.com> <3g_PCAyCoS9IB9PIulE0sXzT8KqvxeNbzdgfbElT72E=.60fa3790-4894-463a-b192-1b75ff42d24b@github.com> <3mhEtXP50-Lkn4iKkMGyCY4RJEcFbaTXyCUN3eoOH7M=.54337aac-d295-4fd8-884b-8dba5c4a68f6@github.com> Message-ID: <69BEw41MlNZ8k-6WG9JC5V-F-Z_WZVzouYVkNCxDNgw=.6f9e9219-538e-4e1b-ab0b-46ef6b932b79@github.com> On Tue, 28 Nov 2023 01:37:01 GMT, Xiaohong Gong wrote: >>> In fact, I am not even sure why it seems to the PR author to be a good idea to let the default be dependent on the build machine at all. My personal opinion is that it would be better to select either "SVE enabled" or "SVE disabled" as the default, and then let the user override this on the configure command line, if they target a platform with different SVE availability. >> >> SVE support should be enabled regardless of the build machine. The same binary must run on both SVE and non-SVE machines, using SVE if it is advantageous. I suppose some ancient C++ compilers without SVE support still exist, but I see no very good reason to support them building JDK 22+. >> >> Making a configure option to disable SVE support for vector math is a mistake, but IMO mostly harmless because no-one will ever turn it off. > >> That's fine, but we must make sure that SVE is not used by the compiler in any other places. If you've already built on non-SVE and tested the result on both SVE and non-SVE, I'm happy. > > Agree. > > Since it contains both NEON and SVE functions in libvmath.so, we have to disable SVE feature when building those NEON functions. We want to separate NEON/SVE functions in two files, build them with different cflags (i.e. only build SVE sources with `-march=armv8-a+sve`), and then link to the single `libvmath.so`. Do we have such similar examples or functions in current jdk make system? I'm still struggling on finding out an effective way for it. > >> SVE support should be enabled regardless of the build machine. The same binary must run on both SVE and non-SVE machines, using SVE if it is advantageous. I suppose some ancient C++ compilers without SVE support still exist, but I see no very good reason to support them building JDK 22+. >> >> Making a configure option to disable SVE support for vector math is a mistake, but IMO mostly harmless because no-one will ever turn it off. > > Yes, SVE feature is also always enabled in jdk hotspot on SVE machines. If we add the option to give people disable SVE, it's weird that we disabling the SVE just in libvmath.so, and enabling it in hotspot. Besides, we choose the NEON stubs for smaller than 128-bit vector operations no matter whether the runtime machine supports SVE or not. So performance may not be an issue. Hence, I don't think people have reason disabling SVE features. It makes no sense to configure any of this at build time. Postpone all of the decisions to runtime. Don't touch the make system.Instead, try to open the library at runtime with `os::dll_open()`, after (or inside) `VM_Version::initialize()`. If you're not running on an SVE system, none of the SVE routines will be called, so it doesn't matter, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1407511713 From aph at openjdk.org Tue Nov 28 10:10:23 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 28 Nov 2023 10:10:23 GMT Subject: RFR: 8320709: AArch64: Vectorized Poly1305 intrinsics Message-ID: Vectorizing Poly1305 is quite tricky. We already have a highly- efficient scalar Poly1305 implementation that runs on the core integer unit, but it's highly serialized, so it does not make make good use of the parallelism available. The scalar implementation takes advantage of some particular features of the Poly1305 keys. In particular, certain bits of r, the secret key, are required to be 0. These make it possible to use a full 64-bit-wide multiply-accumulate operation without needing to process carries between partial products, While this works well for a serial implementation, a parallel implementation cannot do this because rather than multiplying by r, each step multiplies by some integer power of r, modulo 2^130-5. In order to avoid processing carries between partial products we use a redundant representation, in which each 130-bit integer is encoded either as a 5-digit integer in base 2^26 or as a 3-digit integer in base 2^52, depending on whether we are using a 64- or 32-bit multiply-accumulate. In AArch64 Advanced SIMD, there is no 64-bit multiply-accumulate operation available to us, so we must use 32*32 -> 64-bit operations. In order to achieve maximum performance we'd like to get close to the processor's decode bandwidth, so that every clock cycle does something useful. In a typical high-end AArch64 implementation, the core integer unit has a fast 64-bit multiplier pipeline and the ASIMD unit has a fast(ish) two-way 32-bit multiplier, which may be slower than than the core integer unit's. It is not at all obvious whether it's best to use ASIMD or core instructions. Fortunately, if we have a wide-bandwidth instruction decode, we can do both at the same time, by feeding alternating instructions to the core and the ASIMD units. This also allows us to make good use of all of the available core and ASIMD registers, in parallel. To do this we use generators, which here are a kind of iterator that emits a group of instructions each time it is called. In this case we 4 parallel generators, and by calling them alternately we interleave the ASIMD and the core instructions. We also take care to ensure that each generator finishes at about the same time, to maximize the distance between instructions which generate and consume data. The results are pretty good, ranging from 2* - 3* speedup. It is possible that a pure in-order processor (Raspberry Pi?) might be at some disadvantage because more work is being done even though it is highly parallel, but I haven't seen any slowdown on the machines I've tested. I've left the old serial version for testing. The best result is on Apple M1 (Firestorm, the performance cores) which is pushing 6 Gbyes/sec. This is not _quite_ as fast as Intel AVX-512, but I think it's good enough. Finally, it's quite likely that SVE, and perhaps SVE 2, could do even better, but I'm leaving that for another day and another PR. Speed tests, current versus this patch: Graviton 2: Benchmark (dataSize) (provider) Mode Cnt Score Error Units Poly1305DigestBench.updateBytes 64 avgt 10 0.105 ? 0.001 us/op Poly1305DigestBench.updateBytes 256 avgt 10 0.359 ? 0.001 us/op Poly1305DigestBench.updateBytes 1024 avgt 10 1.370 ? 0.001 us/op Poly1305DigestBench.updateBytes 16384 avgt 10 21.699 ? 0.002 us/op Poly1305DigestBench.updateBytes 1048576 avgt 10 1384.960 ? 1.155 us/op Benchmark (dataSize) (provider) Mode Cnt Score Error Units Poly1305DigestBench.updateBytes 64 avgt 10 0.141 ? 0.001 us/op Poly1305DigestBench.updateBytes 256 avgt 10 0.476 ? 0.001 us/op Poly1305DigestBench.updateBytes 1024 avgt 10 0.977 ? 0.001 us/op Poly1305DigestBench.updateBytes 16384 avgt 10 10.944 ? 0.012 us/op Poly1305DigestBench.updateBytes 1048576 avgt 10 681.180 ? 1.275 us/op Graviton 3: Benchmark (dataSize) (provider) Mode Cnt Score Error Units Poly1305DigestBench.updateBytes 64 avgt 10 0.056 ? 0.001 us/op Poly1305DigestBench.updateBytes 256 avgt 10 0.183 ? 0.001 us/op Poly1305DigestBench.updateBytes 1024 avgt 10 0.695 ? 0.002 us/op Poly1305DigestBench.updateBytes 16384 avgt 10 11.057 ? 0.001 us/op Poly1305DigestBench.updateBytes 1048576 avgt 10 680.142 ? 1.111 us/op Benchmark (dataSize) (provider) Mode Cnt Score Error Units Poly1305DigestBench.updateBytes 64 avgt 10 0.056 ? 0.001 us/op Poly1305DigestBench.updateBytes 256 avgt 10 0.181 ? 0.001 us/op Poly1305DigestBench.updateBytes 1024 avgt 10 0.388 ? 0.001 us/op Poly1305DigestBench.updateBytes 16384 avgt 10 4.522 ? 0.001 us/op Poly1305DigestBench.updateBytes 1048576 avgt 10 287.268 ? 0.279 us/op Apple M1: Poly1305DigestBench.updateBytes 64 avgt 10 0.037 ? 0.001 us/op Poly1305DigestBench.updateBytes 256 avgt 10 0.132 ? 0.001 us/op Poly1305DigestBench.updateBytes 1024 avgt 10 0.510 ? 0.002 us/op Poly1305DigestBench.updateBytes 16384 avgt 10 8.064 ? 0.033 us/op Poly1305DigestBench.updateBytes 1048576 avgt 10 516.537 ? 1.090 us/op Benchmark (dataSize) Mode Cnt Score Error Units Poly1305DigestBench.updateBytes 64 avgt 10 0.037 ? 0.001 us/op Poly1305DigestBench.updateBytes 256 avgt 10 0.117 ? 0.001 us/op Poly1305DigestBench.updateBytes 1024 avgt 10 0.238 ? 0.001 us/op Poly1305DigestBench.updateBytes 16384 avgt 10 2.721 ? 0.002 us/op Poly1305DigestBench.updateBytes 1048576 avgt 10 169.724 ? 0.873 us/op ------------- Commit messages: - Oops - 8320709: AArch64: Vectorized Poly1305 intrinsics - Cleanup - Cleanup - Cleanup - Cleanup - Cleanup - Merge branch 'clean' into JDK-8296411-dev - Merge branch 'clean' into JDK-8296411-dev - Delete debugging print - ... and 117 more: https://git.openjdk.org/jdk/compare/0c9a61c1...37f46caa Changes: https://git.openjdk.org/jdk/pull/16812/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16812&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320709 Stats: 1045 lines in 7 files changed: 1041 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16812.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16812/head:pull/16812 PR: https://git.openjdk.org/jdk/pull/16812 From adinn at openjdk.org Tue Nov 28 10:10:24 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 28 Nov 2023 10:10:24 GMT Subject: RFR: 8320709: AArch64: Vectorized Poly1305 intrinsics In-Reply-To: References: Message-ID: On Fri, 24 Nov 2023 17:12:25 GMT, Andrew Haley wrote: > Vectorizing Poly1305 is quite tricky. We already have a highly- > efficient scalar Poly1305 implementation that runs on the core integer > unit, but it's highly serialized, so it does not make make good use of > the parallelism available. > > The scalar implementation takes advantage of some particular features > of the Poly1305 keys. In particular, certain bits of r, the secret > key, are required to be 0. These make it possible to use a full > 64-bit-wide multiply-accumulate operation without needing to process > carries between partial products, > > While this works well for a serial implementation, a parallel > implementation cannot do this because rather than multiplying by r, > each step multiplies by some integer power of r, modulo > 2^130-5. > > In order to avoid processing carries between partial products we use a > redundant representation, in which each 130-bit integer is encoded > either as a 5-digit integer in base 2^26 or as a 3-digit integer in > base 2^52, depending on whether we are using a 64- or 32-bit > multiply-accumulate. > > In AArch64 Advanced SIMD, there is no 64-bit multiply-accumulate > operation available to us, so we must use 32*32 -> 64-bit operations. > > In order to achieve maximum performance we'd like to get close to the > processor's decode bandwidth, so that every clock cycle does something > useful. In a typical high-end AArch64 implementation, the core integer > unit has a fast 64-bit multiplier pipeline and the ASIMD unit has a > fast(ish) two-way 32-bit multiplier, which may be slower than than the > core integer unit's. It is not at all obvious whether it's best to use > ASIMD or core instructions. > > Fortunately, if we have a wide-bandwidth instruction decode, we can do > both at the same time, by feeding alternating instructions to the core > and the ASIMD units. This also allows us to make good use of all of > the available core and ASIMD registers, in parallel. > > To do this we use generators, which here are a kind of iterator that > emits a group of instructions each time it is called. In this case we > 4 parallel generators, and by calling them alternately we interleave > the ASIMD and the core instructions. We also take care to ensure that > each generator finishes at about the same time, to maximize the > distance between instructions which generate and consume data. > > The results are pretty good, ranging from 2* - 3* speedup. It is > possible that a pure in-order processor (Raspberry Pi?) might be at > some disadvantage because more work is being done even though it is > highly parallel, b... I am reviewing this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16812#issuecomment-1827479742 From aph at openjdk.org Tue Nov 28 10:10:26 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 28 Nov 2023 10:10:26 GMT Subject: RFR: 8320709: AArch64: Vectorized Poly1305 intrinsics In-Reply-To: References: Message-ID: On Fri, 24 Nov 2023 17:12:25 GMT, Andrew Haley wrote: > Vectorizing Poly1305 is quite tricky. We already have a highly- > efficient scalar Poly1305 implementation that runs on the core integer > unit, but it's highly serialized, so it does not make make good use of > the parallelism available. > > The scalar implementation takes advantage of some particular features > of the Poly1305 keys. In particular, certain bits of r, the secret > key, are required to be 0. These make it possible to use a full > 64-bit-wide multiply-accumulate operation without needing to process > carries between partial products, > > While this works well for a serial implementation, a parallel > implementation cannot do this because rather than multiplying by r, > each step multiplies by some integer power of r, modulo > 2^130-5. > > In order to avoid processing carries between partial products we use a > redundant representation, in which each 130-bit integer is encoded > either as a 5-digit integer in base 2^26 or as a 3-digit integer in > base 2^52, depending on whether we are using a 64- or 32-bit > multiply-accumulate. > > In AArch64 Advanced SIMD, there is no 64-bit multiply-accumulate > operation available to us, so we must use 32*32 -> 64-bit operations. > > In order to achieve maximum performance we'd like to get close to the > processor's decode bandwidth, so that every clock cycle does something > useful. In a typical high-end AArch64 implementation, the core integer > unit has a fast 64-bit multiplier pipeline and the ASIMD unit has a > fast(ish) two-way 32-bit multiplier, which may be slower than than the > core integer unit's. It is not at all obvious whether it's best to use > ASIMD or core instructions. > > Fortunately, if we have a wide-bandwidth instruction decode, we can do > both at the same time, by feeding alternating instructions to the core > and the ASIMD units. This also allows us to make good use of all of > the available core and ASIMD registers, in parallel. > > To do this we use generators, which here are a kind of iterator that > emits a group of instructions each time it is called. In this case we > 4 parallel generators, and by calling them alternately we interleave > the ASIMD and the core instructions. We also take care to ensure that > each generator finishes at about the same time, to maximize the > distance between instructions which generate and consume data. > > The results are pretty good, ranging from 2* - 3* speedup. It is > possible that a pure in-order processor (Raspberry Pi?) might be at > some disadvantage because more work is being done even though it is > highly parallel, b... Benchmarking Icestorm, the efficiency cores inside the M1, makes for interesting reading. The cores are much smaller and have far less potential parallelism, so the performance gain is far less, maxing at about 1.6 *, and even showing a very slight regression at small sizes. I think this gain is still worth having for the increased performance, even with the code size expansion, but the decision is not so clear. Benchmark (dataSize) (provider) Mode Cnt Score Error Units Poly1305DigestBench.updateBytes 64 avgt 10 0.076 ? 0.001 us/op Poly1305DigestBench.updateBytes 256 avgt 10 0.217 ? 0.002 us/op Poly1305DigestBench.updateBytes 1024 avgt 10 0.786 ? 0.004 us/op Poly1305DigestBench.updateBytes 16384 avgt 10 11.989 ? 0.003 us/op Poly1305DigestBench.updateBytes 1048576 avgt 10 766.477 ? 0.888 us/op Poly1305DigestBench.updateBytes 64 avgt 10 0.097 ? 0.001 us/op Poly1305DigestBench.updateBytes 256 avgt 10 0.294 ? 0.001 us/op Poly1305DigestBench.updateBytes 1024 avgt 10 0.650 ? 0.002 us/op Poly1305DigestBench.updateBytes 16384 avgt 10 7.754 ? 0.014 us/op Poly1305DigestBench.updateBytes 1048576 avgt 10 485.716 ? 1.486 us/op src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7527: > 7525: __ bind(DONE); > 7526: } > 7527: __ poly1305_fully_reduce(S0, u0); This call to `poly1305_fully_reduce` is probably unnecessary, because the caller invokes `IntegerPolynomial1305::finalCarryReduceLast`. However, this part of the contract is undocumented. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16812#issuecomment-1827704006 PR Review Comment: https://git.openjdk.org/jdk/pull/16812#discussion_r1406082911 From jvernee at openjdk.org Tue Nov 28 10:21:33 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 28 Nov 2023 10:21:33 GMT Subject: Integrated: 8267532: C2: Profile and prune untaken exception handlers In-Reply-To: References: Message-ID: On Mon, 30 Oct 2023 14:10:33 GMT, Jorn Vernee wrote: > The issue is essentially that for the Java try-with-resource construct, javac generates multiple calls to `close(`) the resource. One of those calls is inside the hidden exception handler of the try block. The issue for us is that typically the exception handler is never entered (since no exception is thrown), however we don't profile exception handlers at the moment, so the block is not pruned. C2 doesn't inline the `close()` call in the handler due to low call site frequency. As a result, the receiver of that call escapes and can not be scalar replaced, which then leads to a loss in performance. > > There has been some discussion on the JBS issue that this could be fixed by profiling catch blocks. And another suggestion that partial escape analysis could help here to prevent the object from escaping. But, I think there are other benefits to being able to prune dead catch blocks, such as general reduction in code size, and other optimizations being possible by dead code being eliminated. So, I've implemented catch block profiling + pruning in this patch. > > The implementation is essentially very straightforward: we allocate an extra bit of profiling data for each > exception handler of a method in the `MethodData` for that method (which holds all the profiling > data). Then when looking up the exception handler after an exception is thrown, we mark the > exception handler as entered. When C2 parses the exception handler block, and it sees that it has > never been entered, we emit an uncommon trap instead. > > I've also cleaned up the handling of profiling data sections a bit. After adding the extra section of data to MethodData, I was seeing several crashes when ciMethodData was used. The underlying issue seemed to be that the offset of the parameter data was computed based on the total data size - parameter data size (which doesn't work if we add an additional section for exception handler data). I've re-written the code around this a bit to try and prevent issues in the future. Both MethodData and ciMethodData now track offsets of parameter data and exception handler data, and the size of the each data section is derived from the offsets. > > Finally, there was an assert firing in `freeze_internal` in `continuationFreezeThaw.cpp`: > > assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), > "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int64_t)current->held_monitor_count... This pull request has now been integrated. Changeset: a5ccd3be Author: Jorn Vernee URL: https://git.openjdk.org/jdk/commit/a5ccd3beaf069bdfe81736f6c62e5b4b9e18b5fe Stats: 773 lines in 26 files changed: 663 ins; 18 del; 92 mod 8267532: C2: Profile and prune untaken exception handlers 8310011: Arena with try-with-resources is slower than it should be Reviewed-by: thartmann, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/16416 From jvernee at openjdk.org Tue Nov 28 10:28:42 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 28 Nov 2023 10:28:42 GMT Subject: RFR: 8320310: CompiledMethod::has_monitors flag can be incorrect [v2] In-Reply-To: <4efExybeWDkEbcsckI1Qdz8kpYFqd-Rbmt7oiWz5qlo=.d8d38d0e-affa-48dc-b963-45f958041c4e@github.com> References: <4efExybeWDkEbcsckI1Qdz8kpYFqd-Rbmt7oiWz5qlo=.d8d38d0e-affa-48dc-b963-45f958041c4e@github.com> Message-ID: > Currently, the `CompiledMethod::has_monitors` flag is set when either a `monitorenter` is parsed by C1, and `monitorexit` is parsed by C1 or C2 during method compilation. However, not necessarily every bytecode of a method is parsed, which means that we could miss all `monitorenter`/`monitorexit` byte codes in a method, while it actually does use monitors. This can lead to situations where a thread holds a monitor, but `has_monitors` for all frames is set to `false`, leading to an assertion failure in 'freeze_internal' in continuationFreezeThaw.cpp: > > assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), > "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int64_t)current->held_monitor_count(), (int64_t)current->jni_monitor_count()); > > The proposed fix is to rely on `Method::has_monitor_bytecodes` to set the `has_monitors` flag when compiling, which is immune to issues where not all byte codes of a method are parsed during compilation. We can follow the pattern established for `has_reserved_stack_access`, which is similar. > > Note that this PR is based on: https://github.com/openjdk/jdk/pull/16416 which disables the assertion. The goal of this PR is to fix the issue, and then re-enable the assertion. > > Testing: Tier 1-4, `java/lang/Thread/virtual/stress/PinALot.java` Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 45 additional commits since the last revision: - fix has_monitors tracking. Re-enable assert - add interpreter profiling specific test cases - rename ex_handler -> exception_handler - fix linux compile - Revert "add too_many_traps check" This reverts commit bee05534777dc2caf10362f66fea90a06705a144. - add too_many_traps check - Remove has_monitors fix - Merge branch 'master' into PruneDeadCatchBlocks - Only use ProfileExceptionHandlers - drop ProfileExceptionHandlers flag - ... and 35 more: https://git.openjdk.org/jdk/compare/0bef855c...1f4eff37 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16799/files - new: https://git.openjdk.org/jdk/pull/16799/files/1f4eff37..1f4eff37 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16799&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16799&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16799.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16799/head:pull/16799 PR: https://git.openjdk.org/jdk/pull/16799 From jvernee at openjdk.org Tue Nov 28 11:00:21 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 28 Nov 2023 11:00:21 GMT Subject: RFR: 8320310: CompiledMethod::has_monitors flag can be incorrect [v3] In-Reply-To: <4efExybeWDkEbcsckI1Qdz8kpYFqd-Rbmt7oiWz5qlo=.d8d38d0e-affa-48dc-b963-45f958041c4e@github.com> References: <4efExybeWDkEbcsckI1Qdz8kpYFqd-Rbmt7oiWz5qlo=.d8d38d0e-affa-48dc-b963-45f958041c4e@github.com> Message-ID: <8jwOiEYD0PlFdghlYJle6uBZEzGHv6zlG2dd7tr5QjU=.378c87df-6452-40af-a645-156e55719b3f@github.com> > Currently, the `CompiledMethod::has_monitors` flag is set when either a `monitorenter` is parsed by C1, and `monitorexit` is parsed by C1 or C2 during method compilation. However, not necessarily every bytecode of a method is parsed, which means that we could miss all `monitorenter`/`monitorexit` byte codes in a method, while it actually does use monitors. This can lead to situations where a thread holds a monitor, but `has_monitors` for all frames is set to `false`, leading to an assertion failure in 'freeze_internal' in continuationFreezeThaw.cpp: > > assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), > "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int64_t)current->held_monitor_count(), (int64_t)current->jni_monitor_count()); > > The proposed fix is to rely on `Method::has_monitor_bytecodes` to set the `has_monitors` flag when compiling, which is immune to issues where not all byte codes of a method are parsed during compilation. We can follow the pattern established for `has_reserved_stack_access`, which is similar. > > Note that this PR is based on: https://github.com/openjdk/jdk/pull/16416 which disables the assertion. The goal of this PR is to fix the issue, and then re-enable the assertion. > > Testing: Tier 1-4, `java/lang/Thread/virtual/stress/PinALot.java` Jorn Vernee has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 46 commits: - Merge branch 'master' into Has_Monitors - fix has_monitors tracking. Re-enable assert - add interpreter profiling specific test cases - rename ex_handler -> exception_handler - fix linux compile - Revert "add too_many_traps check" This reverts commit bee05534777dc2caf10362f66fea90a06705a144. - add too_many_traps check - Remove has_monitors fix - Merge branch 'master' into PruneDeadCatchBlocks - Only use ProfileExceptionHandlers - ... and 36 more: https://git.openjdk.org/jdk/compare/4bcda602...85b2d662 ------------- Changes: https://git.openjdk.org/jdk/pull/16799/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16799&range=02 Stats: 31 lines in 4 files changed: 9 ins; 17 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16799.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16799/head:pull/16799 PR: https://git.openjdk.org/jdk/pull/16799 From ogillespie at openjdk.org Tue Nov 28 11:03:44 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Tue, 28 Nov 2023 11:03:44 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v12] In-Reply-To: References: Message-ID: > Attempt to fix regressions in class-loading performance caused by fixing a symbol table leak in [JDK-8313678](https://bugs.openjdk.org/browse/JDK-8313678). > > See lengthy discussion in https://bugs.openjdk.org/browse/JDK-8315559 for more background. In short, the leak was providing an accidental cache for temporary symbols, allowing reuse. > > This change keeps new temporary symbols alive in a queue for a short time, allowing them to be re-used by subsequent operations. For example, when attempting to load a class we may call JVM_FindLoadedClass for multiple classloaders back-to-back, and all of them will create a TempNewSymbol for the same string. At present, each call will leave a dead symbol in the table and create a new one. Dead symbols add cleanup and lookup overhead, and insertion is also costly. With this change, the symbol from the first invocation will live for some time after it is used, and subsequent callers can find the symbol alive in the table - avoiding the extra work. > > The queue is bounded, and when full new entries displace the oldest entry. This means symbols are held for the time it takes for 100 new temp symbols to be created. 100 is chosen arbitrarily - the tradeoff is memory usage versus 'cache' hit rate. > > When concurrent symbol table cleanup runs, it also drains the queue. > > In my testing, this brings Dacapo pmd performance back to where it was before the leak was fixed. > > Thanks @shipilev , @coleenp and @MDBijman for helping with this fix. Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: Add precompiled.hpp for Windows builds ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16398/files - new: https://git.openjdk.org/jdk/pull/16398/files/d83ea056..059346aa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16398&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16398&range=10-11 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16398.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16398/head:pull/16398 PR: https://git.openjdk.org/jdk/pull/16398 From ogillespie at openjdk.org Tue Nov 28 11:03:46 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Tue, 28 Nov 2023 11:03:46 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v10] In-Reply-To: References: <6UvVZPlnf_cSgctUVPspR767dHLa7vxOA4DiC7fJ3LY=.35e24443-ab4d-4240-a991-2d98d120bb3b@github.com> Message-ID: On Mon, 27 Nov 2023 22:49:57 GMT, Kim Barrett wrote: >> I don't feel too strongly either way but someone else previously suggested draining during the periodic task so I added it. >> The benefit is not leaving Symbols hanging around in the queue indefinitely (though granted, a fixed number of them, so the memory waste is limited). The downside is a small piece of added code and work on the periodic task. > > I didn't find any discussion of whether draining is needed in this PR, and draining is in the initial commit. > Other downsides include the need to test that feature and the impact that feature has on testing other parts > of this change. Unless someone argues for it, I'd prefer to see it removed. Yes, there is one additional test for the draining but it's quite simple. The drain feature actually helps with the other tests, that's how we can avoid the queue interfering with ref counts for the tests (create temp symbol, immediately drain queue), so even if we don't drain periodically I'd still leave the feature there for the existing tests. What about the fact that not draining the queue means we could keep 128 symbols alive indefinitely? Are you not concerned because that's a small amount, and/or because natural churn will likely keep the queue moving? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1407582112 From stuefe at openjdk.org Tue Nov 28 11:11:08 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 28 Nov 2023 11:11:08 GMT Subject: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2] In-Reply-To: References: <8-buFPL9W3149qcnluk_XqTQr-cJYqu_XvwU5ovyAIA=.396e5005-f896-48b9-919c-94164229d7bf@github.com> <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> Message-ID: On Mon, 27 Nov 2023 15:44:58 GMT, suchismith1993 wrote: > > > > i would have to repeat the line 1132 and 1139 in os_aix.cpp again , if the condition fails for .so files, because i have to reload it again and check if the .a exists. In the shared code i had repeat less number of lines i believe. Do you suggest moving lines 1132 to 1139 to another function then ? > > > > > > > > > @tstuefe Any suggestion on this ? > > > > > > ``` > > --- a/src/hotspot/os/aix/os_aix.cpp > > +++ b/src/hotspot/os/aix/os_aix.cpp > > @@ -1108,7 +1108,7 @@ bool os::dll_address_to_library_name(address addr, char* buf, > > return true; > > } > > > > -void *os::dll_load(const char *filename, char *ebuf, int ebuflen) { > > +static void* dll_load_inner(const char *filename, char *ebuf, int ebuflen) { > > > > log_info(os)("attempting shared library load of %s", filename); > > > > @@ -1158,6 +1158,35 @@ void *os::dll_load(const char *filename, char *ebuf, int ebuflen) { > > return nullptr; > > } > > > > +void* os::dll_load(const char *filename, char *ebuf, int ebuflen) { > > + > > + void* result = nullptr; > > + > > + // First try using *.so suffix; failing that, retry with *.a suffix. > > + const size_t len = strlen(filename); > > + constexpr size_t safety = 3 + 1; > > + constexpr size_t bufsize = len + safety; > > + char* buf = NEW_C_HEAP_ARRAY(char, bufsize, mtInternal); > > + strcpy(buf, filename); > > + char* const dot = strrchr(buf, '.'); > > + > > + assert(dot != nullptr, "Attempting to load a shared object without extension? %s", filename); > > + assert(strcmp(dot, ".a") == 0 || strcmp(dot, ".so") == 0, > > + "Attempting to load a shared object that is neither *.so nor *.a", filename); > > + > > + sprintf(dot, ".so"); > > + result = dll_load_inner(buf, ebuf, ebuflen); > > + > > + if (result == nullptr) { > > + sprintf(dot, ".a"); > > + result = dll_load_inner(buf, ebuf, ebuflen); > > + } > > + > > + FREE_C_HEAP_ARRAY(char, buf); > > + > > + return result; > > +} > > + > > ``` > > @tstuefe as discussed with @TheRealMDoerr do you think using default argument will help ? Either we pass agent object as 3rd parameter or an empty character buffer(and not const chat*) which would be spcifically used to copy the alternate filename to it using strcpy so that it is reflected in the jvmagent code ? A third parameter for what? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1829602040 From duke at openjdk.org Tue Nov 28 11:30:09 2023 From: duke at openjdk.org (suchismith1993) Date: Tue, 28 Nov 2023 11:30:09 GMT Subject: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2] In-Reply-To: References: <8-buFPL9W3149qcnluk_XqTQr-cJYqu_XvwU5ovyAIA=.396e5005-f896-48b9-919c-94164229d7bf@github.com> <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> Message-ID: On Tue, 28 Nov 2023 11:08:43 GMT, Thomas Stuefe wrote: > > > > > i would have to repeat the line 1132 and 1139 in os_aix.cpp again , if the condition fails for .so files, because i have to reload it again and check if the .a exists. In the shared code i had repeat less number of lines i believe. Do you suggest moving lines 1132 to 1139 to another function then ? > > > > > > > > > > > > @tstuefe Any suggestion on this ? > > > > > > > > > ``` > > > --- a/src/hotspot/os/aix/os_aix.cpp > > > +++ b/src/hotspot/os/aix/os_aix.cpp > > > @@ -1108,7 +1108,7 @@ bool os::dll_address_to_library_name(address addr, char* buf, > > > return true; > > > } > > > > > > -void *os::dll_load(const char *filename, char *ebuf, int ebuflen) { > > > +static void* dll_load_inner(const char *filename, char *ebuf, int ebuflen) { > > > > > > log_info(os)("attempting shared library load of %s", filename); > > > > > > @@ -1158,6 +1158,35 @@ void *os::dll_load(const char *filename, char *ebuf, int ebuflen) { > > > return nullptr; > > > } > > > > > > +void* os::dll_load(const char *filename, char *ebuf, int ebuflen) { > > > + > > > + void* result = nullptr; > > > + > > > + // First try using *.so suffix; failing that, retry with *.a suffix. > > > + const size_t len = strlen(filename); > > > + constexpr size_t safety = 3 + 1; > > > + constexpr size_t bufsize = len + safety; > > > + char* buf = NEW_C_HEAP_ARRAY(char, bufsize, mtInternal); > > > + strcpy(buf, filename); > > > + char* const dot = strrchr(buf, '.'); > > > + > > > + assert(dot != nullptr, "Attempting to load a shared object without extension? %s", filename); > > > + assert(strcmp(dot, ".a") == 0 || strcmp(dot, ".so") == 0, > > > + "Attempting to load a shared object that is neither *.so nor *.a", filename); > > > + > > > + sprintf(dot, ".so"); > > > + result = dll_load_inner(buf, ebuf, ebuflen); > > > + > > > + if (result == nullptr) { > > > + sprintf(dot, ".a"); > > > + result = dll_load_inner(buf, ebuf, ebuflen); > > > + } > > > + > > > + FREE_C_HEAP_ARRAY(char, buf); > > > + > > > + return result; > > > +} > > > + > > > ``` > > > > > > @tstuefe as discussed with @TheRealMDoerr do you think using default argument will help ? Either we pass agent object as 3rd parameter or an empty character buffer(and not const chat*) which would be spcifically used to copy the alternate filename to it using strcpy so that it is reflected in the jvmagent code ? > > A third parameter for what? @tstuefe 3rd parameter to pass the either of 2 things: 1. The JvmTiAgent object "agent", so that after shifting the save_library_signature to os_aix,we can still access the agent object.-> For this i tried importing jvm/prims header file, but i get segmentation faults during build . Not sure if i am doing it the right way. 2. Pass a character buffer(and not const char*) where we copy the modified filename back to it and then use it in jvmAgent. code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1829634847 From stuefe at openjdk.org Tue Nov 28 12:51:09 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 28 Nov 2023 12:51:09 GMT Subject: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2] In-Reply-To: References: <8-buFPL9W3149qcnluk_XqTQr-cJYqu_XvwU5ovyAIA=.396e5005-f896-48b9-919c-94164229d7bf@github.com> <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> Message-ID: <7FfEZmI1lotj-z6P6mJtk-jH7vfiq_mO0EYtW2iHuGI=.033a826a-7083-48dc-882a-2ded7b8b0da1@github.com> On Tue, 28 Nov 2023 11:27:33 GMT, suchismith1993 wrote: > > > > @tstuefe 3rd parameter to pass the either of 2 things: > > 1. The JvmTiAgent object "agent", so that after shifting the save_library_signature to os_aix,we can still access the agent object.-> For this i tried importing jvm/prims header file, but i get segmentation faults during build . Not sure if i am doing it the right way. > > 2. Pass a character buffer(and not const char*) where we copy the modified filename back to it and then use it in jvmAgent. code. Does not sound really appealing tbh. We pile one hack atop of another. Please synchronize with @JoKern65 at SAP. He will rewrite the JVMTI handler code, which will make this point moot. See https://bugs.openjdk.org/browse/JDK-8320890. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1829776298 From coleenp at openjdk.org Tue Nov 28 13:00:20 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 28 Nov 2023 13:00:20 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v4] In-Reply-To: <4DLEivE5wKT1xj9lRvcxxxPWeGLZ6y-q6b7sqa3csGg=.d6a0d8dd-5cca-4ab8-8a0a-6345b3b667ef@github.com> References: <4DLEivE5wKT1xj9lRvcxxxPWeGLZ6y-q6b7sqa3csGg=.d6a0d8dd-5cca-4ab8-8a0a-6345b3b667ef@github.com> Message-ID: On Tue, 28 Nov 2023 00:07:06 GMT, Coleen Phillimore wrote: >> I don't remember why they were TempNewSymbol either but I don't think it matters for this test, they can be Symbol*. > > I still don't think it matters for the test, but this seems to be an effective workaround. Rather than having new_stable_temp_symbol here, you could make them Symbol* like your previous change and then explicitly decrement the refcount for A, D and interf at the end of the test. This seems like it would be cleaner and not have to interact with the SymbolTableCleaner code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1407719187 From duke at openjdk.org Tue Nov 28 13:02:07 2023 From: duke at openjdk.org (suchismith1993) Date: Tue, 28 Nov 2023 13:02:07 GMT Subject: RFR: JDK-8320005 : Native library suffix impact on hotspot code in AIX [v2] In-Reply-To: <7FfEZmI1lotj-z6P6mJtk-jH7vfiq_mO0EYtW2iHuGI=.033a826a-7083-48dc-882a-2ded7b8b0da1@github.com> References: <8-buFPL9W3149qcnluk_XqTQr-cJYqu_XvwU5ovyAIA=.396e5005-f896-48b9-919c-94164229d7bf@github.com> <5RZicS1WS5xiFzcJMhxg_Gjrtdc2I1c4vNMMb37OK-4=.e4ba7692-b18a-4b91-9b35-e444710e38b1@github.com> <7FfEZmI1lotj-z6P6mJtk-jH7vfiq_mO0EYtW2iHuGI=.033a826a-7083-48dc-882a-2ded7b8b0da1@github.com> Message-ID: On Tue, 28 Nov 2023 12:48:14 GMT, Thomas Stuefe wrote: > > > > > > > > > @tstuefe 3rd parameter to pass the either of 2 things: > > ``` > > 1. The JvmTiAgent object "agent", so that after shifting the save_library_signature to os_aix,we can still access the agent object.-> For this i tried importing jvm/prims header file, but i get segmentation faults during build . Not sure if i am doing it the right way. > > > > 2. Pass a character buffer(and not const char*) where we copy the modified filename back to it and then use it in jvmAgent. code. > > ``` > > Does not sound really appealing tbh. We pile one hack atop of another. > > Please synchronize with @JoKern65 at SAP. He will rewrite the JVMTI handler code, which will make this point moot. See https://bugs.openjdk.org/browse/JDK-8320890. Hi @tstuefe Should i then wait for this code to be integrated and then rewrite the .a handling ? I mean this PR shall remain open then right ? @JoKern65 Are you even handling the .a handling case ? i would like this PR to stay open. Maybe i can wait for the design change that you are working on. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16604#issuecomment-1829793568 From ogillespie at openjdk.org Tue Nov 28 13:05:23 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Tue, 28 Nov 2023 13:05:23 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v4] In-Reply-To: References: <4DLEivE5wKT1xj9lRvcxxxPWeGLZ6y-q6b7sqa3csGg=.d6a0d8dd-5cca-4ab8-8a0a-6345b3b667ef@github.com> Message-ID: On Tue, 28 Nov 2023 12:57:38 GMT, Coleen Phillimore wrote: >> I still don't think it matters for the test, but this seems to be an effective workaround. > > Rather than having new_stable_temp_symbol here, you could make them Symbol* like your previous change and then explicitly decrement the refcount for A, D and interf at the end of the test. This seems like it would be cleaner and not have to interact with the SymbolTableCleaner code. That's a better idea, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1407725720 From coleenp at openjdk.org Tue Nov 28 13:10:21 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 28 Nov 2023 13:10:21 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v10] In-Reply-To: References: <6UvVZPlnf_cSgctUVPspR767dHLa7vxOA4DiC7fJ3LY=.35e24443-ab4d-4240-a991-2d98d120bb3b@github.com> Message-ID: <_RAExxFbz54yP3mPtWmg-j-J5bJEIYutoSZMQExut3s=.9da4332b-c026-4739-a3ca-174d63f13250@github.com> On Tue, 28 Nov 2023 10:58:08 GMT, Oli Gillespie wrote: >> I didn't find any discussion of whether draining is needed in this PR, and draining is in the initial commit. >> Other downsides include the need to test that feature and the impact that feature has on testing other parts >> of this change. Unless someone argues for it, I'd prefer to see it removed. > > Yes, there is one additional test for the draining but it's quite simple. The drain feature actually helps with the other tests, that's how we can avoid the queue interfering with ref counts for the tests (create temp symbol, immediately drain queue), so even if we don't drain periodically I'd still leave the feature there for the existing tests. > > What about the fact that not draining the queue means we could keep 128 symbols alive indefinitely? Are you not concerned because that's a small amount, and/or because natural churn will likely keep the queue moving? I think the footprint savings is probably negligible. I like the idea that the queue is periodically drained because it might show issues where we've messed up the refcounts, but they'd be hard to debug, so maybe not worth it. So I see neither the harm nor the benefit of draining the queue in the periodic task, but we should have the feature for the one test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1407730442 From aph at openjdk.org Tue Nov 28 13:12:32 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 28 Nov 2023 13:12:32 GMT Subject: RFR: 8320709: AArch64: Vectorized Poly1305 intrinsics [v2] In-Reply-To: References: Message-ID: > Vectorizing Poly1305 is quite tricky. We already have a highly- > efficient scalar Poly1305 implementation that runs on the core integer > unit, but it's highly serialized, so it does not make make good use of > the parallelism available. > > The scalar implementation takes advantage of some particular features > of the Poly1305 keys. In particular, certain bits of r, the secret > key, are required to be 0. These make it possible to use a full > 64-bit-wide multiply-accumulate operation without needing to process > carries between partial products, > > While this works well for a serial implementation, a parallel > implementation cannot do this because rather than multiplying by r, > each step multiplies by some integer power of r, modulo > 2^130-5. > > In order to avoid processing carries between partial products we use a > redundant representation, in which each 130-bit integer is encoded > either as a 5-digit integer in base 2^26 or as a 3-digit integer in > base 2^52, depending on whether we are using a 64- or 32-bit > multiply-accumulate. > > In AArch64 Advanced SIMD, there is no 64-bit multiply-accumulate > operation available to us, so we must use 32*32 -> 64-bit operations. > > In order to achieve maximum performance we'd like to get close to the > processor's decode bandwidth, so that every clock cycle does something > useful. In a typical high-end AArch64 implementation, the core integer > unit has a fast 64-bit multiplier pipeline and the ASIMD unit has a > fast(ish) two-way 32-bit multiplier, which may be slower than than the > core integer unit's. It is not at all obvious whether it's best to use > ASIMD or core instructions. > > Fortunately, if we have a wide-bandwidth instruction decode, we can do > both at the same time, by feeding alternating instructions to the core > and the ASIMD units. This also allows us to make good use of all of > the available core and ASIMD registers, in parallel. > > To do this we use generators, which here are a kind of iterator that > emits a group of instructions each time it is called. In this case we > 4 parallel generators, and by calling them alternately we interleave > the ASIMD and the core instructions. We also take care to ensure that > each generator finishes at about the same time, to maximize the > distance between instructions which generate and consume data. > > The results are pretty good, ranging from 2* - 3* speedup. It is > possible that a pure in-order processor (Raspberry Pi?) might be at > some disadvantage because more work is being done even though it is > highly parallel, b... Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: remove debug code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16812/files - new: https://git.openjdk.org/jdk/pull/16812/files/37f46caa..f2e9c8e5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16812&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16812&range=00-01 Stats: 80 lines in 1 file changed: 0 ins; 80 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16812.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16812/head:pull/16812 PR: https://git.openjdk.org/jdk/pull/16812 From fyang at openjdk.org Tue Nov 28 13:27:11 2023 From: fyang at openjdk.org (Fei Yang) Date: Tue, 28 Nov 2023 13:27:11 GMT Subject: RFR: 8318227: RISC-V: C2 ConvHF2F [v2] In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 13:29:33 GMT, Hamlin Li wrote: >> Hi, >> Can you review the patch to add ConvHF2F intrinsic to JDK for riscv? >> Thanks! >> >> (By latest kernel patch, `#define RISCV_HWPROBE_EXT_ZFH (1 << 27)` >> https://lore.kernel.org/lkml/20231114141256.126749-11-cleger at rivosinc.com/) >> >> ## Test >> ### Functionality >> #### hotspot tests >> test/hotspot/jtreg/compiler/intrinsics/ >> test/hotspot/jtreg/compiler/c2/irTests >> >> #### jdk tests >> test/jdk/java/lang/Float/Binary16Conversion*.java >> >> ### Performance >> tested on licheepi. >> >> #### with UseZfh enabled & stub out-of-band >> >> Benchmark (size) Mode Cnt Score Error Units >> Fp16ConversionBenchmark.float16ToFloat 2048 avgt 10 3493.376 ? 18.631 ns/op >> Fp16ConversionBenchmark.float16ToFloatMemory 2048 avgt 10 19.819 ? 0.193 ns/op >> >> >> #### with UseZfh enabled only >> (i.e. enable the intrinsic) >> >> Benchmark (size) Mode Cnt Score Error Units >> Fp16ConversionBenchmark.float16ToFloat 2048 avgt 10 4659.796 ? 13.262 ns/op >> Fp16ConversionBenchmark.float16ToFloatMemory 2048 avgt 10 22.957 ? 0.098 ns/op >> >> >> #### with UseZfh disabled >> (i.e. disable the intrinsic) >> >> Benchmark (size) Mode Cnt Score Error Units >> Fp16ConversionBenchmark.float16ToFloat 2048 avgt 10 22930.591 ? 72.595 ns/op >> Fp16ConversionBenchmark.float16ToFloatMemory 2048 avgt 10 25.970 ? 0.063 ns/op > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > optimize perf with stub out-of-line src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1704: > 1702: // check whether it's a NaN. > 1703: mv(t0, 0x7c00); > 1704: andr(tmp, src, t0); I see from the exponent encoding of float16 on [1], it could be a negative/positive infinity as well when exponent is 0b11111. It depends on whether the significand is zero or not. So it this checking for NAN sufficient? [1] https://en.wikipedia.org/wiki/Half-precision_floating-point_format src/hotspot/cpu/riscv/riscv.ad line 8288: > 8286: __ float16_to_float($dst$$FloatRegister, $src$$Register, $tmp$$Register); > 8287: %} > 8288: ins_pipe(fp_f2i); Seems we should use `ins_pipe(pipe_slow)` here as this emits multiple instructions. src/hotspot/os_cpu/linux_riscv/riscv_hwprobe.cpp line 52: > 50: #define RISCV_HWPROBE_EXT_ZBB (1 << 4) > 51: #define RISCV_HWPROBE_EXT_ZBS (1 << 5) > 52: #define RISCV_HWPROBE_EXT_ZFH (1 << 27) Will this change in future? Seems it's still not there in the kernel source yet [1]. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/riscv/include/uapi/asm/hwprobe.h?h=v6.7-rc3 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16802#discussion_r1407751915 PR Review Comment: https://git.openjdk.org/jdk/pull/16802#discussion_r1407588997 PR Review Comment: https://git.openjdk.org/jdk/pull/16802#discussion_r1407610639 From stuefe at openjdk.org Tue Nov 28 13:37:18 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 28 Nov 2023 13:37:18 GMT Subject: RFR: JDK-8319437: NMT should show library names in call stacks In-Reply-To: References: Message-ID: <68Bu6xdmV9ftpRwSrOSDLuxmkaTJHWQvDBzhrTon-Iw=.4bc6f1fe-929d-4bd0-8102-90e86992ad67@github.com> On Mon, 27 Nov 2023 16:22:27 GMT, Zhengyu Gu wrote: >> With this tiny enhancement, NMT shows library names in callstacks. > > LGTM Thanks @zhengyu123 and @dholmes-ora ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16508#issuecomment-1829852863 From stuefe at openjdk.org Tue Nov 28 13:37:19 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 28 Nov 2023 13:37:19 GMT Subject: Integrated: JDK-8319437: NMT should show library names in call stacks In-Reply-To: References: Message-ID: On Sun, 5 Nov 2023 06:28:11 GMT, Thomas Stuefe wrote: > With this tiny enhancement, NMT shows library names in callstacks. This pull request has now been integrated. Changeset: e33b6c10 Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/e33b6c10f8d3244ec2f4204cd4de404e0e0686eb Stats: 21 lines in 1 file changed: 15 ins; 0 del; 6 mod 8319437: NMT should show library names in call stacks Reviewed-by: dholmes, zgu ------------- PR: https://git.openjdk.org/jdk/pull/16508 From mbaesken at openjdk.org Tue Nov 28 13:38:20 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 28 Nov 2023 13:38:20 GMT Subject: RFR: JDK-8320383: refresh libraries cache on AIX in VMError::report [v3] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 14:46:28 GMT, Matthias Baesken wrote: >> VMError::report outputs information about loaded shared libraries; but this info could be outdated on AIX because the libraries cache is currently not refreshed before printing. >> This is similar to [JDK-8318587](https://bugs.openjdk.org/browse/JDK-8318587) . >> The refresh in VMError::report could be omitted in some situations where the refresh call is considered problematic. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > use new method also in print_vm_info Hi Thomas and David, > Both IBM and SAP stepped up their AIX efforts because the AIX port is still needed. More eyes find more issues. Yes this is true, we at SAP started again this year to run tests on AIX using jdk-head and found those issues on AIX. Windows **might** have similar needs, but so far I saw the issues with outdated lib cache only on AIX, that's why I did the change on this platform. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16730#issuecomment-1829859521 From qamai at openjdk.org Tue Nov 28 13:45:11 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 28 Nov 2023 13:45:11 GMT Subject: RFR: 8318227: RISC-V: C2 ConvHF2F [v2] In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 13:23:52 GMT, Hamlin Li wrote: >> https://github.com/openjdk/jdk/blob/6aa197667ad05bd93adf3afc7b06adbfb2b18a22/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp#L4307 >> >> Note that the stub will still reside in the code section of the current method, is a trampoline needed in that case? > > Thanks for pointing to the location! > > It DOES bring better performance. Please check the pr description for detailed data. Thanks, it generally looks good, but I'm not familiar with RISC-V assembly so I will leave the approval for the others. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16802#discussion_r1407783405 From mli at openjdk.org Tue Nov 28 13:49:37 2023 From: mli at openjdk.org (Hamlin Li) Date: Tue, 28 Nov 2023 13:49:37 GMT Subject: RFR: 8318227: RISC-V: C2 ConvHF2F [v2] In-Reply-To: References: Message-ID: On Tue, 28 Nov 2023 11:04:01 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> optimize perf with stub out-of-line > > src/hotspot/cpu/riscv/riscv.ad line 8288: > >> 8286: __ float16_to_float($dst$$FloatRegister, $src$$Register, $tmp$$Register); >> 8287: %} >> 8288: ins_pipe(fp_f2i); > > Seems we should use `ins_pipe(pipe_slow)` here as this emits multiple instructions. In fact, I'm not quite sure. I see in the ad file: pipe_class pipe_slow() %{ instruction_count(10); and, all instruct's with `pipe_slow` are related to cmpxchg, which indeed involve lots of instructions in common case. But for `float16_to_float`, in normal case, there is at most 5 instructions; only the rare case `NaN` involves more instructions. Please let me know how do you think about it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16802#discussion_r1407789888 From duke at openjdk.org Tue Nov 28 13:50:50 2023 From: duke at openjdk.org (ArsenyBochkarev) Date: Tue, 28 Nov 2023 13:50:50 GMT Subject: RFR: 8317721: RISC-V: Implement CRC32 intrinsic Message-ID: Hi all! Please review this port of `_updateBytesCRC32` intrinsic from [AArch64](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4224) (for now, plain version). ### Correctness checks Tests `test/hotspot/jtreg/compiler/codegen/CRCTest.java` and `test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java` are passed, in both of the `_updateBytesCRC32` was used. ### Performance results on T-Head board ##### Intrinsic disabled (`-XX:-UseCRC32Intrinsics` flag) | Benchmark | (count) | Mode | Cnt | Score | Error | Units | | ------------------------------- | --- | --- | --- | --- | --- | --- | | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 15 | 768.458 | ? 20.070 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 15 | 699.860 | ? 44.533 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 15 | 558.778 | ? 5.119 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 15 | 420.209 | ? 4.384 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 15 | 166.945 | ? 0.817 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 15 | 25.212 | ? 0.036 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 15 | 6.222 | ? 0.040 | ops/ms | ##### Intrinsic enabled (`-XX:+UseCRC32Intrinsics` flag) | Benchmark | (count) | Mode | Cnt | Score | Error | Units | | ---- | ---- | ---- | ---- | --- | --- | --- | | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 15 | 7164.484 | ? 17.943 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 15 | 7065.546 | ? 178.694 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 15 | 7153.419 | ? 26.696 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 15 | 7008.298 | ? 235.055 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 15 | 6570.959 | ? 612.765 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 15 | 7166.674 | ? 6.639 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 15 | 6918.064 | ? 200.009 | ops/ms | ------------- Commit messages: - 8317721: RISC-V: Implement CRC32 intrinsic Changes: https://git.openjdk.org/jdk/pull/16850/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16850&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8317721 Stats: 526 lines in 8 files changed: 522 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16850.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16850/head:pull/16850 PR: https://git.openjdk.org/jdk/pull/16850 From duke at openjdk.org Tue Nov 28 14:02:27 2023 From: duke at openjdk.org (ArsenyBochkarev) Date: Tue, 28 Nov 2023 14:02:27 GMT Subject: Withdrawn: 8317721: RISC-V: Implement CRC32 intrinsic In-Reply-To: References: Message-ID: On Tue, 28 Nov 2023 13:46:07 GMT, ArsenyBochkarev wrote: > Hi all! Please review this port of `_updateBytesCRC32` intrinsic from [AArch64](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4224) (for now, plain version). > > ### Correctness checks > Tests `test/hotspot/jtreg/compiler/codegen/CRCTest.java` and `test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java` are passed, in both of the `_updateBytesCRC32` was used. > > ### Performance results on T-Head board > ##### Intrinsic disabled > > (`-XX:-UseCRC32Intrinsics` flag) > | Benchmark | (count) | Mode | Cnt | Score | Error | Units | > | ------------------------------- | --- | --- | --- | --- | --- | --- | > | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 15 | 768.458 | ? 20.070 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 15 | 699.860 | ? 44.533 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 15 | 558.778 | ? 5.119 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 15 | 420.209 | ? 4.384 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 15 | 166.945 | ? 0.817 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 15 | 25.212 | ? 0.036 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 15 | 6.222 | ? 0.040 | ops/ms | > > ##### Intrinsic enabled > > (`-XX:+UseCRC32Intrinsics` flag) > > | Benchmark | (count) | Mode | Cnt | Score | Error | Units | > | ---- | ---- | ---- | ---- | --- | --- | --- | > | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 15 | 7164.484 | ? 17.943 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 15 | 7065.546 | ? 178.694 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 15 | 7153.419 | ? 26.696 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 15 | 7008.298 | ? 235.055 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 15 | 6570.959 | ? 612.765 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 15 | 7166.674 | ? 6.639 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 15 | 6918.064 | ? 200.009 | ops/ms | This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/16850 From duke at openjdk.org Tue Nov 28 14:02:27 2023 From: duke at openjdk.org (ArsenyBochkarev) Date: Tue, 28 Nov 2023 14:02:27 GMT Subject: RFR: 8317721: RISC-V: Implement CRC32 intrinsic In-Reply-To: References: Message-ID: On Tue, 28 Nov 2023 13:46:07 GMT, ArsenyBochkarev wrote: > Hi all! Please review this port of `_updateBytesCRC32` intrinsic from [AArch64](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4224) (for now, plain version). > > ### Correctness checks > Tests `test/hotspot/jtreg/compiler/codegen/CRCTest.java` and `test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java` are passed, in both of the `_updateBytesCRC32` was used. > > ### Performance results on T-Head board > ##### Intrinsic disabled > > (`-XX:-UseCRC32Intrinsics` flag) > | Benchmark | (count) | Mode | Cnt | Score | Error | Units | > | ------------------------------- | --- | --- | --- | --- | --- | --- | > | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 15 | 768.458 | ? 20.070 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 15 | 699.860 | ? 44.533 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 15 | 558.778 | ? 5.119 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 15 | 420.209 | ? 4.384 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 15 | 166.945 | ? 0.817 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 15 | 25.212 | ? 0.036 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 15 | 6.222 | ? 0.040 | ops/ms | > > ##### Intrinsic enabled > > (`-XX:+UseCRC32Intrinsics` flag) > > | Benchmark | (count) | Mode | Cnt | Score | Error | Units | > | ---- | ---- | ---- | ---- | --- | --- | --- | > | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 15 | 7164.484 | ? 17.943 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 15 | 7065.546 | ? 178.694 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 15 | 7153.419 | ? 26.696 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 15 | 7008.298 | ? 235.055 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 15 | 6570.959 | ? 612.765 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 15 | 7166.674 | ? 6.639 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 15 | 6918.064 | ? 200.009 | ops/ms | Found an error, so closing this one for a while ------------- PR Comment: https://git.openjdk.org/jdk/pull/16850#issuecomment-1829903061 From pchilanomate at openjdk.org Tue Nov 28 14:07:07 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 28 Nov 2023 14:07:07 GMT Subject: RFR: 8320275: assert(_chunk->bitmap().at(index)) failed: Bit not set at index In-Reply-To: References: Message-ID: <0Yx_vbmI-BQfqvaOihewZRbC9ntN2pfXGBJVfGxmWwo=.1b2844d7-b9cf-4af5-b47f-6bd4ddb02d58@github.com> On Tue, 28 Nov 2023 00:09:10 GMT, Patricio Chilano Mateo wrote: > Please review the following fix. The assert fails while verifying the top frame of the stackChunk before returning from a thaw call. The stackChunk is in gc mode but we found a narrow oop for this c2 compiled frame that doesn't have its corresponding bit set. This is because while thawing its callee we cleared the bitmap range associated with the argument area, but this narrow oop happens to land at the very last stack slot of that region. > Loom code assumes the size of the argument area is always a multiple of 2 stack slots, as SharedRuntime::java_calling_convention() shows. But c2 doesn't seem to follow this convention and, knowing the last passed argument only takes one stack slot, it's using the remaining space to store a narrow oop for the caller. There are more details about the specific crash in JBS. > > The initial proposed fix is to just restrict the range of the bitmap we clear by excluding the last stack slot of the argument area, since passed oops are always word aligned. I've also experimented with a patch where I changed SharedRuntime::java_calling_convention() and Fingerprinter::do_type_calling_convention() to not round up the number of stack slots used, and then changed the callers to use a round up value or not depending on the needs [1]. I wasn't convinced it was worthy given we only care about this difference in this Loom code, but I don't mind going with that fix instead. The 3rd alternative would be to just change c2 to not use this stack slot and start spilling at a word aligned offset from the sp. > > I run the patch with the failing test and verified the crash doesn't reproduce anymore. I've also run this patch through loom tiers1-5. > > Thanks, > Patricio > > [1] https://github.com/pchilano/jdk/commit/42ae9269b28beb6f36c502182116545b680e418f Running some extra tests I see the callee can use the argument area to store data that is different from the one passed. This is actually something @fparain told me some time ago. So this simpler solution won't do. Before applying https://github.com/pchilano/jdk/commit/42ae9269b28beb6f36c502182116545b680e418f instead, @dean-long how about if we just prevent c2 from using this stack slot for the caller? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16837#issuecomment-1829909893 From jsjolen at openjdk.org Tue Nov 28 14:14:29 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 28 Nov 2023 14:14:29 GMT Subject: RFR: 8320750: Allow a testcase to run with muliple -Xlog In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 13:32:52 GMT, Leo Korinth wrote: > Running a testcase with muliple -Xlog crashes JTREG test cases. This is because `Collector.toMap` is not given a merge strategy. > > When the same argument is passed multiple times, I have added a merge strategy to use the latter value. This is similar to how it is implemented for `vm.opt.*` in JTREG. > > If the flag tested is `-Xlog`, replace the value part with a dummy value "NONEMPTY_TEST_SENTINEL". This is because in the case of multiple `-Xlog` all values are used, and JTREG does not give a satisfactory way to represent them. This dummy value should make it hard to try to `@require` on specific values by mistake. > > Tested with: > > @requires vm.opt.x.Xlog == "NONEMPTY_TEST_SENTINEL" > @requires vm.opt.x.Xlog == "NONEMPTY_TEST_SENTINELXXX" > @requires vm.opt.x.Xms == "3g" > > and > > JAVA_OPTIONS=-Xms3g -Xms4g > JAVA_OPTIONS=-Xms4g -Xms3g > JAVA_OPTIONS=-Xlog:gc* -Xlog:gc* > ``` > > Running tier1 Hi Leo, I'm sorry but I don't understand this change. What I do know is that `-Xlog` supports multiple arguments and that it is indeed required to use at least one argument when enabling async logging or first disabling stdout/stderr logging: >-Xlog:async -Xlog:gc=debug:file=gc.log -Xlog:safepoint=trace >-Xlog:disable -Xlog:safepoint=trace:safepointtrace.txt What exactly happens to a testcase which requires two arguments? I guess that we have none, as these would have crashed? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16824#issuecomment-1827923013 From lkorinth at openjdk.org Tue Nov 28 14:14:29 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Tue, 28 Nov 2023 14:14:29 GMT Subject: RFR: 8320750: Allow a testcase to run with muliple -Xlog In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 14:18:25 GMT, Johan Sj?len wrote: > What exactly happens to a testcase which requires two arguments? I guess that we have none, as these would have crashed? correct. This bug is introduce by me in https://bugs.openjdk.org/browse/JDK-8317228 where I added support to `@require` on `-X` flags. If someone is running a test case and manually adds multiple JAVA_OPTIONS of the same type: `-Xlog:async -Xlog:gc=debug:file=gc.log -Xlog:safepoint=trace` It would crash. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16824#issuecomment-1827946261 From dholmes at openjdk.org Tue Nov 28 14:14:31 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 28 Nov 2023 14:14:31 GMT Subject: RFR: 8320750: Allow a testcase to run with muliple -Xlog In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 14:30:02 GMT, Leo Korinth wrote: >> Hi Leo, >> >> I'm sorry but I don't understand this change. What I do know is that `-Xlog` supports multiple arguments and that it is indeed required to use at least one argument when enabling async logging or first disabling stdout/stderr logging: >> >>>-Xlog:async -Xlog:gc=debug:file=gc.log -Xlog:safepoint=trace >> >>>-Xlog:disable -Xlog:safepoint=trace:safepointtrace.txt >> >> What exactly happens to a testcase which requires two arguments? I guess that we have none, as these would have crashed? > >> What exactly happens to a testcase which requires two arguments? I guess that we have none, as these would have crashed? > > correct. > > This bug is introduce by me in https://bugs.openjdk.org/browse/JDK-8317228 where I added support to `@require` on `-X` flags. If someone is running a test case and manually adds multiple JAVA_OPTIONS of the same type: `-Xlog:async -Xlog:gc=debug:file=gc.log -Xlog:safepoint=trace` It would crash. Whoa! @lkorinth [8317228](https://github.com/openjdk/jdk/pull/15986#top) needed broader discussion for the changes to VMProps.java - what exactly is that change doing? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16824#issuecomment-1828690719 From lkorinth at openjdk.org Tue Nov 28 14:14:28 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Tue, 28 Nov 2023 14:14:28 GMT Subject: RFR: 8320750: Allow a testcase to run with muliple -Xlog Message-ID: Running a testcase with muliple -Xlog crashes JTREG test cases. This is because `Collector.toMap` is not given a merge strategy. When the same argument is passed multiple times, I have added a merge strategy to use the latter value. This is similar to how it is implemented for `vm.opt.*` in JTREG. If the flag tested is `-Xlog`, replace the value part with a dummy value "NONEMPTY_TEST_SENTINEL". This is because in the case of multiple `-Xlog` all values are used, and JTREG does not give a satisfactory way to represent them. This dummy value should make it hard to try to `@require` on specific values by mistake. Tested with: @requires vm.opt.x.Xlog == "NONEMPTY_TEST_SENTINEL" @requires vm.opt.x.Xlog == "NONEMPTY_TEST_SENTINELXXX" @requires vm.opt.x.Xms == "3g" and JAVA_OPTIONS=-Xms3g -Xms4g JAVA_OPTIONS=-Xms4g -Xms3g JAVA_OPTIONS=-Xlog:gc* -Xlog:gc* ``` Running tier1 ------------- Commit messages: - 8320750: Allow a testcase to run with muliple -Xlog Changes: https://git.openjdk.org/jdk/pull/16824/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16824&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320750 Stats: 10 lines in 1 file changed: 6 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16824.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16824/head:pull/16824 PR: https://git.openjdk.org/jdk/pull/16824 From lkorinth at openjdk.org Tue Nov 28 14:14:32 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Tue, 28 Nov 2023 14:14:32 GMT Subject: RFR: 8320750: Allow a testcase to run with muliple -Xlog In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 13:32:52 GMT, Leo Korinth wrote: > Running a testcase with muliple -Xlog crashes JTREG test cases. This is because `Collector.toMap` is not given a merge strategy. > > When the same argument is passed multiple times, I have added a merge strategy to use the latter value. This is similar to how it is implemented for `vm.opt.*` in JTREG. > > If the flag tested is `-Xlog`, replace the value part with a dummy value "NONEMPTY_TEST_SENTINEL". This is because in the case of multiple `-Xlog` all values are used, and JTREG does not give a satisfactory way to represent them. This dummy value should make it hard to try to `@require` on specific values by mistake. > > Tested with: > > @requires vm.opt.x.Xlog == "NONEMPTY_TEST_SENTINEL" > @requires vm.opt.x.Xlog == "NONEMPTY_TEST_SENTINELXXX" > @requires vm.opt.x.Xms == "3g" > > and > > JAVA_OPTIONS=-Xms3g -Xms4g > JAVA_OPTIONS=-Xms4g -Xms3g > JAVA_OPTIONS=-Xlog:gc* -Xlog:gc* > ``` > > Running tier1 I have been starting to change test cases to use `createTestJavaProcessBuilder` instead of `createLimitedTestJavaProcessBuilder` because we severely limit our testing when we use `createLimitedTestJavaProcessBuilder`. Before that change there were no way to add `@require` lines for `-X` options. Unfortunately I made a bug when I introduced that functionality. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16824#issuecomment-1829359779 From dholmes at openjdk.org Tue Nov 28 14:14:30 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 28 Nov 2023 14:14:30 GMT Subject: RFR: 8320750: Allow a testcase to run with muliple -Xlog In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 13:32:52 GMT, Leo Korinth wrote: > Running a testcase with muliple -Xlog crashes JTREG test cases. This is because `Collector.toMap` is not given a merge strategy. > > When the same argument is passed multiple times, I have added a merge strategy to use the latter value. This is similar to how it is implemented for `vm.opt.*` in JTREG. > > If the flag tested is `-Xlog`, replace the value part with a dummy value "NONEMPTY_TEST_SENTINEL". This is because in the case of multiple `-Xlog` all values are used, and JTREG does not give a satisfactory way to represent them. This dummy value should make it hard to try to `@require` on specific values by mistake. > > Tested with: > > @requires vm.opt.x.Xlog == "NONEMPTY_TEST_SENTINEL" > @requires vm.opt.x.Xlog == "NONEMPTY_TEST_SENTINELXXX" > @requires vm.opt.x.Xms == "3g" > > and > > JAVA_OPTIONS=-Xms3g -Xms4g > JAVA_OPTIONS=-Xms4g -Xms3g > JAVA_OPTIONS=-Xlog:gc* -Xlog:gc* > ``` > > Running tier1 I'm also having trouble understanding problem and solution here, but mainly because I don't understand what the jtreg code is supposed to be doing anyway. I'm surprised to see jtreg trying to streamline the set of flags that have been passed, I expect it to leave them alone and let the VM process them as it would normally do so. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16824#issuecomment-1828686250 From rehn at openjdk.org Tue Nov 28 14:20:40 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 28 Nov 2023 14:20:40 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 [v2] In-Reply-To: References: Message-ID: > Hi, please consider. > > Main author is @luhenry, I only fixed some minor things and tested it. > > Such as: > test/hotspot/jtreg/compiler/intrinsics/sha/ > test/jdk/java/security/MessageDigest/ > test/jdk/jdk/security/ > tier1 > > And still running some test. Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Flag fixes - Merge branch 'master' into sha256 - Share code - SHA-2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16562/files - new: https://git.openjdk.org/jdk/pull/16562/files/5cad30ff..3b2aeec8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16562&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16562&range=00-01 Stats: 638600 lines in 1706 files changed: 98952 ins; 474250 del; 65398 mod Patch: https://git.openjdk.org/jdk/pull/16562.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16562/head:pull/16562 PR: https://git.openjdk.org/jdk/pull/16562 From rehn at openjdk.org Tue Nov 28 14:20:41 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 28 Nov 2023 14:20:41 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 In-Reply-To: References: Message-ID: On Wed, 8 Nov 2023 14:47:07 GMT, Robbin Ehn wrote: > Hi, please consider. > > Main author is @luhenry, I only fixed some minor things and tested it. > > Such as: > test/hotspot/jtreg/compiler/intrinsics/sha/ > test/jdk/java/security/MessageDigest/ > test/jdk/jdk/security/ > tier1 > > And still running some test. An update, not all comments addressed yet. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16562#issuecomment-1829932625 From rehn at openjdk.org Tue Nov 28 14:20:49 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 28 Nov 2023 14:20:49 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 [v2] In-Reply-To: References: Message-ID: On Tue, 21 Nov 2023 08:23:25 GMT, Fei Yang wrote: >> The vsetivli is often expensive:ish, the code in openssl sets it five times before reaching first round. >> That don't seem like a good idea, now vsetivli make the code much easier to read yes... >> >> I guess I need to check numbers for that also.. :) > > Yeah. Why not consider something more simpler if there is no known big difference on performance numbers? And this is the first version when RVV-1.0 compatible hardwares are not popular yet :-) Not yet addressed. >> It seems like the correct answer is: >> `? Zvknhb supports SHA-256 and SHA-512.` >> >> I suggest we start with supporting Zvkn, which is: >> Zvkned, Zvknhb, Zvkb, Zvkt >> They require: Zve64x >> >> If someone have a CPU lacking something we can revisit it. >> >> (I *think* Zvknha is mainly for 32-bits, as it only require sew 32) > > Yeah, I agree it's more reasonable to check for `Zvkn` here which stands for NIST Algorithm Suite. > I see the vector cryptography spec says: > > The Zvknhb and Zvbc Vector Crypto Extensions --and accordingly the composite extensions Zvkn > and Zvks-- require a Zve64x base, or application ("V") base Vector Extension. > > My understanding is that either `Zve64x` (for the embeded) or RVV (as in our case) will do. > So we might want to do this check: `if (UseRVV && UseZvkn)`. > > (Or making enablement of `UseZvkn` dependent of `UseRVV`? Then only check one option `UseZvkn` here) Please add new comments in updated code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1407830547 PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1407829856 From rehn at openjdk.org Tue Nov 28 14:20:47 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 28 Nov 2023 14:20:47 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 [v2] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 07:08:04 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Flag fixes >> - Merge branch 'master' into sha256 >> - Share code >> - SHA-2 > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 3700: > >> 3698: Register ofs = c_rarg2; >> 3699: Register limit = c_rarg3; >> 3700: Register consts = t0; > > I would suggest choose a different temporary register for `consts`, maybe `t2`. Using x5 (t0) / x6 (t1) to keep some long-lived values like `consts` can be error prone. Those two are reserved scratch registers which could be explictly / implicitly clobberred by various assembler functions. Fixed > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4215: > >> 4213: __ vslidedown_vi(v16, v27, 2); // v16 = {_,_,e,f} >> 4214: // Merge elements [3..2] of v26 ({a,b}) into elements [3..2] of v16 >> 4215: __ vmerge_vvm(v16, v26, v16); // v16 = {a,b,e,f} > > Simlar here. Can we make use of index-load and index-store to simplify the code for the 512 case too? Not yet addressed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1407830224 PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1407830822 From rehn at openjdk.org Tue Nov 28 14:20:50 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 28 Nov 2023 14:20:50 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 [v2] In-Reply-To: References: Message-ID: On Thu, 9 Nov 2023 17:39:09 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 3917: >> >>> 3915: //-------------------------------------------------------------------------------- >>> 3916: // Quad-round 0 (+0, v10->v11->v12->v13) >>> 3917: __ vl1re32_v(v15, consts); >> >> Seems the round 0-11 are quite similar with each other, although with some difference in some src registers, but with similar patterns. >> Would it be possible and better to group them in a loop to simplify the code? or construct some functions. > > Seems also that generate_sha256_implCompress and generate_sha512_implCompress can share some code, looks like they are quite similar at a brief look. Now they share code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1407829086 From rehn at openjdk.org Tue Nov 28 14:20:51 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 28 Nov 2023 14:20:51 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 [v2] In-Reply-To: References: Message-ID: <3h8vvldj_w326e7TQ-OgSQBe14qXhAoxmXk_CIW8234=.2e193a35-d61c-4e7c-87b3-d831d5059f36@github.com> On Thu, 9 Nov 2023 17:35:50 GMT, Hamlin Li wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Flag fixes >> - Merge branch 'master' into sha256 >> - Share code >> - SHA-2 > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4348: > >> 4346: >> 4347: //-------------------------------------------------------------------------------- >> 4348: // Quad-round 0 (+0, v10->v11->v12->v13) > > similar comments as generate_sha256_implCompress about group the rounds in a loop. Now rounds share code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1407829476 From rehn at openjdk.org Tue Nov 28 14:20:52 2023 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 28 Nov 2023 14:20:52 GMT Subject: RFR: 8319716: RISC-V: Add SHA-2 [v2] In-Reply-To: References: <8rC40UxJC4IF9vdv6xIyaJl6l-fhAlRC0VezoUAuKYE=.bdc94ce5-d9d5-429c-bb38-701ffcbe0bcf@github.com> Message-ID: <_p8uUY5c6wTbMWgY5QA38ZpOqfhNzuFEGfJxZOVfxZA=.d9deba84-5b5c-416e-81d5-bb25972ba3e4@github.com> On Mon, 13 Nov 2023 13:47:26 GMT, Robbin Ehn wrote: >> I think that's what RISC-V profiles are for [1] which make some basic extensions mandatory for each profile. And we already have JVM options like `UseRVA20U64` and `UseRVA22U64` for riscv. But there are still some optional extensions for each profile, say RVV for RVA22U64. So instead of feeding a rather long march to the JVM, I feel it's more reasonable to have some JVM options at the extension level (instead of sub-extension level) as suggested by @robehn. >> >> Personally, I would suggest something slightly different. Say: >> "-XX:VectorCryptoExt=zvknhb", "-XX:VectorCryptoExt=zvknhb_zvkb", or "-XX:VectorCryptoExt=all" >> >> This way we will still be able to distinguish specific sub-extensions while keeping one JVM option for each extension/collection. >> >> [1] https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc > > Let's take it on the list: > https://mail.openjdk.org/pipermail/riscv-port-dev/2023-November/001211.html Yes, flags are messy. Please add new comments in the update. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16562#discussion_r1407828712 From shade at openjdk.org Tue Nov 28 14:21:20 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 28 Nov 2023 14:21:20 GMT Subject: RFR: 8320888: Shenandoah: Enable ShenandoahVerifyOptoBarriers in debug builds Message-ID: <4V4ijEJlqyoqZ7UjiX3613qsBPw5R4k9yv9lv1eqcaw=.aee775ba-a393-4f2c-9978-8aac011317f9@github.com> Flag cleanup. Current barrier verification code is opt-in, and it is selected for a few tests. For extra safety, we want to have it enabled by default in debug builds. This also simplifies test configurations. Additional testing: - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` - [ ] Linux x86_64 server fastdebug, `tier{1,2,3,4}` with `-XX:+UseShenandoahGC` - [ ] Linux AArch64 server fastdebug, `tier{1,2,3,4}` with `-XX:+UseShenandoahGC` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/16849/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16849&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320888 Stats: 39 lines in 4 files changed: 0 ins; 37 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16849.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16849/head:pull/16849 PR: https://git.openjdk.org/jdk/pull/16849 From ogillespie at openjdk.org Tue Nov 28 14:27:47 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Tue, 28 Nov 2023 14:27:47 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v13] In-Reply-To: References: Message-ID: > Attempt to fix regressions in class-loading performance caused by fixing a symbol table leak in [JDK-8313678](https://bugs.openjdk.org/browse/JDK-8313678). > > See lengthy discussion in https://bugs.openjdk.org/browse/JDK-8315559 for more background. In short, the leak was providing an accidental cache for temporary symbols, allowing reuse. > > This change keeps new temporary symbols alive in a queue for a short time, allowing them to be re-used by subsequent operations. For example, when attempting to load a class we may call JVM_FindLoadedClass for multiple classloaders back-to-back, and all of them will create a TempNewSymbol for the same string. At present, each call will leave a dead symbol in the table and create a new one. Dead symbols add cleanup and lookup overhead, and insertion is also costly. With this change, the symbol from the first invocation will live for some time after it is used, and subsequent callers can find the symbol alive in the table - avoiding the extra work. > > The queue is bounded, and when full new entries displace the oldest entry. This means symbols are held for the time it takes for 100 new temp symbols to be created. 100 is chosen arbitrarily - the tradeoff is memory usage versus 'cache' hit rate. > > When concurrent symbol table cleanup runs, it also drains the queue. > > In my testing, this brings Dacapo pmd performance back to where it was before the leak was fixed. > > Thanks @shipilev , @coleenp and @MDBijman for helping with this fix. Oli Gillespie has updated the pull request incrementally with two additional commits since the last revision: - Don't drain queue periodically - Avoid TempNewSymbol in placeholders test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16398/files - new: https://git.openjdk.org/jdk/pull/16398/files/059346aa..2e5dd556 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16398&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16398&range=11-12 Stats: 17 lines in 2 files changed: 5 ins; 9 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16398.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16398/head:pull/16398 PR: https://git.openjdk.org/jdk/pull/16398 From ogillespie at openjdk.org Tue Nov 28 14:27:47 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Tue, 28 Nov 2023 14:27:47 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v10] In-Reply-To: <_RAExxFbz54yP3mPtWmg-j-J5bJEIYutoSZMQExut3s=.9da4332b-c026-4739-a3ca-174d63f13250@github.com> References: <6UvVZPlnf_cSgctUVPspR767dHLa7vxOA4DiC7fJ3LY=.35e24443-ab4d-4240-a991-2d98d120bb3b@github.com> <_RAExxFbz54yP3mPtWmg-j-J5bJEIYutoSZMQExut3s=.9da4332b-c026-4739-a3ca-174d63f13250@github.com> Message-ID: On Tue, 28 Nov 2023 13:07:03 GMT, Coleen Phillimore wrote: >> Yes, there is one additional test for the draining but it's quite simple. The drain feature actually helps with the other tests, that's how we can avoid the queue interfering with ref counts for the tests (create temp symbol, immediately drain queue), so even if we don't drain periodically I'd still leave the feature there for the existing tests. >> >> What about the fact that not draining the queue means we could keep 128 symbols alive indefinitely? Are you not concerned because that's a small amount, and/or because natural churn will likely keep the queue moving? > > I think the footprint savings is probably negligible. I like the idea that the queue is periodically drained because it might show issues where we've messed up the refcounts, but they'd be hard to debug, so maybe not worth it. So I see neither the harm nor the benefit of draining the queue in the periodic task, but we should have the feature for the one test. I've removed drain from the concurrent cleanup. I left it public and tested, is that reasonable or do you prefer I make it private somehow (not sure the typical way to have test-only methods)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1407844595 From ogillespie at openjdk.org Tue Nov 28 14:27:49 2023 From: ogillespie at openjdk.org (Oli Gillespie) Date: Tue, 28 Nov 2023 14:27:49 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v10] In-Reply-To: References: <6UvVZPlnf_cSgctUVPspR767dHLa7vxOA4DiC7fJ3LY=.35e24443-ab4d-4240-a991-2d98d120bb3b@github.com> Message-ID: On Mon, 27 Nov 2023 18:04:17 GMT, Oli Gillespie wrote: >> test/hotspot/gtest/classfile/test_placeholders.cpp line 45: >> >>> 43: Symbol* D = SymbolTable::new_symbol("def2_8_2023_class"); >>> 44: Symbol* super = SymbolTable::new_symbol("super2_8_2023_supername"); >>> 45: Symbol* interf = SymbolTable::new_symbol("interface2_8_2023_supername"); >> >> This doesn't seem like the right way to update this test. Doesn't this leave the symbols dangling? >> And in the face of potential queue draining, it seems to me this could lead the test to intermittent failures. > > Updated to use the same 'create then drain' approach as the other test. Now just manually cleaning up as suggested by Coleen ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1407845585 From epeter at openjdk.org Tue Nov 28 14:29:26 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 Nov 2023 14:29:26 GMT Subject: RFR: 8306767: Concurrent repacking of extra data in MethodData is potentially unsafe Message-ID: I'm making sure that `allocate_bci_to_data` is only called when holding the `extra_data_lock`, so that no concurrent calls of it can ever occur. Testing: tier1-3 and stress. ------------- Commit messages: - add locks for jvmci calls to allocate_bci_to_data - 8306767 Changes: https://git.openjdk.org/jdk/pull/16840/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16840&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306767 Stats: 14 lines in 5 files changed: 10 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16840.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16840/head:pull/16840 PR: https://git.openjdk.org/jdk/pull/16840 From eosterlund at openjdk.org Tue Nov 28 14:32:23 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 28 Nov 2023 14:32:23 GMT Subject: RFR: 8306767: Concurrent repacking of extra data in MethodData is potentially unsafe In-Reply-To: References: Message-ID: On Tue, 28 Nov 2023 06:23:29 GMT, Emanuel Peter wrote: > I'm making sure that `allocate_bci_to_data` is only called when holding the `extra_data_lock`, so that no concurrent calls of it can ever occur. > > Testing: tier1-3 and stress. Looks good! ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16840#pullrequestreview-1753038432 From aph at openjdk.org Tue Nov 28 14:32:31 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 28 Nov 2023 14:32:31 GMT Subject: RFR: JDK-8320892: AArch64: Restore FPU control state after JNI Message-ID: <-Jv5Xvre3lonydwQ5uzYN3QB8V0VIuORIhM1RtIdW5g=.06167df9-7268-4945-8e18-a04d19ee97e1@github.com> Some buggy libraries corrupt the floating-point control register. Provide something similar to the x86 RestoreMXCSROnJNICalls. I realize that using the x86ish name "RestoreMXCSROnJNICalls" might be a little controversial, but it is a _global_ flag, not a CPU-specific one. And it's clearly intended for this purpose. It might have been better if that flag had been given a better name twentyish years ago, but we can't change it now. ------------- Commit messages: - JDK-8320892: AArch64: Restore FPU control state after JNI Changes: https://git.openjdk.org/jdk/pull/16851/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16851&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320892 Stats: 27 lines in 5 files changed: 25 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16851.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16851/head:pull/16851 PR: https://git.openjdk.org/jdk/pull/16851 From stuefe at openjdk.org Tue Nov 28 14:34:55 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 28 Nov 2023 14:34:55 GMT Subject: RFR: JDK-8320383: refresh libraries cache on AIX in VMError::report [v3] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 14:46:28 GMT, Matthias Baesken wrote: >> VMError::report outputs information about loaded shared libraries; but this info could be outdated on AIX because the libraries cache is currently not refreshed before printing. >> This is similar to [JDK-8318587](https://bugs.openjdk.org/browse/JDK-8318587) . >> The refresh in VMError::report could be omitted in some situations where the refresh call is considered problematic. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > use new method also in print_vm_info >>> @Stuefe this is the opposite to what you suggested for the AIX specific changes for static library loading. It was proposed there to implement an os abstraction and you rightly said no because it was an AIX only issue. I don't see this is any different. > > The reason I proposed this is that on Windows, we have code paths that are executed on demand on symbol decoding. Its exactly the same thing as on AIX: we need to refresh the loaded pdb list. This would fit well into this abstraction. > > But the Windows implementation of this abstraction is also empty! Is there some follow up to actually put this new abstraction into actual use? My recollection is that the Windows refresh worked fine in the Windows code. I think we have done everything possible. I don't know what would have been the correct way. Because we did not like the AIX ifdefs in shared code, Matthias replaced them with a reasonable os abstraction. The critique then was that this abstraction only exists for AIX. We then showed that this abstraction can be filled in other platforms too. Note that back in the day we carried a lot of Solaris-specific coding hiding behind abstractions that only ever were implemented on Solaris. We do this still today for Windows (e.g. the os::vm_allocation_granularity() <-> os::vm_page_size() duality only exists because of Windows). That seemed to have been no problem. Why is it a problem when we do the same for AIX? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16730#issuecomment-1829958678 From prappo at openjdk.org Tue Nov 28 14:40:55 2023 From: prappo at openjdk.org (Pavel Rappo) Date: Tue, 28 Nov 2023 14:40:55 GMT Subject: RFR: 8308715: Create a mechanism for Implicitly Declared Class javadoc Message-ID: Please review this PR to support _JEP 463 Implicitly Declared Classes and Instance Main Method (Second Preview)_ in JavaDoc. [JEP 463](https://openjdk.org/jeps/463) is the next iteration of [JEP 445](https://openjdk.org/jeps/445), which introduced the ability to have a source file as a launchable, "classless" method bag % cat HelloWorld.java /** Run me. */ void main() { print("Hello, world!"); } /** Shortcut for printing strings. */ void print(String arg) { System.out.println(arg); } which the compiler interprets as a familiar class final class HelloWorld { HelloWorld() { } /** Run me. */ void main() { print("Hello, world!"); } /** Shortcut for printing strings. */ void print(String arg) { System.out.println(arg); } } ### How JEP 445 works with JavaDoc today In JDK 21, javadoc can document such a file **without any changes to the javadoc tool**. The only thing that the user needs to do is to make sure that the following options are present: * `--enable-preview` and `--source=21` * `-package` The first pair of options tells javadoc to use preview features, which JEP 445 is one of. Without these preview-related options, javadoc will raise the following error: % javadoc --version javadoc 21 % javadoc HelloWorld.java -d /tmp/throwaway Loading source file HelloWorld.java... HelloWorld.java:2: error: unnamed classes are a preview feature and are disabled by default. void main() { ^ (use --enable-preview to enable unnamed classes) 1 error The second option, `-package`, tells javadoc to document classes that are public, protected, or declared with package access (colloquially known as "package private"). Without this option, javadoc will only document public and protected classes, which do not include the interpreted class: % javadoc --enable-preview --source=21 HelloWorld.java -d /tmp/throwaway Loading source file HelloWorld.java... Constructing Javadoc information... error: No public or protected classes found to document. 1 error When those additional options are present, javadoc does its job: % javadoc --enable-preview --source=21 -package HelloWorld.java -d /tmp/throwaway Loading source file HelloWorld.java... Constructing Javadoc information... Creating destination directory: "/tmp/throwaway/" Building index for all the packages and classes... Standard Doclet version 21+35-LTS-2513 Building tree for all the packages and classes... Generating /tmp/throwaway/HelloWorld.html... HelloWorld.java:7: warning: no @param for arg void print(String arg) { ^ HelloWorld.java:2: warning: no comment void main() { ^ HelloWorld.java:2: warning: use of default constructor, which does not provide a comment void main() { ^ Generating /tmp/throwaway/package-summary.html... Generating /tmp/throwaway/package-tree.html... Generating /tmp/throwaway/overview-tree.html... Building index for all classes... Generating /tmp/throwaway/allclasses-index.html... Generating /tmp/throwaway/allpackages-index.html... Generating /tmp/throwaway/index-all.html... Generating /tmp/throwaway/search.html... Generating /tmp/throwaway/index.html... Generating /tmp/throwaway/help-doc.html... 3 warnings However, the result does not feel quite right. Firstly, `-package` is too coarse. It includes all top-level classes and their elements, not just the implicit class from `HelloWorld.java`, its default constructor and methods, which are all declared with package access. Secondly, `HelloWorld.java` isn't a first-class citizen in javadoc. That latter fact can be seen from examining stdout and the output directory: 1. DocLint (compiler and javadoc) as well as javadoc itself issue unjust warnings: neither the implicit class nor its default constructor can be documented. The author either does not know about classes and constructors yet (on-ramp audience) or does not care about them (scripts/utilities audience). Additionally, because the class' AST node is at the same position as that of the first method declaration, the warning about the undocumented class can be confused with a warning on the first method being undocumented. 2. While such a file is documented as if it were an explicitly declared (normal) class, we might want to dispense with the documentation for the default constructor as it lacks a comment and is an artefact. ### What this PR proposes for JEP 463 1. Leave `--enable-preview` and `--source` as correct and unavoidable until the feature is standardised. 2. "Drill a hole" in javadoc access control to **automatically** allow implicit classes and their public, protected or declared with package access members in documentation. 3. Do not emit warnings for an implicit class and its deault constructor. 4. Do not document an implicit class' default constructor. ------------- Depends on: https://git.openjdk.org/jdk/pull/16461 Commit messages: - Initial commit - 8320358: GHA: ignore jdk* branches - 8319437: NMT should show library names in call stacks Changes: https://git.openjdk.org/jdk/pull/16853/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16853&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308715 Stats: 256 lines in 6 files changed: 246 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/16853.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16853/head:pull/16853 PR: https://git.openjdk.org/jdk/pull/16853 From prappo at openjdk.org Tue Nov 28 14:45:36 2023 From: prappo at openjdk.org (Pavel Rappo) Date: Tue, 28 Nov 2023 14:45:36 GMT Subject: RFR: 8308715: Create a mechanism for Implicitly Declared Class javadoc In-Reply-To: References: Message-ID: On Tue, 28 Nov 2023 14:32:14 GMT, Pavel Rappo wrote: > Please review this PR to support _JEP 463 Implicitly Declared Classes and Instance Main Method (Second Preview)_ in JavaDoc. > > [JEP 463](https://openjdk.org/jeps/463) is the next iteration of [JEP 445](https://openjdk.org/jeps/445), which introduced the ability to have a source file as a launchable, "classless" method bag > > > % cat HelloWorld.java > /** Run me. */ > void main() { > print("Hello, world!"); > } > > /** Shortcut for printing strings. */ > void print(String arg) { > System.out.println(arg); > } > > > which the compiler interprets as a familiar class > > > final class HelloWorld { > > HelloWorld() { > } > > /** Run me. */ > void main() { > print("Hello, world!"); > } > > /** Shortcut for printing strings. */ > void print(String arg) { > System.out.println(arg); > } > } > > > ### How JEP 445 works with JavaDoc today > > In JDK 21, javadoc can document such a file **without any changes to the javadoc tool**. The only thing that the user needs to do is to make sure that the following options are present: > > * `--enable-preview` and `--source=21` > * `-package` > > The first pair of options tells javadoc to use preview features, which JEP 445 is one of. Without these preview-related options, javadoc will raise the following error: > > > % javadoc --version > javadoc 21 > > % javadoc HelloWorld.java -d /tmp/throwaway > Loading source file HelloWorld.java... > HelloWorld.java:2: error: unnamed classes are a preview feature and are disabled by default. > void main() { > ^ > (use --enable-preview to enable unnamed classes) > 1 error > > > The second option, `-package`, tells javadoc to document classes that are public, protected, or declared with package access (colloquially known as "package private"). Without this option, javadoc will only document public and protected classes, which do not include the interpreted class: > > > % javadoc --enable-preview --source=21 HelloWorld.java -d /tmp/throwaway > Loading source file HelloWorld.java... > Constructing Javadoc information... > error: No public or protected classes found to document. > 1 error > > > When those additional options are present, javadoc does its job: > > > % javadoc --enable-preview --source=21 -package HelloWorld.java -d /tmp/throwaway > Loading source file HelloWorld.java... > Constructing Javadoc information... > Creating destination directory: "/tmp/throwaway/" > Building index for all the packages and classes... > Standard Doclet version 21+35-LTS-2513 > Building tree for all the packages and classes... > Generating /tmp/throwaway/HelloWorld.htm... Reviewers, please ignore changes to the following files as well as commits that brought them: * .github/workflows/main.yml * src/hotspot/share/utilities/nativeCallStack.cpp Those changes seem to be transient artefacts of the workflow and should disappear on their own, eventually. The reason those changes appeared in this PR is that this PR's branch is based on a more recent master than that of the PR that this PR depends on, https://github.com/openjdk/jdk/pull/16461. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16853#issuecomment-1829986363 From rgiulietti at openjdk.org Tue Nov 28 14:58:34 2023 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Tue, 28 Nov 2023 14:58:34 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v13] In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: On Mon, 27 Nov 2023 19:09:40 GMT, Roger Riggs wrote: >> Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. >> The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. >> >> The changes include: >> >> - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate >> if the input array is modified before the constructor returns. >> The resulting string may contain any combination of characters sampled from the input array. >> >> - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. >> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. >> If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. >> If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. >> Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. >> Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. >> >> - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. >> >> - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. > > Roger Riggs has updated the pull request incrementally with one additional commit since the last revision: > > Use byte off branches in char_array_compress > Verified by manual tests with "-XX:AVX3Threshold=0" > And test in the PR test/hotspot/jtreg/compiler/intrinsics/string/TestStringConstructionIntrinsics.java The Java code, both src and test, looks good to me. ------------- Marked as reviewed by rgiulietti (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16425#pullrequestreview-1753111255 From pchilanomate at openjdk.org Tue Nov 28 15:01:37 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 28 Nov 2023 15:01:37 GMT Subject: RFR: 8310644: Make panama memory segment close use async handshakes [v5] In-Reply-To: References: Message-ID: On Fri, 24 Nov 2023 18:30:17 GMT, Erik ?sterlund wrote: >> The current logic for closing memory in panama today is susceptible to live lock if we have a closing thread that wants to close the memory in a loop that keeps failing, and a bunch of accessing threads that want to perform accesses as long as the memory is alive. They can both create impediments for the other. >> >> By using asynchronous handshakes to install an exception onto threads that are in @Scoped memory accesses, we can have close always succeed, and the accessing threads bail out. The idea is that we perform a synchronous handshake first to find threads that are in scoped methods. They might however be in the middle of throwing an exception or something wild like there, where an exception can't be delivered. We install an async handshake that will roll us forward to the first place where we can indeed install exceptions, then we reevaluate if we still need to do that, or if we have unwound out from the scoped method. If we are still inside of it, we ensure an exception is installed so we don't continue executing bytecodes that might access the memory that we have freed. >> >> Tested tier 1-5 as well as running test/jdk/java/foreign/TestHandshake.java hundreds of times, which tests this API pretty well. > > Erik ?sterlund has updated the pull request incrementally with two additional commits since the last revision: > > - Merge pull request #3 from JornVernee/PR_async_close+NoToNativeTrans > > - don't transition to native state on Unsafe_CopySwapMemory0 I don't know about the Panama details but the asynchronous handshake usage looks good. ------------- Marked as reviewed by pchilanomate (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16792#pullrequestreview-1753119472 From aph at openjdk.org Tue Nov 28 15:09:35 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 28 Nov 2023 15:09:35 GMT Subject: RFR: JDK-8320892: AArch64: Restore FPU control state after JNI [v2] In-Reply-To: <-Jv5Xvre3lonydwQ5uzYN3QB8V0VIuORIhM1RtIdW5g=.06167df9-7268-4945-8e18-a04d19ee97e1@github.com> References: <-Jv5Xvre3lonydwQ5uzYN3QB8V0VIuORIhM1RtIdW5g=.06167df9-7268-4945-8e18-a04d19ee97e1@github.com> Message-ID: > Some buggy libraries corrupt the floating-point control register. Provide something similar to the x86 RestoreMXCSROnJNICalls. > > I realize that using the x86ish name "RestoreMXCSROnJNICalls" might be a little controversial, but it is a _global_ flag, not a CPU-specific one. And it's clearly intended for this purpose. It might have been better if that flag had been given a better name twentyish years ago, but we can't change it now. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Fix oop map ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16851/files - new: https://git.openjdk.org/jdk/pull/16851/files/8dc2033c..590bb9a5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16851&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16851&range=00-01 Stats: 6 lines in 1 file changed: 3 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16851.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16851/head:pull/16851 PR: https://git.openjdk.org/jdk/pull/16851 From adinn at openjdk.org Tue Nov 28 15:12:15 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 28 Nov 2023 15:12:15 GMT Subject: RFR: JDK-8320892: AArch64: Restore FPU control state after JNI [v2] In-Reply-To: References: <-Jv5Xvre3lonydwQ5uzYN3QB8V0VIuORIhM1RtIdW5g=.06167df9-7268-4945-8e18-a04d19ee97e1@github.com> Message-ID: On Tue, 28 Nov 2023 15:09:35 GMT, Andrew Haley wrote: >> Some buggy libraries corrupt the floating-point control register. Provide something similar to the x86 RestoreMXCSROnJNICalls. >> >> I realize that using the x86ish name "RestoreMXCSROnJNICalls" might be a little controversial, but it is a _global_ flag, not a CPU-specific one. And it's clearly intended for this purpose. It might have been better if that flag had been given a better name twentyish years ago, but we can't change it now. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Fix oop map src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 4440: > 4438: bfi(tmp1, zr, 22, 4); // Clear DN, FZ, and Rmode > 4439: bfi(tmp1, zr, 8, 5); // Clear exception-control bits (8-12) > 4440: eor(tmp1, tmp1, tmp2); Hmm? So . . . 1) We ensure tmp1 has the bits we want by clearing DN FZ Rmode and Exception bits 2) we XOR tmp1 with the original bits (saved in tmp2) and put the 'difference' in tmp1 (?) 3) If the difference is zero we skip 4) Otherwise we write tmp1 (i.e. the 'difference' bits) to fpcr (???) Should this not be get_fpcr(tmp1); mov(tmp2, tmp1); bfi(tmp1, zr, 22, 4); bfi(tmp1, zr, 8, 5); eor(tmp2, tmp1, tmp2) cbz(tmp2, OK); set_fpcr(tmp1); bind(OK); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16851#discussion_r1407922192 From mli at openjdk.org Tue Nov 28 15:16:05 2023 From: mli at openjdk.org (Hamlin Li) Date: Tue, 28 Nov 2023 15:16:05 GMT Subject: RFR: 8318227: RISC-V: C2 ConvHF2F [v2] In-Reply-To: References: Message-ID: On Tue, 28 Nov 2023 13:23:43 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> optimize perf with stub out-of-line > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1704: > >> 1702: // check whether it's a NaN. >> 1703: mv(t0, 0x7c00); >> 1704: andr(tmp, src, t0); > > I see from the exponent encoding of float16 on [1], it could be a negative/positive infinity as well when exponent is 0b11111. It depends on whether the significand is zero or not. So it this checking for NAN sufficient? > > [1] https://en.wikipedia.org/wiki/Half-precision_floating-point_format Goot catch! Your observation is right and wrong. :) We could have a patch like below, but it will scrafise the performance of the normal case(non-NaN, non-Inf), as it adds one extra instructions at the critical path. Maybe a solution is to add some comments here, to state that NaN and Inf are processed in slow path, and slow path is necessary to NaN, but not necessary to Inf. How do you think about it? $ git diff src/ diff --git a/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp index 1b6140242b8..7413767395f 100644 --- a/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp +++ b/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp @@ -1700,10 +1700,11 @@ void C2_MacroAssembler::float16_to_float(FloatRegister dst, Register src, Regist auto stub = C2CodeStub::make(dst, src, tmp, 20, float16_to_float_nan_path); // check whether it's a NaN. - mv(t0, 0x7c00); + mv(t0, 0x7fff); andr(tmp, src, t0); + mv(t0, 0x7c00); // jump to stub processing NaN case - beq(t0, tmp, stub->entry()); + bgt(tmp, t0, stub->entry()); // non-NaN cases, just use built-in instructions. fmv_h_x(dst, src); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16802#discussion_r1407928062 From mli at openjdk.org Tue Nov 28 15:33:52 2023 From: mli at openjdk.org (Hamlin Li) Date: Tue, 28 Nov 2023 15:33:52 GMT Subject: RFR: 8318227: RISC-V: C2 ConvHF2F [v2] In-Reply-To: References: Message-ID: On Tue, 28 Nov 2023 11:23:30 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> optimize perf with stub out-of-line > > src/hotspot/os_cpu/linux_riscv/riscv_hwprobe.cpp line 52: > >> 50: #define RISCV_HWPROBE_EXT_ZBB (1 << 4) >> 51: #define RISCV_HWPROBE_EXT_ZBS (1 << 5) >> 52: #define RISCV_HWPROBE_EXT_ZFH (1 << 27) > > Will this change in future? Seems it's still not there in the kernel source yet [1]. > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/riscv/include/uapi/asm/hwprobe.h?h=v6.7-rc3 The latest message I got is that it will be pushed into kernel soon, we can wait for it landing in kernel if you'd like to. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16802#discussion_r1407960943 From mli at openjdk.org Tue Nov 28 15:37:37 2023 From: mli at openjdk.org (Hamlin Li) Date: Tue, 28 Nov 2023 15:37:37 GMT Subject: RFR: 8318227: RISC-V: C2 ConvHF2F [v2] In-Reply-To: References: Message-ID: On Tue, 28 Nov 2023 13:41:56 GMT, Quan Anh Mai wrote: >> Thanks for pointing to the location! >> >> It DOES bring better performance. Please check the pr description for detailed data. > > Thanks, it generally looks good, but I'm not familiar with RISC-V assembly so I will leave the approval for the others. Thanks for your great suggestion! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16802#discussion_r1407968412 From aph at openjdk.org Tue Nov 28 15:58:07 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 28 Nov 2023 15:58:07 GMT Subject: RFR: JDK-8320892: AArch64: Restore FPU control state after JNI [v3] In-Reply-To: References: <-Jv5Xvre3lonydwQ5uzYN3QB8V0VIuORIhM1RtIdW5g=.06167df9-7268-4945-8e18-a04d19ee97e1@github.com> Message-ID: On Tue, 28 Nov 2023 15:09:19 GMT, Andrew Dinn wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix thinko > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 4440: > >> 4438: bfi(tmp1, zr, 22, 4); // Clear DN, FZ, and Rmode >> 4439: bfi(tmp1, zr, 8, 5); // Clear exception-control bits (8-12) >> 4440: eor(tmp1, tmp1, tmp2); > > Hmm? So . . . > 1) We ensure tmp1 has the bits we want by clearing DN FZ Rmode and Exception bits > 2) we XOR tmp1 with the original bits (saved in tmp2) and put the 'difference' in tmp1 (?) > 3) If the difference is zero we skip > 4) Otherwise we write tmp1 (i.e. the 'difference' bits) to fpcr (???) > > Should this not be > > get_fpcr(tmp1); > mov(tmp2, tmp1); > bfi(tmp1, zr, 22, 4); > bfi(tmp1, zr, 8, 5); > eor(tmp2, tmp1, tmp2) > cbz(tmp2, OK); > set_fpcr(tmp1); > bind(OK); Argh, yes. The curse of the last-minute change... Good catch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16851#discussion_r1408000231 From aph at openjdk.org Tue Nov 28 15:58:04 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 28 Nov 2023 15:58:04 GMT Subject: RFR: JDK-8320892: AArch64: Restore FPU control state after JNI [v3] In-Reply-To: <-Jv5Xvre3lonydwQ5uzYN3QB8V0VIuORIhM1RtIdW5g=.06167df9-7268-4945-8e18-a04d19ee97e1@github.com> References: <-Jv5Xvre3lonydwQ5uzYN3QB8V0VIuORIhM1RtIdW5g=.06167df9-7268-4945-8e18-a04d19ee97e1@github.com> Message-ID: > Some buggy libraries corrupt the floating-point control register. Provide something similar to the x86 RestoreMXCSROnJNICalls. > > I realize that using the x86ish name "RestoreMXCSROnJNICalls" might be a little controversial, but it is a _global_ flag, not a CPU-specific one. And it's clearly intended for this purpose. It might have been better if that flag had been given a better name twentyish years ago, but we can't change it now. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Fix thinko ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16851/files - new: https://git.openjdk.org/jdk/pull/16851/files/590bb9a5..02a7aaa0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16851&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16851&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16851.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16851/head:pull/16851 PR: https://git.openjdk.org/jdk/pull/16851 From coleenp at openjdk.org Tue Nov 28 16:02:37 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 28 Nov 2023 16:02:37 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v10] In-Reply-To: References: <6UvVZPlnf_cSgctUVPspR767dHLa7vxOA4DiC7fJ3LY=.35e24443-ab4d-4240-a991-2d98d120bb3b@github.com> <_RAExxFbz54yP3mPtWmg-j-J5bJEIYutoSZMQExut3s=.9da4332b-c026-4739-a3ca-174d63f13250@github.com> Message-ID: On Tue, 28 Nov 2023 14:23:53 GMT, Oli Gillespie wrote: >> I think the footprint savings is probably negligible. I like the idea that the queue is periodically drained because it might show issues where we've messed up the refcounts, but they'd be hard to debug, so maybe not worth it. So I see neither the harm nor the benefit of draining the queue in the periodic task, but we should have the feature for the one test. > > I've removed drain from the concurrent cleanup. I left it public and tested, is that reasonable or do you prefer I make it private somehow (not sure the typical way to have test-only methods)? I like that there's a test. Some gtests use friend declarations if they're in a class, but test_symbolTable isn't. I don't think it's worth changing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1408008819 From never at openjdk.org Tue Nov 28 16:26:06 2023 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 28 Nov 2023 16:26:06 GMT Subject: RFR: 8306767: Concurrent repacking of extra data in MethodData is potentially unsafe In-Reply-To: References: Message-ID: On Tue, 28 Nov 2023 06:23:29 GMT, Emanuel Peter wrote: > I'm making sure that `allocate_bci_to_data` is only called when holding the `extra_data_lock`, so that no concurrent calls of it can ever occur. > > Testing: tier1-3 and stress. src/hotspot/share/runtime/deoptimization.cpp line 2499: > 2497: // This will let us detect a repeated trap at this point. > 2498: { > 2499: MutexLocker ml(trap_mdo->extra_data_lock()); Doesn't the lock have to be held over the lifetime of the pdata variable? Otherwise an intervening safepoint could repack the MDO rendering pdata invalid. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16840#discussion_r1408043075 From wkemper at openjdk.org Tue Nov 28 16:30:05 2023 From: wkemper at openjdk.org (William Kemper) Date: Tue, 28 Nov 2023 16:30:05 GMT Subject: RFR: 8320888: Shenandoah: Enable ShenandoahVerifyOptoBarriers in debug builds In-Reply-To: <4V4ijEJlqyoqZ7UjiX3613qsBPw5R4k9yv9lv1eqcaw=.aee775ba-a393-4f2c-9978-8aac011317f9@github.com> References: <4V4ijEJlqyoqZ7UjiX3613qsBPw5R4k9yv9lv1eqcaw=.aee775ba-a393-4f2c-9978-8aac011317f9@github.com> Message-ID: On Tue, 28 Nov 2023 12:40:41 GMT, Aleksey Shipilev wrote: > Flag cleanup. Current barrier verification code is opt-in, and it is selected for a few tests. For extra safety, we want to have it enabled by default in debug builds. This also simplifies test configurations. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` > - [ ] Linux x86_64 server fastdebug, `tier{1,2,3,4}` with `-XX:+UseShenandoahGC` > - [ ] Linux AArch64 server fastdebug, `tier{1,2,3,4}` with `-XX:+UseShenandoahGC` Marked as reviewed by wkemper (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/16849#pullrequestreview-1753352215 From eosterlund at openjdk.org Tue Nov 28 16:32:05 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 28 Nov 2023 16:32:05 GMT Subject: RFR: 8306767: Concurrent repacking of extra data in MethodData is potentially unsafe In-Reply-To: References: Message-ID: On Tue, 28 Nov 2023 16:23:34 GMT, Tom Rodriguez wrote: >> I'm making sure that `allocate_bci_to_data` is only called when holding the `extra_data_lock`, so that no concurrent calls of it can ever occur. >> >> Testing: tier1-3 and stress. > > src/hotspot/share/runtime/deoptimization.cpp line 2499: > >> 2497: // This will let us detect a repeated trap at this point. >> 2498: { >> 2499: MutexLocker ml(trap_mdo->extra_data_lock()); > > Doesn't the lock have to be held over the lifetime of the pdata variable? Otherwise an intervening safepoint could repack the MDO rendering pdata invalid. Good point. That was done in a similar place above but not here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16840#discussion_r1408052647 From epeter at openjdk.org Tue Nov 28 17:23:23 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 28 Nov 2023 17:23:23 GMT Subject: RFR: 8306767: Concurrent repacking of extra data in MethodData is potentially unsafe In-Reply-To: References: Message-ID: On Tue, 28 Nov 2023 16:29:19 GMT, Erik ?sterlund wrote: >> src/hotspot/share/runtime/deoptimization.cpp line 2499: >> >>> 2497: // This will let us detect a repeated trap at this point. >>> 2498: { >>> 2499: MutexLocker ml(trap_mdo->extra_data_lock()); >> >> Doesn't the lock have to be held over the lifetime of the pdata variable? Otherwise an intervening safepoint could repack the MDO rendering pdata invalid. > > Good point. That was done in a similar place above but not here. Hmm. So is it that I just have to have a lock on `allocate_bci_to_data`, or do I have to protect `pdata`? Because `pdata` is returned from `Deoptimization::query_update_method_data`, and so I would have to make the lock take a wider scope than that function, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16840#discussion_r1408123487 From adinn at openjdk.org Tue Nov 28 17:42:08 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 28 Nov 2023 17:42:08 GMT Subject: RFR: JDK-8320892: AArch64: Restore FPU control state after JNI [v3] In-Reply-To: References: <-Jv5Xvre3lonydwQ5uzYN3QB8V0VIuORIhM1RtIdW5g=.06167df9-7268-4945-8e18-a04d19ee97e1@github.com> Message-ID: On Tue, 28 Nov 2023 15:58:04 GMT, Andrew Haley wrote: >> Some buggy libraries corrupt the floating-point control register. Provide something similar to the x86 RestoreMXCSROnJNICalls. >> >> I realize that using the x86ish name "RestoreMXCSROnJNICalls" might be a little controversial, but it is a _global_ flag, not a CPU-specific one. And it's clearly intended for this purpose. It might have been better if that flag had been given a better name twentyish years ago, but we can't change it now. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Fix thinko Looks good. ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16851#pullrequestreview-1753504888 From never at openjdk.org Tue Nov 28 18:00:06 2023 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 28 Nov 2023 18:00:06 GMT Subject: RFR: 8306767: Concurrent repacking of extra data in MethodData is potentially unsafe In-Reply-To: References: Message-ID: On Tue, 28 Nov 2023 17:20:47 GMT, Emanuel Peter wrote: >> Good point. That was done in a similar place above but not here. > > Hmm. So is it that I just have to have a lock on `allocate_bci_to_data`, or do I have to protect `pdata`? Because `pdata` is returned from `Deoptimization::query_update_method_data`, and so I would have to make the lock take a wider scope than that function, right? The lock needs to be held while accessing the contents of pdata which is very complicated in this particular usage pattern. Maybe the caller needs to refetch the pointer under a lock instead of passing it out of this method. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16840#discussion_r1408165429 From kdnilsen at openjdk.org Tue Nov 28 18:31:06 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Tue, 28 Nov 2023 18:31:06 GMT Subject: RFR: 8320888: Shenandoah: Enable ShenandoahVerifyOptoBarriers in debug builds In-Reply-To: <4V4ijEJlqyoqZ7UjiX3613qsBPw5R4k9yv9lv1eqcaw=.aee775ba-a393-4f2c-9978-8aac011317f9@github.com> References: <4V4ijEJlqyoqZ7UjiX3613qsBPw5R4k9yv9lv1eqcaw=.aee775ba-a393-4f2c-9978-8aac011317f9@github.com> Message-ID: <65DZRaGoYyRdchfgnNcIweLjK35xMJlrswDaXP-zIdA=.f9be9f05-cd6a-4e53-9640-b22bac03b0d2@github.com> On Tue, 28 Nov 2023 12:40:41 GMT, Aleksey Shipilev wrote: > Flag cleanup. Current barrier verification code is opt-in, and it is selected for a few tests. For extra safety, we want to have it enabled by default in debug builds. This also simplifies test configurations. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` > - [ ] Linux x86_64 server fastdebug, `tier{1,2,3,4}` with `-XX:+UseShenandoahGC` > - [ ] Linux AArch64 server fastdebug, `tier{1,2,3,4}` with `-XX:+UseShenandoahGC` Marked as reviewed by kdnilsen (no project role). ------------- PR Review: https://git.openjdk.org/jdk/pull/16849#pullrequestreview-1753612732 From jiangli at openjdk.org Tue Nov 28 18:39:16 2023 From: jiangli at openjdk.org (Jiangli Zhou) Date: Tue, 28 Nov 2023 18:39:16 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is created for one JavaThread associated with attached native thread [v5] In-Reply-To: <9bpDj7y-wn-5I1Hvnw9kCKqLVK0TOINrZvYeq_d-QsA=.aa8cd690-3198-4398-8e59-6be8288d90ee@github.com> References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> <9bpDj7y-wn-5I1Hvnw9kCKqLVK0TOINrZvYeq_d-QsA=.aa8cd690-3198-4398-8e59-6be8288d90ee@github.com> Message-ID: On Tue, 28 Nov 2023 05:08:58 GMT, Serguei Spitsyn wrote: >> src/hotspot/share/prims/jvmtiThreadState.inline.hpp line 98: >> >>> 96: state->get_thread_oop() != thread_oop)) { >>> 97: // Check if java_lang_Thread already has a link to the JvmtiThreadState. >>> 98: if (thread_oop != nullptr) { // thread_oop can be null during early VMStart. >> >> This comment is another case of `state->get_thread_oop()` being null. We should merge this comment with the new comment about attaching native thread. > > This also was caught by my eyes. :) > With the lines 99-101 in place the only case when `thread_oop` can be equal to `nullptr` is when `thread->threadObj() == nullptr`. My understanding is that it can be for a detached thread only. > I would suggest to add an assert after the line 101: ` assert(thread_oop != nullptr, "sanity check");` > Full testing with this assert should help to identify if it can be fired. If such cases are found then they need to be fixed. Then we can remove the check at the line 104. > The `JvmtiThreadState` constructor also allows for `thread_oop` to be `nullptr`. > Some cleanup will be needed to get rid of unneeded checks there as well. @sspitsyn For the above suggestions, it seems cleaner/safer to handle the clean-ups in a separate RFE with full testing including the vthread cases. There are additional comments in https://github.com/openjdk/jdk/pull/16642#issuecomment-1815379890 related to this as well. Those could be handled together and require through testing including the vthread support. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16642#discussion_r1408228229 From dcubed at openjdk.org Tue Nov 28 18:41:13 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 28 Nov 2023 18:41:13 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v8] In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 20:20:01 GMT, Stefan Karlsson wrote: >> In the rewrites made for: >> [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` >> >> I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. >> >> The provided tests provoke this assert form: >> * the JNI thread detach code >> * thread dumping with locked monitors, and >> * the JVMTI GetOwnedMonitorInfo API. >> >> While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. >> >> The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. >> >> For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. >> >> Test: the written tests with and without the fix. Tier1-Tier3, so far. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Fix indentation I'll re-review again once the last set of comments are addressed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16783#issuecomment-1830463592 From duke at openjdk.org Tue Nov 28 19:01:14 2023 From: duke at openjdk.org (Francesco Nigro) Date: Tue, 28 Nov 2023 19:01:14 GMT Subject: RFR: 8320886: Unsafe_SetMemory0 is not guarded In-Reply-To: <5kRdxpEyFZLzxlyHpdHju1w9qLbm4OA6UkVZMr17nt0=.339b7543-574c-4a06-84e9-2ffb9d9a345a@github.com> References: <5kRdxpEyFZLzxlyHpdHju1w9qLbm4OA6UkVZMr17nt0=.339b7543-574c-4a06-84e9-2ffb9d9a345a@github.com> Message-ID: <4iaLWHzsfBlAlbxKH36YR4EN3JQzERsKjItuBp7ilz8=.cd07bc9b-8abb-4df8-9cc0-813b89bd5630@github.com> On Tue, 28 Nov 2023 12:09:12 GMT, Jorn Vernee wrote: > See JBS issue. > > Guard the memory access done in Unsafe_SetMemory0 to prevent a SIGBUS error from crashing the VM when a truncated memory mapped file is accessed. > > Testing: local `InternalErrorTest`, Tier 1-5 (ongoing) @JornVernee please check what I have done for https://github.com/openjdk/jdk/pull/16760 Maybe there is something we can pull in already on your PR here re it ------------- PR Comment: https://git.openjdk.org/jdk/pull/16848#issuecomment-1829720912 From jvernee at openjdk.org Tue Nov 28 19:01:13 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 28 Nov 2023 19:01:13 GMT Subject: RFR: 8320886: Unsafe_SetMemory0 is not guarded Message-ID: <5kRdxpEyFZLzxlyHpdHju1w9qLbm4OA6UkVZMr17nt0=.339b7543-574c-4a06-84e9-2ffb9d9a345a@github.com> See JBS issue. Guard the memory access done in Unsafe_SetMemory0 to prevent a SIGBUS error from crashing the VM when a truncated memory mapped file is accessed. Testing: local `InternalErrorTest`, Tier 1-5 (ongoing) ------------- Commit messages: - fix - add test Changes: https://git.openjdk.org/jdk/pull/16848/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16848&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320886 Stats: 18 lines in 2 files changed: 11 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/16848.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16848/head:pull/16848 PR: https://git.openjdk.org/jdk/pull/16848 From jvernee at openjdk.org Tue Nov 28 19:01:19 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 28 Nov 2023 19:01:19 GMT Subject: RFR: 8320886: Unsafe_SetMemory0 is not guarded In-Reply-To: <5kRdxpEyFZLzxlyHpdHju1w9qLbm4OA6UkVZMr17nt0=.339b7543-574c-4a06-84e9-2ffb9d9a345a@github.com> References: <5kRdxpEyFZLzxlyHpdHju1w9qLbm4OA6UkVZMr17nt0=.339b7543-574c-4a06-84e9-2ffb9d9a345a@github.com> Message-ID: On Tue, 28 Nov 2023 12:09:12 GMT, Jorn Vernee wrote: > See JBS issue. > > Guard the memory access done in Unsafe_SetMemory0 to prevent a SIGBUS error from crashing the VM when a truncated memory mapped file is accessed. > > Testing: local `InternalErrorTest`, Tier 1-5 (ongoing) https://github.com/openjdk/jdk/pull/16760 Looks more performance focused, so I think that should be addressed separately. I'm looking into the crash on mac ------------- PR Comment: https://git.openjdk.org/jdk/pull/16848#issuecomment-1830488963 PR Comment: https://git.openjdk.org/jdk/pull/16848#issuecomment-1830491876 From jvernee at openjdk.org Tue Nov 28 19:01:15 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 28 Nov 2023 19:01:15 GMT Subject: RFR: 8320886: Unsafe_SetMemory0 is not guarded In-Reply-To: <4iaLWHzsfBlAlbxKH36YR4EN3JQzERsKjItuBp7ilz8=.cd07bc9b-8abb-4df8-9cc0-813b89bd5630@github.com> References: <5kRdxpEyFZLzxlyHpdHju1w9qLbm4OA6UkVZMr17nt0=.339b7543-574c-4a06-84e9-2ffb9d9a345a@github.com> <4iaLWHzsfBlAlbxKH36YR4EN3JQzERsKjItuBp7ilz8=.cd07bc9b-8abb-4df8-9cc0-813b89bd5630@github.com> Message-ID: On Tue, 28 Nov 2023 12:12:55 GMT, Francesco Nigro wrote: >> See JBS issue. >> >> Guard the memory access done in Unsafe_SetMemory0 to prevent a SIGBUS error from crashing the VM when a truncated memory mapped file is accessed. >> >> Testing: local `InternalErrorTest`, Tier 1-5 (ongoing) > > @JornVernee please check what I have done for https://github.com/openjdk/jdk/pull/16760 > > Maybe there is something we can pull in already on your PR here re it @franz1981 I can not interact with PRs that have the OCA label. Once you have signed the OCA, and it is verified (label is removed), I can take a look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16848#issuecomment-1829723735 From duke at openjdk.org Tue Nov 28 19:01:15 2023 From: duke at openjdk.org (Francesco Nigro) Date: Tue, 28 Nov 2023 19:01:15 GMT Subject: RFR: 8320886: Unsafe_SetMemory0 is not guarded In-Reply-To: References: <5kRdxpEyFZLzxlyHpdHju1w9qLbm4OA6UkVZMr17nt0=.339b7543-574c-4a06-84e9-2ffb9d9a345a@github.com> <4iaLWHzsfBlAlbxKH36YR4EN3JQzERsKjItuBp7ilz8=.cd07bc9b-8abb-4df8-9cc0-813b89bd5630@github.com> Message-ID: On Tue, 28 Nov 2023 12:14:39 GMT, Jorn Vernee wrote: >> @JornVernee please check what I have done for https://github.com/openjdk/jdk/pull/16760 >> >> Maybe there is something we can pull in already on your PR here re it > > @franz1981 I can not interact with PRs that have the OCA label. Once you have signed the OCA, and it is verified (label is removed), I can take a look. @JornVernee Fair point, in theory I work for Red Hair so it should be sorted out, let me ask @theRealAph about it First contribution ever, so forgive the naive comment ------------- PR Comment: https://git.openjdk.org/jdk/pull/16848#issuecomment-1829767272 From jvernee at openjdk.org Tue Nov 28 19:01:17 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 28 Nov 2023 19:01:17 GMT Subject: RFR: 8320886: Unsafe_SetMemory0 is not guarded In-Reply-To: References: <5kRdxpEyFZLzxlyHpdHju1w9qLbm4OA6UkVZMr17nt0=.339b7543-574c-4a06-84e9-2ffb9d9a345a@github.com> <4iaLWHzsfBlAlbxKH36YR4EN3JQzERsKjItuBp7ilz8=.cd07bc9b-8abb-4df8-9cc0-813b89bd5630@github.com> Message-ID: <3ae8BYoX-CB3iekNvjQ091TKdRW6o6JXK9s99hFynno=.c21d984f-58c5-4dd5-b2a3-85925e3333fd@github.com> On Tue, 28 Nov 2023 12:42:13 GMT, Francesco Nigro wrote: >> @franz1981 I can not interact with PRs that have the OCA label. Once you have signed the OCA, and it is verified (label is removed), I can take a look. > > @JornVernee Fair point, in theory I work for Red Hair so it should be sorted out, let me ask @theRealAph about it > > First contribution ever, so forgive the naive comment @franz1981 No problem. I wasn't sure whether you worked for Red Hat or not. In that case, it should be enough to post a comment with `/covered` and then one of the community managers will create an entry for your github user name in the OCA database: https://oca.opensource.oracle.com/?ojr=contrib-list I will stick to the policy of waiting until the OCA label is removed though :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16848#issuecomment-1829809577 From aph at openjdk.org Tue Nov 28 19:01:18 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 28 Nov 2023 19:01:18 GMT Subject: RFR: 8320886: Unsafe_SetMemory0 is not guarded In-Reply-To: References: <5kRdxpEyFZLzxlyHpdHju1w9qLbm4OA6UkVZMr17nt0=.339b7543-574c-4a06-84e9-2ffb9d9a345a@github.com> <4iaLWHzsfBlAlbxKH36YR4EN3JQzERsKjItuBp7ilz8=.cd07bc9b-8abb-4df8-9cc0-813b89bd5630@github.com> Message-ID: On Tue, 28 Nov 2023 12:42:13 GMT, Francesco Nigro wrote: >> @franz1981 I can not interact with PRs that have the OCA label. Once you have signed the OCA, and it is verified (label is removed), I can take a look. > > @JornVernee Fair point, in theory I work for Red Hair so it should be sorted out, let me ask @theRealAph about it > > First contribution ever, so forgive the naive comment > @franz1981 I can not interact with PRs that have the OCA label. Once you have signed the OCA, and it is verified (label is removed), I can take a look. I can tell you that @franz1981 is a Red Hat employee, and we have signed the OCA for everyone here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16848#issuecomment-1830146554 From shade at openjdk.org Tue Nov 28 20:45:36 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 28 Nov 2023 20:45:36 GMT Subject: RFR: 8320924: Improve heap dump performance by optimizing archived object checks Message-ID: <8Ek_2iD6dG8MJE0AEHlzxcD4GDCYYEmKeVoBMO4PBF8=.4352c26a-76b9-46ae-af3f-8666821c9a9c@github.com> Profiling heap dumping code reveals another simple issue: `mask_dormant_archived_object` on dumping hotpath takes quite a bit of time. We can reflow it for better inlineability, throwing out the non-essential parts into cold method. There is also no reason to peek into java mirror with (default) keep-alive, if we only use the result for null-check. Example improvements on Mac M1: % for I in `seq 1 5`; do build/macosx-aarch64-server-release/images/jdk/bin/java -XX:+UseParallelGC -XX:+HeapDumpAfterFullGC -Xms8g -Xmx8g HeapDump.java 2>&1 | grep created; rm *.hprof; done # Before Heap dump file created [1897307608 bytes in 1.584 secs] Heap dump file created [1897308278 bytes in 1.439 secs] Heap dump file created [1897308508 bytes in 1.460 secs] Heap dump file created [1897308505 bytes in 1.423 secs] Heap dump file created [1897308554 bytes in 1.414 secs] # After Heap dump file created [1897307648 bytes in 1.509 secs] Heap dump file created [1897308498 bytes in 1.281 secs] Heap dump file created [1897308554 bytes in 1.282 secs] Heap dump file created [1897308512 bytes in 1.263 secs] Heap dump file created [1897308554 bytes in 1.270 secs] ...which is about +12% faster heap dump. I also eyeballed the generated code and saw `mask_dormant_archived_object` fully inlined at least on x86_64. ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/16863/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16863&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320924 Stats: 41 lines in 3 files changed: 19 ins; 17 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16863.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16863/head:pull/16863 PR: https://git.openjdk.org/jdk/pull/16863 From sspitsyn at openjdk.org Tue Nov 28 20:46:12 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 28 Nov 2023 20:46:12 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is created for one JavaThread associated with attached native thread [v5] In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Wed, 22 Nov 2023 22:40:20 GMT, Jiangli Zhou wrote: >> Please review JvmtiThreadState::state_for_while_locked change to handle the state->get_thread_oop() null case. Please see https://bugs.openjdk.org/browse/JDK-8319935 for details. > > Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: > > Address Serguei Spitsyn's comments/suggestions: > - Remove the redundant thread->is_Java_thread() check from JvmtiSampledObjectAllocEventCollector::object_alloc_is_safe_to_sample(). > - Change the assert in JvmtiThreadState::state_for_while_locked to avoid #ifdef ASSERT. This fix looks good to me, so approved now. I assume it is for 22. Is it correct? ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16642#pullrequestreview-1753899489 From sspitsyn at openjdk.org Tue Nov 28 20:46:14 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 28 Nov 2023 20:46:14 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is created for one JavaThread associated with attached native thread [v5] In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> <9bpDj7y-wn-5I1Hvnw9kCKqLVK0TOINrZvYeq_d-QsA=.aa8cd690-3198-4398-8e59-6be8288d90ee@github.com> Message-ID: On Tue, 28 Nov 2023 18:36:00 GMT, Jiangli Zhou wrote: >> This also was caught by my eyes. :) >> With the lines 99-101 in place the only case when `thread_oop` can be equal to `nullptr` is when `thread->threadObj() == nullptr`. My understanding is that it can be for a detached thread only. >> I would suggest to add an assert after the line 101: ` assert(thread_oop != nullptr, "sanity check");` >> Full testing with this assert should help to identify if it can be fired. If such cases are found then they need to be fixed. Then we can remove the check at the line 104. >> The `JvmtiThreadState` constructor also allows for `thread_oop` to be `nullptr`. >> Some cleanup will be needed to get rid of unneeded checks there as well. > > @sspitsyn For the above suggestions, it seems cleaner/safer to handle the clean-ups in a separate RFE with full testing including the vthread cases. There are additional comments in https://github.com/openjdk/jdk/pull/16642#issuecomment-1815379890 related to this as well. Those could be handled together and require through testing including the vthread support. Okay, thanks! I was about to suggest the same as it will simplify the process. I've filed the following issue and assigned to myself: [8320925](https://bugs.openjdk.org/browse/JDK-8320925 )`assert and cleanup in JvmtiThreadState::state_for_while_locked for thread_oop != nullptr` I'll prepare a fix for 22 and hope to get you as a reviewer. :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16642#discussion_r1408394435 From coleenp at openjdk.org Tue Nov 28 21:34:18 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 28 Nov 2023 21:34:18 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v10] In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 09:16:30 GMT, Jaroslav Bachorik wrote: >> Please, review this fix for a corner case handling of `jmethodID` values. >> >> The issue is related to the interplay between `jmethodID` values and method redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` instance. Once that method gets redefined, the `jmethodID` is updated to point to the last `Method` version. >> Unless the method is still on stack/running, in which case the original `jmethodID` will be redirected to the latest `Method` version and at the same time the 'previous' `Method` version will receive a new `jmethodID` pointing to that previous version. >> >> If we happen to capture stacktrace via `GetStackTrace` or `GetAllStackTraces` JVMTI calls while this previous `Method` version is still on stack we will have the corresponding frame identified by a `jmethodID` pointing to that version. >> However, sooner or later the 'previous' class version becomes eligible for cleanup at what time all contained `Method` instances. The cleanup process will not perform the `jmethodID` pointer maintenance and we will end up with pointers to deallocated memory. >> This is caused by the fact that the `jmethodID` lifecycle is bound to `ClassLoaderData` instance and all relevant `jmethodID`s will get batch-updated when the class loader is being released and all its classes are getting unloaded. >> >> This means that we need to make sure that if a `Method` instance is being deallocate the associated `jmethodID` (if any) must not point to the deallocated instance once we are finished. Unfortunately, we can not just update the `jmethodID` values in bulk when purging an old class version - the per `InstanceKlass` jmethodID cache is present only for the main class version and contains `jmethodID` values for both the old and current method versions. >> >> ~Therefore we need to perform `jmethodID` lookup when we are about to deallocate a `Method` instance and clean up the pointer only if that `jmethodID` is pointing to the `Method` instance which is being deallocated.~ >> >> Therefore, we need to perform `jmethodID` lookup for each method in an old class version that is getting purged, and null out the pointer of that `jmethodID` to break the link from `jmethodID` to the method instance that is about to get deallocated. >> >> _(For anyone interested, a much lengthier writeup is available in [my blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_ > > Jaroslav Bachorik has updated the pull request incrementally with one additional commit since the last revision: > > Remove unnecessary assert This looks really good. Thank you for the blog post, it helped understand the problem, which is very convoluted. I like that the methodID is cleared with purge_previous_version_list. src/hotspot/share/oops/instanceKlass.cpp line 4236: > 4234: if (method != nullptr) { > 4235: method->clear_jmethod_id(); > 4236: } This loops through the methods in the InstanceKlass that was a previous version klass, and clears the jmethodIDs for all the methods. Will it clear the jmethodIDs for the EMCP methods also, and should it? The jmethodID for EMCP methods are replaced with a the new version, so the Method* in this list won't find a matching jmethodID. Maybe this can be restricted to obsolete methods? ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16662#pullrequestreview-1753984104 PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1408443056 From manc at openjdk.org Tue Nov 28 21:34:15 2023 From: manc at openjdk.org (Man Cao) Date: Tue, 28 Nov 2023 21:34:15 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is created for one JavaThread associated with attached native thread [v5] In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Wed, 22 Nov 2023 22:40:20 GMT, Jiangli Zhou wrote: >> Please review JvmtiThreadState::state_for_while_locked change to handle the state->get_thread_oop() null case. Please see https://bugs.openjdk.org/browse/JDK-8319935 for details. > > Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: > > Address Serguei Spitsyn's comments/suggestions: > - Remove the redundant thread->is_Java_thread() check from JvmtiSampledObjectAllocEventCollector::object_alloc_is_safe_to_sample(). > - Change the assert in JvmtiThreadState::state_for_while_locked to avoid #ifdef ASSERT. Marked as reviewed by manc (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16642#pullrequestreview-1753984980 From jiangli at openjdk.org Tue Nov 28 21:34:18 2023 From: jiangli at openjdk.org (Jiangli Zhou) Date: Tue, 28 Nov 2023 21:34:18 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is created for one JavaThread associated with attached native thread [v5] In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Tue, 28 Nov 2023 20:43:15 GMT, Serguei Spitsyn wrote: > This fix looks good to me, so approved now. I assume it is for 22. Is it correct? Thanks for the careful review, @sspitsyn! The fix is for 22. We probably should also consider back-porting to JDK 11 to prevent any potential changes in the area accidentally reintroducing the bug. The https://bugs.openjdk.org/browse/JDK-8312174 change has been back-ported to 11, which resolved the crashes by luck. I'll request backport after this fix is integrated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16642#issuecomment-1830777696 From manc at openjdk.org Tue Nov 28 21:34:20 2023 From: manc at openjdk.org (Man Cao) Date: Tue, 28 Nov 2023 21:34:20 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is created for one JavaThread associated with attached native thread [v5] In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> <9bpDj7y-wn-5I1Hvnw9kCKqLVK0TOINrZvYeq_d-QsA=.aa8cd690-3198-4398-8e59-6be8288d90ee@github.com> Message-ID: <5z6qMAAU2UwfD17B6TbfivUuQvDsWrVjBdQpCrjdTnI=.b79dcef0-153a-4abd-b001-8ecf3ee27dcf@github.com> On Tue, 28 Nov 2023 20:38:55 GMT, Serguei Spitsyn wrote: >> @sspitsyn For the above suggestions, it seems cleaner/safer to handle the clean-ups in a separate RFE with full testing including the vthread cases. There are additional comments in https://github.com/openjdk/jdk/pull/16642#issuecomment-1815379890 related to this as well. Those could be handled together and require through testing including the vthread support. > > Okay, thanks! I was about to suggest the same as it will simplify the process. > I've filed the following issue and assigned to myself: > [8320925](https://bugs.openjdk.org/browse/JDK-8320925 )`assert and cleanup in JvmtiThreadState::state_for_while_locked for thread_oop != nullptr` > > I'll prepare a fix for 22 and hope to get you as a reviewer. :) Thank you @sspitsyn, I'm interested in reviewing the followup change for [8320925](https://bugs.openjdk.org/browse/JDK-8320925) as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16642#discussion_r1408441239 From matsaave at openjdk.org Tue Nov 28 22:22:22 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 28 Nov 2023 22:22:22 GMT Subject: RFR: 8320530: has_resolved_ref_index flag not restored after resetting entry [v2] In-Reply-To: <4Bi8mWx5pxfYciHnXHUla1X_BzUt_56q8MoRkCYc0dk=.50072528-a5d9-4708-995b-c0bd49f8c74b@github.com> References: <4Bi8mWx5pxfYciHnXHUla1X_BzUt_56q8MoRkCYc0dk=.50072528-a5d9-4708-995b-c0bd49f8c74b@github.com> Message-ID: > ResolvedMethodEntry::reset_entry() clears the fields in the structure and then restores the constant pool index and the resolved references index if it has one. Currently, the resolved references index is restored without restoring the has_resolved_reference flag used by the structure. > > This patch restored the flag with the resolved references index. Verified with tier 1-5 tests. Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Added asserts to ensure correctness - Merge branch 'master' into resolved_ref_flag - Merge branch 'master' of https://github.com/openjdk/jdk into resolved_ref_flag - 8320530: has_resolved_ref_index flag not restored after resetting entry ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16769/files - new: https://git.openjdk.org/jdk/pull/16769/files/bff99660..273d82df Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16769&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16769&range=00-01 Stats: 10779 lines in 510 files changed: 7760 ins; 1975 del; 1044 mod Patch: https://git.openjdk.org/jdk/pull/16769.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16769/head:pull/16769 PR: https://git.openjdk.org/jdk/pull/16769 From sspitsyn at openjdk.org Tue Nov 28 22:43:10 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 28 Nov 2023 22:43:10 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is created for one JavaThread associated with attached native thread [v3] In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Tue, 28 Nov 2023 02:51:54 GMT, Serguei Spitsyn wrote: >>> Thank you for filing and fixing this issue! I'm kind of late here. Sorry for that. Is it hard to create a JTreg test for an attaching native thread? I can help if you have a standalone prototype. You can look for some examples in the folder: `test/hotspot/jtreg/serviceability/jvmti/vthread`. >> >> Hi @sspitsyn we don't have an extracted standalone test case (yet) to demonstrate the crashes. The crashes could not reproduce consistently. Outside the debugger (lldb), I ran the test (one of the affected ones) 10 times/per-iteration in order to reproduce. I found the crashes could be affected by both timing and memory layout. During the investigation, I noticed the problem became hidden when I increased allocation size for ThreadsList::_threads (as one of the experiments that I did, I wanted to mprotect the memory to be read-only in order to find who trashed the memory, so was trying to allocate memory up to page boundary). That's the reason why I added noreg-hard tag earlier. >> >> I gave some more thoughts today. Perhaps, we could write a whitebox test to check the JvmtiThreadState, without being able to consistently trigger crashes. We could add a WhiteBox api to iterate the JvmtiThreadState list and validate if all the JavaThread pointers were valid after detaching. The test would need to create native threads to attach and detach before the check. That could more reliably test the 1-1 mapping of JvmtiThreadState and JavaThread. What do you think? >> >> Thanks for volunteering to help with the test. I created https://bugs.openjdk.org/browse/JDK-8320614 today. Should I assign it to you? > > @jianglizhou Thank you for filing the sub-task. You have already seen some crashes. Even though you do not have a standalone test case, it is still valuable if you describe a test scenario (at least, surfacely) which helped to observe the problem. Could you, add it to the sub-task report, please? > Thanks for the careful review, @sspitsyn! The fix is for 22. We probably should also consider back-porting to JDK 11 to prevent any potential changes in the area accidentally reintroducing the bug. The https://bugs.openjdk.org/browse/JDK-8312174 change has been back-ported to 11, which resolved the crashes by luck. > I'll request backport after this fix is integrated. Nice. I've targeted it to 22. I agree it is better to have it back-ported. Its back-port is not going to be clean though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16642#issuecomment-1830872157 From cjplummer at openjdk.org Tue Nov 28 22:45:08 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 28 Nov 2023 22:45:08 GMT Subject: RFR: 8314029: Add file name parameter to Compiler.perfmap [v4] In-Reply-To: References: Message-ID: On Tue, 21 Nov 2023 20:42:47 GMT, Chris Plummer wrote: >> src/jdk.jcmd/share/man/jcmd.1 line 1: >> >>> 1: .\" Copyright (c) 2012, 2023, Oracle and/or its affiliates. All rights reserved. >> >> The actual markdown source for this file needs to be updated with these changes. Those sources are not open-source unfortunately. Please either coordinate to get the sources updated with an Oracle developer as part of this PR (they will integrate the internal part), or else please defer this to a subtask and let an Oracle developer update the source and output at the same time. Thanks. > > I filed JDK-8320556 to update the closed source. It's assigned to me. I'll do the update after these changes are pushed. In the meantime I'll make sure the current jcmd.1 changes are correct and match the closed changes I'll be making. I've applied the doc changes in the CSR to our closed markup file and generated a new jcmd.1 file with it. You can apply this diff to this PR to get the open changes checked in: diff --git a/src/jdk.jcmd/share/man/jcmd.1 b/src/jdk.jcmd/share/man/jcmd.1 index 4157cf600e1..af8d3e61b86 100644 --- a/src/jdk.jcmd/share/man/jcmd.1 +++ b/src/jdk.jcmd/share/man/jcmd.1 @@ -178,11 +178,21 @@ Prints all compiled methods in code cache that are alive. Impact: Medium .RE .TP -\f[V]Compiler.perfmap\f[R] (Linux only) +\f[V]Compiler.perfmap\f[R] [\f[I]arguments\f[R]] (Linux only) Write map file for Linux perf tool. .RS .PP Impact: Low +.PP +\f[I]arguments\f[R]: +.IP [bu] 2 +\f[V]filename\f[R]: (Optional) Name of the map file (STRING, no default +value) +.PP +If \f[V]filename\f[R] is not specified, a default file name is chosen +using the pid of the target JVM process. +For example, if the pid is \f[V]12345\f[R], then the default +\f[V]filename\f[R] will be \f[V]/tmp/perf-12345.map\f[R]. .RE .TP \f[V]Compiler.queue\f[R] ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15871#discussion_r1408505466 From duke at openjdk.org Tue Nov 28 23:25:27 2023 From: duke at openjdk.org (Yi-Fan Tsai) Date: Tue, 28 Nov 2023 23:25:27 GMT Subject: RFR: 8314029: Add file name parameter to Compiler.perfmap [v5] In-Reply-To: References: Message-ID: <81dXSHvLQMGj3s1BcBs8fmJUEoJpaU-5wBRSIjnztMM=.d53f8a2f-8353-49ec-8a9b-695b32f03d20@github.com> > `jcmd Compiler.perfmap` uses the hard-coded file name for a perf map: `/tmp/perf-%d.map`. This change adds an optional argument for specifying a file name. > > `jcmd PID help Compiler.perfmap` shows the following usage. > > > Compiler.perfmap > Write map file for Linux perf tool. > > Impact: Low > > Syntax : Compiler.perfmap [] > > Arguments: > filename : [optional] Name of the map file (STRING, no default value) > > > The man page of jcmd will be updated in a separate PR. Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: Apply man changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15871/files - new: https://git.openjdk.org/jdk/pull/15871/files/0ec20ea3..a7dcf426 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15871&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15871&range=03-04 Stats: 11 lines in 1 file changed: 10 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15871.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15871/head:pull/15871 PR: https://git.openjdk.org/jdk/pull/15871 From iklam at openjdk.org Tue Nov 28 23:31:13 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 28 Nov 2023 23:31:13 GMT Subject: RFR: 8320935: Move CDS config initialization code to cdsConfig.cpp Message-ID: This is a simple clean up that moves the code for initializing the CDS config states from arguments.cpp to cdsConfig.cpp I renamed a few functions, but otherwise the code is unchanged. - `get_default_shared_archive_path()` -> `default_archive_path()` - `GetSharedArchivePath()` -> `static_archive_path()` - `GetSharedDynamicArchivePath()` -> `dynamic_archive_path()` There's also less `#if INCLUDE_CDS` since the entire cdsConfig.cpp file is compiled only if CDS is enabled. ------------- Commit messages: - code alignment - step4 - step3 - step2 - step1 Changes: https://git.openjdk.org/jdk/pull/16868/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16868&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320935 Stats: 696 lines in 8 files changed: 346 ins; 327 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/16868.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16868/head:pull/16868 PR: https://git.openjdk.org/jdk/pull/16868 From dlong at openjdk.org Tue Nov 28 23:57:04 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 28 Nov 2023 23:57:04 GMT Subject: RFR: 8320750: Allow a testcase to run with muliple -Xlog In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 13:32:52 GMT, Leo Korinth wrote: > Running a testcase with muliple -Xlog crashes JTREG test cases. This is because `Collector.toMap` is not given a merge strategy. > > When the same argument is passed multiple times, I have added a merge strategy to use the latter value. This is similar to how it is implemented for `vm.opt.*` in JTREG. > > If the flag tested is `-Xlog`, replace the value part with a dummy value "NONEMPTY_TEST_SENTINEL". This is because in the case of multiple `-Xlog` all values are used, and JTREG does not give a satisfactory way to represent them. This dummy value should make it hard to try to `@require` on specific values by mistake. > > Tested with: > > @requires vm.opt.x.Xlog == "NONEMPTY_TEST_SENTINEL" > @requires vm.opt.x.Xlog == "NONEMPTY_TEST_SENTINELXXX" > @requires vm.opt.x.Xms == "3g" > > and > > JAVA_OPTIONS=-Xms3g -Xms4g > JAVA_OPTIONS=-Xms4g -Xms3g > JAVA_OPTIONS=-Xlog:gc* -Xlog:gc* > ``` > > Running tier1 What about -Xlog:gc=debug,safepoint=trace and other -XX flags that take a comma-separated list? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16824#issuecomment-1830962381 From iklam at openjdk.org Wed Nov 29 00:15:08 2023 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 29 Nov 2023 00:15:08 GMT Subject: RFR: 8320530: has_resolved_ref_index flag not restored after resetting entry [v2] In-Reply-To: References: <4Bi8mWx5pxfYciHnXHUla1X_BzUt_56q8MoRkCYc0dk=.50072528-a5d9-4708-995b-c0bd49f8c74b@github.com> Message-ID: On Tue, 28 Nov 2023 22:22:22 GMT, Matias Saavedra Silva wrote: >> ResolvedMethodEntry::reset_entry() clears the fields in the structure and then restores the constant pool index and the resolved references index if it has one. Currently, the resolved references index is restored without restoring the has_resolved_reference flag used by the structure. >> >> This patch restored the flag with the resolved references index. Verified with tier 1-5 tests. > > Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Added asserts to ensure correctness > - Merge branch 'master' into resolved_ref_flag > - Merge branch 'master' of https://github.com/openjdk/jdk into resolved_ref_flag > - 8320530: has_resolved_ref_index flag not restored after resetting entry LGTM. Just a small nit. src/hotspot/share/oops/cpCache.cpp line 312: > 310: // Store appendix, if any. > 311: if (has_appendix) { > 312: assert(method_entry->has_resolved_references_index(), "sanity"); The assert is not necessary here, as resolved_references_index() already has the same assert. ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16769#pullrequestreview-1754172185 PR Review Comment: https://git.openjdk.org/jdk/pull/16769#discussion_r1408566653 From ayang at openjdk.org Wed Nov 29 00:41:11 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 29 Nov 2023 00:41:11 GMT Subject: RFR: 8320916: jdk/jfr/event/gc/stacktrace/TestParallelMarkSweepAllocationPendingStackTrace.java failed with "OutOfMemoryError: GC overhead limit exceeded" Message-ID: Simple fix to reduce live set so that after the triggered full-gc, there is still some memory left. Test: ~2/10 failure before the fix and no failure observed for 100 iterations. ------------- Commit messages: - test/jdk/jdk/jfr/event/gc/stacktrace/AllocationStackTrace.java Changes: https://git.openjdk.org/jdk/pull/16870/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16870&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320916 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16870.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16870/head:pull/16870 PR: https://git.openjdk.org/jdk/pull/16870 From sgibbons at openjdk.org Wed Nov 29 01:11:21 2023 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 29 Nov 2023 01:11:21 GMT Subject: RFR: JDK-8320448 Accelerate IndexOf using AVX2 Message-ID: Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: Benchmark Score Latest StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x StringIndexOf.constantPattern 9.361 11.906 1.271872663x StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x StringIndexOf.success 9.186 9.713 1.057369911x StringIndexOf.successBig 14.341 46.343 3.231504079x StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 ------------- Commit messages: - Fix whitespace - Merge branch 'openjdk:master' into indexof - Comments; added exhaustive-ish test - Subtracting 0x10 twice. - Stomped on r13 in switch branch calculation - Windows register preservation fix - Fix merge problem - Merge branch 'master' into indexof - Working version - Protecting against page faults - ... and 6 more: https://git.openjdk.org/jdk/compare/ce4e6e2b...60d762b9 Changes: https://git.openjdk.org/jdk/pull/16753/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320448 Stats: 3062 lines in 14 files changed: 2928 ins; 11 del; 123 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From sgibbons at openjdk.org Wed Nov 29 01:11:21 2023 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 29 Nov 2023 01:11:21 GMT Subject: RFR: JDK-8320448 Accelerate IndexOf using AVX2 In-Reply-To: References: Message-ID: On Tue, 21 Nov 2023 00:06:19 GMT, Scott Gibbons wrote: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Opening up for review. Fixed last whitespace error. I will post final performance numbers soon. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-1831031294 From dholmes at openjdk.org Wed Nov 29 02:15:08 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 29 Nov 2023 02:15:08 GMT Subject: RFR: 8320886: Unsafe_SetMemory0 is not guarded In-Reply-To: References: <5kRdxpEyFZLzxlyHpdHju1w9qLbm4OA6UkVZMr17nt0=.339b7543-574c-4a06-84e9-2ffb9d9a345a@github.com> Message-ID: On Tue, 28 Nov 2023 18:58:30 GMT, Jorn Vernee wrote: >> See JBS issue. >> >> Guard the memory access done in Unsafe_SetMemory0 to prevent a SIGBUS error from crashing the VM when a truncated memory mapped file is accessed. >> >> Testing: local `InternalErrorTest`, Tier 1-5 (ongoing) > > I'm looking into the crash on mac @JornVernee is there some "new" usage of this method such that it needs guarding? I find it interesting that in the 3+ months that the fix for [JDK-8191278](https://bugs.openjdk.org/browse/JDK-8191278) was developed and reviewed, the need to include `SetMemory0` was never raised once. And that was over 4 years ago. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16848#issuecomment-1831087207 From yyang at openjdk.org Wed Nov 29 02:25:05 2023 From: yyang at openjdk.org (Yi Yang) Date: Wed, 29 Nov 2023 02:25:05 GMT Subject: RFR: 8320924: Improve heap dump performance by optimizing archived object checks In-Reply-To: <8Ek_2iD6dG8MJE0AEHlzxcD4GDCYYEmKeVoBMO4PBF8=.4352c26a-76b9-46ae-af3f-8666821c9a9c@github.com> References: <8Ek_2iD6dG8MJE0AEHlzxcD4GDCYYEmKeVoBMO4PBF8=.4352c26a-76b9-46ae-af3f-8666821c9a9c@github.com> Message-ID: On Tue, 28 Nov 2023 20:24:17 GMT, Aleksey Shipilev wrote: > Profiling heap dumping code reveals another simple issue: `mask_dormant_archived_object` on dumping hotpath takes quite a bit of time. We can reflow it for better inlineability, throwing out the non-essential parts into cold method. There is also no reason to peek into java mirror with (default) keep-alive, if we only use the result for null-check. > > Example improvements on Mac M1: > > > % for I in `seq 1 5`; do build/macosx-aarch64-server-release/images/jdk/bin/java -XX:+UseParallelGC -XX:+HeapDumpAfterFullGC -Xms8g -Xmx8g HeapDump.java 2>&1 | grep created; rm *.hprof; done > > # Before > Heap dump file created [1897307608 bytes in 1.584 secs] > Heap dump file created [1897308278 bytes in 1.439 secs] > Heap dump file created [1897308508 bytes in 1.460 secs] > Heap dump file created [1897308505 bytes in 1.423 secs] > Heap dump file created [1897308554 bytes in 1.414 secs] > > # After > Heap dump file created [1897307648 bytes in 1.509 secs] > Heap dump file created [1897308498 bytes in 1.281 secs] > Heap dump file created [1897308554 bytes in 1.282 secs] > Heap dump file created [1897308512 bytes in 1.263 secs] > Heap dump file created [1897308554 bytes in 1.270 secs] > > > ...which is about +12% faster heap dump. > > I also eyeballed the generated code and saw `mask_dormant_archived_object` fully inlined at least on x86_64. Looks reasonable ------------- Marked as reviewed by yyang (Committer). PR Review: https://git.openjdk.org/jdk/pull/16863#pullrequestreview-1754300831 From dlong at openjdk.org Wed Nov 29 03:24:02 2023 From: dlong at openjdk.org (Dean Long) Date: Wed, 29 Nov 2023 03:24:02 GMT Subject: RFR: 8320275: assert(_chunk->bitmap().at(index)) failed: Bit not set at index In-Reply-To: References: Message-ID: On Tue, 28 Nov 2023 00:09:10 GMT, Patricio Chilano Mateo wrote: > Please review the following fix. The assert fails while verifying the top frame of the stackChunk before returning from a thaw call. The stackChunk is in gc mode but we found a narrow oop for this c2 compiled frame that doesn't have its corresponding bit set. This is because while thawing its callee we cleared the bitmap range associated with the argument area, but this narrow oop happens to land at the very last stack slot of that region. > Loom code assumes the size of the argument area is always a multiple of 2 stack slots, as SharedRuntime::java_calling_convention() shows. But c2 doesn't seem to follow this convention and, knowing the last passed argument only takes one stack slot, it's using the remaining space to store a narrow oop for the caller. There are more details about the specific crash in JBS. > > The initial proposed fix is to just restrict the range of the bitmap we clear by excluding the last stack slot of the argument area, since passed oops are always word aligned. I've also experimented with a patch where I changed SharedRuntime::java_calling_convention() and Fingerprinter::do_type_calling_convention() to not round up the number of stack slots used, and then changed the callers to use a round up value or not depending on the needs [1]. I wasn't convinced it was worthy given we only care about this difference in this Loom code, but I don't mind going with that fix instead. The 3rd alternative would be to just change c2 to not use this stack slot and start spilling at a word aligned offset from the sp. > > I run the patch with the failing test and verified the crash doesn't reproduce anymore. I've also run this patch through loom tiers1-5. > > Thanks, > Patricio > > [1] https://github.com/pchilano/jdk/commit/42ae9269b28beb6f36c502182116545b680e418f I don't really like the use of `address` for the pointer that might be either `oop` or `narrowOop`, because the bitmap can't really support an arbitrary `char*` address. I think it would be better to cleanup methods that take `intptr_t*` and instead use `template ` like `bit_index_for`. Also, I would prefer removing the round up in `java_calling_convention`, because the current patch fixes a subtle bug with an even subtler work-around, but let's see what other reviews think. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16837#issuecomment-1831150740 From dholmes at openjdk.org Wed Nov 29 06:26:09 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 29 Nov 2023 06:26:09 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is created for one JavaThread associated with attached native thread [v5] In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Wed, 22 Nov 2023 22:40:20 GMT, Jiangli Zhou wrote: >> Please review JvmtiThreadState::state_for_while_locked change to handle the state->get_thread_oop() null case. Please see https://bugs.openjdk.org/browse/JDK-8319935 for details. > > Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: > > Address Serguei Spitsyn's comments/suggestions: > - Remove the redundant thread->is_Java_thread() check from JvmtiSampledObjectAllocEventCollector::object_alloc_is_safe_to_sample(). > - Change the assert in JvmtiThreadState::state_for_while_locked to avoid #ifdef ASSERT. I think these changes seem reasonable. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16642#pullrequestreview-1754529865 From thartmann at openjdk.org Wed Nov 29 06:34:12 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 29 Nov 2023 06:34:12 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v8] In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 18:35:25 GMT, Volodymyr Paprotski wrote: >> Splitting vblendvp[sd] into boolean operations is bit faster on ECore, get up to 30% gain >> >> >> =============== BEFORE =============== >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorSignum.floatSignum 256 avgt 3 77.766 ? 0.049 ns/op >> VectorSignum.floatSignum 512 avgt 3 154.889 ? 0.242 ns/op >> VectorSignum.floatSignum 1024 avgt 3 306.130 ? 0.605 ns/op >> VectorSignum.floatSignum 2048 avgt 3 609.965 ? 0.927 ns/op >> VectorSignum.doubleSignum 256 avgt 3 151.874 ? 1.748 ns/op >> VectorSignum.doubleSignum 512 avgt 3 303.080 ? 0.310 ns/op >> VectorSignum.doubleSignum 1024 avgt 3 607.517 ? 0.597 ns/op >> VectorSignum.doubleSignum 2048 avgt 3 1214.282 ? 1.834 ns/op >> Benchmark Mode Cnt Score Error Units >> MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op >> MaxMinOptimizeTest.dMax avgt 3 137.334 ? 0.128 us/op >> MaxMinOptimizeTest.dMin avgt 3 137.160 ? 0.465 us/op >> MaxMinOptimizeTest.dMul avgt 3 77.231 ? 0.051 us/op >> MaxMinOptimizeTest.fAdd avgt 3 77.165 ? 0.003 us/op >> MaxMinOptimizeTest.fMax avgt 3 107.428 ? 1.501 us/op >> MaxMinOptimizeTest.fMin avgt 3 107.186 ? 0.022 us/op >> MaxMinOptimizeTest.fMul avgt 3 77.164 ? 0.012 us/op >> >> =============== AFTER =============== >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorSignum.floatSignum 256 avgt 3 61.816 ? 1.980 ns/op >> VectorSignum.floatSignum 512 avgt 3 117.251 ? 0.052 ns/op >> VectorSignum.floatSignum 1024 avgt 3 231.356 ? 0.397 ns/op >> VectorSignum.floatSignum 2048 avgt 3 458.904 ? 0.774 ns/op >> VectorSignum.doubleSignum 256 avgt 3 121.449 ? 0.184 ns/op >> VectorSignum.doubleSignum 512 avgt 3 241.662 ? 0.189 ns/op >> VectorSignum.doubleSignum 1024 avgt 3 482.365 ? 0.165 ns/op >> VectorSignum.doubleSignum 2048 avgt 3 962.412 ? 1.401 ns/op >> Benchmark Mode Cnt Score Error Units >> MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op >> MaxMinOptimizeTest.dMax avgt 3 125.701 ? 0.082 us/op >> MaxMinOptimizeTest.dMin avgt 3 124.704 ? 0.119 us/op >> MaxMinOptimizeTest.dMul avgt 3 77.232 ? 0.028 us/op >> MaxMinOptimizeTest.fAdd avgt 3 77.169 ? 0.103 us/op >> MaxMinOptimizeTest.fMax avgt 3 97.939 ? 0.477 us/op >> MaxMinOptimizeTest.fMin avgt 3 98.012 ? 0.154 us/op >> MaxMinO... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/cpu/x86/x86.ad > > Co-authored-by: Jatin Bhateja Thanks for the notification. I'll run it through our testing and report back. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16716#issuecomment-1831298181 From dholmes at openjdk.org Wed Nov 29 06:38:12 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 29 Nov 2023 06:38:12 GMT Subject: RFR: 8320530: has_resolved_ref_index flag not restored after resetting entry [v2] In-Reply-To: References: <4Bi8mWx5pxfYciHnXHUla1X_BzUt_56q8MoRkCYc0dk=.50072528-a5d9-4708-995b-c0bd49f8c74b@github.com> Message-ID: On Tue, 28 Nov 2023 22:22:22 GMT, Matias Saavedra Silva wrote: >> ResolvedMethodEntry::reset_entry() clears the fields in the structure and then restores the constant pool index and the resolved references index if it has one. Currently, the resolved references index is restored without restoring the has_resolved_reference flag used by the structure. >> >> This patch restored the flag with the resolved references index. Verified with tier 1-5 tests. > > Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Added asserts to ensure correctness > - Merge branch 'master' into resolved_ref_flag > - Merge branch 'master' of https://github.com/openjdk/jdk into resolved_ref_flag > - 8320530: has_resolved_ref_index flag not restored after resetting entry A few nits with the `DEBUG_ONLY` code. src/hotspot/share/oops/resolvedMethodEntry.hpp line 80: > 78: u1 _flags; // Flags: [00|has_resolved_ref_index|has_local_signature|has_appendix|forced_virtual|final|virtual_final] > 79: u1 _bytecode1, _bytecode2; // Resolved invoke codes > 80: DEBUG_ONLY( Nit: it is better to use `#ifdef ASSERT` than a multi-line `DEBUG_ONLY()` - and that will also allow the correct indentation for the new variables. src/hotspot/share/oops/resolvedMethodEntry.hpp line 96: > 94: _bytecode2(0) { > 95: _entry_specific._interface_klass = nullptr; > 96: DEBUG_ONLY( Ditto - use `ifdef ASSERT` src/hotspot/share/oops/resolvedMethodEntry.hpp line 194: > 192: > 193: void set_klass(InstanceKlass* klass) { > 194: DEBUG_ONLY( You don't need this to guard an actual assert statement. src/hotspot/share/oops/resolvedMethodEntry.hpp line 204: > 202: > 203: void set_resolved_references_index(u2 ref_index) { > 204: DEBUG_ONLY( You don't need this to guard an actual assert statement. src/hotspot/share/oops/resolvedMethodEntry.hpp line 214: > 212: > 213: void set_table_index(u2 table_index) { > 214: DEBUG_ONLY( You don't need this to guard an actual assert statement. ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16769#pullrequestreview-1754536538 PR Review Comment: https://git.openjdk.org/jdk/pull/16769#discussion_r1408807310 PR Review Comment: https://git.openjdk.org/jdk/pull/16769#discussion_r1408808145 PR Review Comment: https://git.openjdk.org/jdk/pull/16769#discussion_r1408809189 PR Review Comment: https://git.openjdk.org/jdk/pull/16769#discussion_r1408810238 PR Review Comment: https://git.openjdk.org/jdk/pull/16769#discussion_r1408810513 From stefank at openjdk.org Wed Nov 29 06:38:51 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 29 Nov 2023 06:38:51 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v9] In-Reply-To: References: Message-ID: > In the rewrites made for: > [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` > > I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. > > The provided tests provoke this assert form: > * the JNI thread detach code > * thread dumping with locked monitors, and > * the JVMTI GetOwnedMonitorInfo API. > > While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. > > The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. > > For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. > > Test: the written tests with and without the fix. Tier1-Tier3, so far. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16783/files - new: https://git.openjdk.org/jdk/pull/16783/files/0e68fb68..ca6a7828 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16783&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16783&range=07-08 Stats: 19 lines in 3 files changed: 3 ins; 3 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/16783.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16783/head:pull/16783 PR: https://git.openjdk.org/jdk/pull/16783 From stuefe at openjdk.org Wed Nov 29 06:52:13 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 29 Nov 2023 06:52:13 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v10] In-Reply-To: References: Message-ID: <5L1vHdzmz7cTqCXewE_9h1RSh1Smsb-zunqr8Z9D54Q=.07448064-7da6-4e42-8e1f-f5a06dcd241d@github.com> On Fri, 24 Nov 2023 14:17:36 GMT, Thomas Stuefe wrote: >> In `Metaspace::reserve_address_space_for_compressed_classes`, we reserve space for the future Klass range. We place the Klass range somewhere that allows us to use "good" narrow Klass decoding later when initializing the encoding scheme. >> >> Narrow Klass decoding is inherently CPU-specific, so doing this in shared coding is awkward. It leads to many ifdefs, vague code comments that are difficult to explain, and missed optimizations. >> >> There are common patterns: >> - all platforms benefit from unscaled encoding so trying to reserve <4GB for CDS=off is worthwhile. >> >> But there are more differences than one would think: >> - some platforms (s390, riscv) benefit from reservation < 4GB even with CDS=on since a 32-bit immediate requires fewer instructions >> - some platforms (aarch64) don't benefit from zero-based encoding, so no need to try that >> - some platforms benefit from optimizing the base for 16-bit moves (PPC, s390, aarch64) or for other immediate formats (riscv) >> >> It would be much better to have this section per CPU so that every CPU can implement its perfect, well documented version. A bit of code duplication is a good price for code clarity. >> >> ------------- >> >> This patch splits out `Metaspace::reserve_address_space_for_compressed_classes` into five variants, one per CPU (moving the code to CompressedKlassPointers); it also splits out `CompressedKlassPointers::initialize` into two variants, one for aarch64, one for all other platforms. >> >> Changes per-CPU: >> >> #### aarch64: >> >> Don't attempt to reserve for zero-based encoding; since lsl is not faster than movk. We reserve for movk mode right away if reserve for unscaled fails or if CDS=on. >> >> We also add a last-ditch attempt to reserve optimized for movk via over-alignment. We only do this on aarch64 to prevent errors like this one JDK-8318119: "Invalid narrow Klass base on aarch64 post 8312018" >> >> Since we don't want zero-based encoding, we need an aarch64-specific version for `CompressedKlassPointers::initialize()` >> >> #### riscv: >> >> We attempt to reserve at a "good" base that has only bits set either in [12..32), [32, 44) or in [44, 64). >> >> #### s390: >> >> We attempt to allocate < 4GB unconditionally. > > Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: > > switch off AIX tests since AIX does not support CDS Bug reporter did not report back. I'm pushing now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16743#issuecomment-1831313687 From dholmes at openjdk.org Wed Nov 29 06:55:07 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 29 Nov 2023 06:55:07 GMT Subject: RFR: 8320750: Allow a testcase to run with muliple -Xlog In-Reply-To: References: Message-ID: On Tue, 28 Nov 2023 08:50:34 GMT, Leo Korinth wrote: > I have been starting to change test cases to use `createTestJavaProcessBuilder` instead of `createLimitedTestJavaProcessBuilder` because we severely limit our testing when we use `createLimitedTestJavaProcessBuilder`. Before that change there were no way to add `@require` lines for `-X` options. Unfortunately I made a bug when I introduced that functionality. Sure but I am trying to understand that previous change. I don't speak "stream" so can't figure out what exactly you have done. What I expected you to do was combine the various flags coming in from jtreg arguments and env vars, that would affect the VM under test, then see if that set of args contains the `-X` one the `@requires` refers to. But the added complexity there if actually checking a particular flag value is that you need to know how to combine multiple occurrences of the same arg, the same way that the launcher and/or VM will. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16824#issuecomment-1831316147 From dholmes at openjdk.org Wed Nov 29 07:03:14 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 29 Nov 2023 07:03:14 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v9] In-Reply-To: References: Message-ID: On Wed, 29 Nov 2023 06:38:51 GMT, Stefan Karlsson wrote: >> In the rewrites made for: >> [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` >> >> I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. >> >> The provided tests provoke this assert form: >> * the JNI thread detach code >> * thread dumping with locked monitors, and >> * the JVMTI GetOwnedMonitorInfo API. >> >> While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. >> >> The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. >> >> For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. >> >> Test: the written tests with and without the fix. Tier1-Tier3, so far. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Review comments Updates look good. I think that is all from me. Thanks for your patience on the test issues. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16783#pullrequestreview-1754568705 From stefank at openjdk.org Wed Nov 29 07:08:12 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 29 Nov 2023 07:08:12 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v9] In-Reply-To: References: Message-ID: <70fVhYrgQ11EFtEUlRqStowU0G2IgnbOl2bzUUzWsV0=.4d2abb7c-c1b3-4f53-9b5c-7b4840c98d22@github.com> On Wed, 29 Nov 2023 07:00:06 GMT, David Holmes wrote: > Updates look good. I think that is all from me. Thanks for your patience on the test issues. Thanks for the in-depth review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16783#issuecomment-1831327248 From epeter at openjdk.org Wed Nov 29 07:19:04 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 Nov 2023 07:19:04 GMT Subject: RFR: 8306767: Concurrent repacking of extra data in MethodData is potentially unsafe In-Reply-To: References: Message-ID: On Tue, 28 Nov 2023 17:57:12 GMT, Tom Rodriguez wrote: >> Hmm. So is it that I just have to have a lock on `allocate_bci_to_data`, or do I have to protect `pdata`? Because `pdata` is returned from `Deoptimization::query_update_method_data`, and so I would have to make the lock take a wider scope than that function, right? > > The lock needs to be held while accessing the contents of pdata which is very complicated in this particular usage pattern. Maybe the caller needs to refetch the pointer under a lock instead of passing it out of this method. I see. And how about other usages of `ProfileData` from the `mdo`, like these (there is a few variants of them around): ProfileData* data = mdo->first_data(); ProfileData* data = mdo->bci_to_data(bci); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16840#discussion_r1408845046 From dholmes at openjdk.org Wed Nov 29 07:23:14 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 29 Nov 2023 07:23:14 GMT Subject: RFR: JDK-8320383: refresh libraries cache on AIX in VMError::report [v3] In-Reply-To: References: Message-ID: <95bH6xTusjMnTYydXn0_eaeDLEZoV3QPIl62H4YZObI=.c548f385-8ca5-471a-ba21-01c8d4c3e6af@github.com> On Tue, 28 Nov 2023 14:31:44 GMT, Thomas Stuefe wrote: > Note that back in the day we carried a lot of Solaris-specific coding hiding behind abstractions that only ever were implemented on Solaris. We do this still today for Windows (e.g. the os::vm_allocation_granularity() <-> os::vm_page_size() duality only exists because of Windows). That seemed to have been no problem. Why is it a problem when we do the same for AIX? Two wrongs don't make a right. The legacy we had with Solaris and how the original three OS ports were developed was just something we've had to deal with. And when macOS/BSD came along we started to make inroads to cleaning up a lot of the mess (adding os_posix for example). An abstraction that only needs a non-empty implementation on one platform is not an abstraction it is a platform-specific requirement. Our goal is to try and minimise platform-specific code in shared code - hence the ifdefs here looked excessive, but converting to a not-really-an-abstraction API isn't better IMO. It's a balancing act for sure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16730#issuecomment-1831345382 From epeter at openjdk.org Wed Nov 29 07:26:04 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 Nov 2023 07:26:04 GMT Subject: RFR: 8306767: Concurrent repacking of extra data in MethodData is potentially unsafe In-Reply-To: References: Message-ID: <0mKShAVB-sftch99WqIcX1px5avX5-yejLzdRUEIEg8=.868a1053-bbb3-494a-941a-65c577714a38@github.com> On Wed, 29 Nov 2023 07:16:27 GMT, Emanuel Peter wrote: >> The lock needs to be held while accessing the contents of pdata which is very complicated in this particular usage pattern. Maybe the caller needs to refetch the pointer under a lock instead of passing it out of this method. > > I see. > And how about other usages of `ProfileData` from the `mdo`, like these (there is a few variants of them around): > > ProfileData* data = mdo->first_data(); > ProfileData* data = mdo->bci_to_data(bci); And there are 2 uses of `query_update_method_data`. One does not use the return `pdata`. The other uses it and in some cases updates it. Do you think it is safe to just re-fetch it, or would that potentially cut some connection between the two that should not be cut? The alternative is just to already get the lock before calling `query_update_method_data`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16840#discussion_r1408851190 From never at openjdk.org Wed Nov 29 07:33:04 2023 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 29 Nov 2023 07:33:04 GMT Subject: RFR: 8306767: Concurrent repacking of extra data in MethodData is potentially unsafe In-Reply-To: <0mKShAVB-sftch99WqIcX1px5avX5-yejLzdRUEIEg8=.868a1053-bbb3-494a-941a-65c577714a38@github.com> References: <0mKShAVB-sftch99WqIcX1px5avX5-yejLzdRUEIEg8=.868a1053-bbb3-494a-941a-65c577714a38@github.com> Message-ID: On Wed, 29 Nov 2023 07:23:26 GMT, Emanuel Peter wrote: >> I see. >> And how about other usages of `ProfileData` from the `mdo`, like these (there is a few variants of them around): >> >> ProfileData* data = mdo->first_data(); >> ProfileData* data = mdo->bci_to_data(bci); > > And there are 2 uses of `query_update_method_data`. One does not use the return `pdata`. The other uses it and in some cases updates it. Do you think it is safe to just re-fetch it, or would that potentially cut some connection between the two that should not be cut? > The alternative is just to already get the lock before calling `query_update_method_data`. I think that anything that can return data from the extra data section is a potential danger. bci_to_data calls bci_to_extra_data at the end so it seems potentially unsafe which seems like a huge problem since that's used all over the place. Whether the callers are actually getting or expecting record from extra data is unclear. I would suspect that most places where it's used there should already be a preallocated record. The concurrent repacking really makes it hard to ensure the accesses are safe. I think the API would need to make a stronger split between preallocated records and records which might come from the extra data section. I'm honestly not sure how to make this truly safe. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16840#discussion_r1408858242 From stuefe at openjdk.org Wed Nov 29 07:38:17 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 29 Nov 2023 07:38:17 GMT Subject: RFR: JDK-8320383: refresh libraries cache on AIX in VMError::report [v3] In-Reply-To: <95bH6xTusjMnTYydXn0_eaeDLEZoV3QPIl62H4YZObI=.c548f385-8ca5-471a-ba21-01c8d4c3e6af@github.com> References: <95bH6xTusjMnTYydXn0_eaeDLEZoV3QPIl62H4YZObI=.c548f385-8ca5-471a-ba21-01c8d4c3e6af@github.com> Message-ID: <_MywSq2HJE3tGf9LubgxQzSzYAaKkz6EvKKl_41lm6w=.fd905028-da4d-48a4-9522-00340702ea6d@github.com> On Wed, 29 Nov 2023 07:20:32 GMT, David Holmes wrote: > > Note that back in the day we carried a lot of Solaris-specific coding hiding behind abstractions that only ever were implemented on Solaris. We do this still today for Windows (e.g. the os::vm_allocation_granularity() <-> os::vm_page_size() duality only exists because of Windows). That seemed to have been no problem. Why is it a problem when we do the same for AIX? > > Two wrongs don't make a right. The legacy we had with Solaris and how the original three OS ports were developed was just something we've had to deal with. And when macOS/BSD came along we started to make inroads to cleaning up a lot of the mess (adding os_posix for example). An abstraction that only needs a non-empty implementation on one platform is not an abstraction it is a platform-specific requirement. Our goal is to try and minimise platform-specific code in shared code - hence the ifdefs here looked excessive, but converting to a not-really-an-abstraction API isn't better IMO. It's a balancing act for sure. I understand that. That is why I looked for patterns in other platforms. In this case, doing things ad-hoc on demand (e.g. allocating when printing the first line of a stack trace) is inferior to my eyes than a clean setup at a definitive point (e.g. a hook for platforms to run code right before error handling). ------------- PR Comment: https://git.openjdk.org/jdk/pull/16730#issuecomment-1831362410 From epeter at openjdk.org Wed Nov 29 07:41:04 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 29 Nov 2023 07:41:04 GMT Subject: RFR: 8306767: Concurrent repacking of extra data in MethodData is potentially unsafe In-Reply-To: References: <0mKShAVB-sftch99WqIcX1px5avX5-yejLzdRUEIEg8=.868a1053-bbb3-494a-941a-65c577714a38@github.com> Message-ID: On Wed, 29 Nov 2023 07:30:48 GMT, Tom Rodriguez wrote: >> And there are 2 uses of `query_update_method_data`. One does not use the return `pdata`. The other uses it and in some cases updates it. Do you think it is safe to just re-fetch it, or would that potentially cut some connection between the two that should not be cut? >> The alternative is just to already get the lock before calling `query_update_method_data`. > > I think that anything that can return data from the extra data section is a potential danger. bci_to_data calls bci_to_extra_data at the end so it seems potentially unsafe which seems like a huge problem since that's used all over the place. Whether the callers are actually getting or expecting record from extra data is unclear. I would suspect that most places where it's used there should already be a preallocated record. The concurrent repacking really makes it hard to ensure the accesses are safe. I think the API would need to make a stronger split between preallocated records and records which might come from the extra data section. I'm honestly not sure how to make this truly safe. Ok. So this issue is much bigger than `query_update_method_data` and `allocate_bci_to_data`, is what you are saying. Sounds like I need to study this much deeper. Maybe we need to refactor the while way we access the records? Maybe any access to the records needs to be guarded with a lock, just to be safe? If there are concurrent updates, which are guarded by lock, then should not all reads also be guarded? In the end maybe we just need to make all accesses mutually exclusive, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16840#discussion_r1408866129 From stuefe at openjdk.org Wed Nov 29 07:58:04 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 29 Nov 2023 07:58:04 GMT Subject: RFR: 8320924: Improve heap dump performance by optimizing archived object checks In-Reply-To: <8Ek_2iD6dG8MJE0AEHlzxcD4GDCYYEmKeVoBMO4PBF8=.4352c26a-76b9-46ae-af3f-8666821c9a9c@github.com> References: <8Ek_2iD6dG8MJE0AEHlzxcD4GDCYYEmKeVoBMO4PBF8=.4352c26a-76b9-46ae-af3f-8666821c9a9c@github.com> Message-ID: On Tue, 28 Nov 2023 20:24:17 GMT, Aleksey Shipilev wrote: > Profiling heap dumping code reveals another simple issue: `mask_dormant_archived_object` on dumping hotpath takes quite a bit of time. We can reflow it for better inlineability, throwing out the non-essential parts into cold method. There is also no reason to peek into java mirror with (default) keep-alive, if we only use the result for null-check. > > Example improvements on Mac M1: > > > % for I in `seq 1 5`; do build/macosx-aarch64-server-release/images/jdk/bin/java -XX:+UseParallelGC -XX:+HeapDumpAfterFullGC -Xms8g -Xmx8g HeapDump.java 2>&1 | grep created; rm *.hprof; done > > # Before > Heap dump file created [1897307608 bytes in 1.584 secs] > Heap dump file created [1897308278 bytes in 1.439 secs] > Heap dump file created [1897308508 bytes in 1.460 secs] > Heap dump file created [1897308505 bytes in 1.423 secs] > Heap dump file created [1897308554 bytes in 1.414 secs] > > # After > Heap dump file created [1897307648 bytes in 1.509 secs] > Heap dump file created [1897308498 bytes in 1.281 secs] > Heap dump file created [1897308554 bytes in 1.282 secs] > Heap dump file created [1897308512 bytes in 1.263 secs] > Heap dump file created [1897308554 bytes in 1.270 secs] > > > ...which is about +12% faster heap dump. > > I also eyeballed the generated code and saw `mask_dormant_archived_object` fully inlined at least on x86_64. Okay. I feel "debug" log level is kind of high since we may encounter a lot of these; for such unbound micro-prints I usually use "trace". Up to you. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16863#pullrequestreview-1754639466 From stuefe at openjdk.org Wed Nov 29 08:02:37 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 29 Nov 2023 08:02:37 GMT Subject: RFR: JDK-8320368: Per-CPU optimization of Klass range reservation [v11] In-Reply-To: References: Message-ID: > In `Metaspace::reserve_address_space_for_compressed_classes`, we reserve space for the future Klass range. We place the Klass range somewhere that allows us to use "good" narrow Klass decoding later when initializing the encoding scheme. > > Narrow Klass decoding is inherently CPU-specific, so doing this in shared coding is awkward. It leads to many ifdefs, vague code comments that are difficult to explain, and missed optimizations. > > There are common patterns: > - all platforms benefit from unscaled encoding so trying to reserve <4GB for CDS=off is worthwhile. > > But there are more differences than one would think: > - some platforms (s390, riscv) benefit from reservation < 4GB even with CDS=on since a 32-bit immediate requires fewer instructions > - some platforms (aarch64) don't benefit from zero-based encoding, so no need to try that > - some platforms benefit from optimizing the base for 16-bit moves (PPC, s390, aarch64) or for other immediate formats (riscv) > > It would be much better to have this section per CPU so that every CPU can implement its perfect, well documented version. A bit of code duplication is a good price for code clarity. > > ------------- > > This patch splits out `Metaspace::reserve_address_space_for_compressed_classes` into five variants, one per CPU (moving the code to CompressedKlassPointers); it also splits out `CompressedKlassPointers::initialize` into two variants, one for aarch64, one for all other platforms. > > Changes per-CPU: > > #### aarch64: > > Don't attempt to reserve for zero-based encoding; since lsl is not faster than movk. We reserve for movk mode right away if reserve for unscaled fails or if CDS=on. > > We also add a last-ditch attempt to reserve optimized for movk via over-alignment. We only do this on aarch64 to prevent errors like this one JDK-8318119: "Invalid narrow Klass base on aarch64 post 8312018" > > Since we don't want zero-based encoding, we need an aarch64-specific version for `CompressedKlassPointers::initialize()` > > #### riscv: > > We attempt to reserve at a "good" base that has only bits set either in [12..32), [32, 44) or in [44, 64). > > #### s390: > > We attempt to allocate < 4GB unconditionally. Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: - merge - switch off AIX tests since AIX does not support CDS - Fix test for riscv - Adapt test; exclude on windows - Correctly name test flag - Feedback Felix - remove stray newline - fix macos - fix mistake - feedback andrew - ... and 7 more: https://git.openjdk.org/jdk/compare/78b6c2b4...5dbd4bdb ------------- Changes: https://git.openjdk.org/jdk/pull/16743/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16743&range=10 Stats: 651 lines in 15 files changed: 545 ins; 63 del; 43 mod Patch: https://git.openjdk.org/jdk/pull/16743.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16743/head:pull/16743 PR: https://git.openjdk.org/jdk/pull/16743 From mli at openjdk.org Wed Nov 29 08:28:06 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 29 Nov 2023 08:28:06 GMT Subject: RFR: 8318227: RISC-V: C2 ConvHF2F [v2] In-Reply-To: References: Message-ID: <39l6CMVwgGXR1YNi_1yyMX1qGGW2ASgzF7sjV4HWhDU=.7e309db7-af01-4cd0-9176-714f61e25391@github.com> On Tue, 28 Nov 2023 13:46:40 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/riscv.ad line 8288: >> >>> 8286: __ float16_to_float($dst$$FloatRegister, $src$$Register, $tmp$$Register); >>> 8287: %} >>> 8288: ins_pipe(fp_f2i); >> >> Seems we should use `ins_pipe(pipe_slow)` here as this emits multiple instructions. > > In fact, I'm not quite sure. > I see in the ad file: > > pipe_class pipe_slow() > %{ > instruction_count(10); > > > and, all instruct's with `pipe_slow` are related to cmpxchg, which indeed involve lots of instructions in common case. > But for `float16_to_float`, in normal case, there is at most 5 instructions; only the rare case `NaN` involves more instructions. > > Please let me know how do you think about it. Some more information: 1. `fcvt_w_s_safe` is quite similar to float16_to_float in cost, it's labeled in `fp_f2i`. 2. `float_compare` is too, but labeled with `pipe_class_default` 3. while instruct's with multiple instructions in riscv_v.ad are labeled with `pipe_slow`. I'm not sure what should be chosen here, seems `fcvt_w_s_safe` is more similar to our situation, how do you think about it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16802#discussion_r1408915704 From shade at openjdk.org Wed Nov 29 08:30:04 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 29 Nov 2023 08:30:04 GMT Subject: RFR: 8320924: Improve heap dump performance by optimizing archived object checks In-Reply-To: References: <8Ek_2iD6dG8MJE0AEHlzxcD4GDCYYEmKeVoBMO4PBF8=.4352c26a-76b9-46ae-af3f-8666821c9a9c@github.com> Message-ID: On Wed, 29 Nov 2023 07:55:18 GMT, Thomas Stuefe wrote: > I feel "debug" log level is kind of high since we may encounter a lot of these; for such unbound micro-prints I usually use "trace". Up to you. I agree "trace" would fit better here. Let's see if @iklam has an opinion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16863#issuecomment-1831431816 From sjohanss at openjdk.org Wed Nov 29 08:36:25 2023 From: sjohanss at openjdk.org (Stefan Johansson) Date: Wed, 29 Nov 2023 08:36:25 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48] In-Reply-To: <_lEBVrWV8wrVbmhOiu3AAqPJo_xBs718ZtA9V-VSzGM=.253c0ec8-256e-4dee-b125-90be6338e4b8@github.com> References: <_lEBVrWV8wrVbmhOiu3AAqPJo_xBs718ZtA9V-VSzGM=.253c0ec8-256e-4dee-b125-90be6338e4b8@github.com> Message-ID: On Tue, 28 Nov 2023 02:22:45 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with two additional commits since the last revision: > > - Fix namespace issues (2) > > Co-authored-by: Stefan Johansson <54407259+kstefanj at users.noreply.github.com> > - Fix namespace issues > > Co-authored-by: Stefan Johansson <54407259+kstefanj at users.noreply.github.com> >From a testing point of view I think this is looking good now, re-ran the failing test and some other jstat tests as well and they all pass. Please address the comments from Albert and we can hopefully finish this before RDP1. src/hotspot/share/runtime/cpuTimeCounters.cpp line 91: > 89: } while (old_value != fetched_value); > 90: get_counter(CPUTimeGroups::CPUTimeType::gc_total)->inc(fetched_value); > 91: } Why do we have to do this publish dance? Couldn't the closure that update the diff instead just update the counter? From what I can tell we never have multiple closures active at the same time so should be no race there? ------------- PR Review: https://git.openjdk.org/jdk/pull/15082#pullrequestreview-1754686415 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1408914724 From sjohanss at openjdk.org Wed Nov 29 08:36:28 2023 From: sjohanss at openjdk.org (Stefan Johansson) Date: Wed, 29 Nov 2023 08:36:28 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v47] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 13:46:54 GMT, Albert Mingkun Yang wrote: >> Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: >> >> Cleanup and address comments > > src/hotspot/share/runtime/cpuTimeCounters.cpp line 119: > >> 117: if (CPUTimeGroups::is_gc_counter(_name)) { >> 118: instance->inc_gc_total_cpu_time(net_cpu_time); >> 119: } > > I feel much of this is on the wrong abstraction level; `CPUTimeCounters::update_counter(_name, _total);` should be sufficient here. (The caller handles diff calculation and inc gc-counter if needed.) We could add a new closure just used by GC that 's a sub-class of `ThreadTotalCPUTimeClosure` and just adds this to the constructor: instance->inc_gc_total_cpu_time(net_cpu_time); That way we could get rid of `CPUTimeGroups::is_gc_counter()` as well since all those counters should use the "GC closure" or we can keep it and assert that no GC closure uses the wrong closure. What do you think about that Albert, would that address your concerns? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1408923331 From jvernee at openjdk.org Wed Nov 29 08:58:06 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 29 Nov 2023 08:58:06 GMT Subject: RFR: 8320886: Unsafe_SetMemory0 is not guarded In-Reply-To: References: <5kRdxpEyFZLzxlyHpdHju1w9qLbm4OA6UkVZMr17nt0=.339b7543-574c-4a06-84e9-2ffb9d9a345a@github.com> Message-ID: On Tue, 28 Nov 2023 18:58:30 GMT, Jorn Vernee wrote: >> See JBS issue. >> >> Guard the memory access done in Unsafe_SetMemory0 to prevent a SIGBUS error from crashing the VM when a truncated memory mapped file is accessed. >> >> Testing: local `InternalErrorTest`, Tier 1-5 (ongoing) > > I'm looking into the crash on mac > @JornVernee is there some "new" usage of this method such that it needs guarding? This function is now exposed (indirectly) through the new `MemorySegment::fill` API. There are some rare other uses of `Unsafe.setMemory` in the JDK, and some seem to be operating on user-supplied buffers, so this issue might have existed for longer already. For instance in `GaloisCounterMode`, but it's hard to tell for me where the ByteBuffer that this is operating on comes from. I went through the original review thread starting here: https://mail.openjdk.org/pipermail/hotspot-dev/2019-February/037058.html but I don't see a reference to `setMemory`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16848#issuecomment-1831476713 From stefank at openjdk.org Wed Nov 29 09:04:11 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 29 Nov 2023 09:04:11 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v7] In-Reply-To: <2u7QlAUR0MpiqurUcsrLKdiPsgDPv0jq8HrpAWfu4Mk=.31a59a07-b518-474b-a3d6-410292ede3c4@github.com> References: <54eLI7PoGn3jHcWzniPASmXB0ZUsmxqwe3JRhkyU4bM=.f6ad0469-727c-4f4b-9dd7-334dd7233a9a@github.com> <2u7QlAUR0MpiqurUcsrLKdiPsgDPv0jq8HrpAWfu4Mk=.31a59a07-b518-474b-a3d6-410292ede3c4@github.com> Message-ID: On Mon, 27 Nov 2023 19:33:09 GMT, Serguei Spitsyn wrote: > It will be completely safe to run mach5 tiers 1-4, tier5-svc and 6. In this particular case, the tier6 can be not necessary. It has some important -Xcomp/interp_only_mode related testing. I've run the suggested testing and it passes (except a few unrelated issues) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16783#issuecomment-1831485088 From fyang at openjdk.org Wed Nov 29 09:24:07 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 29 Nov 2023 09:24:07 GMT Subject: RFR: 8318227: RISC-V: C2 ConvHF2F [v2] In-Reply-To: <39l6CMVwgGXR1YNi_1yyMX1qGGW2ASgzF7sjV4HWhDU=.7e309db7-af01-4cd0-9176-714f61e25391@github.com> References: <39l6CMVwgGXR1YNi_1yyMX1qGGW2ASgzF7sjV4HWhDU=.7e309db7-af01-4cd0-9176-714f61e25391@github.com> Message-ID: On Wed, 29 Nov 2023 08:25:12 GMT, Hamlin Li wrote: >> In fact, I'm not quite sure. >> I see in the ad file: >> >> pipe_class pipe_slow() >> %{ >> instruction_count(10); >> >> >> and, all instruct's with `pipe_slow` are related to cmpxchg, which indeed involve lots of instructions in common case. >> But for `float16_to_float`, in normal case, there is at most 5 instructions; only the rare case `NaN` involves more instructions. >> >> Please let me know how do you think about it. > > Some more information: > 1. `fcvt_w_s_safe` is quite similar to float16_to_float in cost, it's labeled in `fp_f2i`. > 2. `float_compare` is too, but labeled with `pipe_class_default` > 3. while instruct's with multiple instructions in riscv_v.ad are labeled with `pipe_slow`. > > I'm not sure what should be chosen here, seems `fcvt_w_s_safe` is more similar to our situation, how do you think about it? I think we should `pipe_slow` here and for `fcvt_w_s_safe` and `float_compare` the multiple instruction cases. `fp_f2i` and `pipe_class_default` are supposed to be used for the single instruction cases. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16802#discussion_r1408987810 From lkorinth at openjdk.org Wed Nov 29 09:24:08 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Wed, 29 Nov 2023 09:24:08 GMT Subject: RFR: 8320750: Allow a testcase to run with muliple -Xlog In-Reply-To: References: Message-ID: On Tue, 28 Nov 2023 23:54:44 GMT, Dean Long wrote: > What about -Xlog:gc=debug,safepoint=trace and other -XX flags that take a comma-separated list? `-XX` flags are handled by JTREG itself (not VMProps), I `filter` them out and ignore them. And the value of `-Xlog` is ignored. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16824#issuecomment-1831515660 From fyang at openjdk.org Wed Nov 29 09:28:06 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 29 Nov 2023 09:28:06 GMT Subject: RFR: 8318227: RISC-V: C2 ConvHF2F [v2] In-Reply-To: References: Message-ID: On Tue, 28 Nov 2023 15:31:20 GMT, Hamlin Li wrote: >> src/hotspot/os_cpu/linux_riscv/riscv_hwprobe.cpp line 52: >> >>> 50: #define RISCV_HWPROBE_EXT_ZBB (1 << 4) >>> 51: #define RISCV_HWPROBE_EXT_ZBS (1 << 5) >>> 52: #define RISCV_HWPROBE_EXT_ZFH (1 << 27) >> >> Will this change in future? Seems it's still not there in the kernel source yet [1]. >> >> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/riscv/include/uapi/asm/hwprobe.h?h=v6.7-rc3 > > The latest message I got is that it will be pushed into kernel soon, we can wait for it landing in kernel if you'd like to. Maybe wait a while for it to land? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16802#discussion_r1408992931 From fyang at openjdk.org Wed Nov 29 09:34:08 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 29 Nov 2023 09:34:08 GMT Subject: RFR: 8318227: RISC-V: C2 ConvHF2F [v2] In-Reply-To: References: Message-ID: On Tue, 28 Nov 2023 15:13:25 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1704: >> >>> 1702: // check whether it's a NaN. >>> 1703: mv(t0, 0x7c00); >>> 1704: andr(tmp, src, t0); >> >> I see from the exponent encoding of float16 on [1], it could be a negative/positive infinity as well when exponent is 0b11111. It depends on whether the significand is zero or not. So it this checking for NAN sufficient? >> >> [1] https://en.wikipedia.org/wiki/Half-precision_floating-point_format > > Goot catch! > > Your observation is right and wrong. :) > We could have a patch like below, but it will scrafise the performance of the normal case(non-NaN, non-Inf), as it adds one extra instructions at the critical path. > Maybe a solution is to add some comments here, to state that NaN and Inf are processed in slow path, and slow path is necessary to NaN, but not necessary to Inf. > How do you think about it? > > > $ git diff src/ > diff --git a/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp > index 1b6140242b8..7413767395f 100644 > --- a/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp > +++ b/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp > @@ -1700,10 +1700,11 @@ void C2_MacroAssembler::float16_to_float(FloatRegister dst, Register src, Regist > auto stub = C2CodeStub::make(dst, src, tmp, 20, float16_to_float_nan_path); > > // check whether it's a NaN. > - mv(t0, 0x7c00); > + mv(t0, 0x7fff); > andr(tmp, src, t0); > + mv(t0, 0x7c00); > // jump to stub processing NaN case > - beq(t0, tmp, stub->entry()); > + bgt(tmp, t0, stub->entry()); > > // non-NaN cases, just use built-in instructions. > fmv_h_x(dst, src); Ah, I haven't checked the slow-path yet. Could it handle negative/positive infinity cases too? If that is true, then we might give it a new name instead of `float16_to_float_nan_path` which might indicates a path only for NAN-handling. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16802#discussion_r1409000423 From lkorinth at openjdk.org Wed Nov 29 09:44:09 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Wed, 29 Nov 2023 09:44:09 GMT Subject: RFR: 8320750: Allow a testcase to run with muliple -Xlog In-Reply-To: References: Message-ID: On Wed, 29 Nov 2023 06:52:33 GMT, David Holmes wrote: > > I have been starting to change test cases to use `createTestJavaProcessBuilder` instead of `createLimitedTestJavaProcessBuilder` because we severely limit our testing when we use `createLimitedTestJavaProcessBuilder`. Before that change there were no way to add `@require` lines for `-X` options. Unfortunately I made a bug when I introduced that functionality. > > Sure but I am trying to understand that previous change. I don't speak "stream" so can't figure out what exactly you have done. What I expected you to do was combine the various flags coming in from jtreg arguments and env vars, that would affect the VM under test, then see if that set of args contains the `-X` one the `@requires` refers to. But the added complexity there if actually checking a particular flag value is that you need to know how to combine multiple occurrences of the same arg, the same way that the launcher and/or VM will. JTREG does the exclusion of test cases when we tag them with `@require` lines. I have no way to know what the `@require` line is. But I have the power to create an (additional) key->value mapping of properties that JTREG will use. As JTREG does not care to do this for `-X` flags and only for `-XX` flags I have to do it myself. So I mapped all `-X` flags to `vm.opt.x.`. The problem is that when you collect using `Collectors.toMap` it will throw an exception if multiple keys are the same --- and that happens with multiple `-Xlog`. This pull request fixes this by adding a third argument to `Collectors.toMap`, a merge strategy. I choose to use the second value so that `-Xms2g -Xms4g` will create `vm.opt.x.Xms4g` for example. However, how should we merge the values of `-Xlog` where the second value is not only what matters? One way is to concatenate the strings, but that is also not the truth. I chose to instead add a dummy value so that you can do a check for the existence of `-Xlog`, but not for its contents. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16824#issuecomment-1831547587 From tschatzl at openjdk.org Wed Nov 29 10:06:27 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 29 Nov 2023 10:06:27 GMT Subject: RFR: 8318706: Implement JEP 423: Region Pinning for G1 [v10] In-Reply-To: <1-i3-5OmZbuCNUlpfv31Kr3eiBXEd4Si8F5gsbPHuBQ=.1d97dcac-4662-4482-842c-ce86315ba61a@github.com> References: <1-i3-5OmZbuCNUlpfv31Kr3eiBXEd4Si8F5gsbPHuBQ=.1d97dcac-4662-4482-842c-ce86315ba61a@github.com> Message-ID: On Fri, 3 Nov 2023 14:14:50 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> typos > > Marked as reviewed by ayang (Reviewer). Thanks @albertnetymk @kstefanj @walulyai for your reviews! Given that the JEP is now targeted, I will integrate. This has been a fairly long journey until today... :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16342#issuecomment-1831581034 From tschatzl at openjdk.org Wed Nov 29 10:06:30 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 29 Nov 2023 10:06:30 GMT Subject: Integrated: 8318706: Implement JEP 423: Region Pinning for G1 In-Reply-To: References: Message-ID: On Tue, 24 Oct 2023 09:56:57 GMT, Thomas Schatzl wrote: > The JEP covers the idea very well, so I'm only covering some implementation details here: > > * regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it. > > * the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways: > > * when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later. > > * when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway). > > * G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can. > > * there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates) > > The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens. > > I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections it is dropped from the candidates. Its current value is fairly random. > > * G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in addition to something like: > > `GC(6) P... This pull request has now been integrated. Changeset: 38cfb220 Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/38cfb220ddadbb401cc15f313aadb8234f626210 Stats: 1823 lines in 59 files changed: 1150 ins; 435 del; 238 mod 8318706: Implement JEP 423: Region Pinning for G1 Reviewed-by: ayang, iwalulya, sjohanss ------------- PR: https://git.openjdk.org/jdk/pull/16342 From mdoerr at openjdk.org Wed Nov 29 10:53:21 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 29 Nov 2023 10:53:21 GMT Subject: RFR: JDK-8320383: refresh libraries cache on AIX in VMError::report [v3] In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 14:46:28 GMT, Matthias Baesken wrote: >> VMError::report outputs information about loaded shared libraries; but this info could be outdated on AIX because the libraries cache is currently not refreshed before printing. >> This is similar to [JDK-8318587](https://bugs.openjdk.org/browse/JDK-8318587) . >> The refresh in VMError::report could be omitted in some situations where the refresh call is considered problematic. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > use new method also in print_vm_info I think the main problem is that it's impossible to make a good design without knowing all requirements. We always find a new issue and need to make a small AIX specific enhancement for this, then for that, ... After having a more complete picture, we should be able to improve it and do some refactoring. https://bugs.openjdk.org/browse/JDK-8320890 is an example. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16730#issuecomment-1831658812 From mli at openjdk.org Wed Nov 29 11:15:23 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 29 Nov 2023 11:15:23 GMT Subject: RFR: 8318227: RISC-V: C2 ConvHF2F [v3] In-Reply-To: References: Message-ID: > Hi, > Can you review the patch to add ConvHF2F intrinsic to JDK for riscv? > Thanks! > > (By latest kernel patch, `#define RISCV_HWPROBE_EXT_ZFH (1 << 27)` > https://lore.kernel.org/lkml/20231114141256.126749-11-cleger at rivosinc.com/) > > ## Test > ### Functionality > #### hotspot tests > test/hotspot/jtreg/compiler/intrinsics/ > test/hotspot/jtreg/compiler/c2/irTests > > #### jdk tests > test/jdk/java/lang/Float/Binary16Conversion*.java > > ### Performance > tested on licheepi. > > #### with UseZfh enabled & stub out-of-band > > Benchmark (size) Mode Cnt Score Error Units > Fp16ConversionBenchmark.float16ToFloat 2048 avgt 10 3493.376 ? 18.631 ns/op > Fp16ConversionBenchmark.float16ToFloatMemory 2048 avgt 10 19.819 ? 0.193 ns/op > > > #### with UseZfh enabled only > (i.e. enable the intrinsic) > > Benchmark (size) Mode Cnt Score Error Units > Fp16ConversionBenchmark.float16ToFloat 2048 avgt 10 4659.796 ? 13.262 ns/op > Fp16ConversionBenchmark.float16ToFloatMemory 2048 avgt 10 22.957 ? 0.098 ns/op > > > #### with UseZfh disabled > (i.e. disable the intrinsic) > > Benchmark (size) Mode Cnt Score Error Units > Fp16ConversionBenchmark.float16ToFloat 2048 avgt 10 22930.591 ? 72.595 ns/op > Fp16ConversionBenchmark.float16ToFloatMemory 2048 avgt 10 25.970 ? 0.063 ns/op Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: Fix pipeline cost in ad; Add comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16802/files - new: https://git.openjdk.org/jdk/pull/16802/files/db50b68a..2872b076 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16802&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16802&range=01-02 Stats: 16 lines in 2 files changed: 10 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/16802.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16802/head:pull/16802 PR: https://git.openjdk.org/jdk/pull/16802 From mli at openjdk.org Wed Nov 29 11:15:25 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 29 Nov 2023 11:15:25 GMT Subject: RFR: 8318227: RISC-V: C2 ConvHF2F [v2] In-Reply-To: References: Message-ID: <4a-y9Ivwf_50BwGs_-7nhrHbi4UptM43ZZw212A991g=.f21a8bd3-c963-4a2e-9a98-92bcfb16173f@github.com> On Wed, 29 Nov 2023 09:30:31 GMT, Fei Yang wrote: >> Goot catch! >> >> Your observation is right and wrong. :) >> We could have a patch like below, but it will scrafise the performance of the normal case(non-NaN, non-Inf), as it adds one extra instructions at the critical path. >> Maybe a solution is to add some comments here, to state that NaN and Inf are processed in slow path, and slow path is necessary to NaN, but not necessary to Inf. >> How do you think about it? >> >> >> $ git diff src/ >> diff --git a/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp b/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp >> index 1b6140242b8..7413767395f 100644 >> --- a/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp >> +++ b/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp >> @@ -1700,10 +1700,11 @@ void C2_MacroAssembler::float16_to_float(FloatRegister dst, Register src, Regist >> auto stub = C2CodeStub::make(dst, src, tmp, 20, float16_to_float_nan_path); >> >> // check whether it's a NaN. >> - mv(t0, 0x7c00); >> + mv(t0, 0x7fff); >> andr(tmp, src, t0); >> + mv(t0, 0x7c00); >> // jump to stub processing NaN case >> - beq(t0, tmp, stub->entry()); >> + bgt(tmp, t0, stub->entry()); >> >> // non-NaN cases, just use built-in instructions. >> fmv_h_x(dst, src); > > Ah, I haven't checked the slow-path yet. Could it handle negative/positive infinity cases too? If that is true, then we might give it a new name instead of `float16_to_float_nan_path` which might indicates a path only for NAN-handling. Yes, it can handle inf cases. Done, also add some comments to state the background and reason. >> The latest message I got is that it will be pushed into kernel soon, we can wait for it landing in kernel if you'd like to. > > Maybe wait a while for it to land? Sure, let's wait for it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16802#discussion_r1409129753 PR Review Comment: https://git.openjdk.org/jdk/pull/16802#discussion_r1409129513 From mli at openjdk.org Wed Nov 29 11:15:26 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 29 Nov 2023 11:15:26 GMT Subject: RFR: 8318227: RISC-V: C2 ConvHF2F [v2] In-Reply-To: References: <39l6CMVwgGXR1YNi_1yyMX1qGGW2ASgzF7sjV4HWhDU=.7e309db7-af01-4cd0-9176-714f61e25391@github.com> Message-ID: On Wed, 29 Nov 2023 09:21:37 GMT, Fei Yang wrote: >> Some more information: >> 1. `fcvt_w_s_safe` is quite similar to float16_to_float in cost, it's labeled in `fp_f2i`. >> 2. `float_compare` is too, but labeled with `pipe_class_default` >> 3. while instruct's with multiple instructions in riscv_v.ad are labeled with `pipe_slow`. >> >> I'm not sure what should be chosen here, seems `fcvt_w_s_safe` is more similar to our situation, how do you think about it? > > I think we should `pipe_slow` here and for `fcvt_w_s_safe` and `float_compare` the multiple instruction cases. `fp_f2i` and `pipe_class_default` are supposed to be used for the single instruction cases. OK, seems we have some old code to clean for `fp_f2i` and `pipe_class_default`? If it's needed, I can do that in a separate pr later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16802#discussion_r1409129254 From jbachorik at openjdk.org Wed Nov 29 11:49:31 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Wed, 29 Nov 2023 11:49:31 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v11] In-Reply-To: References: Message-ID: > Please, review this fix for a corner case handling of `jmethodID` values. > > The issue is related to the interplay between `jmethodID` values and method redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` instance. Once that method gets redefined, the `jmethodID` is updated to point to the last `Method` version. > Unless the method is still on stack/running, in which case the original `jmethodID` will be redirected to the latest `Method` version and at the same time the 'previous' `Method` version will receive a new `jmethodID` pointing to that previous version. > > If we happen to capture stacktrace via `GetStackTrace` or `GetAllStackTraces` JVMTI calls while this previous `Method` version is still on stack we will have the corresponding frame identified by a `jmethodID` pointing to that version. > However, sooner or later the 'previous' class version becomes eligible for cleanup at what time all contained `Method` instances. The cleanup process will not perform the `jmethodID` pointer maintenance and we will end up with pointers to deallocated memory. > This is caused by the fact that the `jmethodID` lifecycle is bound to `ClassLoaderData` instance and all relevant `jmethodID`s will get batch-updated when the class loader is being released and all its classes are getting unloaded. > > This means that we need to make sure that if a `Method` instance is being deallocate the associated `jmethodID` (if any) must not point to the deallocated instance once we are finished. Unfortunately, we can not just update the `jmethodID` values in bulk when purging an old class version - the per `InstanceKlass` jmethodID cache is present only for the main class version and contains `jmethodID` values for both the old and current method versions. > > ~Therefore we need to perform `jmethodID` lookup when we are about to deallocate a `Method` instance and clean up the pointer only if that `jmethodID` is pointing to the `Method` instance which is being deallocated.~ > > Therefore, we need to perform `jmethodID` lookup for each method in an old class version that is getting purged, and null out the pointer of that `jmethodID` to break the link from `jmethodID` to the method instance that is about to get deallocated. > > _(For anyone interested, a much lengthier writeup is available in [my blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_ Jaroslav Bachorik has updated the pull request incrementally with one additional commit since the last revision: Restrict cleanup to obsolete methods only ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16662/files - new: https://git.openjdk.org/jdk/pull/16662/files/81e31dae..aae367fb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16662&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16662&range=09-10 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16662.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16662/head:pull/16662 PR: https://git.openjdk.org/jdk/pull/16662 From jbachorik at openjdk.org Wed Nov 29 11:49:32 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Wed, 29 Nov 2023 11:49:32 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v10] In-Reply-To: References: Message-ID: On Tue, 28 Nov 2023 21:30:16 GMT, Coleen Phillimore wrote: >> Jaroslav Bachorik has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unnecessary assert > > src/hotspot/share/oops/instanceKlass.cpp line 4236: > >> 4234: if (method != nullptr) { >> 4235: method->clear_jmethod_id(); >> 4236: } > > This loops through the methods in the InstanceKlass that was a previous version klass, and clears the jmethodIDs for all the methods. Will it clear the jmethodIDs for the EMCP methods also, and should it? > The jmethodID for EMCP methods are replaced with a the new version, so the Method* in this list won't find a matching jmethodID. Maybe this can be restricted to obsolete methods? Restricting to obsolete methods sounds like a good idea. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1409164812 From stuefe at openjdk.org Wed Nov 29 11:52:16 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 29 Nov 2023 11:52:16 GMT Subject: RFR: JDK-8320383: refresh libraries cache on AIX in VMError::report [v3] In-Reply-To: References: Message-ID: <09-hGDbkCKfxKwdX9AOyBU7V4l8hbi69gokM1tLCjUs=.44de759e-d961-4197-9973-bdebd0c48e03@github.com> On Wed, 29 Nov 2023 10:50:51 GMT, Martin Doerr wrote: > I think the main problem is that it's impossible to make a good design without knowing all requirements. We always find a new issue and need to make a small AIX specific enhancement for this, then for that, ... After having a more complete picture, we should be able to improve it and do some refactoring. https://bugs.openjdk.org/browse/JDK-8320890 is an example. Thank you, Martin. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16730#issuecomment-1831745555 From gcao at openjdk.org Wed Nov 29 12:04:14 2023 From: gcao at openjdk.org (Gui Cao) Date: Wed, 29 Nov 2023 12:04:14 GMT Subject: RFR: 8320397: RISC-V: Avoid passing t0 as temp register to MacroAssembler:: cmpxchg_obj_header/cmpxchgptr Message-ID: MacroAssembler::cmpxchg/cmpxchgptr/cmpxchg_obj_header is non-trivial on linux-riscv64 platform. Passing t0(aka x5) as temporary register to this functions can also be error prone. As a reserved scratch register, t0 is implicitly clobberred by various assembler functions. @robehn can you help review this PR? This issue is used to track avoid passing t0 as a temporary register in the following cases: 1. avoid passing t0 as temp register to MacroAssembler::cmpxchg/cmpxchgptr/cmpxchg_obj_header. 2. avoid passing t0 as temp register to x_load_barrier and x_load_barrier_slow_path function in x_riscv.ad 3. avoid passing t0 as temp register to z_store_barrier and z_color function in z_riscv.ad Note that I didn't touch MacroAssembler::cmpxchg because it seems to me that this function is designed that it allows t0 to be used as the result register. As the result register will be set on exits, there should be no risk when using t0 for receiving the result. ### Testing: - [x] Run tier1-3 tests with qemu 8.1.50 (release) - [x] Run tier1 tests with SiFive unmatched (release) ------------- Commit messages: - RISC-V: Avoid passing t0 as temp register to MacroAssembler::cmpxchg/cmpxchgptr Changes: https://git.openjdk.org/jdk/pull/16880/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16880&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320397 Stats: 41 lines in 5 files changed: 0 ins; 4 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/16880.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16880/head:pull/16880 PR: https://git.openjdk.org/jdk/pull/16880 From duke at openjdk.org Wed Nov 29 12:08:11 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Wed, 29 Nov 2023 12:08:11 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v2] In-Reply-To: References: <9i5yHmpRi3-XqL5lw0-0IexhCDr2FOi5nT4dgY7cWao=.ab8a1d6e-c9fc-4108-820b-374ce7815463@github.com> <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> Message-ID: On Fri, 24 Nov 2023 18:54:37 GMT, Hamlin Li wrote: >> Hey, I saw you already change the code in this patch to not use the specific registers, do you still face the JVM starting issue? > > I think the reason might be: with specific register, you can add effect as `USE_KILL ary, USE_KILL cnt`, but without specific register, currently you have to way to do so. > But, in current patch, it does modify the ary and cnt in the intrinsic, so I wonder if the current (lastest) patch is safe enough in all situation. > > It maybe be helpful to add 2 new register when matching the instrinsic in ad file, and I guess the register allocator will merge different use of temp register together? > But I still think it's not necessary to specify the register when matching arrays_hashcode in ad file. Hi @Hamlin-Li, My apologies for some delay with an answer. Without specifying the "concrete" registers for _ary/cnt/result_, as e.g. as follows: (1) [ iRegP/iRegI ] instruct arrays_hashcode(iRegP ary, iRegI cnt, iRegI result, immI basic_type, iRegLNoSp tmp1, iRegINoSp tmp2, iRegINoSp tmp3, iRegLNoSp tmp4, rFlagsReg cr) or (2) [ iRegPNoSp/iRegINoSp ] instruct arrays_hashcode(iRegPNoSp ary, iRegINoSp cnt, iRegINoSp result, immI basic_type, iRegLNoSp tmp1, iRegINoSp tmp2, iRegINoSp tmp3, iRegLNoSp tmp4, rFlagsReg cr) it's impossible to use '_USE_KILL_' hint in '_effect_' directive because AD compilation fails: Building target 'images' in configuration 'linux-riscv64-server-release' /jdk/src/hotspot/cpu/riscv/gc/z/z_riscv.ad(10306) Syntax Error: :In arrays_hashcode only bound registers can be killed: iRegP ary or Building target 'images' in configuration 'linux-riscv64-server-release' /jdk/src/hotspot/cpu/riscv/gc/z/z_riscv.ad(10306) Syntax Error: :In arrays_hashcode only bound registers can be killed: iRegPNoSp ary IMHO, the usage of '_USE_KILL_' for _ary_/_cnt_ looks reasonable since we actually do use/modify them both in the assembler body, and avoiding any hint or usage other hint ('_USE_'?) may be wrong here even if that does not cause a failure in pre-integration tests. It is interesting that many intrinsics' definitions in AD files similarly use "concrete" registers in X86/RISC-V archs, possibly due to the mentioned "_only bound registers can be killed_" ADLC requirement. I'd like to mention that applying '_USE_KILL_' to 'result' causes the error in runtime: --------------- T H R E A D --------------- Current thread (0x0000003fc0189130): JavaThread "C2 CompilerThread0" daemon [_thread_in_native, id=54524, stack(0x0000003f7bc00000,0x0000003f7be00000) (2048K)] Current CompileTask: C2:828 117 4 java.lang.String::hashCode (60 bytes) Stack: [0x0000003f7bc00000,0x0000003f7be00000], sp=0x0000003f7bdfbba0, free space=2030k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x5e1ad2] Node_Backward_Iterator::next()+0x106 V [libjvm.so+0x5e4040] PhaseCFG::global_code_motion()+0x242 V [libjvm.so+0x5e5266] PhaseCFG::do_global_code_motion()+0x38 V [libjvm.so+0x455142] Compile::Code_Gen()+0x198 V [libjvm.so+0x458168] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0xce4 V [libjvm.so+0x3a239a] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x278 V [libjvm.so+0x45d11e] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x804 V [libjvm.so+0x45f9e4] CompileBroker::compiler_thread_loop()+0x4f0 V [libjvm.so+0x6768b4] JavaThread::thread_main_inner() [clone .part.0]+0x98 V [libjvm.so+0xb1d3ea] Thread::call_run()+0x8c V [libjvm.so+0x97253c] thread_native_entry(Thread*)+0xd0 C [libc.so.6+0x6a51c] C [libc.so.6+0xb7e3e] siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000028 and similarly I do not see in any intrinsics around the usage of '_effect_' hint for '_result_', again both for X86/RISC-V archs. I suggest the definition which looks "in sync" with X86 version and similar RISC-V intrinsics: instruct arrays_hashcode(iRegP_R11 ary, iRegI_R12 cnt, iRegI_R10 result, immI basic_type, iRegLNoSp tmp1, iRegINoSp tmp2, iRegINoSp tmp3, iRegLNoSp tmp4, rFlagsReg cr) %{ match(Set result (VectorizedHashCode (Binary ary cnt) (Binary result basic_type))); effect(TEMP tmp1, TEMP tmp2, TEMP tmp3, TEMP tmp4, USE_KILL ary, USE_KILL cnt, USE basic_type, KILL cr); As an alternative we can use: instruct arrays_hashcode(iRegP ary, iRegI cnt, iRegI_R10 result, immI basic_type, iRegLNoSp tmp1, iRegINoSp tmp2, iRegINoSp tmp3, iRegLNoSp tmp4, rFlagsReg cr) %{ match(Set result (VectorizedHashCode (Binary ary cnt) (Binary result basic_type))); effect(TEMP tmp1, TEMP tmp2, TEMP tmp3, TEMP tmp4, USE basic_type, KILL cr); or as third possibilitiy even introduce additional temp registers and copy ary/cnt into them at the beginning. But I personally do not like these 2 variants. Ideally we need to understand why other intrinsics (a) use "concrete" registers and (b) do not use any hint for 'result' regs, but that look to me as a separate not-so-small activity. :-( What do you think? Could C2 regalloc experts give us more details for usage of concrete regs in intrinsics' definitions? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1409186277 From dholmes at openjdk.org Wed Nov 29 12:32:06 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 29 Nov 2023 12:32:06 GMT Subject: RFR: 8320750: Allow a testcase to run with muliple -Xlog In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 13:32:52 GMT, Leo Korinth wrote: > Running a testcase with muliple -Xlog crashes JTREG test cases. This is because `Collector.toMap` is not given a merge strategy. > > When the same argument is passed multiple times, I have added a merge strategy to use the latter value. This is similar to how it is implemented for `vm.opt.*` in JTREG. > > If the flag tested is `-Xlog`, replace the value part with a dummy value "NONEMPTY_TEST_SENTINEL". This is because in the case of multiple `-Xlog` all values are used, and JTREG does not give a satisfactory way to represent them. This dummy value should make it hard to try to `@require` on specific values by mistake. > > Tested with: > > @requires vm.opt.x.Xlog == "NONEMPTY_TEST_SENTINEL" > @requires vm.opt.x.Xlog == "NONEMPTY_TEST_SENTINELXXX" > @requires vm.opt.x.Xms == "3g" > > and > > JAVA_OPTIONS=-Xms3g -Xms4g > JAVA_OPTIONS=-Xms4g -Xms3g > JAVA_OPTIONS=-Xlog:gc* -Xlog:gc* > ``` > > Running tier1 This seems a rather fragile mechanism. In practice I expect there are only a handful of -X flags tests really care about - and some of them already handled (e.g. -Xint, -Xmixed,-Xcomp are exposed by the vm.mode value). Merging requires you to know how the launcher and VM would process things and it is different for different things. So what you have now acts as "last one wins" with a special case for -Xlog such that only its presence can be detected not its value (not a loss I think). It seems like this fixes the bug your original code had, but I can't review it as I don't understand the actual code involved. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16824#issuecomment-1831804103 From eosterlund at openjdk.org Wed Nov 29 12:40:09 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 29 Nov 2023 12:40:09 GMT Subject: RFR: 8310644: Make panama memory segment close use async handshakes [v3] In-Reply-To: References: Message-ID: <0KmqmP-3IThcICrsSDxm1BDVPWKarvfoqgAUNnxdA2E=.0891e938-e432-440b-871e-250dd8de02d9@github.com> On Thu, 23 Nov 2023 16:32:28 GMT, Jorn Vernee wrote: >> Erik ?sterlund has updated the pull request incrementally with one additional commit since the last revision: >> >> Comments from Jorn > > LGTM Thanks for the reviews @JornVernee @pchilano and @mcimadamore! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16792#issuecomment-1831817744 From eosterlund at openjdk.org Wed Nov 29 12:43:20 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 29 Nov 2023 12:43:20 GMT Subject: Integrated: 8310644: Make panama memory segment close use async handshakes In-Reply-To: References: Message-ID: On Thu, 23 Nov 2023 11:14:29 GMT, Erik ?sterlund wrote: > The current logic for closing memory in panama today is susceptible to live lock if we have a closing thread that wants to close the memory in a loop that keeps failing, and a bunch of accessing threads that want to perform accesses as long as the memory is alive. They can both create impediments for the other. > > By using asynchronous handshakes to install an exception onto threads that are in @Scoped memory accesses, we can have close always succeed, and the accessing threads bail out. The idea is that we perform a synchronous handshake first to find threads that are in scoped methods. They might however be in the middle of throwing an exception or something wild like there, where an exception can't be delivered. We install an async handshake that will roll us forward to the first place where we can indeed install exceptions, then we reevaluate if we still need to do that, or if we have unwound out from the scoped method. If we are still inside of it, we ensure an exception is installed so we don't continue executing bytecodes that might access the memory that we have freed. > > Tested tier 1-5 as well as running test/jdk/java/foreign/TestHandshake.java hundreds of times, which tests this API pretty well. This pull request has now been integrated. Changeset: 15946532 Author: Erik ?sterlund URL: https://git.openjdk.org/jdk/commit/159465324fc45325d0df438991032ebca9229ca2 Stats: 222 lines in 8 files changed: 76 ins; 63 del; 83 mod 8310644: Make panama memory segment close use async handshakes Reviewed-by: jvernee, mcimadamore, pchilanomate ------------- PR: https://git.openjdk.org/jdk/pull/16792 From stuefe at openjdk.org Wed Nov 29 13:19:35 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 29 Nov 2023 13:19:35 GMT Subject: Integrated: JDK-8320368: Per-CPU optimization of Klass range reservation In-Reply-To: References: Message-ID: <1ZRwoHVlKGa9kOnjxBAPrryHPEMwnoEZyVcfg95ht_4=.58fbf569-5e39-48e3-bebb-d225dad23efb@github.com> On Mon, 20 Nov 2023 16:38:17 GMT, Thomas Stuefe wrote: > In `Metaspace::reserve_address_space_for_compressed_classes`, we reserve space for the future Klass range. We place the Klass range somewhere that allows us to use "good" narrow Klass decoding later when initializing the encoding scheme. > > Narrow Klass decoding is inherently CPU-specific, so doing this in shared coding is awkward. It leads to many ifdefs, vague code comments that are difficult to explain, and missed optimizations. > > There are common patterns: > - all platforms benefit from unscaled encoding so trying to reserve <4GB for CDS=off is worthwhile. > > But there are more differences than one would think: > - some platforms (s390, riscv) benefit from reservation < 4GB even with CDS=on since a 32-bit immediate requires fewer instructions > - some platforms (aarch64) don't benefit from zero-based encoding, so no need to try that > - some platforms benefit from optimizing the base for 16-bit moves (PPC, s390, aarch64) or for other immediate formats (riscv) > > It would be much better to have this section per CPU so that every CPU can implement its perfect, well documented version. A bit of code duplication is a good price for code clarity. > > ------------- > > This patch splits out `Metaspace::reserve_address_space_for_compressed_classes` into five variants, one per CPU (moving the code to CompressedKlassPointers); it also splits out `CompressedKlassPointers::initialize` into two variants, one for aarch64, one for all other platforms. > > Changes per-CPU: > > #### aarch64: > > Don't attempt to reserve for zero-based encoding; since lsl is not faster than movk. We reserve for movk mode right away if reserve for unscaled fails or if CDS=on. > > We also add a last-ditch attempt to reserve optimized for movk via over-alignment. We only do this on aarch64 to prevent errors like this one JDK-8318119: "Invalid narrow Klass base on aarch64 post 8312018" > > Since we don't want zero-based encoding, we need an aarch64-specific version for `CompressedKlassPointers::initialize()` > > #### riscv: > > We attempt to reserve at a "good" base that has only bits set either in [12..32), [32, 44) or in [44, 64). > > #### s390: > > We attempt to allocate < 4GB unconditionally. This pull request has now been integrated. Changeset: 033cced6 Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/033cced6e11bbe7862d9cdd279264b3098d294ba Stats: 651 lines in 15 files changed: 545 ins; 63 del; 43 mod 8320368: Per-CPU optimization of Klass range reservation Reviewed-by: rkennke, rehn ------------- PR: https://git.openjdk.org/jdk/pull/16743 From coleenp at openjdk.org Wed Nov 29 13:32:12 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 29 Nov 2023 13:32:12 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v10] In-Reply-To: References: Message-ID: <5w8UvO58ZzevFI_UFvTBwsMOWaH3miHkDRH_d_l1_2E=.cd37f447-aa65-4125-87f9-2360561236cb@github.com> On Wed, 29 Nov 2023 11:45:53 GMT, Jaroslav Bachorik wrote: >> src/hotspot/share/oops/instanceKlass.cpp line 4236: >> >>> 4234: if (method != nullptr) { >>> 4235: method->clear_jmethod_id(); >>> 4236: } >> >> This loops through the methods in the InstanceKlass that was a previous version klass, and clears the jmethodIDs for all the methods. Will it clear the jmethodIDs for the EMCP methods also, and should it? >> The jmethodID for EMCP methods are replaced with a the new version, so the Method* in this list won't find a matching jmethodID. Maybe this can be restricted to obsolete methods? > > Restricting to obsolete methods sounds like a good idea. Can you confirm my observation above, that EMCP jmethodIDs are replaced? I haven't looked at this code in a while. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1409283757 From lkorinth at openjdk.org Wed Nov 29 13:57:06 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Wed, 29 Nov 2023 13:57:06 GMT Subject: RFR: 8320750: Allow a testcase to run with muliple -Xlog In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 13:32:52 GMT, Leo Korinth wrote: > Running a testcase with muliple -Xlog crashes JTREG test cases. This is because `Collector.toMap` is not given a merge strategy. > > When the same argument is passed multiple times, I have added a merge strategy to use the latter value. This is similar to how it is implemented for `vm.opt.*` in JTREG. > > If the flag tested is `-Xlog`, replace the value part with a dummy value "NONEMPTY_TEST_SENTINEL". This is because in the case of multiple `-Xlog` all values are used, and JTREG does not give a satisfactory way to represent them. This dummy value should make it hard to try to `@require` on specific values by mistake. > > Tested with: > > @requires vm.opt.x.Xlog == "NONEMPTY_TEST_SENTINEL" > @requires vm.opt.x.Xlog == "NONEMPTY_TEST_SENTINELXXX" > @requires vm.opt.x.Xms == "3g" > > and > > JAVA_OPTIONS=-Xms3g -Xms4g > JAVA_OPTIONS=-Xms4g -Xms3g > JAVA_OPTIONS=-Xlog:gc* -Xlog:gc* > ``` > > Running tier1 I think this feature will mostly be used to filter out un-allowed flag combinations, and I guess the user will seldom if ever be interested in the actual values (just the keys). Do you have an idea for something that is less fragile? I find it a bit ugly to special case `-Xlog` as was done by me and if you prefer we could set the value to the last `-Xlog`, but that is ugly as well. I wanted this to be similar as the built in `-XX` and from my understanding that code just uses the latest value. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16824#issuecomment-1831939865 From aw at openjdk.org Wed Nov 29 14:29:15 2023 From: aw at openjdk.org (Andreas Woess) Date: Wed, 29 Nov 2023 14:29:15 GMT Subject: RFR: JDK-8320892: AArch64: Restore FPU control state after JNI [v3] In-Reply-To: References: <-Jv5Xvre3lonydwQ5uzYN3QB8V0VIuORIhM1RtIdW5g=.06167df9-7268-4945-8e18-a04d19ee97e1@github.com> Message-ID: On Tue, 28 Nov 2023 15:58:04 GMT, Andrew Haley wrote: >> Some buggy libraries corrupt the floating-point control register. Provide something similar to the x86 RestoreMXCSROnJNICalls. >> >> I realize that using the x86ish name "RestoreMXCSROnJNICalls" might be a little controversial, but it is a _global_ flag, not a CPU-specific one. And it's clearly intended for this purpose. It might have been better if that flag had been given a better name twentyish years ago, but we can't change it now. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Fix thinko src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 4441: > 4439: bfi(tmp1, zr, 8, 5); // Clear exception-control bits (8-12) > 4440: eor(tmp2, tmp1, tmp2); > 4441: cbz(tmp2, OK); // Only reset FPCR if it's wrong should we maybe do the same in [generate_call_stub](https://github.com/openjdk/jdk/pull/16637/files#diff-9112056f732229b18fec48fb0b20a3fe824de49d0abd41fbdb4202cfe70ad114R266), too (likely faster)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16851#discussion_r1409364879 From aph at openjdk.org Wed Nov 29 14:52:14 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 29 Nov 2023 14:52:14 GMT Subject: RFR: JDK-8320892: AArch64: Restore FPU control state after JNI [v3] In-Reply-To: References: <-Jv5Xvre3lonydwQ5uzYN3QB8V0VIuORIhM1RtIdW5g=.06167df9-7268-4945-8e18-a04d19ee97e1@github.com> Message-ID: On Wed, 29 Nov 2023 14:26:30 GMT, Andreas Woess wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix thinko > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 4441: > >> 4439: bfi(tmp1, zr, 8, 5); // Clear exception-control bits (8-12) >> 4440: eor(tmp2, tmp1, tmp2); >> 4441: cbz(tmp2, OK); // Only reset FPCR if it's wrong > > should we maybe do the same in [generate_call_stub](https://github.com/openjdk/jdk/pull/16637/files#diff-9112056f732229b18fec48fb0b20a3fe824de49d0abd41fbdb4202cfe70ad114R266), too (likely faster)? Sure, it would be. I didn't think it was worthwhile, but we have the code now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16851#discussion_r1409398354 From eosterlund at openjdk.org Wed Nov 29 14:55:07 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 29 Nov 2023 14:55:07 GMT Subject: RFR: 8306767: Concurrent repacking of extra data in MethodData is potentially unsafe In-Reply-To: References: <0mKShAVB-sftch99WqIcX1px5avX5-yejLzdRUEIEg8=.868a1053-bbb3-494a-941a-65c577714a38@github.com> Message-ID: On Wed, 29 Nov 2023 07:38:50 GMT, Emanuel Peter wrote: >> I think that anything that can return data from the extra data section is a potential danger. bci_to_data calls bci_to_extra_data at the end so it seems potentially unsafe which seems like a huge problem since that's used all over the place. Whether the callers are actually getting or expecting record from extra data is unclear. I would suspect that most places where it's used there should already be a preallocated record. The concurrent repacking really makes it hard to ensure the accesses are safe. I think the API would need to make a stronger split between preallocated records and records which might come from the extra data section. I'm honestly not sure how to make this truly safe. > > Ok. So this issue is much bigger than `query_update_method_data` and `allocate_bci_to_data`, is what you are saying. Sounds like I need to study this much deeper. Maybe we need to refactor the while way we access the records? Maybe any access to the records needs to be guarded with a lock, just to be safe? > If there are concurrent updates, which are guarded by lock, then should not all reads also be guarded? In the end maybe we just need to make all accesses mutually exclusive, right? Yeah, reading data from the extra data section, requires holding the lock. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16840#discussion_r1409402346 From jbachorik at openjdk.org Wed Nov 29 15:00:13 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Wed, 29 Nov 2023 15:00:13 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v10] In-Reply-To: <5w8UvO58ZzevFI_UFvTBwsMOWaH3miHkDRH_d_l1_2E=.cd37f447-aa65-4125-87f9-2360561236cb@github.com> References: <5w8UvO58ZzevFI_UFvTBwsMOWaH3miHkDRH_d_l1_2E=.cd37f447-aa65-4125-87f9-2360561236cb@github.com> Message-ID: On Wed, 29 Nov 2023 13:29:18 GMT, Coleen Phillimore wrote: >> Restricting to obsolete methods sounds like a good idea. > > Can you confirm my observation above, that EMCP jmethodIDs are replaced? I haven't looked at this code in a while. Thanks. I am going to take a look. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1409410538 From sgibbons at openjdk.org Wed Nov 29 15:01:32 2023 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 29 Nov 2023 15:01:32 GMT Subject: RFR: JDK-8320448 Accelerate IndexOf using AVX2 [v2] In-Reply-To: References: Message-ID: > Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: > > > Benchmark Score Latest > StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x > StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x > StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x > StringIndexOf.constantPattern 9.361 11.906 1.271872663x > StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x > StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x > StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x > StringIndexOf.success 9.186 9.713 1.057369911x > StringIndexOf.successBig 14.341 46.343 3.231504079x > StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x > StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x > StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x > StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x > StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x > StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x > StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x > StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Only use optimization when EnableX86ECoreOpts is true ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16753/files - new: https://git.openjdk.org/jdk/pull/16753/files/60d762b9..e614b86f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16753&range=00-01 Stats: 6 lines in 3 files changed: 0 ins; 4 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16753.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16753/head:pull/16753 PR: https://git.openjdk.org/jdk/pull/16753 From jbachorik at openjdk.org Wed Nov 29 15:18:15 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Wed, 29 Nov 2023 15:18:15 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v10] In-Reply-To: References: <5w8UvO58ZzevFI_UFvTBwsMOWaH3miHkDRH_d_l1_2E=.cd37f447-aa65-4125-87f9-2360561236cb@github.com> Message-ID: On Wed, 29 Nov 2023 14:57:34 GMT, Jaroslav Bachorik wrote: >> Can you confirm my observation above, that EMCP jmethodIDs are replaced? I haven't looked at this code in a while. Thanks. > > I am going to take a look. Ok, I found it. The reason for the jmethodID not being cleaned out is this assignment of a new jmethodID to obsolete methods - https://github.com/openjdk/jdk/blob/a2c5f1fc914ef5c28d044b75598f895cf6097138/src/hotspot/share/prims/jvmtiRedefineClasses.cpp#L3887 Since this is not done for EMCP (or more generic, non-obsolete) methods restricting the cleanup process to obsolete methods still sounds like the right thing to do. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1409436435 From ayang at openjdk.org Wed Nov 29 15:27:52 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 29 Nov 2023 15:27:52 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48] In-Reply-To: References: <_lEBVrWV8wrVbmhOiu3AAqPJo_xBs718ZtA9V-VSzGM=.253c0ec8-256e-4dee-b125-90be6338e4b8@github.com> Message-ID: On Wed, 29 Nov 2023 08:24:22 GMT, Stefan Johansson wrote: >> Jonathan Joo has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fix namespace issues (2) >> >> Co-authored-by: Stefan Johansson <54407259+kstefanj at users.noreply.github.com> >> - Fix namespace issues >> >> Co-authored-by: Stefan Johansson <54407259+kstefanj at users.noreply.github.com> > > src/hotspot/share/runtime/cpuTimeCounters.cpp line 91: > >> 89: } while (old_value != fetched_value); >> 90: get_counter(CPUTimeGroups::CPUTimeType::gc_total)->inc(fetched_value); >> 91: } > > Why do we have to do this publish dance? Couldn't the closure that update the diff instead just update the counter? From what I can tell we never have multiple closures active at the same time so should be no race there? This two-step update does seem unnecessary, IMO. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1409452326 From ayang at openjdk.org Wed Nov 29 15:27:55 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 29 Nov 2023 15:27:55 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v47] In-Reply-To: References: Message-ID: On Wed, 29 Nov 2023 08:31:37 GMT, Stefan Johansson wrote: >> src/hotspot/share/runtime/cpuTimeCounters.cpp line 119: >> >>> 117: if (CPUTimeGroups::is_gc_counter(_name)) { >>> 118: instance->inc_gc_total_cpu_time(net_cpu_time); >>> 119: } >> >> I feel much of this is on the wrong abstraction level; `CPUTimeCounters::update_counter(_name, _total);` should be sufficient here. (The ~caller~ callee handles diff calculation and inc gc-counter if needed.) > > We could add a new closure just used by GC that 's a sub-class of `ThreadTotalCPUTimeClosure` and just adds this to the constructor: > > instance->inc_gc_total_cpu_time(net_cpu_time); > > > That way we could get rid of `CPUTimeGroups::is_gc_counter()` as well since all those counters should use the "GC closure" or we can keep it and assert that no GC closure uses the wrong closure. > > What do you think about that Albert, would that address your concerns? (I just realized that I made a typo in my previous msg; should be *callee* instead.) That is what I have in mind. void CPUTimeCounters::update_counter(name, total) { auto counter = get_counter(name); auto old_v = counter->get_value(); auto diff = total - old_v; counter->inc(diff); if (counter->is_gc_counter()) { counter->inc(diff); } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1409450168 From sjohanss at openjdk.org Wed Nov 29 16:02:18 2023 From: sjohanss at openjdk.org (Stefan Johansson) Date: Wed, 29 Nov 2023 16:02:18 GMT Subject: RFR: 8320916: jdk/jfr/event/gc/stacktrace/TestParallelMarkSweepAllocationPendingStackTrace.java failed with "OutOfMemoryError: GC overhead limit exceeded" In-Reply-To: References: Message-ID: On Wed, 29 Nov 2023 00:35:47 GMT, Albert Mingkun Yang wrote: > Simple fix to reduce live set so that after the triggered full-gc, there is still some memory left. > > Test: ~2/10 failure before the fix and no failure observed for 100 iterations. Looks good. ------------- Marked as reviewed by sjohanss (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16870#pullrequestreview-1755634055 From jiangli at openjdk.org Wed Nov 29 16:02:19 2023 From: jiangli at openjdk.org (Jiangli Zhou) Date: Wed, 29 Nov 2023 16:02:19 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is created for one JavaThread associated with attached native thread [v5] In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Tue, 28 Nov 2023 21:28:31 GMT, Jiangli Zhou wrote: >> This fix looks good to me, so approved now. >> I assume it is for 22. Is it correct? > >> This fix looks good to me, so approved now. I assume it is for 22. Is it correct? > > Thanks for the careful review, @sspitsyn! The fix is for 22. We probably should also consider back-porting to JDK 11 to prevent any potential changes in the area accidentally reintroducing the bug. The https://bugs.openjdk.org/browse/JDK-8312174 change has been back-ported to 11, which resolved the crashes by luck. > > I'll request backport after this fix is integrated. > @jianglizhou Thank you for filing the sub-task. You have already seen some crashes. Even though you do not have a standalone test case, it is still valuable if you describe a test scenario (at least, surfacely) which helped to observe the problem. Could you, add it to the sub-task report, please? Hi @sspitsyn I'll comment on https://bugs.openjdk.org/browse/JDK-8320614 later, thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16642#issuecomment-1832193011 From jiangli at openjdk.org Wed Nov 29 16:05:10 2023 From: jiangli at openjdk.org (Jiangli Zhou) Date: Wed, 29 Nov 2023 16:05:10 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is created for one JavaThread associated with attached native thread [v2] In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Thu, 16 Nov 2023 21:58:17 GMT, Man Cao wrote: >> Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: >> >> Don't try to setup_jvmti_thread_state for obj allocation sampling if the current thread is attaching from native and is allocating the thread oop. That's to make sure we don't create a 'partial' JvmtiThreadState. > > Thanks. The latest change to `JvmtiSampledObjectAllocEventCollector::object_alloc_is_safe_to_sample()` looks OK to me. Skipping a few allocations for JVMTI allocation sampler is better than resulting in a problematic `JvmtiThreadState` instance. > > My main question is if we can now change > `if (state == nullptr || state->get_thread_oop() != thread_oop) ` to `if (state == nullptr)` in `JvmtiThreadState::state_for_while_locked()`. I suspect we would never run into a case of `state != nullptr && state->get_thread_oop() != thread_oop` with the latest change, even with virtual threads. This is backed up by testing with https://github.com/openjdk/jdk/commit/00ace66c36243671a0fb1b673b3f9845460c6d22 not triggering any failure. > > If we run into such as a case, it could still be problematic as `JvmtiThreadState::state_for_while_locked()` would allocate a new `JvmtiThreadState` instance pointing to the same JavaThread, and it does not delete the existing instance. > > Could anyone with deep knowledge on JvmtiThreadState and virtual threads provide some feedback on this change and https://bugs.openjdk.org/browse/JDK-8319935? @AlanBateman, do you know who would be the best reviewer for this? @caoman @dholmes-ora Thank you for the reviews and discussions in this thread. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16642#issuecomment-1832198014 From jiangli at openjdk.org Wed Nov 29 16:09:20 2023 From: jiangli at openjdk.org (Jiangli Zhou) Date: Wed, 29 Nov 2023 16:09:20 GMT Subject: Integrated: 8319935: Ensure only one JvmtiThreadState is created for one JavaThread associated with attached native thread In-Reply-To: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Mon, 13 Nov 2023 21:52:22 GMT, Jiangli Zhou wrote: > Please review JvmtiThreadState::state_for_while_locked change to handle the state->get_thread_oop() null case. Please see https://bugs.openjdk.org/browse/JDK-8319935 for details. This pull request has now been integrated. Changeset: da7bcfcf Author: Jiangli Zhou URL: https://git.openjdk.org/jdk/commit/da7bcfcf6e45486a0427e0ceaba74d52acbd722f Stats: 28 lines in 2 files changed: 22 ins; 4 del; 2 mod 8319935: Ensure only one JvmtiThreadState is created for one JavaThread associated with attached native thread Reviewed-by: manc, dholmes, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/16642 From never at openjdk.org Wed Nov 29 16:12:07 2023 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 29 Nov 2023 16:12:07 GMT Subject: RFR: JDK-8320892: AArch64: Restore FPU control state after JNI [v3] In-Reply-To: References: <-Jv5Xvre3lonydwQ5uzYN3QB8V0VIuORIhM1RtIdW5g=.06167df9-7268-4945-8e18-a04d19ee97e1@github.com> Message-ID: On Tue, 28 Nov 2023 15:58:04 GMT, Andrew Haley wrote: >> Some buggy libraries corrupt the floating-point control register. Provide something similar to the x86 RestoreMXCSROnJNICalls. >> >> I realize that using the x86ish name "RestoreMXCSROnJNICalls" might be a little controversial, but it is a _global_ flag, not a CPU-specific one. And it's clearly intended for this purpose. It might have been better if that flag had been given a better name twentyish years ago, but we can't change it now. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Fix thinko x86 also has a restore call in DowncallLinker::StubGenerator::generate so you might consider adding there as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16851#issuecomment-1832210732 From never at openjdk.org Wed Nov 29 16:17:08 2023 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 29 Nov 2023 16:17:08 GMT Subject: RFR: JDK-8320892: AArch64: Restore FPU control state after JNI [v3] In-Reply-To: References: <-Jv5Xvre3lonydwQ5uzYN3QB8V0VIuORIhM1RtIdW5g=.06167df9-7268-4945-8e18-a04d19ee97e1@github.com> Message-ID: On Tue, 28 Nov 2023 15:58:04 GMT, Andrew Haley wrote: >> Some buggy libraries corrupt the floating-point control register. Provide something similar to the x86 RestoreMXCSROnJNICalls. >> >> I realize that using the x86ish name "RestoreMXCSROnJNICalls" might be a little controversial, but it is a _global_ flag, not a CPU-specific one. And it's clearly intended for this purpose. It might have been better if that flag had been given a better name twentyish years ago, but we can't change it now. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Fix thinko Regarding the flag name you could introduce a better name and alias to it RestoreMXCSROnJNICalls. Reusing the same name it definitely confusing. At a minimum you need to update the documentation for RestoreMXCSROnJNICalls to indicate that it also does something on aarch64 or generalize the language to something like `Restore floating point control word when returning from JNI calls`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16851#issuecomment-1832221013 From eastigeevich at openjdk.org Wed Nov 29 16:21:16 2023 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 29 Nov 2023 16:21:16 GMT Subject: RFR: 8321025: Enable Neoverse N1 optimizations for Neoverse V2 Message-ID: As Arm Neoverse V2 will benefit from the same optimizations as Neoverse N1 does, it should have OnSpinWaitInst/OnSpinWaitInstCount defaults set to "isb"/1 and UseSIMDForMemoryOps default set to true. This patch sets these flags accordingly for the V2 architecture. ------------- Commit messages: - 8321025: Enable Neoverse N1 optimizations for Neoverse V2 Changes: https://git.openjdk.org/jdk/pull/16887/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16887&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8321025 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16887.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16887/head:pull/16887 PR: https://git.openjdk.org/jdk/pull/16887 From eastigeevich at openjdk.org Wed Nov 29 16:24:05 2023 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 29 Nov 2023 16:24:05 GMT Subject: RFR: 8321025: Enable Neoverse N1 optimizations for Neoverse V2 In-Reply-To: References: Message-ID: On Wed, 29 Nov 2023 16:16:12 GMT, Evgeny Astigeevich wrote: > As Arm Neoverse V2 will benefit from the same optimizations as Neoverse N1 does, it should have OnSpinWaitInst/OnSpinWaitInstCount defaults set to "isb"/1 and UseSIMDForMemoryOps default set to true. > This patch sets these flags accordingly for the V2 architecture. @nick-arm, could you please have a look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16887#issuecomment-1832246614 From aph at openjdk.org Wed Nov 29 16:39:05 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 29 Nov 2023 16:39:05 GMT Subject: RFR: JDK-8320892: AArch64: Restore FPU control state after JNI [v3] In-Reply-To: References: <-Jv5Xvre3lonydwQ5uzYN3QB8V0VIuORIhM1RtIdW5g=.06167df9-7268-4945-8e18-a04d19ee97e1@github.com> Message-ID: <6X5-aU23gA-kOpUJ_9DSD6rhX2SUKz3s-hLIXUsV_oE=.09144ed5-aafe-4387-b75d-5f8ad8f07350@github.com> On Wed, 29 Nov 2023 16:14:40 GMT, Tom Rodriguez wrote: > Regarding the flag name you could introduce a better name and alias to it RestoreMXCSROnJNICalls. Ah, I did not know that flag aliases were possible. I'll have a look at something generic. > Reusing the same name it definitely confusing. At a minimum you need to update the documentation for RestoreMXCSROnJNICalls to indicate that it also does something on aarch64 or generalize the language to something like `Restore floating point control word when returning from JNI calls`. I'll go digging. > x86 also has a restore call in DowncallLinker::StubGenerator::generate so you might consider adding there as well. I think it's there. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16851#issuecomment-1832295989 PR Comment: https://git.openjdk.org/jdk/pull/16851#issuecomment-1832297313 From ngasson at openjdk.org Wed Nov 29 16:46:06 2023 From: ngasson at openjdk.org (Nick Gasson) Date: Wed, 29 Nov 2023 16:46:06 GMT Subject: RFR: 8321025: Enable Neoverse N1 optimizations for Neoverse V2 In-Reply-To: References: Message-ID: <8WlMFHvjI5vSE2k7QRt1cDHgC1r0p7EA665yBp4gvuI=.6781ae52-bd79-4f58-8f9b-aea3b6f0c5f3@github.com> On Wed, 29 Nov 2023 16:16:12 GMT, Evgeny Astigeevich wrote: > As Arm Neoverse V2 will benefit from the same optimizations as Neoverse N1 does, it should have OnSpinWaitInst/OnSpinWaitInstCount defaults set to "isb"/1 and UseSIMDForMemoryOps default set to true. > This patch sets these flags accordingly for the V2 architecture. Marked as reviewed by ngasson (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16887#pullrequestreview-1755756172 From dcubed at openjdk.org Wed Nov 29 16:57:13 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 29 Nov 2023 16:57:13 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v9] In-Reply-To: References: Message-ID: On Wed, 29 Nov 2023 06:38:51 GMT, Stefan Karlsson wrote: >> In the rewrites made for: >> [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` >> >> I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. >> >> The provided tests provoke this assert form: >> * the JNI thread detach code >> * thread dumping with locked monitors, and >> * the JVMTI GetOwnedMonitorInfo API. >> >> While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. >> >> The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. >> >> For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. >> >> Test: the written tests with and without the fix. Tier1-Tier3, so far. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Review comments Thumbs up. Thanks for making the additional changes. ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16783#pullrequestreview-1755779119 From iklam at openjdk.org Wed Nov 29 17:13:10 2023 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 29 Nov 2023 17:13:10 GMT Subject: RFR: 8320924: Improve heap dump performance by optimizing archived object checks In-Reply-To: References: <8Ek_2iD6dG8MJE0AEHlzxcD4GDCYYEmKeVoBMO4PBF8=.4352c26a-76b9-46ae-af3f-8666821c9a9c@github.com> Message-ID: On Wed, 29 Nov 2023 08:27:11 GMT, Aleksey Shipilev wrote: > > I feel "debug" log level is kind of high since we may encounter a lot of these; for such unbound micro-prints I usually use "trace". Up to you. > > I agree "trace" would fit better here. Let's see if @iklam has an opinion. I think `trace` is good. BTW, I rarely use `-Xlog:cds+heap` as it usually too verbose :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16863#issuecomment-1832356872 From stefank at openjdk.org Wed Nov 29 17:13:18 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 29 Nov 2023 17:13:18 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v8] In-Reply-To: References: Message-ID: On Tue, 28 Nov 2023 18:38:07 GMT, Daniel D. Daugherty wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix indentation > > I'll re-review again once the last set of comments are addressed. Thanks @dcubed-ojdk! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16783#issuecomment-1832357089 From rkennke at openjdk.org Wed Nov 29 17:16:10 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 29 Nov 2023 17:16:10 GMT Subject: RFR: 8320888: Shenandoah: Enable ShenandoahVerifyOptoBarriers in debug builds In-Reply-To: <4V4ijEJlqyoqZ7UjiX3613qsBPw5R4k9yv9lv1eqcaw=.aee775ba-a393-4f2c-9978-8aac011317f9@github.com> References: <4V4ijEJlqyoqZ7UjiX3613qsBPw5R4k9yv9lv1eqcaw=.aee775ba-a393-4f2c-9978-8aac011317f9@github.com> Message-ID: On Tue, 28 Nov 2023 12:40:41 GMT, Aleksey Shipilev wrote: > Flag cleanup. Current barrier verification code is opt-in, and it is selected for a few tests. For extra safety, we want to have it enabled by default in debug builds. This also simplifies test configurations. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux x86_64 server fastdebug, `tier{1,2,3,4}` with `-XX:+UseShenandoahGC` > - [x] Linux AArch64 server fastdebug, `tier{1,2,3,4}` with `-XX:+UseShenandoahGC` Makes sense and looks good, thank you! ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16849#pullrequestreview-1755819547 From coleenp at openjdk.org Wed Nov 29 17:18:15 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 29 Nov 2023 17:18:15 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v10] In-Reply-To: References: <5w8UvO58ZzevFI_UFvTBwsMOWaH3miHkDRH_d_l1_2E=.cd37f447-aa65-4125-87f9-2360561236cb@github.com> Message-ID: On Wed, 29 Nov 2023 15:14:59 GMT, Jaroslav Bachorik wrote: >> I am going to take a look. > > Ok, I found it. > The reason for the jmethodID not being cleaned out is this assignment of a new jmethodID to obsolete methods - https://github.com/openjdk/jdk/blob/a2c5f1fc914ef5c28d044b75598f895cf6097138/src/hotspot/share/prims/jvmtiRedefineClasses.cpp#L3887 > > Since this is not done for EMCP (or more generic, non-obsolete) methods restricting the cleanup process to obsolete methods still sounds like the right thing to do. Excellent thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16662#discussion_r1409626840 From shade at openjdk.org Wed Nov 29 17:29:05 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 29 Nov 2023 17:29:05 GMT Subject: RFR: 8321025: Enable Neoverse N1 optimizations for Neoverse V2 In-Reply-To: References: Message-ID: On Wed, 29 Nov 2023 16:16:12 GMT, Evgeny Astigeevich wrote: > As Arm Neoverse V2 will benefit from the same optimizations as Neoverse N1 does, it should have OnSpinWaitInst/OnSpinWaitInstCount defaults set to "isb"/1 and UseSIMDForMemoryOps default set to true. > This patch sets these flags accordingly for the V2 architecture. Okay in principle, but I have a question, there is another block below: // Neoverse V1 if (_cpu == CPU_ARM && model_is(0xd40)) { if (FLAG_IS_DEFAULT(UseCryptoPmullForCRC32)) { FLAG_SET_DEFAULT(UseCryptoPmullForCRC32, true); } } Should it be enabled for V2 as well? ------------- PR Review: https://git.openjdk.org/jdk/pull/16887#pullrequestreview-1755841831 From jbachorik at openjdk.org Wed Nov 29 17:30:14 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Wed, 29 Nov 2023 17:30:14 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v11] In-Reply-To: References: Message-ID: On Wed, 29 Nov 2023 11:49:31 GMT, Jaroslav Bachorik wrote: >> Please, review this fix for a corner case handling of `jmethodID` values. >> >> The issue is related to the interplay between `jmethodID` values and method redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` instance. Once that method gets redefined, the `jmethodID` is updated to point to the last `Method` version. >> Unless the method is still on stack/running, in which case the original `jmethodID` will be redirected to the latest `Method` version and at the same time the 'previous' `Method` version will receive a new `jmethodID` pointing to that previous version. >> >> If we happen to capture stacktrace via `GetStackTrace` or `GetAllStackTraces` JVMTI calls while this previous `Method` version is still on stack we will have the corresponding frame identified by a `jmethodID` pointing to that version. >> However, sooner or later the 'previous' class version becomes eligible for cleanup at what time all contained `Method` instances. The cleanup process will not perform the `jmethodID` pointer maintenance and we will end up with pointers to deallocated memory. >> This is caused by the fact that the `jmethodID` lifecycle is bound to `ClassLoaderData` instance and all relevant `jmethodID`s will get batch-updated when the class loader is being released and all its classes are getting unloaded. >> >> This means that we need to make sure that if a `Method` instance is being deallocate the associated `jmethodID` (if any) must not point to the deallocated instance once we are finished. Unfortunately, we can not just update the `jmethodID` values in bulk when purging an old class version - the per `InstanceKlass` jmethodID cache is present only for the main class version and contains `jmethodID` values for both the old and current method versions. >> >> ~Therefore we need to perform `jmethodID` lookup when we are about to deallocate a `Method` instance and clean up the pointer only if that `jmethodID` is pointing to the `Method` instance which is being deallocated.~ >> >> Therefore, we need to perform `jmethodID` lookup for each method in an old class version that is getting purged, and null out the pointer of that `jmethodID` to break the link from `jmethodID` to the method instance that is about to get deallocated. >> >> _(For anyone interested, a much lengthier writeup is available in [my blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_ > > Jaroslav Bachorik has updated the pull request incrementally with one additional commit since the last revision: > > Restrict cleanup to obsolete methods only Thanks everyone involved in reviewing this PR! You were awesome and helped me drive the PR to better place than it started! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16662#issuecomment-1832386439 From shade at openjdk.org Wed Nov 29 17:32:19 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 29 Nov 2023 17:32:19 GMT Subject: RFR: 8320924: Improve heap dump performance by optimizing archived object checks [v2] In-Reply-To: <8Ek_2iD6dG8MJE0AEHlzxcD4GDCYYEmKeVoBMO4PBF8=.4352c26a-76b9-46ae-af3f-8666821c9a9c@github.com> References: <8Ek_2iD6dG8MJE0AEHlzxcD4GDCYYEmKeVoBMO4PBF8=.4352c26a-76b9-46ae-af3f-8666821c9a9c@github.com> Message-ID: > Profiling heap dumping code reveals another simple issue: `mask_dormant_archived_object` on dumping hotpath takes quite a bit of time. We can reflow it for better inlineability, throwing out the non-essential parts into cold method. There is also no reason to peek into java mirror with (default) keep-alive, if we only use the result for null-check. > > Example improvements on Mac M1: > > > % for I in `seq 1 5`; do build/macosx-aarch64-server-release/images/jdk/bin/java -XX:+UseParallelGC -XX:+HeapDumpAfterFullGC -Xms8g -Xmx8g HeapDump.java 2>&1 | grep created; rm *.hprof; done > > # Before > Heap dump file created [1897307608 bytes in 1.584 secs] > Heap dump file created [1897308278 bytes in 1.439 secs] > Heap dump file created [1897308508 bytes in 1.460 secs] > Heap dump file created [1897308505 bytes in 1.423 secs] > Heap dump file created [1897308554 bytes in 1.414 secs] > > # After > Heap dump file created [1897307648 bytes in 1.509 secs] > Heap dump file created [1897308498 bytes in 1.281 secs] > Heap dump file created [1897308554 bytes in 1.282 secs] > Heap dump file created [1897308512 bytes in 1.263 secs] > Heap dump file created [1897308554 bytes in 1.270 secs] > > > ...which is about +12% faster heap dump. > > I also eyeballed the generated code and saw `mask_dormant_archived_object` fully inlined at least on x86_64. Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Switch logging: debug -> trace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16863/files - new: https://git.openjdk.org/jdk/pull/16863/files/c2f2b3e8..e6745195 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16863&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16863&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/16863.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16863/head:pull/16863 PR: https://git.openjdk.org/jdk/pull/16863 From shade at openjdk.org Wed Nov 29 17:32:21 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 29 Nov 2023 17:32:21 GMT Subject: RFR: 8320924: Improve heap dump performance by optimizing archived object checks [v2] In-Reply-To: References: <8Ek_2iD6dG8MJE0AEHlzxcD4GDCYYEmKeVoBMO4PBF8=.4352c26a-76b9-46ae-af3f-8666821c9a9c@github.com> Message-ID: On Wed, 29 Nov 2023 17:10:22 GMT, Ioi Lam wrote: > I think `trace` is good. BTW, I rarely use `-Xlog:cds+heap` as it usually too verbose :-) Hah! Okay, I switched to `trace` in new commit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16863#issuecomment-1832389969 From jbachorik at openjdk.org Wed Nov 29 17:33:20 2023 From: jbachorik at openjdk.org (Jaroslav Bachorik) Date: Wed, 29 Nov 2023 17:33:20 GMT Subject: Integrated: 8313816: Accessing jmethodID might lead to spurious crashes In-Reply-To: References: Message-ID: On Tue, 14 Nov 2023 17:56:09 GMT, Jaroslav Bachorik wrote: > Please, review this fix for a corner case handling of `jmethodID` values. > > The issue is related to the interplay between `jmethodID` values and method redefinitions. Each `jmethodID` value is effectively a pointer to a `Method` instance. Once that method gets redefined, the `jmethodID` is updated to point to the last `Method` version. > Unless the method is still on stack/running, in which case the original `jmethodID` will be redirected to the latest `Method` version and at the same time the 'previous' `Method` version will receive a new `jmethodID` pointing to that previous version. > > If we happen to capture stacktrace via `GetStackTrace` or `GetAllStackTraces` JVMTI calls while this previous `Method` version is still on stack we will have the corresponding frame identified by a `jmethodID` pointing to that version. > However, sooner or later the 'previous' class version becomes eligible for cleanup at what time all contained `Method` instances. The cleanup process will not perform the `jmethodID` pointer maintenance and we will end up with pointers to deallocated memory. > This is caused by the fact that the `jmethodID` lifecycle is bound to `ClassLoaderData` instance and all relevant `jmethodID`s will get batch-updated when the class loader is being released and all its classes are getting unloaded. > > This means that we need to make sure that if a `Method` instance is being deallocate the associated `jmethodID` (if any) must not point to the deallocated instance once we are finished. Unfortunately, we can not just update the `jmethodID` values in bulk when purging an old class version - the per `InstanceKlass` jmethodID cache is present only for the main class version and contains `jmethodID` values for both the old and current method versions. > > ~Therefore we need to perform `jmethodID` lookup when we are about to deallocate a `Method` instance and clean up the pointer only if that `jmethodID` is pointing to the `Method` instance which is being deallocated.~ > > Therefore, we need to perform `jmethodID` lookup for each method in an old class version that is getting purged, and null out the pointer of that `jmethodID` to break the link from `jmethodID` to the method instance that is about to get deallocated. > > _(For anyone interested, a much lengthier writeup is available in [my blog](https://jbachorik.github.io/posts/mysterious-jmethodid))_ This pull request has now been integrated. Changeset: cdd1a6e8 Author: Jaroslav Bachorik URL: https://git.openjdk.org/jdk/commit/cdd1a6e851bcaf4a25d4a405b8ee0b0d5b83a4a9 Stats: 206 lines in 8 files changed: 206 ins; 0 del; 0 mod 8313816: Accessing jmethodID might lead to spurious crashes Reviewed-by: coleenp ------------- PR: https://git.openjdk.org/jdk/pull/16662 From eastigeevich at openjdk.org Wed Nov 29 17:47:04 2023 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 29 Nov 2023 17:47:04 GMT Subject: RFR: 8321025: Enable Neoverse N1 optimizations for Neoverse V2 In-Reply-To: References: Message-ID: <-_8kbpd5aGwBQ07LvWn-6G3g6Jh3qX-Y0ZlPldmsauM=.deda3a19-6357-4402-893a-066466eccfdf@github.com> On Wed, 29 Nov 2023 17:26:01 GMT, Aleksey Shipilev wrote: > Okay in principle, but I have a question, there is another block below: > > ``` > // Neoverse V1 > if (_cpu == CPU_ARM && model_is(0xd40)) { > if (FLAG_IS_DEFAULT(UseCryptoPmullForCRC32)) { > FLAG_SET_DEFAULT(UseCryptoPmullForCRC32, true); > } > } > ``` > > Should it be enabled for V2 as well? Good catch. I'll check whether V2 has the same or better `pmull` as V1. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16887#issuecomment-1832412262 From mli at openjdk.org Wed Nov 29 17:53:09 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 29 Nov 2023 17:53:09 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v2] In-Reply-To: References: <9i5yHmpRi3-XqL5lw0-0IexhCDr2FOi5nT4dgY7cWao=.ab8a1d6e-c9fc-4108-820b-374ce7815463@github.com> <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> Message-ID: <2ciWlr2OCjKDqOzUYFVD1nQ3OiGbNTeK3DVpVz6trk8=.e06b29b6-b706-4e9d-8eab-c37d887c29bc@github.com> On Wed, 29 Nov 2023 12:05:23 GMT, Yuri Gaevsky wrote: >> I think the reason might be: with specific register, you can add effect as `USE_KILL ary, USE_KILL cnt`, but without specific register, currently you have to way to do so. >> But, in current patch, it does modify the ary and cnt in the intrinsic, so I wonder if the current (lastest) patch is safe enough in all situation. >> >> It maybe be helpful to add 2 new register when matching the instrinsic in ad file, and I guess the register allocator will merge different use of temp register together? >> But I still think it's not necessary to specify the register when matching arrays_hashcode in ad file. > > Hi @Hamlin-Li, > > My apologies for some delay with an answer. > > Without specifying the "concrete" registers for _ary/cnt/result_, as e.g. as follows: > (1) [ iRegP/iRegI ] > > instruct arrays_hashcode(iRegP ary, iRegI cnt, iRegI result, immI basic_type, > iRegLNoSp tmp1, iRegINoSp tmp2, > iRegINoSp tmp3, iRegLNoSp tmp4, rFlagsReg cr) > > or > (2) [ iRegPNoSp/iRegINoSp ] > > instruct arrays_hashcode(iRegPNoSp ary, iRegINoSp cnt, iRegINoSp result, immI basic_type, > iRegLNoSp tmp1, iRegINoSp tmp2, > iRegINoSp tmp3, iRegLNoSp tmp4, rFlagsReg cr) > > it's impossible to use '_USE_KILL_' hint in '_effect_' directive because AD compilation fails: > > Building target 'images' in configuration 'linux-riscv64-server-release' > /jdk/src/hotspot/cpu/riscv/gc/z/z_riscv.ad(10306) Syntax Error: :In arrays_hashcode only bound registers can be killed: iRegP ary > > or > > Building target 'images' in configuration 'linux-riscv64-server-release' > /jdk/src/hotspot/cpu/riscv/gc/z/z_riscv.ad(10306) Syntax Error: :In arrays_hashcode only bound registers can be killed: iRegPNoSp ary > > > IMHO, the usage of '_USE_KILL_' for _ary_/_cnt_ looks reasonable since > we actually do use/modify them both in the assembler body, and avoiding any > hint or usage other hint ('_USE_'?) may be wrong here even if that does > not cause a failure in pre-integration tests. It is interesting that many > intrinsics' definitions in AD files similarly use "concrete" registers in > X86/RISC-V archs, possibly due to the mentioned "_only bound registers can be > killed_" ADLC requirement. > > I'd like to mention that applying '_USE_KILL_' to 'result' causes the error > in runtime: > > --------------- T H R E A D --------------- > > Current thread (0x0000003fc0189130): JavaThread "C2 CompilerThread0" daemon [_thread_in_native, id=54524, stack(0x0000003f7bc00000,0x0000003f7be00000) (2048K)] > > > Current CompileTask: > C2:828 117 4 java.lang.String::hashCode (60 bytes) > > Stack: [0x0000003f7bc00000,0x0000003f7be00000], sp=0x0000003f7bdfbba0, free space=2030k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x5e1ad2] Node_Backward_Iterator::next()+0x106 > V [libjvm.so+0x5e4040] PhaseCFG::global_code_motion()+0x242 > V [libjvm.so+0x5e5266] PhaseCFG::do_global_code_motion()+0x38 > V [libjvm.so+0x455142] Compile::Code_Gen()+0x198 > V [libjvm.so+0x458168] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0xce... Thanks for looking into the details. Based on the information we got till now, yes, we need the concrete registers, otherwise, we need to pass in more tmp register, seems former one is better, let's go with it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1409665119 From shade at openjdk.org Wed Nov 29 17:58:06 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 29 Nov 2023 17:58:06 GMT Subject: RFR: 8321025: Enable Neoverse N1 optimizations for Neoverse V2 In-Reply-To: References: Message-ID: On Wed, 29 Nov 2023 16:16:12 GMT, Evgeny Astigeevich wrote: > As Arm Neoverse V2 will benefit from the same optimizations as Neoverse N1 does, it should have OnSpinWaitInst/OnSpinWaitInstCount defaults set to "isb"/1 and UseSIMDForMemoryOps default set to true. > This patch sets these flags accordingly for the V2 architecture. Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16887#pullrequestreview-1755892029 From shade at openjdk.org Wed Nov 29 17:58:09 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 29 Nov 2023 17:58:09 GMT Subject: RFR: 8321025: Enable Neoverse N1 optimizations for Neoverse V2 In-Reply-To: <-_8kbpd5aGwBQ07LvWn-6G3g6Jh3qX-Y0ZlPldmsauM=.deda3a19-6357-4402-893a-066466eccfdf@github.com> References: <-_8kbpd5aGwBQ07LvWn-6G3g6Jh3qX-Y0ZlPldmsauM=.deda3a19-6357-4402-893a-066466eccfdf@github.com> Message-ID: <1SRCHh28tB8Q5X2o7SWeW41LO4IqbiYpzyNKcMgOEvY=.02953e71-2114-4c39-90cc-1060437e4d0d@github.com> On Wed, 29 Nov 2023 17:44:38 GMT, Evgeny Astigeevich wrote: > Good catch. I'll check whether V2 has the same or better `pmull` as V1. Although, it would not be "enabling N1 optos for V2", it would be "enabling V1 optos for V2" :) Your call if you just want to make that change separately. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16887#issuecomment-1832430356 From mli at openjdk.org Wed Nov 29 18:00:11 2023 From: mli at openjdk.org (Hamlin Li) Date: Wed, 29 Nov 2023 18:00:11 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v2] In-Reply-To: References: <9i5yHmpRi3-XqL5lw0-0IexhCDr2FOi5nT4dgY7cWao=.ab8a1d6e-c9fc-4108-820b-374ce7815463@github.com> <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> Message-ID: On Fri, 24 Nov 2023 18:56:21 GMT, Hamlin Li wrote: >>> BTW, can you add some comments about what java method or bytecode this intrinsic is for? >> Done. > > Hmm, addition of TEMP_DEF result makes the bencmark results even worse tha without intrinsic (I haven't look at the generated assembler though). > > This seems bit confusing to me. If we use concrete register for `result`, then there is no need to label it as `TEMP_DEF ` either. So, in summary, let's go with the concrete registers solution if no one has other opinion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1409673888 From never at openjdk.org Wed Nov 29 18:32:07 2023 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 29 Nov 2023 18:32:07 GMT Subject: RFR: JDK-8320892: AArch64: Restore FPU control state after JNI [v3] In-Reply-To: References: <-Jv5Xvre3lonydwQ5uzYN3QB8V0VIuORIhM1RtIdW5g=.06167df9-7268-4945-8e18-a04d19ee97e1@github.com> Message-ID: On Tue, 28 Nov 2023 15:58:04 GMT, Andrew Haley wrote: >> Some buggy libraries corrupt the floating-point control register. Provide something similar to the x86 RestoreMXCSROnJNICalls. >> >> I realize that using the x86ish name "RestoreMXCSROnJNICalls" might be a little controversial, but it is a _global_ flag, not a CPU-specific one. And it's clearly intended for this purpose. It might have been better if that flag had been given a better name twentyish years ago, but we can't change it now. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Fix thinko Yes it's there. Sorry I guess I was scanning too quickly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16851#issuecomment-1832478828 From dcubed at openjdk.org Wed Nov 29 18:32:38 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 29 Nov 2023 18:32:38 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v9] In-Reply-To: References: Message-ID: On Tue, 21 Nov 2023 14:37:01 GMT, Axel Boldt-Christmas wrote: >> LM_LIGHTWEIGHT only uses the lock bits for its locking. This leaves the hashCode bits free when a monitor is not inflated. So instead of inflating when installing the hashCode on a fast locked object it can simply use the hashCode bits in the markWord. >> >> The mark word transitions Unlocked (0b01) <=> Locked (0b00) are done by retrying the CAS if it fails due to non-lock bit changes. >> The mark word transitions Monitor (0b10) <=> Locked/Unlocked (0b0X) are the same as before, inflation already handles hash codes. This change does not interact with the mark word if it is in a Monitor (0b10) state, so the strong CAS which is used for deflation are still valid, and will not fail to any other reason than the cooperative race to help transition the mark word during deflation. >> >> This is dependent on JDK-8319778 simply because JDK-8319797 is dependent on both this and JDK-8319778. > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge remote-tracking branch 'upstream_jdk/pr/16602' into JDK-8319773 > - Fix copy paste typo. > - Update src/hotspot/share/opto/library_call.cpp > > Co-authored-by: Tobias Hartmann > - Add retry CAS comment > - Use is_neutral over is_unlocked > - Merge remote-tracking branch 'upstream_jdk/pr/16602' into JDK-8319773 > - Use more familiar CAS variable names and pattern > - Move is_lock_owned closer to its only use > - Merge remote-tracking branch 'upstream_jdk/pr/16602' into JDK-8319773 > - Simplify test. > - ... and 1 more: https://git.openjdk.org/jdk/compare/87ee59dd...b4061417 I've re-reviewed the changes from v03 -> v08. Thumbs up with what you currently have. There's still one open query in `FastHashCode` about whether the VMThread can get in there during JVM/TI tagging... It may be best to file a follow up bug for chasing down that detail and possibly addressing it... Or you could address it with one more change in this PR... ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16603#pullrequestreview-1755946248 From dcubed at openjdk.org Wed Nov 29 18:32:39 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 29 Nov 2023 18:32:39 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v10] In-Reply-To: References: <2MRTHFoYSaSW2NH922LOEvqKx4NLjshWaHJaYV2RdVY=.e234046a-aac8-4d7b-81b9-269506944165@github.com> Message-ID: On Fri, 17 Nov 2023 07:19:51 GMT, Axel Boldt-Christmas wrote: >> Do we inflate when the VMThread is doing JVM/TI tagging? > > LM_LEGACY and LM_MONITOR will. LM_LIGHTWEIGHT technically may. If deflation finishes between reading the mark word in FastHashCode and reading the mark word in `inflate`. It seems like a rare enough case that it does not need to be handled separately. The following change would avoid inflation all together. > > // An async deflation can race after the inflate() call and before we > // can update the ObjectMonitor's header with the hash value below. > + if (LockingMode == LM_LIGHTWEIGHT) { > + assert(mark.has_monitor(), "must be"); > + monitor = mark.monitor(); > + } else { > - monitor = inflate(current, obj, inflate_cause_hash_code); > + monitor = inflate(current, obj, inflate_cause_hash_code); > + } > // Load ObjectMonitor's header/dmw field and see if it has a hash. > > > Maybe I should change it to this. Given that there has been confusion here. > My ideal solution would be to separate the implementations for the different locking modes all together, all of these functions are littered with if (LockingMode == X). So what did you decide to do here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16603#discussion_r1409704775 From sgibbons at openjdk.org Wed Nov 29 18:37:08 2023 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 29 Nov 2023 18:37:08 GMT Subject: RFR: JDK-8320448 Accelerate IndexOf using AVX2 [v2] In-Reply-To: References: Message-ID: On Wed, 29 Nov 2023 15:01:32 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Only use optimization when EnableX86ECoreOpts is true Latest numbers vs. top-of-tree JDK. ![JBS IndexOf](https://github.com/openjdk/jdk/assets/6704669/6ed74968-d333-4c70-8fe2-f94a56aaeca9) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16753#issuecomment-1832486450 From sspitsyn at openjdk.org Wed Nov 29 18:42:18 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 29 Nov 2023 18:42:18 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is created for one JavaThread associated with attached native thread [v3] In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Tue, 28 Nov 2023 22:40:10 GMT, Serguei Spitsyn wrote: >> @jianglizhou Thank you for filing the sub-task. You have already seen some crashes. Even though you do not have a standalone test case, it is still valuable if you describe a test scenario (at least, surfacely) which helped to observe the problem. Could you, add it to the sub-task report, please? > >> Thanks for the careful review, @sspitsyn! The fix is for 22. We probably should also consider back-porting to JDK 11 to prevent any potential changes in the area accidentally reintroducing the bug. The https://bugs.openjdk.org/browse/JDK-8312174 change has been back-ported to 11, which resolved the crashes by luck. >> I'll request backport after this fix is integrated. > > Nice. I've targeted it to 22. I agree it is better to have it back-ported. Its back-port is not going to be clean though. > Hi @sspitsyn I'll comment on https://bugs.openjdk.org/browse/JDK-8320614 later, thanks. @jianglizhou Okay, thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16642#issuecomment-1832493311 From jvernee at openjdk.org Wed Nov 29 19:13:36 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 29 Nov 2023 19:13:36 GMT Subject: RFR: 8320886: Unsafe_SetMemory0 is not guarded [v2] In-Reply-To: References: <5kRdxpEyFZLzxlyHpdHju1w9qLbm4OA6UkVZMr17nt0=.339b7543-574c-4a06-84e9-2ffb9d9a345a@github.com> Message-ID: On Wed, 29 Nov 2023 19:10:42 GMT, Jorn Vernee wrote: >> See JBS issue. >> >> Guard the memory access done in Unsafe_SetMemory0 to prevent a SIGBUS error from crashing the VM when a truncated memory mapped file is accessed. >> >> Testing: local `InternalErrorTest`, Tier 1-5 (ongoing) > > Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: > > - add handling for missing instruction > - Print out instruction src/hotspot/cpu/x86/assembler_x86.cpp line 973: > 971: return ip; > 972: default: > 973: fatal("not handled: 0x0F%2X", 0xFF & *(ip-1)); I've improved the error reporting a bit here to print out the problematic instruction. I was only able to reproduce the crash through GHA. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16848#discussion_r1409753452 From jvernee at openjdk.org Wed Nov 29 19:13:34 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 29 Nov 2023 19:13:34 GMT Subject: RFR: 8320886: Unsafe_SetMemory0 is not guarded [v2] In-Reply-To: <5kRdxpEyFZLzxlyHpdHju1w9qLbm4OA6UkVZMr17nt0=.339b7543-574c-4a06-84e9-2ffb9d9a345a@github.com> References: <5kRdxpEyFZLzxlyHpdHju1w9qLbm4OA6UkVZMr17nt0=.339b7543-574c-4a06-84e9-2ffb9d9a345a@github.com> Message-ID: > See JBS issue. > > Guard the memory access done in Unsafe_SetMemory0 to prevent a SIGBUS error from crashing the VM when a truncated memory mapped file is accessed. > > Testing: local `InternalErrorTest`, Tier 1-5 (ongoing) Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: - add handling for missing instruction - Print out instruction ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16848/files - new: https://git.openjdk.org/jdk/pull/16848/files/e05c2b93..e9a5247e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16848&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16848&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16848.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16848/head:pull/16848 PR: https://git.openjdk.org/jdk/pull/16848 From duke at openjdk.org Wed Nov 29 19:25:21 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Wed, 29 Nov 2023 19:25:21 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v7] In-Reply-To: References: Message-ID: > Hello All, > > Please review these changes to support _vectorizedHashCode intrinsic on > RISC-V platform. The patch adds the "scalar" code for the intrinsic without > usage of any RVV instruction but provides manual unrolling of the appropriate > loop. The code with usage of RVV instruction could be added as follow-up of > the patch or independently. > > Thanks, > -Yuri Gaevsky > > P.S. My OCA has been accepted recently (ygaevsky). > > ### Correctness checks > > Testing: tier1 tests successfully passed on a RISC-V StarFive JH7110 board with Linux. > > ### Performance results (the numbers for non-ints are similar) > > #### StarFive JH7110 board: > > > ArraysHashCode: without intrinsic with intrinsic > ------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > ------------------------------------------------------------------------------- > multiints 0 avgt 30 2.658 ? 0.001 2.661 ? 0.004 ns/op > multiints 1 avgt 30 4.881 ? 0.011 4.892 ? 0.015 ns/op > multiints 2 avgt 30 16.109 ? 0.041 10.451 ? 0.075 ns/op > multiints 3 avgt 30 14.873 ? 0.068 11.753 ? 0.024 ns/op > multiints 4 avgt 30 17.283 ? 0.078 13.176 ? 0.044 ns/op > multiints 5 avgt 30 19.691 ? 0.136 14.723 ? 0.046 ns/op > multiints 6 avgt 30 21.727 ? 0.166 15.463 ? 0.124 ns/op > multiints 7 avgt 30 23.790 ? 0.126 18.298 ? 0.059 ns/op > multiints 8 avgt 30 23.527 ? 0.116 18.267 ? 0.046 ns/op > multiints 9 avgt 30 27.981 ? 0.303 20.453 ? 0.069 ns/op > multiints 10 avgt 30 26.947 ? 0.215 20.541 ? 0.051 ns/op > multiints 50 avgt 30 95.373 ? 0.588 69.238 ? 0.208 ns/op > multiints 100 avgt 30 177.109 ? 0.525 137.852 ? 0.417 ns/op > multiints 200 avgt 30 341.074 ? 1.363 296.832 ? 0.725 ns/op > multiints 500 avgt 30 847.993 ? 1.713 752.415 ? 1.918 ns/op > multiints 1000 avgt 30 1610.199 ? 5.424 1426.112 ? 3.407 ns/op > multiints 10000 avgt 30 16234.260 ? 26.789 14447.936 ? 26.345 ns/op > multiints 100000 avgt 30 170726.025 ? 184.003 152587.649 ? 381.964 ns/op > ------------------------------------------------------------------------------- > > #### T-Head RVB-ICE board: > > > ArraysHashCode: ... Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: Use concrete registers for input parameters. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16629/files - new: https://git.openjdk.org/jdk/pull/16629/files/af940acd..23db372c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16629&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16629&range=05-06 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16629.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16629/head:pull/16629 PR: https://git.openjdk.org/jdk/pull/16629 From duke at openjdk.org Wed Nov 29 19:25:23 2023 From: duke at openjdk.org (Yuri Gaevsky) Date: Wed, 29 Nov 2023 19:25:23 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v2] In-Reply-To: <2ciWlr2OCjKDqOzUYFVD1nQ3OiGbNTeK3DVpVz6trk8=.e06b29b6-b706-4e9d-8eab-c37d887c29bc@github.com> References: <9i5yHmpRi3-XqL5lw0-0IexhCDr2FOi5nT4dgY7cWao=.ab8a1d6e-c9fc-4108-820b-374ce7815463@github.com> <2k3er3hBU-G3xMungfG5nlSXyrYIioBhSsF9EclRIKE=.87785de1-efb3-4ff0-a9a1-802a7eb768f4@github.com> <2ciWlr2OCjKDqOzUYFVD1nQ3OiGbNTeK3DVpVz6trk8=.e06b29b6-b706-4e9d-8eab-c37d887c29bc@github.com> Message-ID: On Wed, 29 Nov 2023 17:50:00 GMT, Hamlin Li wrote: >> Hi @Hamlin-Li, >> >> My apologies for some delay with an answer. >> >> Without specifying the "concrete" registers for _ary/cnt/result_, as e.g. as follows: >> (1) [ iRegP/iRegI ] >> >> instruct arrays_hashcode(iRegP ary, iRegI cnt, iRegI result, immI basic_type, >> iRegLNoSp tmp1, iRegINoSp tmp2, >> iRegINoSp tmp3, iRegLNoSp tmp4, rFlagsReg cr) >> >> or >> (2) [ iRegPNoSp/iRegINoSp ] >> >> instruct arrays_hashcode(iRegPNoSp ary, iRegINoSp cnt, iRegINoSp result, immI basic_type, >> iRegLNoSp tmp1, iRegINoSp tmp2, >> iRegINoSp tmp3, iRegLNoSp tmp4, rFlagsReg cr) >> >> it's impossible to use '_USE_KILL_' hint in '_effect_' directive because AD compilation fails: >> >> Building target 'images' in configuration 'linux-riscv64-server-release' >> /jdk/src/hotspot/cpu/riscv/gc/z/z_riscv.ad(10306) Syntax Error: :In arrays_hashcode only bound registers can be killed: iRegP ary >> >> or >> >> Building target 'images' in configuration 'linux-riscv64-server-release' >> /jdk/src/hotspot/cpu/riscv/gc/z/z_riscv.ad(10306) Syntax Error: :In arrays_hashcode only bound registers can be killed: iRegPNoSp ary >> >> >> IMHO, the usage of '_USE_KILL_' for _ary_/_cnt_ looks reasonable since >> we actually do use/modify them both in the assembler body, and avoiding any >> hint or usage other hint ('_USE_'?) may be wrong here even if that does >> not cause a failure in pre-integration tests. It is interesting that many >> intrinsics' definitions in AD files similarly use "concrete" registers in >> X86/RISC-V archs, possibly due to the mentioned "_only bound registers can be >> killed_" ADLC requirement. >> >> I'd like to mention that applying '_USE_KILL_' to 'result' causes the error >> in runtime: >> >> --------------- T H R E A D --------------- >> >> Current thread (0x0000003fc0189130): JavaThread "C2 CompilerThread0" daemon [_thread_in_native, id=54524, stack(0x0000003f7bc00000,0x0000003f7be00000) (2048K)] >> >> >> Current CompileTask: >> C2:828 117 4 java.lang.String::hashCode (60 bytes) >> >> Stack: [0x0000003f7bc00000,0x0000003f7be00000], sp=0x0000003f7bdfbba0, free space=2030k >> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.so+0x5e1ad2] Node_Backward_Iterator::next()+0x106 >> V [libjvm.so+0x5e4040] PhaseCFG::global_code_motion()+0x242 >> V [libjvm.so+0x5e5266] PhaseCFG::do_global_code_motion()+0x38 >> V [libjvm.so+0x455142] Compile::Cod... > > Thanks for looking into the details. > Based on the information we got till now, yes, we need the concrete registers, otherwise, we need to pass in more tmp register, seems former one is better, let's go with it. Thanks @Hamlin-Li, the patch has been updated to use concrete registers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1409766224 From cslucas at openjdk.org Wed Nov 29 19:30:24 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 29 Nov 2023 19:30:24 GMT Subject: RFR: JDK-8241503: C2: Share MacroAssembler between mach nodes during code emission [v3] In-Reply-To: References: Message-ID: > # Description > > Please review this PR with a patch to re-use the same C2_MacroAssembler object to emit all instructions in the same compilation unit. > > Overall, the change is pretty simple. However, due to the renaming of the variable to access C2_MacroAssembler, from `_masm.` to `masm->`, and also some method prototype changes, the patch became quite large. > > # Help Needed for Testing > > I don't have access to all platforms necessary to test this. I hope some other folks can help with testing on `S390`, `RISC-V` and `PPC`. > > # Testing status > > ## tier1 > > | | Win | Mac | Linux | > |----------|---------|---------|---------| > | ARM64 | | | | > | ARM32 | | | | > | x86 | | | | > | x64 | | | | > | PPC64 | | | | > | S390x | | | | > | RiscV | | | | Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Some inst_mark fixes; Catch up with master. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16484/files - new: https://git.openjdk.org/jdk/pull/16484/files/b56c98de..89a6dff3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16484&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16484&range=01-02 Stats: 11 lines in 3 files changed: 7 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16484.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16484/head:pull/16484 PR: https://git.openjdk.org/jdk/pull/16484 From matsaave at openjdk.org Wed Nov 29 20:01:22 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 29 Nov 2023 20:01:22 GMT Subject: RFR: 8320530: has_resolved_ref_index flag not restored after resetting entry [v3] In-Reply-To: <4Bi8mWx5pxfYciHnXHUla1X_BzUt_56q8MoRkCYc0dk=.50072528-a5d9-4708-995b-c0bd49f8c74b@github.com> References: <4Bi8mWx5pxfYciHnXHUla1X_BzUt_56q8MoRkCYc0dk=.50072528-a5d9-4708-995b-c0bd49f8c74b@github.com> Message-ID: > ResolvedMethodEntry::reset_entry() clears the fields in the structure and then restores the constant pool index and the resolved references index if it has one. Currently, the resolved references index is restored without restoring the has_resolved_reference flag used by the structure. > > This patch restored the flag with the resolved references index. Verified with tier 1-5 tests. Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Corrections and gtest fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16769/files - new: https://git.openjdk.org/jdk/pull/16769/files/273d82df..0b82817d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16769&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16769&range=01-02 Stats: 32 lines in 3 files changed: 6 ins; 3 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/16769.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16769/head:pull/16769 PR: https://git.openjdk.org/jdk/pull/16769 From shade at openjdk.org Wed Nov 29 20:18:23 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 29 Nov 2023 20:18:23 GMT Subject: RFR: 8320888: Shenandoah: Enable ShenandoahVerifyOptoBarriers in debug builds In-Reply-To: <4V4ijEJlqyoqZ7UjiX3613qsBPw5R4k9yv9lv1eqcaw=.aee775ba-a393-4f2c-9978-8aac011317f9@github.com> References: <4V4ijEJlqyoqZ7UjiX3613qsBPw5R4k9yv9lv1eqcaw=.aee775ba-a393-4f2c-9978-8aac011317f9@github.com> Message-ID: On Tue, 28 Nov 2023 12:40:41 GMT, Aleksey Shipilev wrote: > Flag cleanup. Current barrier verification code is opt-in, and it is selected for a few tests. For extra safety, we want to have it enabled by default in debug builds. This also simplifies test configurations. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux x86_64 server fastdebug, `tier{1,2,3,4}` with `-XX:+UseShenandoahGC` > - [x] Linux AArch64 server fastdebug, `tier{1,2,3,4}` with `-XX:+UseShenandoahGC` Thanks all! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16849#issuecomment-1832634200 From shade at openjdk.org Wed Nov 29 20:18:23 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 29 Nov 2023 20:18:23 GMT Subject: Integrated: 8320888: Shenandoah: Enable ShenandoahVerifyOptoBarriers in debug builds In-Reply-To: <4V4ijEJlqyoqZ7UjiX3613qsBPw5R4k9yv9lv1eqcaw=.aee775ba-a393-4f2c-9978-8aac011317f9@github.com> References: <4V4ijEJlqyoqZ7UjiX3613qsBPw5R4k9yv9lv1eqcaw=.aee775ba-a393-4f2c-9978-8aac011317f9@github.com> Message-ID: On Tue, 28 Nov 2023 12:40:41 GMT, Aleksey Shipilev wrote: > Flag cleanup. Current barrier verification code is opt-in, and it is selected for a few tests. For extra safety, we want to have it enabled by default in debug builds. This also simplifies test configurations. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux x86_64 server fastdebug, `tier{1,2,3,4}` with `-XX:+UseShenandoahGC` > - [x] Linux AArch64 server fastdebug, `tier{1,2,3,4}` with `-XX:+UseShenandoahGC` This pull request has now been integrated. Changeset: c8643176 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/c86431767e6802317dc2be6221a5d0990b976ddc Stats: 39 lines in 4 files changed: 0 ins; 37 del; 2 mod 8320888: Shenandoah: Enable ShenandoahVerifyOptoBarriers in debug builds Reviewed-by: wkemper, kdnilsen, rkennke ------------- PR: https://git.openjdk.org/jdk/pull/16849 From dlong at openjdk.org Wed Nov 29 21:03:06 2023 From: dlong at openjdk.org (Dean Long) Date: Wed, 29 Nov 2023 21:03:06 GMT Subject: RFR: 8320750: Allow a testcase to run with muliple -Xlog In-Reply-To: References: Message-ID: <8XZM7KNv5br4rhLuWAs4SZkNmpEcFC7EpVnIQKJO-yk=.cdf9237f-f918-42bc-9b54-8528ecbb4727@github.com> On Mon, 27 Nov 2023 13:32:52 GMT, Leo Korinth wrote: > Running a testcase with muliple -Xlog crashes JTREG test cases. This is because `Collector.toMap` is not given a merge strategy. > > When the same argument is passed multiple times, I have added a merge strategy to use the latter value. This is similar to how it is implemented for `vm.opt.*` in JTREG. > > If the flag tested is `-Xlog`, replace the value part with a dummy value "NONEMPTY_TEST_SENTINEL". This is because in the case of multiple `-Xlog` all values are used, and JTREG does not give a satisfactory way to represent them. This dummy value should make it hard to try to `@require` on specific values by mistake. > > Tested with: > > @requires vm.opt.x.Xlog == "NONEMPTY_TEST_SENTINEL" > @requires vm.opt.x.Xlog == "NONEMPTY_TEST_SENTINELXXX" > @requires vm.opt.x.Xms == "3g" > > and > > JAVA_OPTIONS=-Xms3g -Xms4g > JAVA_OPTIONS=-Xms4g -Xms3g > JAVA_OPTIONS=-Xlog:gc* -Xlog:gc* > ``` > > Running tier1 I'm not sure this is the right approach for -X flags. `@requires vm.opt.x.Xms` should probably be `@requires vm.opt.InitialHeapSize` instead. How many other -X flags are tests using that have a -XX flag equivalent? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16824#issuecomment-1832692981 From dlong at openjdk.org Wed Nov 29 21:11:06 2023 From: dlong at openjdk.org (Dean Long) Date: Wed, 29 Nov 2023 21:11:06 GMT Subject: RFR: 8320750: Allow a testcase to run with muliple -Xlog In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 13:32:52 GMT, Leo Korinth wrote: > Running a testcase with muliple -Xlog crashes JTREG test cases. This is because `Collector.toMap` is not given a merge strategy. > > When the same argument is passed multiple times, I have added a merge strategy to use the latter value. This is similar to how it is implemented for `vm.opt.*` in JTREG. > > If the flag tested is `-Xlog`, replace the value part with a dummy value "NONEMPTY_TEST_SENTINEL". This is because in the case of multiple `-Xlog` all values are used, and JTREG does not give a satisfactory way to represent them. This dummy value should make it hard to try to `@require` on specific values by mistake. > > Tested with: > > @requires vm.opt.x.Xlog == "NONEMPTY_TEST_SENTINEL" > @requires vm.opt.x.Xlog == "NONEMPTY_TEST_SENTINELXXX" > @requires vm.opt.x.Xms == "3g" > > and > > JAVA_OPTIONS=-Xms3g -Xms4g > JAVA_OPTIONS=-Xms4g -Xms3g > JAVA_OPTIONS=-Xlog:gc* -Xlog:gc* > ``` > > Running tier1 It's better to get the value from the VM after it has processed the flags, rather than trying to look at pre-processed flag values. The value given on the command-line isn't always the same as the final value. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16824#issuecomment-1832702019 From dcubed at openjdk.org Wed Nov 29 21:13:07 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 29 Nov 2023 21:13:07 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 [v3] In-Reply-To: References: Message-ID: <3on61co7vkFJCag8MLNu5ZbcLR-9Whwku0wulthCLGk=.5abdf272-4723-45d4-85d2-207286177c99@github.com> On Fri, 17 Nov 2023 20:29:11 GMT, Serguei Spitsyn wrote: >> This is a fix of a performance/scalability related issue. The `JvmtiThreadState` objects for virtual thread filtered events enabled globally are created eagerly because it is needed when the `interp_only_mode` is enabled. Otherwise, some events which are generated in `interp_only_mode` from the debugging version of interpreter chunks can be missed. >> However, it has to be okay to avoid eager creation of these object if no `interp_only_mode` has ever been requested. >> It seems to be an extremely important optimization to create JvmtiThreadState objects lazily in such cases. >> It is done by introducing the flag `JvmtiThreadState::_seen_interp_only_mode` which indicates when the `JvmtiThreadState` objects have to be created eagerly. >> >> Additionally, the fix includes the following related changes: >> - Use condition double checking idiom for `MutexLocker mu(JvmtiThreadState_lock)` in the function `JvmtiVTMSTransitionDisabler::VTMS_mount_end` which is on a performance-critical path and looks like this: >> >> JvmtiThreadState* state = thread->jvmti_thread_state(); >> if (state != nullptr && state->is_pending_interp_only_mode()) { >> MutexLocker mu(JvmtiThreadState_lock); >> state = thread->jvmti_thread_state(); >> if (state != nullptr && state->is_pending_interp_only_mode()) { >> JvmtiEventController::enter_interp_only_mode(); >> } >> } >> >> >> - Add extra check of `JvmtiExport::can_support_virtual_threads()` when virtual thread mount and unmount are posted. >> - Minor: Added a `ThreadsListHandle` to the `JvmtiEventControllerPrivate::enter_interp_only_mode`. It is needed because of the dynamic creation of compensating carrier threads which is racy for JVMTI `SetEventNotificationMode` implementation. >> >> Performance mesurements: >> - Without this fix the test provided by the bug submitter gives execution numbers: >> - no ClassLoad events enabled: 3251 ms >> - ClassLoad events are enabled: 40534 ms >> >> - With the fix: >> - no ClassLoad events enabled: 3270 ms >> - ClassLoad events are enabled: 3385 ms >> >> Testing: >> - Ran mach5 tiers 1-6, no regressions are noticed > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: add comment for new ThreadsListHandle use I'm going to resurrect the failing guarantee() code and part of the stack trace that was removed and yack a bit about this code path. Here's the location of the failing guarantee(): void Handshake::execute(HandshakeClosure* hs_cl, ThreadsListHandle* tlh, JavaThread* target) { . . . guarantee(target != nullptr, "must be"); if (tlh == nullptr) { guarantee(Thread::is_JavaThread_protected_by_TLH(target), "missing ThreadsListHandle in calling context."); and here's part of the stack trace that got us here: V [libjvm.so+0x117937d] JvmtiEventControllerPrivate::enter_interp_only_mode(JvmtiThreadState*)+0x45d (jvmtiEventController.cpp:402) V [libjvm.so+0x1179520] JvmtiEventControllerPrivate::recompute_thread_enabled(JvmtiThreadState*) [clone .part.0]+0x190 (jvmtiEventController.cpp:632) V [libjvm.so+0x117a1e1] JvmtiEventControllerPrivate::thread_started(JavaThread*)+0x351 (jvmtiEventController.cpp:1174) V [libjvm.so+0x117e608] JvmtiExport::get_jvmti_thread_state(JavaThread*)+0x98 (jvmtiExport.cpp:424) V [libjvm.so+0x118a86c] JvmtiExport::post_field_access(JavaThread*, Method*, unsigned char*, Klass*, Handle, _jfieldID*)+0x6c (jvmtiExport.cpp:2214) This must have been a stack trace from a build with some optimizations enabled because when I look at last night's code base, I see 8 frames from the JvmtiExport::get_jvmti_thread_state() call to Handshake::execute() with three params: src/hotspot/share/prims/jvmtiExport.cpp: JvmtiExport::get_jvmti_thread_state(JavaThread *thread) { assert(thread == JavaThread::current(), "must be current thread"); if (thread->is_vthread_mounted() && thread->jvmti_thread_state() == nullptr) { JvmtiEventController::thread_started(thread); } The above code asserts that the `thread` parameter is the current thread so we know that the calling thread is operating on itself. src/hotspot/share/prims/jvmtiEventController.cpp JvmtiEventControllerPrivate::thread_started(JavaThread *thread) { assert(thread == Thread::current(), "must be current thread"); // if we have any thread filtered events globally enabled, create/update the thread state if (is_any_thread_filtered_event_enabled_globally()) { // intentionally racy JvmtiThreadState::state_for(thread); The above code asserts that the `thread` parameter is the current thread so we know that the calling thread is operating on itself. Note that we're calling the single parameter version of `JvmtiThreadState::state_for()` here and in that case the `thread_handle` parameter is `Handle thread_handle = Handle()`. src/hotspot/share/prims/jvmtiThreadState.inline.hpp inline JvmtiThreadState* JvmtiThreadState::state_for(JavaThread *thread, Handle thread_handle) { // In a case of unmounted virtual thread the thread can be null. JvmtiThreadState* state = thread_handle == nullptr ? thread->jvmti_thread_state() : java_lang_Thread::jvmti_thread_state(thread_handle()); if (state == nullptr) { MutexLocker mu(JvmtiThreadState_lock); // check again with the lock held state = state_for_while_locked(thread, thread_handle()); JvmtiEventController::recompute_thread_filtered(state); The above code grabs the JvmtiThreadState from the `thread` parameter and passes that to the `JvmtiEventController::recompute_thread_filtered()` call. We know that `thread` parameter is the current thread. src/hotspot/share/prims/jvmtiEventController.cpp void JvmtiEventControllerPrivate::recompute_thread_filtered(JvmtiThreadState *state) { if (is_any_thread_filtered_event_enabled_globally()) { JvmtiEventControllerPrivate::recompute_thread_enabled(state); The above code is just a filter wrapper for calling `JvmtiEventControllerPrivate::recompute_thread_enabled(`. src/hotspot/share/prims/jvmtiEventController.cpp jlong JvmtiEventControllerPrivate::recompute_thread_enabled(JvmtiThreadState *state) { // compute interp_only mode bool should_be_interp = (any_env_enabled & INTERP_EVENT_BITS) != 0 || has_frame_pops; bool is_now_interp = state->is_interp_only_mode(); if (should_be_interp != is_now_interp) { if (should_be_interp) { enter_interp_only_mode(state); The above code determines that the current thread needs to be in interpreted mode so it calls `enter_interp_only_mode(state)`. src/hotspot/share/prims/jvmtiEventController.cpp void JvmtiEventControllerPrivate::enter_interp_only_mode(JvmtiThreadState *state) { JavaThread *target = state->get_thread(); Thread *current = Thread::current(); if (target->is_handshake_safe_for(current)) { hs.do_thread(target); } else { assert(state->get_thread() != nullptr, "sanity check"); Handshake::execute(&hs, target); We know from our previous code analysis that the `JvmtiThreadState *state` we were passed was fetched from the current thread. See `JvmtiThreadState::state_for` above. So that `target` thread and the `current` should be the same thread. Why does this check return false: if (target->is_handshake_safe_for(current)) { which allows us to travel down this call: `Handshake::execute(&hs, target)` src/hotspot/share/runtime/handshake.cpp void Handshake::execute(HandshakeClosure* hs_cl, JavaThread* target) { // tlh == nullptr means we rely on a ThreadsListHandle somewhere // in the caller's context (and we sanity check for that). Handshake::execute(hs_cl, nullptr, target); } The two parameter version of `Handshake::execute()` is just a wrapper that passed a nullptr for the ThreadsListHandle to the three parameter version of `Handshake::execute()`. And that's how we get to the failing guarantee()... src/hotspot/share/runtime/handshake.cpp void Handshake::execute(HandshakeClosure* hs_cl, ThreadsListHandle* tlh, JavaThread* target) { JavaThread* self = JavaThread::current(); HandshakeOperation op(hs_cl, target, Thread::current()); jlong start_time_ns = os::javaTimeNanos(); guarantee(target != nullptr, "must be"); if (tlh == nullptr) { guarantee(Thread::is_JavaThread_protected_by_TLH(target), "missing ThreadsListHandle in calling context."); target->handshake_state()->add_operation(&op); One possible fix for the guarantee is this version: guarantee(self == target || Thread::is_JavaThread_protected_by_TLH(target), "missing ThreadsListHandle in calling context."); However, that ignores why this code in JvmtiEventControllerPrivate::enter_interp_only_mode returned false: if (target->is_handshake_safe_for(current)) { when we have these local variable values: JavaThread *target = state->get_thread(); Thread *current = Thread::current(); src/hotspot/share/runtime/javaThread.hpp // A JavaThread can always safely operate on it self and other threads // can do it safely if they are the active handshaker. bool is_handshake_safe_for(Thread* th) const { return _handshake.active_handshaker() == th || this == th; } It seems to me that this portion of the logic: `this == th` should have returned `true` and not `false`. What am I missing here? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16686#issuecomment-1832699046 From ccheung at openjdk.org Wed Nov 29 22:00:07 2023 From: ccheung at openjdk.org (Calvin Cheung) Date: Wed, 29 Nov 2023 22:00:07 GMT Subject: RFR: 8320935: Move CDS config initialization code to cdsConfig.cpp In-Reply-To: References: Message-ID: <7tqgQeAidnvr6kp8hkHZ4QPCV_pFbVvWbafTiWzEEbg=.0e728f7b-0c77-4012-bc3d-6cec099b9e68@github.com> On Tue, 28 Nov 2023 23:24:53 GMT, Ioi Lam wrote: > This is a simple clean up that moves the code for initializing the CDS config states from arguments.cpp to cdsConfig.cpp > > I renamed a few functions, but otherwise the code is unchanged. > > - `get_default_shared_archive_path()` -> `default_archive_path()` > - `GetSharedArchivePath()` -> `static_archive_path()` > - `GetSharedDynamicArchivePath()` -> `dynamic_archive_path()` > > There's also less `#if INCLUDE_CDS` since the entire cdsConfig.cpp file is compiled only if CDS is enabled. Code migration from arguments cpp to cdsConfig.cpp looks good. Found a minor simplification regarding the include statements. src/hotspot/share/cds/cdsConfig.cpp line 34: > 32: #include "logging/log.hpp" > 33: #include "runtime/arguments.hpp" > 34: #include "runtime/java.hpp" I was able to build with your patch without including `java.hpp`. The #include java.hpp could also be removed from arguments.cpp. ------------- PR Review: https://git.openjdk.org/jdk/pull/16868#pullrequestreview-1756281866 PR Review Comment: https://git.openjdk.org/jdk/pull/16868#discussion_r1409910394 From cslucas at openjdk.org Wed Nov 29 22:40:35 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 29 Nov 2023 22:40:35 GMT Subject: RFR: JDK-8241503: C2: Share MacroAssembler between mach nodes during code emission [v4] In-Reply-To: References: Message-ID: > # Description > > Please review this PR with a patch to re-use the same C2_MacroAssembler object to emit all instructions in the same compilation unit. > > Overall, the change is pretty simple. However, due to the renaming of the variable to access C2_MacroAssembler, from `_masm.` to `masm->`, and also some method prototype changes, the patch became quite large. > > # Help Needed for Testing > > I don't have access to all platforms necessary to test this. I hope some other folks can help with testing on `S390`, `RISC-V` and `PPC`. > > # Testing status > > ## tier1 > > | | Win | Mac | Linux | > |----------|---------|---------|---------| > | ARM64 | | | | > | ARM32 | | | | > | x86 | | | | > | x64 | | | | > | PPC64 | | | | > | S390x | | | | > | RiscV | | | | Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Fix merge - Catch up with master branch. Merge remote-tracking branch 'origin/master' into reuse-macroasm - Some inst_mark fixes; Catch up with master. - Catch up with changes on master - Reuse same C2_MacroAssembler object to emit instructions. ------------- Changes: https://git.openjdk.org/jdk/pull/16484/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16484&range=03 Stats: 2433 lines in 60 files changed: 106 ins; 433 del; 1894 mod Patch: https://git.openjdk.org/jdk/pull/16484.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16484/head:pull/16484 PR: https://git.openjdk.org/jdk/pull/16484 From dcubed at openjdk.org Wed Nov 29 23:09:16 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 29 Nov 2023 23:09:16 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is created for one JavaThread associated with attached native thread [v5] In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Wed, 22 Nov 2023 22:40:20 GMT, Jiangli Zhou wrote: >> Please review JvmtiThreadState::state_for_while_locked change to handle the state->get_thread_oop() null case. Please see https://bugs.openjdk.org/browse/JDK-8319935 for details. > > Jiangli Zhou has updated the pull request incrementally with one additional commit since the last revision: > > Address Serguei Spitsyn's comments/suggestions: > - Remove the redundant thread->is_Java_thread() check from JvmtiSampledObjectAllocEventCollector::object_alloc_is_safe_to_sample(). > - Change the assert in JvmtiThreadState::state_for_while_locked to avoid #ifdef ASSERT. A belated thumbs up. Sorry I didn't get back to this review before the fix was integrated. I found just a nit comment that could be more clear. src/hotspot/share/prims/jvmtiExport.cpp line 3143: > 3141: > 3142: // If the current thread is attaching from native and its Java thread object > 3143: // is being allocated, things are not ready for allocation sampling. nit - typo: s/is being allocated/has not been allocated/ ------------- PR Review: https://git.openjdk.org/jdk/pull/16642#pullrequestreview-1756369261 PR Review Comment: https://git.openjdk.org/jdk/pull/16642#discussion_r1409970009 From sspitsyn at openjdk.org Wed Nov 29 23:13:05 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 29 Nov 2023 23:13:05 GMT Subject: RFR: 8320239: add dynamic switch for JvmtiVTMSTransitionDisabler sync protocol [v2] In-Reply-To: References: Message-ID: On Fri, 17 Nov 2023 10:30:59 GMT, Serguei Spitsyn wrote: >> This is an update for a performance/scalability enhancement. >> >> The `JvmtiVTMSTransitionDisabler`sync protocol is on a performance critical path of the virtual threads mount state transitions (VTMS transitions). It has a noticeable performance overhead (about 10%) which contributes to the combined JVMTI performance overhead when Java apps are executed with loaded JVMTI agents. >> >> Please, also see another/related performance issue which contributes around 70% of total performance overhead: >> [8308614](https://bugs.openjdk.org/browse/JDK-8308614): Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 >> >> Testing: >> - Ran mach5 tiers 1-6 with no regressions noticed. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: make new fields volatile, use Atomic for access/update PING... Need one more reviewer for this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16688#issuecomment-1832845586 From jjoo at openjdk.org Thu Nov 30 00:06:18 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Thu, 30 Nov 2023 00:06:18 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v47] In-Reply-To: References: Message-ID: <4CkUYwnpxAAaKyV86ybEHbzNJw-7yzaU4X2CJsmD4MQ=.4d5cc389-6009-4e01-9a79-5044fe011bfe@github.com> On Thu, 23 Nov 2023 12:43:00 GMT, Albert Mingkun Yang wrote: >> Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: >> >> Cleanup and address comments > > src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 2433: > >> 2431: } >> 2432: WorkerThreads* worker_threads = workers(); >> 2433: if (worker_threads != nullptr) { > > When will this be null? I guess it shouldn't be, since it is set when we call initialize() and also never unset. I'll remove this conditional. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1410005989 From dholmes at openjdk.org Thu Nov 30 00:30:04 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 30 Nov 2023 00:30:04 GMT Subject: RFR: 8320750: Allow a testcase to run with muliple -Xlog In-Reply-To: References: Message-ID: <4Xm-yjzIjda7m9ts-5ONbz88TCskReNI5-81B_jbRI8=.07b3c008-7e8f-4df2-934b-392acdabd10d@github.com> On Mon, 27 Nov 2023 13:32:52 GMT, Leo Korinth wrote: > Running a testcase with muliple -Xlog crashes JTREG test cases. This is because `Collector.toMap` is not given a merge strategy. > > When the same argument is passed multiple times, I have added a merge strategy to use the latter value. This is similar to how it is implemented for `vm.opt.*` in JTREG. > > If the flag tested is `-Xlog`, replace the value part with a dummy value "NONEMPTY_TEST_SENTINEL". This is because in the case of multiple `-Xlog` all values are used, and JTREG does not give a satisfactory way to represent them. This dummy value should make it hard to try to `@require` on specific values by mistake. > > Tested with: > > @requires vm.opt.x.Xlog == "NONEMPTY_TEST_SENTINEL" > @requires vm.opt.x.Xlog == "NONEMPTY_TEST_SENTINELXXX" > @requires vm.opt.x.Xms == "3g" > > and > > JAVA_OPTIONS=-Xms3g -Xms4g > JAVA_OPTIONS=-Xms4g -Xms3g > JAVA_OPTIONS=-Xlog:gc* -Xlog:gc* > ``` > > Running tier1 The VM processes -XX flags such that "last value wins" - though in part that is due to convention in that the nature of our flags tend to be absolute rather than relative (e.g. imagine a flag `-XX:IncreaseDefaultFoo=3G` that simply does `foo_size += 3 * G` - there we would not have a last-flag-wins situation). The -X and other flags are a mixed bunch and there are no generally applicable rules. I struggle to see the actual benefit of `@requires vm.opt.x.Foo` as a general mechanism because I don't think there are many flags that would reasonably be applied to test runs that individual tests would care that much about ( things like Xcomp are already handled directly). We do not expect that any test can be run with any set of incoming flags - we only deal with specific sets of flags to control specific areas of functionality. E.g if someone complains that a test times out because they passed `-Xint` the solution would be "well don't do that" - we don't have the resources to try and make every test bullet-proof. The -Xlog handling for example, I can't see any case where a test should care about any incoming logging flags - even if the test itself performs logging, it should not care about other logging settings - and we would fix the test if there was a specific problem. Even for the heap flags I'm struggling to see the usefulness. If a test has specific heap size requirements then the test should set them - and last setting wins, so should be no issue there (I suspect many tests are lazy though and assume defaults so for example a test may set -Xmx without -Xms and so an externally supplied -Xms may conflict with it - but in that case I'd question why anyone was forcing -Xms externally like that) . Taking a specific example, to me `test/hotspot/jtreg/gc/arguments/TestG1HeapSizeFlags.java` that you changed is a clear candidate for vm.flagless and not using CreateTestJVM because this test is only checking the affects of it setting specific GC flags - we don't (or shouldn't care) about running such a simple test with a range of VM flags because the test itself is not interesting enough to warrant it i.e. there is nothing about that test to make us think it contains something unique such that only that test will provide test coverage in a specific area in relation to externally set flags. I mean that is what the whole push to `@driver` and `vm.flagless` has been about - there is no value in running a whole bunch of tests with other flags, given what the tests are actually testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16824#issuecomment-1832911919 From pchilanomate at openjdk.org Thu Nov 30 00:43:05 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 30 Nov 2023 00:43:05 GMT Subject: RFR: 8320239: add dynamic switch for JvmtiVTMSTransitionDisabler sync protocol [v2] In-Reply-To: References: Message-ID: On Fri, 17 Nov 2023 10:30:59 GMT, Serguei Spitsyn wrote: >> This is an update for a performance/scalability enhancement. >> >> The `JvmtiVTMSTransitionDisabler`sync protocol is on a performance critical path of the virtual threads mount state transitions (VTMS transitions). It has a noticeable performance overhead (about 10%) which contributes to the combined JVMTI performance overhead when Java apps are executed with loaded JVMTI agents. >> >> Please, also see another/related performance issue which contributes around 70% of total performance overhead: >> [8308614](https://bugs.openjdk.org/browse/JDK-8308614): Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 >> >> Testing: >> - Ran mach5 tiers 1-6 with no regressions noticed. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: make new fields volatile, use Atomic for access/update Hi Serguei, One comment below. Thanks src/hotspot/share/prims/jvmtiThreadState.cpp line 427: > 425: oop vt = JNIHandles::resolve_external_guard(vthread); > 426: > 427: if (!sync_protocol_enabled()) { Isn't the check racy? So we could see that there is no disabler, but before setting anything a disabler can run and succeed. Then we would never notice it and will just return after setting these fields. I think you need to check this after setting _VTMS_transition_count and the vthread transition bit. ------------- PR Review: https://git.openjdk.org/jdk/pull/16688#pullrequestreview-1756428869 PR Review Comment: https://git.openjdk.org/jdk/pull/16688#discussion_r1410012212 From pchilanomate at openjdk.org Thu Nov 30 00:44:18 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 30 Nov 2023 00:44:18 GMT Subject: RFR: 8320275: assert(_chunk->bitmap().at(index)) failed: Bit not set at index [v2] In-Reply-To: References: Message-ID: <0eX9lQsQl61MnSDcClo6e2S1wYOdDy21i-CzJq2faIw=.b7f0e9d0-62d6-43ce-a9be-57333f0f871d@github.com> > Please review the following fix. The assert fails while verifying the top frame of the stackChunk before returning from a thaw call. The stackChunk is in gc mode but we found a narrow oop for this c2 compiled frame that doesn't have its corresponding bit set. This is because while thawing its callee we cleared the bitmap range associated with the argument area, but this narrow oop happens to land at the very last stack slot of that region. > Loom code assumes the size of the argument area is always a multiple of 2 stack slots, as SharedRuntime::java_calling_convention() shows. But c2 doesn't seem to follow this convention and, knowing the last passed argument only takes one stack slot, it's using the remaining space to store a narrow oop for the caller. There are more details about the specific crash in JBS. > > The initial proposed fix is to just restrict the range of the bitmap we clear by excluding the last stack slot of the argument area, since passed oops are always word aligned. I've also experimented with a patch where I changed SharedRuntime::java_calling_convention() and Fingerprinter::do_type_calling_convention() to not round up the number of stack slots used, and then changed the callers to use a round up value or not depending on the needs [1]. I wasn't convinced it was worthy given we only care about this difference in this Loom code, but I don't mind going with that fix instead. The 3rd alternative would be to just change c2 to not use this stack slot and start spilling at a word aligned offset from the sp. > > I run the patch with the failing test and verified the crash doesn't reproduce anymore. I've also run this patch through loom tiers1-5. > > Thanks, > Patricio > > [1] https://github.com/pchilano/jdk/commit/42ae9269b28beb6f36c502182116545b680e418f Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: - add is_aligned assert in stackChunkOopDesc::bit_index_for - remove round up on java_calling_convention ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16837/files - new: https://git.openjdk.org/jdk/pull/16837/files/f117421e..42478a45 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16837&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16837&range=00-01 Stats: 67 lines in 16 files changed: 29 ins; 10 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/16837.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16837/head:pull/16837 PR: https://git.openjdk.org/jdk/pull/16837 From amenkov at openjdk.org Thu Nov 30 00:47:05 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 30 Nov 2023 00:47:05 GMT Subject: RFR: 8320239: add dynamic switch for JvmtiVTMSTransitionDisabler sync protocol [v2] In-Reply-To: References: Message-ID: On Fri, 17 Nov 2023 10:30:59 GMT, Serguei Spitsyn wrote: >> This is an update for a performance/scalability enhancement. >> >> The `JvmtiVTMSTransitionDisabler`sync protocol is on a performance critical path of the virtual threads mount state transitions (VTMS transitions). It has a noticeable performance overhead (about 10%) which contributes to the combined JVMTI performance overhead when Java apps are executed with loaded JVMTI agents. >> >> Please, also see another/related performance issue which contributes around 70% of total performance overhead: >> [8308614](https://bugs.openjdk.org/browse/JDK-8308614): Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 >> >> Testing: >> - Ran mach5 tiers 1-6 with no regressions noticed. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: make new fields volatile, use Atomic for access/update Marked as reviewed by amenkov (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16688#pullrequestreview-1756451315 From pchilanomate at openjdk.org Thu Nov 30 00:49:03 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 30 Nov 2023 00:49:03 GMT Subject: RFR: 8320275: assert(_chunk->bitmap().at(index)) failed: Bit not set at index In-Reply-To: References: Message-ID: On Wed, 29 Nov 2023 03:21:07 GMT, Dean Long wrote: >Also, I would prefer removing the round up in `java_calling_convention`, because the current patch fixes a subtle bug with an even subtler work-around, but let's see what other reviews think. > I removed the round up in java_calling_convention and do_type_calling_convention. The simpler approach wasn't going to work anyways. I think the callee can use the argument area to store data of a different type than the one passed as argument. So the last stack slot might not contain a narrow oop initially but could later on. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16837#issuecomment-1832927446 From pchilanomate at openjdk.org Thu Nov 30 00:53:14 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 30 Nov 2023 00:53:14 GMT Subject: RFR: 8320275: assert(_chunk->bitmap().at(index)) failed: Bit not set at index [v2] In-Reply-To: <0eX9lQsQl61MnSDcClo6e2S1wYOdDy21i-CzJq2faIw=.b7f0e9d0-62d6-43ce-a9be-57333f0f871d@github.com> References: <0eX9lQsQl61MnSDcClo6e2S1wYOdDy21i-CzJq2faIw=.b7f0e9d0-62d6-43ce-a9be-57333f0f871d@github.com> Message-ID: On Thu, 30 Nov 2023 00:44:18 GMT, Patricio Chilano Mateo wrote: >> Please review the following fix. The assert fails while verifying the top frame of the stackChunk before returning from a thaw call. The stackChunk is in gc mode but we found a narrow oop for this c2 compiled frame that doesn't have its corresponding bit set. This is because while thawing its callee we cleared the bitmap range associated with the argument area, but this narrow oop happens to land at the very last stack slot of that region. >> Loom code assumes the size of the argument area is always a multiple of 2 stack slots, as SharedRuntime::java_calling_convention() shows. But c2 doesn't seem to follow this convention and, knowing the last passed argument only takes one stack slot, it's using the remaining space to store a narrow oop for the caller. There are more details about the specific crash in JBS. >> >> The initial proposed fix is to just restrict the range of the bitmap we clear by excluding the last stack slot of the argument area, since passed oops are always word aligned. I've also experimented with a patch where I changed SharedRuntime::java_calling_convention() and Fingerprinter::do_type_calling_convention() to not round up the number of stack slots used, and then changed the callers to use a round up value or not depending on the needs [1]. I wasn't convinced it was worthy given we only care about this difference in this Loom code, but I don't mind going with that fix instead. The 3rd alternative would be to just change c2 to not use this stack slot and start spilling at a word aligned offset from the sp. >> >> I run the patch with the failing test and verified the crash doesn't reproduce anymore. I've also run this patch through loom tiers1-5. >> >> Thanks, >> Patricio >> >> [1] https://github.com/pchilano/jdk/commit/42ae9269b28beb6f36c502182116545b680e418f > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - add is_aligned assert in stackChunkOopDesc::bit_index_for > - remove round up on java_calling_convention I tested the last version in mach5 loom-tiers[1-5] and with the failing test. I'll keep running more rounds though since issues with this code are highly intermittent. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16837#issuecomment-1832930670 From pchilanomate at openjdk.org Thu Nov 30 00:53:12 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 30 Nov 2023 00:53:12 GMT Subject: RFR: 8320275: assert(_chunk->bitmap().at(index)) failed: Bit not set at index In-Reply-To: References: Message-ID: On Wed, 29 Nov 2023 03:21:07 GMT, Dean Long wrote: >I don't really like the use of address for the pointer that might be either oop or narrowOop, because the bitmap can't really support an arbitrary char* address. I think it would be better to cleanup methods that take intptr_t* and instead use template like bit_index_for. > The thing is that we would need to check before calling `clear_bitmap_bits()` if `UseCompressedOops` is set to use oop or narrowOop, which makes the code uglier in my opinion. This is a private method anyways and is not meant to be used outside this two callers in thaw code. But I added asserts in `clear_bitmap_bits()` to verify `start` and `end` have the expected alignment. I also added an is_aligned() assert in `stackChunkOopDesc::bit_index_for()` just for extra caution. While adding this last assert I actually realized I had a bug in my original experiment branch because I missed to align_down `end` when `UseCompressedOops` is not set, so thanks for pointing this out. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16837#issuecomment-1832929769 From sspitsyn at openjdk.org Thu Nov 30 01:45:05 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 Nov 2023 01:45:05 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 [v3] In-Reply-To: <3on61co7vkFJCag8MLNu5ZbcLR-9Whwku0wulthCLGk=.5abdf272-4723-45d4-85d2-207286177c99@github.com> References: <3on61co7vkFJCag8MLNu5ZbcLR-9Whwku0wulthCLGk=.5abdf272-4723-45d4-85d2-207286177c99@github.com> Message-ID: On Wed, 29 Nov 2023 21:05:31 GMT, Daniel D. Daugherty wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: add comment for new ThreadsListHandle use > > I'm going to resurrect the failing guarantee() code and part of the stack trace that was removed > and yack a bit about this code path. > > Here's the location of the failing guarantee(): > > void Handshake::execute(HandshakeClosure* hs_cl, ThreadsListHandle* tlh, JavaThread* target) { > . . . > guarantee(target != nullptr, "must be"); > if (tlh == nullptr) { > guarantee(Thread::is_JavaThread_protected_by_TLH(target), > "missing ThreadsListHandle in calling context."); > > > and here's part of the stack trace that got us here: > > V [libjvm.so+0x117937d] JvmtiEventControllerPrivate::enter_interp_only_mode(JvmtiThreadState*)+0x45d (jvmtiEventController.cpp:402) > V [libjvm.so+0x1179520] JvmtiEventControllerPrivate::recompute_thread_enabled(JvmtiThreadState*) [clone .part.0]+0x190 (jvmtiEventController.cpp:632) > V [libjvm.so+0x117a1e1] JvmtiEventControllerPrivate::thread_started(JavaThread*)+0x351 (jvmtiEventController.cpp:1174) > V [libjvm.so+0x117e608] JvmtiExport::get_jvmti_thread_state(JavaThread*)+0x98 (jvmtiExport.cpp:424) > V [libjvm.so+0x118a86c] JvmtiExport::post_field_access(JavaThread*, Method*, unsigned char*, Klass*, Handle, _jfieldID*)+0x6c (jvmtiExport.cpp:2214) > > > This must have been a stack trace from a build with some optimizations enabled because > when I look at last night's code base, I see 8 frames from the JvmtiExport::get_jvmti_thread_state() > call to Handshake::execute() with three params: > > > src/hotspot/share/prims/jvmtiExport.cpp: > JvmtiExport::get_jvmti_thread_state(JavaThread *thread) { > assert(thread == JavaThread::current(), "must be current thread"); > if (thread->is_vthread_mounted() && thread->jvmti_thread_state() == nullptr) { > JvmtiEventController::thread_started(thread); > } > > The above code asserts that the `thread` parameter is the current thread so > we know that the calling thread is operating on itself. > > > src/hotspot/share/prims/jvmtiEventController.cpp > JvmtiEventControllerPrivate::thread_started(JavaThread *thread) { > assert(thread == Thread::current(), "must be current thread"); > > // if we have any thread filtered events globally enabled, create/update the thread state > if (is_any_thread_filtered_event_enabled_globally()) { // intentionally racy > JvmtiThreadState::state_for(thread); > > The above code asserts that the `thread` parameter is the current thread so > we know that the calling thread is operating on itself. Note that we're calling > the single parameter vers... @dcubed-ojdk Thank you for the analysis. I agree with it. It is why I've removed this stack trace and posted my statement: > I've removed a part of this comment with stack traces as my traps were not fully correct, need to double check everything. > This issue is not well reproducible but I'm still trying to reproduce it again. Initially, I added a trap to catch the issue with the JVMTI SetEventNotificationMode related code path and mistakenly thought that I've caught it. But it was a different code path that you just described. So. I've removed this part of my comment as it was wrong. Then I tried to reproduce the code path I wanted to catch but had no luck to reproduce it. Now, I'm going to remove this line of code from my PR we are discussing. I'm suggesting to investigate the issue with guarantee separately from this PR. Is it okay with you? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16686#issuecomment-1832970099 From jjoo at openjdk.org Thu Nov 30 01:48:47 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Thu, 30 Nov 2023 01:48:47 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v49] In-Reply-To: References: Message-ID: <5zh0f9THaoeVoSuLRa8Poxm9ex9jnLpBcyK5JoEIEHQ=.c00b21e1-69ab-424b-a708-be2962b3f4ed@github.com> > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Change APIs to be all-static, address other comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/abb90258..91100e81 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=48 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=47-48 Stats: 80 lines in 6 files changed: 21 ins; 21 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From jjoo at openjdk.org Thu Nov 30 01:48:47 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Thu, 30 Nov 2023 01:48:47 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v47] In-Reply-To: References: Message-ID: On Wed, 29 Nov 2023 15:23:55 GMT, Albert Mingkun Yang wrote: >> We could add a new closure just used by GC that 's a sub-class of `ThreadTotalCPUTimeClosure` and just adds this to the constructor: >> >> instance->inc_gc_total_cpu_time(net_cpu_time); >> >> >> That way we could get rid of `CPUTimeGroups::is_gc_counter()` as well since all those counters should use the "GC closure" or we can keep it and assert that no GC closure uses the wrong closure. >> >> What do you think about that Albert, would that address your concerns? > > (I just realized that I made a typo in my previous msg; should be *callee* instead.) That is what I have in mind. > > > void CPUTimeCounters::update_counter(name, total) { > auto counter = get_counter(name); > auto old_v = counter->get_value(); > auto diff = total - old_v; > counter->inc(diff); > if (counter->is_gc_counter()) { > counter->inc(diff); > } > } I'm not sure I understood correctly, could you let me know if this latest commit addresses your comment in the way you were intending? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1410059322 From jjoo at openjdk.org Thu Nov 30 01:52:46 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Thu, 30 Nov 2023 01:52:46 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v50] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: remove whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/91100e81..d11d6960 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=49 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=48-49 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From jjoo at openjdk.org Thu Nov 30 01:59:58 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Thu, 30 Nov 2023 01:59:58 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v51] In-Reply-To: References: Message-ID: <-NwH6KcjpOF6QwMu8IJvYFjEPrjgQ_CjSzS-kGUGBrI=.9623e3ce-7687-4ccc-86cc-415fdde20096@github.com> > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 56 commits: - Merge branch 'openjdk:master' into master - remove whitespace - Change APIs to be all-static, address other comments - Fix namespace issues (2) Co-authored-by: Stefan Johansson <54407259+kstefanj at users.noreply.github.com> - Fix namespace issues Co-authored-by: Stefan Johansson <54407259+kstefanj at users.noreply.github.com> - Cleanup and address comments - Update memory tracking type for CPUTimeCounters - Fix assertion logic - Address comments and refactor TTTC class for simplification - Fix whitespace - ... and 46 more: https://git.openjdk.org/jdk/compare/940f67c1...63adf78c ------------- Changes: https://git.openjdk.org/jdk/pull/15082/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=50 Stats: 437 lines in 19 files changed: 432 ins; 4 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From jjoo at openjdk.org Thu Nov 30 02:04:41 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Thu, 30 Nov 2023 02:04:41 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v52] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: fix whitespace again ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/63adf78c..e6726ab6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=51 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=50-51 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From sspitsyn at openjdk.org Thu Nov 30 02:08:39 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 Nov 2023 02:08:39 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 [v4] In-Reply-To: References: Message-ID: > This is a fix of a performance/scalability related issue. The `JvmtiThreadState` objects for virtual thread filtered events enabled globally are created eagerly because it is needed when the `interp_only_mode` is enabled. Otherwise, some events which are generated in `interp_only_mode` from the debugging version of interpreter chunks can be missed. > However, it has to be okay to avoid eager creation of these object if no `interp_only_mode` has ever been requested. > It seems to be an extremely important optimization to create JvmtiThreadState objects lazily in such cases. > It is done by introducing the flag `JvmtiThreadState::_seen_interp_only_mode` which indicates when the `JvmtiThreadState` objects have to be created eagerly. > > Additionally, the fix includes the following related changes: > - Use condition double checking idiom for `MutexLocker mu(JvmtiThreadState_lock)` in the function `JvmtiVTMSTransitionDisabler::VTMS_mount_end` which is on a performance-critical path and looks like this: > > JvmtiThreadState* state = thread->jvmti_thread_state(); > if (state != nullptr && state->is_pending_interp_only_mode()) { > MutexLocker mu(JvmtiThreadState_lock); > state = thread->jvmti_thread_state(); > if (state != nullptr && state->is_pending_interp_only_mode()) { > JvmtiEventController::enter_interp_only_mode(); > } > } > > > - Add extra check of `JvmtiExport::can_support_virtual_threads()` when virtual thread mount and unmount are posted. > - Minor: Added a `ThreadsListHandle` to the `JvmtiEventControllerPrivate::enter_interp_only_mode`. It is needed because of the dynamic creation of compensating carrier threads which is racy for JVMTI `SetEventNotificationMode` implementation. > > Performance mesurements: > - Without this fix the test provided by the bug submitter gives execution numbers: > - no ClassLoad events enabled: 3251 ms > - ClassLoad events are enabled: 40534 ms > > - With the fix: > - no ClassLoad events enabled: 3270 ms > - ClassLoad events are enabled: 3385 ms > > Testing: > - Ran mach5 tiers 1-6, no regressions are noticed Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: remove newly added ThreadsListHandle from enter_interp_only_mode ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16686/files - new: https://git.openjdk.org/jdk/pull/16686/files/de36957a..e3d30c86 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16686&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16686&range=02-03 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16686.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16686/head:pull/16686 PR: https://git.openjdk.org/jdk/pull/16686 From jjoo at openjdk.org Thu Nov 30 02:42:18 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Thu, 30 Nov 2023 02:42:18 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48] In-Reply-To: References: <_lEBVrWV8wrVbmhOiu3AAqPJo_xBs718ZtA9V-VSzGM=.253c0ec8-256e-4dee-b125-90be6338e4b8@github.com> Message-ID: On Wed, 29 Nov 2023 15:24:52 GMT, Albert Mingkun Yang wrote: >> src/hotspot/share/runtime/cpuTimeCounters.cpp line 91: >> >>> 89: } while (old_value != fetched_value); >>> 90: get_counter(CPUTimeGroups::CPUTimeType::gc_total)->inc(fetched_value); >>> 91: } >> >> Why do we have to do this publish dance? Couldn't the closure that update the diff instead just update the counter? From what I can tell we never have multiple closures active at the same time so should be no race there? > > This two-step update does seem unnecessary, IMO. I agree that in the case that multiple closures are not active at the same time, we wouldn't have to implement it in this way. However, isn't it possible to have multiple closures active simultaneously, e.g. vm thread, concurrent mark thread, concurrent refine thread? I think to account for the races there, we can only update the `_gc_total_cpu_time_diff` member variable atomically during these closure destructions, and then publish the actual updated `gc_total` counter at manually specified times via `publish_gc_total_cpu_time()`. If we were to call `publish_gc_total_cpu_time` as part of the thread closure, I believe it would be difficult to prevent races when accessing the underlying counter from the various gc-related threads. Or maybe there is another strategy that I'm not seeing? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1410089070 From dholmes at openjdk.org Thu Nov 30 02:45:13 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 30 Nov 2023 02:45:13 GMT Subject: RFR: 8314029: Add file name parameter to Compiler.perfmap [v5] In-Reply-To: <81dXSHvLQMGj3s1BcBs8fmJUEoJpaU-5wBRSIjnztMM=.d53f8a2f-8353-49ec-8a9b-695b32f03d20@github.com> References: <81dXSHvLQMGj3s1BcBs8fmJUEoJpaU-5wBRSIjnztMM=.d53f8a2f-8353-49ec-8a9b-695b32f03d20@github.com> Message-ID: On Tue, 28 Nov 2023 23:25:27 GMT, Yi-Fan Tsai wrote: >> `jcmd Compiler.perfmap` uses the hard-coded file name for a perf map: `/tmp/perf-%d.map`. This change adds an optional argument for specifying a file name. >> >> `jcmd PID help Compiler.perfmap` shows the following usage. >> >> >> Compiler.perfmap >> Write map file for Linux perf tool. >> >> Impact: Low >> >> Syntax : Compiler.perfmap [] >> >> Arguments: >> filename : [optional] Name of the map file (STRING, no default value) >> >> >> The man page of jcmd will be updated in a separate PR. > > Yi-Fan Tsai has updated the pull request incrementally with one additional commit since the last revision: > > Apply man changes @yftsai , @plummercj my aplogies on the CSR situation. Since Sep 22 I have been reminded that changing the command-line of our diagnostic tools, is treated as a "diagnostic" option per the CSR FAQ and does not require a CSR request. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15871#issuecomment-1833017497 From jjoo at openjdk.org Thu Nov 30 02:45:45 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Thu, 30 Nov 2023 02:45:45 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v53] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Add missing include ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/e6726ab6..7e4cdcd3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=52 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=51-52 Stats: 2 lines in 2 files changed: 1 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From duke at openjdk.org Thu Nov 30 03:03:31 2023 From: duke at openjdk.org (SUN Guoyun) Date: Thu, 30 Nov 2023 03:03:31 GMT Subject: RFR: 8301997: Move method resolution information out of the cpCache [v13] In-Reply-To: References: Message-ID: On Wed, 15 Nov 2023 16:44:15 GMT, Matias Saavedra Silva wrote: >> The current structure used to store the resolution information for methods, ConstantPoolCacheEntry, is difficult to interpret due to its ambigious fields f1 and f2. This structure previously held information for fields, methods, and invokedynamic calls which were all encoded into f1 and f2. Currently this structure only handles method entries, but it remains obtuse and inconsistent with recent changes. >> >> This enhancement introduces a new data structure that stores the necessary resolution data in an intuitive an extensible manner. These resolved entries are stored in an array inside the constant pool cache in a very similar manner to invokedynamic entries in JDK-8301995. >> >> Instances of ConstantPoolCache entry related to field resolution have been replaced with the new ResolvedMethodEntry, and ConstantPoolCacheEntry has been removed entirely. The class ResolvedMethodEntry holds resolution information for all types of invoke calls besides invokedynamic, and thus has fields that may be unused depending on the invoke code. >> >> To streamline the review, please consider these major areas that have been changed: >> 1. ResolvedMethodEntry class >> 2. Rewriter for initialization of the structure >> 3. cpCache for resolution >> 4. InterpreterRuntime, linkResolver, and templateTable >> 5. JVMCI >> 6. SA >> >> Verified with tier 1-9 tests. >> >> This change supports the following platforms: x86, aarch64, RISCV, PPC, S390 > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > S390 port src/hotspot/cpu/riscv/templateTable_riscv.cpp line 2192: > 2190: } > 2191: // Load-acquire the bytecode to match store-release in InterpreterRuntime > 2192: __ membar(MacroAssembler::AnyAny); Perhaps just using `StoreLoad` here is enough, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15455#discussion_r1410100483 From dlong at openjdk.org Thu Nov 30 03:07:03 2023 From: dlong at openjdk.org (Dean Long) Date: Thu, 30 Nov 2023 03:07:03 GMT Subject: RFR: 8320275: assert(_chunk->bitmap().at(index)) failed: Bit not set at index [v2] In-Reply-To: <0eX9lQsQl61MnSDcClo6e2S1wYOdDy21i-CzJq2faIw=.b7f0e9d0-62d6-43ce-a9be-57333f0f871d@github.com> References: <0eX9lQsQl61MnSDcClo6e2S1wYOdDy21i-CzJq2faIw=.b7f0e9d0-62d6-43ce-a9be-57333f0f871d@github.com> Message-ID: On Thu, 30 Nov 2023 00:44:18 GMT, Patricio Chilano Mateo wrote: >> Please review the following fix. The assert fails while verifying the top frame of the stackChunk before returning from a thaw call. The stackChunk is in gc mode but we found a narrow oop for this c2 compiled frame that doesn't have its corresponding bit set. This is because while thawing its callee we cleared the bitmap range associated with the argument area, but this narrow oop happens to land at the very last stack slot of that region. >> Loom code assumes the size of the argument area is always a multiple of 2 stack slots, as SharedRuntime::java_calling_convention() shows. But c2 doesn't seem to follow this convention and, knowing the last passed argument only takes one stack slot, it's using the remaining space to store a narrow oop for the caller. There are more details about the specific crash in JBS. >> >> The initial proposed fix is to just restrict the range of the bitmap we clear by excluding the last stack slot of the argument area, since passed oops are always word aligned. I've also experimented with a patch where I changed SharedRuntime::java_calling_convention() and Fingerprinter::do_type_calling_convention() to not round up the number of stack slots used, and then changed the callers to use a round up value or not depending on the needs [1]. I wasn't convinced it was worthy given we only care about this difference in this Loom code, but I don't mind going with that fix instead. The 3rd alternative would be to just change c2 to not use this stack slot and start spilling at a word aligned offset from the sp. >> >> I run the patch with the failing test and verified the crash doesn't reproduce anymore. I've also run this patch through loom tiers1-5. >> >> Thanks, >> Patricio >> >> [1] https://github.com/pchilano/jdk/commit/42ae9269b28beb6f36c502182116545b680e418f > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - add is_aligned assert in stackChunkOopDesc::bit_index_for > - remove round up on java_calling_convention src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2175: > 2173: // we need to clear the bits that correspond to arguments as they reside in the caller frame > 2174: // or they will keep objects that are otherwise unreachable alive > 2175: address effective_end = UseCompressedOops ? end : align_down(end, wordSize); Is the align_down for correctness, or just for the benefit of the new assert at line 2179? Since it's not immediately obvious, I think it deserves a comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16837#discussion_r1410102529 From dlong at openjdk.org Thu Nov 30 03:31:05 2023 From: dlong at openjdk.org (Dean Long) Date: Thu, 30 Nov 2023 03:31:05 GMT Subject: RFR: 8320275: assert(_chunk->bitmap().at(index)) failed: Bit not set at index [v2] In-Reply-To: <0eX9lQsQl61MnSDcClo6e2S1wYOdDy21i-CzJq2faIw=.b7f0e9d0-62d6-43ce-a9be-57333f0f871d@github.com> References: <0eX9lQsQl61MnSDcClo6e2S1wYOdDy21i-CzJq2faIw=.b7f0e9d0-62d6-43ce-a9be-57333f0f871d@github.com> Message-ID: <420maB6CB-flM2EsOOVlpWvOghCavZ02chpeIk9vCKs=.039647b7-4d60-4a52-a8b2-cf93972911f8@github.com> On Thu, 30 Nov 2023 00:44:18 GMT, Patricio Chilano Mateo wrote: >> Please review the following fix. The assert fails while verifying the top frame of the stackChunk before returning from a thaw call. The stackChunk is in gc mode but we found a narrow oop for this c2 compiled frame that doesn't have its corresponding bit set. This is because while thawing its callee we cleared the bitmap range associated with the argument area, but this narrow oop happens to land at the very last stack slot of that region. >> Loom code assumes the size of the argument area is always a multiple of 2 stack slots, as SharedRuntime::java_calling_convention() shows. But c2 doesn't seem to follow this convention and, knowing the last passed argument only takes one stack slot, it's using the remaining space to store a narrow oop for the caller. There are more details about the specific crash in JBS. >> >> The initial proposed fix is to just restrict the range of the bitmap we clear by excluding the last stack slot of the argument area, since passed oops are always word aligned. I've also experimented with a patch where I changed SharedRuntime::java_calling_convention() and Fingerprinter::do_type_calling_convention() to not round up the number of stack slots used, and then changed the callers to use a round up value or not depending on the needs [1]. I wasn't convinced it was worthy given we only care about this difference in this Loom code, but I don't mind going with that fix instead. The 3rd alternative would be to just change c2 to not use this stack slot and start spilling at a word aligned offset from the sp. >> >> I run the patch with the failing test and verified the crash doesn't reproduce anymore. I've also run this patch through loom tiers1-5. >> >> Thanks, >> Patricio >> >> [1] https://github.com/pchilano/jdk/commit/42ae9269b28beb6f36c502182116545b680e418f > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - add is_aligned assert in stackChunkOopDesc::bit_index_for > - remove round up on java_calling_convention OK, the use of `address` still seems misleading, but I can't think of anything much better, except perhaps `void*`. Do we really need a version of num_stack_arg_slots() that rounds up? I wish we didn't have duplicate code between java_calling_convention() and Fingerprinter, and unnecessarily different calling conventions between platforms, but those issues could be cleaned up in a separate RFE. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16837#issuecomment-1833049930 From dholmes at openjdk.org Thu Nov 30 04:35:14 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 30 Nov 2023 04:35:14 GMT Subject: RFR: 8320860: add-opens/add-exports require '=' in JAVA_TOOL_OPTIONS Message-ID: Please review this simple clarification to the JVM TI spec regarding use of `JAVA_TOOL_OPTIONS` in regards to module options and their format. I do not believe this clarification needs a CSR request. Thanks. ------------- Commit messages: - 8320860: add-opens/add-exports require '=' in JAVA_TOOL_OPTIONS Changes: https://git.openjdk.org/jdk/pull/16896/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16896&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320860 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16896.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16896/head:pull/16896 PR: https://git.openjdk.org/jdk/pull/16896 From dholmes at openjdk.org Thu Nov 30 05:46:04 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 30 Nov 2023 05:46:04 GMT Subject: RFR: JDK-8320892: AArch64: Restore FPU control state after JNI [v3] In-Reply-To: References: <-Jv5Xvre3lonydwQ5uzYN3QB8V0VIuORIhM1RtIdW5g=.06167df9-7268-4945-8e18-a04d19ee97e1@github.com> Message-ID: On Tue, 28 Nov 2023 15:58:04 GMT, Andrew Haley wrote: >> Some buggy libraries corrupt the floating-point control register. Provide something similar to the x86 RestoreMXCSROnJNICalls. >> >> I realize that using the x86ish name "RestoreMXCSROnJNICalls" might be a little controversial, but it is a _global_ flag, not a CPU-specific one. And it's clearly intended for this purpose. It might have been better if that flag had been given a better name twentyish years ago, but we can't change it now. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Fix thinko I was also going to suggest adding a new flag and creating an alias. The new flag will need a CSR request of course. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16851#issuecomment-1833143152 From dholmes at openjdk.org Thu Nov 30 06:09:24 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 30 Nov 2023 06:09:24 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v11] In-Reply-To: References: Message-ID: On Wed, 29 Nov 2023 17:27:30 GMT, Jaroslav Bachorik wrote: >> Jaroslav Bachorik has updated the pull request incrementally with one additional commit since the last revision: >> >> Restrict cleanup to obsolete methods only > > Thanks everyone involved in reviewing this PR! You were awesome and helped me drive the PR to better place than it started! @jbachorik this should not have been integrated yet! You only have one review not the required two for hotspot changes. Further your one Reviewer didn't even review the final version of the change! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16662#issuecomment-1833162015 From dholmes at openjdk.org Thu Nov 30 06:27:06 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 30 Nov 2023 06:27:06 GMT Subject: RFR: 8320530: has_resolved_ref_index flag not restored after resetting entry [v3] In-Reply-To: References: <4Bi8mWx5pxfYciHnXHUla1X_BzUt_56q8MoRkCYc0dk=.50072528-a5d9-4708-995b-c0bd49f8c74b@github.com> Message-ID: On Wed, 29 Nov 2023 20:01:22 GMT, Matias Saavedra Silva wrote: >> ResolvedMethodEntry::reset_entry() clears the fields in the structure and then restores the constant pool index and the resolved references index if it has one. Currently, the resolved references index is restored without restoring the has_resolved_reference flag used by the structure. >> >> This patch restored the flag with the resolved references index. Verified with tier 1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Corrections and gtest fix Thanks for the updates. One indentation nit to fix before integration please. src/hotspot/share/oops/resolvedMethodEntry.hpp line 99: > 97: _has_interface_klass = false; > 98: _has_table_index = false; > 99: #endif Indent appears to be 1 instead of 2 ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16769#pullrequestreview-1756718722 PR Review Comment: https://git.openjdk.org/jdk/pull/16769#discussion_r1410205566 From xgong at openjdk.org Thu Nov 30 06:39:43 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 30 Nov 2023 06:39:43 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v5] In-Reply-To: References: Message-ID: > Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). > > SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. > > To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. > > Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. > > [1] https://github.com/openjdk/jdk/pull/3638 > [2] https://sleef.org/ > [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ > [4] https://packages.debian.org/bookworm/libsleef3 > [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Rename vmath to sleef in configure ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16234/files - new: https://git.openjdk.org/jdk/pull/16234/files/2c3c4a64..c1ce1968 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16234&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16234&range=03-04 Stats: 268 lines in 6 files changed: 137 ins; 126 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16234.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16234/head:pull/16234 PR: https://git.openjdk.org/jdk/pull/16234 From ayang at openjdk.org Thu Nov 30 06:48:20 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 30 Nov 2023 06:48:20 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48] In-Reply-To: References: <_lEBVrWV8wrVbmhOiu3AAqPJo_xBs718ZtA9V-VSzGM=.253c0ec8-256e-4dee-b125-90be6338e4b8@github.com> Message-ID: On Thu, 30 Nov 2023 02:39:39 GMT, Jonathan Joo wrote: >> This two-step update does seem unnecessary, IMO. > > I agree that in the case that multiple closures are not active at the same time, we wouldn't have to implement it in this way. However, isn't it possible to have multiple closures active simultaneously, e.g. vm thread, concurrent mark thread, concurrent refine thread? I think to account for the races there, we can only update the `_gc_total_cpu_time_diff` member variable atomically during these closure destructions, and then publish the actual updated `gc_total` counter at manually specified times via `publish_gc_total_cpu_time()`. If we were to call `publish_gc_total_cpu_time` as part of the thread closure, I believe it would be difficult to prevent races when accessing the underlying counter from the various gc-related threads. > > Or maybe there is another strategy that I'm not seeing? Both `publish_gc_total_cpu_time` and `~ThreadTotalCPUTimeClosure` are called by the vm-thread inside a safepoint, so there shouldn't be any other threads running simultaneously, I believe. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1410221646 From sspitsyn at openjdk.org Thu Nov 30 07:34:23 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 Nov 2023 07:34:23 GMT Subject: RFR: 8320239: add dynamic switch for JvmtiVTMSTransitionDisabler sync protocol [v3] In-Reply-To: References: Message-ID: > This is an update for a performance/scalability enhancement. > > The `JvmtiVTMSTransitionDisabler`sync protocol is on a performance critical path of the virtual threads mount state transitions (VTMS transitions). It has a noticeable performance overhead (about 10%) which contributes to the combined JVMTI performance overhead when Java apps are executed with loaded JVMTI agents. > > Please, also see another/related performance issue which contributes around 70% of total performance overhead: > [8308614](https://bugs.openjdk.org/browse/JDK-8308614): Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 > > Testing: > - Ran mach5 tiers 1-6 with no regressions noticed. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: review: addressed a race condition ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16688/files - new: https://git.openjdk.org/jdk/pull/16688/files/a81218fd..01f6aea7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16688&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16688&range=01-02 Stats: 15 lines in 1 file changed: 6 ins; 9 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16688.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16688/head:pull/16688 PR: https://git.openjdk.org/jdk/pull/16688 From sspitsyn at openjdk.org Thu Nov 30 07:40:12 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 Nov 2023 07:40:12 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v9] In-Reply-To: References: Message-ID: On Wed, 29 Nov 2023 06:38:51 GMT, Stefan Karlsson wrote: >> In the rewrites made for: >> [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` >> >> I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. >> >> The provided tests provoke this assert form: >> * the JNI thread detach code >> * thread dumping with locked monitors, and >> * the JVMTI GetOwnedMonitorInfo API. >> >> While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. >> >> The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. >> >> For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. >> >> Test: the written tests with and without the fix. Tier1-Tier3, so far. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Review comments Marked as reviewed by sspitsyn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16783#pullrequestreview-1756808953 From sspitsyn at openjdk.org Thu Nov 30 07:45:06 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 Nov 2023 07:45:06 GMT Subject: RFR: 8320239: add dynamic switch for JvmtiVTMSTransitionDisabler sync protocol [v2] In-Reply-To: References: Message-ID: On Thu, 30 Nov 2023 00:14:34 GMT, Patricio Chilano Mateo wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: make new fields volatile, use Atomic for access/update > > src/hotspot/share/prims/jvmtiThreadState.cpp line 427: > >> 425: oop vt = JNIHandles::resolve_external_guard(vthread); >> 426: >> 427: if (!sync_protocol_enabled()) { > > Isn't the check racy? So we could see that there is no disabler, but before setting anything a disabler can run and succeed. Then we would never notice it and will just return after setting these fields. > I think you need to check this after setting _VTMS_transition_count and the vthread transition bit. Thank you for looking at this, Patricio! Nice catch - fixed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16688#discussion_r1410267479 From xgong at openjdk.org Thu Nov 30 07:54:11 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 30 Nov 2023 07:54:11 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v4] In-Reply-To: References: <3dscjw75an6_n3ko_2LjHdpdj08xsyjhfpStuZrS5-M=.f43592fe-036e-47b5-babc-67f126885839@github.com> Message-ID: On Thu, 23 Nov 2023 14:05:51 GMT, Magnus Ihse Bursie wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments in build system > > make/autoconf/lib-vmath.m4 line 70: > >> 68: if test "x$SYSROOT" = "x" && >> 69: test "x${LIBSLEEF_FOUND}" = "xno"; then >> 70: PKG_CHECK_MODULES([LIBSLEEF], [sleef], [LIBSLEEF_FOUND=yes], [LIBSLEEF_FOUND=no]) > > Suggestion: > > PKG_CHECK_MODULES([SLEEF], [sleef], [LIBSLEEF_FOUND=yes], [LIBSLEEF_FOUND=no]) > > > Otherwise `PKG_CHECK_MODULES` will set the variables LIBSLEEF_CFLAGS and LIBSLEEF_LIBS. Keep using `LIBSLEEF`, as the cflags and libs are named with `LIBSLEEF_` prefix. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1410275161 From dfenacci at openjdk.org Thu Nov 30 08:05:20 2023 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 30 Nov 2023 08:05:20 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v13] In-Reply-To: References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: On Mon, 27 Nov 2023 19:09:40 GMT, Roger Riggs wrote: >> Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. >> The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. >> >> The changes include: >> >> - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate >> if the input array is modified before the constructor returns. >> The resulting string may contain any combination of characters sampled from the input array. >> >> - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. >> For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. >> If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. >> If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. >> Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. >> Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. >> >> - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. >> >> - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. > > Roger Riggs has updated the pull request incrementally with one additional commit since the last revision: > > Use byte off branches in char_array_compress > Verified by manual tests with "-XX:AVX3Threshold=0" > And test in the PR test/hotspot/jtreg/compiler/intrinsics/string/TestStringConstructionIntrinsics.java src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8547: > 8545: // bail out when there is nothing to be done > 8546: testl(tmp5, 0xFFFFFFFF); > 8547: jcc(Assembler::zero, post_alignment); @RogerRiggs I think you changed the wrong line ? Suggestion: jccb(Assembler::zero, post_alignment); src/hotspot/cpu/x86/macroAssembler_x86.cpp line 8559: > 8557: evpcmpw(mask1, mask2, tmp1Reg, tmp2Reg, Assembler::le, /*signed*/ false, Assembler::AVX_512bit); > 8558: ktestd(mask1, mask2); > 8559: jccb(Assembler::carryClear, copy_tail); Suggestion: jcc(Assembler::carryClear, copy_tail); (in this case the jump can potentially be long) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1410283240 PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1410284158 From xgong at openjdk.org Thu Nov 30 08:07:15 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 30 Nov 2023 08:07:15 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v3] In-Reply-To: References: <97YS_I-DY-Q5agE6mE-iBkoVxtvL7R4Q3NebjTsXMvI=.dac0dc99-84d7-4cbd-ada6-5190564688a9@github.com> Message-ID: On Wed, 22 Nov 2023 09:05:31 GMT, Andrew Haley wrote: >>> Have you considered the possibility of copying the sleef source to the OpenJDK repository and thereby it becomes part of the build process? I don't know how straightforward that is technically and IANAL but I think it's worth exploring. >>> >> >> Hi @PaulSandoz ! Thanks for the suggestion! Copying the sleef source sounds good. However, I actually have no idea about how to handle the third-party licence in OpenJDK project. Do you have any idea about this area? Some suggestions/guidence from the JDK team will be much helpful. Thanks! > >> > Have you considered the possibility of copying the sleef source to the OpenJDK repository and thereby it becomes part of the build process? I don't know how straightforward that is technically and IANAL but I think it's worth exploring. >> >> Hi @PaulSandoz ! Thanks for the suggestion! Copying the sleef source sounds good. However, I actually have no idea about how to handle the third-party licence in OpenJDK project. Do you have any idea about this area? Some suggestions/guidence from the JDK team will be much helpful. Thanks! > > From a legal pespective, we can do this. SLEEF is distributed under Boost Software License Version 1.0., which is a GPL-compatible free software licence. The only issue is whether we want to do so. It would certainly be convenient. Hi @theRealAph , @magicus , The latest commit renamed `vmath` in configure to `sleef`, and the Arm SVE cflags is removed to `flags-cflags.m4` now. I was thinking about separating the NEON and SVE functions in `libvmath.so` to two source files, and add the SVE cflags just for SVE functions as needed. But it seems I have to write the make commands manually, which I think is not good? But I can have a try if it's necessary. Since the sve flags is just used for building `libvmath.so` now, moving it to `flags-cflags.m4` seems not so right? @magicus , may I have your comments on this part? Thanks in advance! > Don't touch the make system.Instead, try to open the library at runtime with os::dll_open(), after (or inside) VM_Version::initialize(). > > If you're not running on an SVE system, none of the SVE routines will be called, so it doesn't matter, right? Yes, the SVE routines are called only on the SVE system. So I think it doesn't matter. I tested with the image built on non-sve machine and ran the Vector API tests on SVE machine, and it works well. It also works well on the contrary situation. Best Regards, Xiaohong ------------- PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1833278899 From aboldtch at openjdk.org Thu Nov 30 08:10:08 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 30 Nov 2023 08:10:08 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v10] In-Reply-To: References: Message-ID: > LM_LIGHTWEIGHT only uses the lock bits for its locking. This leaves the hashCode bits free when a monitor is not inflated. So instead of inflating when installing the hashCode on a fast locked object it can simply use the hashCode bits in the markWord. > > The mark word transitions Unlocked (0b01) <=> Locked (0b00) are done by retrying the CAS if it fails due to non-lock bit changes. > The mark word transitions Monitor (0b10) <=> Locked/Unlocked (0b0X) are the same as before, inflation already handles hash codes. This change does not interact with the mark word if it is in a Monitor (0b10) state, so the strong CAS which is used for deflation are still valid, and will not fail to any other reason than the cooperative race to help transition the mark word during deflation. > > This is dependent on JDK-8319778 simply because JDK-8319797 is dependent on both this and JDK-8319778. Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: - Merge remote-tracking branch 'upstream_jdk/master' into JDK-8319773 - Merge remote-tracking branch 'upstream_jdk/pr/16602' into JDK-8319773 - Merge remote-tracking branch 'upstream_jdk/master' into JDK-8319778 - Fix copy paste typo. - Update src/hotspot/share/opto/library_call.cpp Co-authored-by: Tobias Hartmann - Add retry CAS comment - Use is_neutral over is_unlocked - Merge remote-tracking branch 'upstream_jdk/pr/16602' into JDK-8319773 - Merge remote-tracking branch 'upstream_jdk/master' into JDK-8319778 - Use more familiar CAS variable names and pattern - ... and 6 more: https://git.openjdk.org/jdk/compare/10faf438...4508ef5a ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16603/files - new: https://git.openjdk.org/jdk/pull/16603/files/b4061417..4508ef5a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16603&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16603&range=08-09 Stats: 20207 lines in 843 files changed: 14073 ins; 3553 del; 2581 mod Patch: https://git.openjdk.org/jdk/pull/16603.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16603/head:pull/16603 PR: https://git.openjdk.org/jdk/pull/16603 From aboldtch at openjdk.org Thu Nov 30 08:10:10 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 30 Nov 2023 08:10:10 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v10] In-Reply-To: References: <2MRTHFoYSaSW2NH922LOEvqKx4NLjshWaHJaYV2RdVY=.e234046a-aac8-4d7b-81b9-269506944165@github.com> Message-ID: On Wed, 29 Nov 2023 18:25:56 GMT, Daniel D. Daugherty wrote: >> LM_LEGACY and LM_MONITOR will. LM_LIGHTWEIGHT technically may. If deflation finishes between reading the mark word in FastHashCode and reading the mark word in `inflate`. It seems like a rare enough case that it does not need to be handled separately. The following change would avoid inflation all together. >> >> // An async deflation can race after the inflate() call and before we >> // can update the ObjectMonitor's header with the hash value below. >> + if (LockingMode == LM_LIGHTWEIGHT) { >> + assert(mark.has_monitor(), "must be"); >> + monitor = mark.monitor(); >> + } else { >> - monitor = inflate(current, obj, inflate_cause_hash_code); >> + monitor = inflate(current, obj, inflate_cause_hash_code); >> + } >> // Load ObjectMonitor's header/dmw field and see if it has a hash. >> >> >> Maybe I should change it to this. Given that there has been confusion here. >> My ideal solution would be to separate the implementations for the different locking modes all together, all of these functions are littered with if (LockingMode == X). > > So what did you decide to do here? For now I believe the extra code noise from trying to handle this race with deflation is not worth it. I creates some questionable code paths and head scratchers. If we were to add a separate FastHashCode just for LM_LIGHTWEIGHT it would be worth it as the while loop body would look quite a bit different and be easier to reason about. But I was looking for input if we should handle this case regardless of code complexity. Or maybe taking this all the way and create a separate FastHashCode with its own more understandable logic which does not have to try to fit in with the legacy locking/inflation protocol. Regardless if we were to just go with it as it is now there should probably be a comment here along the line: ```c++ // With LM_LIGHTWEIGHT FastHashCode may race with deflation here and cause a monitor to be re-inflated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16603#discussion_r1410289581 From ihse at openjdk.org Thu Nov 30 09:20:14 2023 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 30 Nov 2023 09:20:14 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v5] In-Reply-To: References: Message-ID: On Thu, 30 Nov 2023 06:39:43 GMT, Xiaohong Gong wrote: >> Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). >> >> SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. >> >> To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. >> >> Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. >> >> [1] https://github.com/openjdk/jdk/pull/3638 >> [2] https://sleef.org/ >> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ >> [4] https://packages.debian.org/bookworm/libsleef3 >> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Rename vmath to sleef in configure make/autoconf/spec.gmk.in line 892: > 890: PNG_CFLAGS:=@PNG_CFLAGS@ > 891: > 892: ENABLE_LIBSLEEF:=@ENABLE_LIBSLEEF@ You need to merge in the latest changes from mainline. I have just updated spec.gmk.in to always have a space around `:=`. Please do so with your additions as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1410378540 From shade at openjdk.org Thu Nov 30 09:22:20 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 30 Nov 2023 09:22:20 GMT Subject: RFR: 8321063: AArch64: Zero build fails after JDK-8320368 Message-ID: <2l8_hqkC3ryoU7NfHN25nEECULjrX8qie6XM6f7gehI=.1922c2d5-0850-42d3-a9c1-7391b238aa7a@github.com> Simple: we need a definition for Zero, because we cannot reach the AArch64-specific one in `src/cpu/aarch64`. Additional testing: - [x] MacOS AArch64 Zero build now passes - [ ] Extensive build matrix of Zero and Server builds ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/16897/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16897&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8321063 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16897.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16897/head:pull/16897 PR: https://git.openjdk.org/jdk/pull/16897 From stuefe at openjdk.org Thu Nov 30 09:30:08 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 30 Nov 2023 09:30:08 GMT Subject: RFR: 8321063: AArch64: Zero build fails after JDK-8320368 In-Reply-To: <2l8_hqkC3ryoU7NfHN25nEECULjrX8qie6XM6f7gehI=.1922c2d5-0850-42d3-a9c1-7391b238aa7a@github.com> References: <2l8_hqkC3ryoU7NfHN25nEECULjrX8qie6XM6f7gehI=.1922c2d5-0850-42d3-a9c1-7391b238aa7a@github.com> Message-ID: On Thu, 30 Nov 2023 09:15:40 GMT, Aleksey Shipilev wrote: > Simple: we need a definition for Zero, because we cannot reach the AArch64-specific one in `src/cpu/aarch64`. > > Additional testing: > - [x] MacOS AArch64 Zero build now passes > - [ ] Extensive build matrix of Zero and Server builds Good + trivial. Oh, you were faster than me. I'll close https://github.com/openjdk/jdk/pull/16898. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16897#pullrequestreview-1757001368 PR Comment: https://git.openjdk.org/jdk/pull/16897#issuecomment-1833390344 From haosun at openjdk.org Thu Nov 30 09:31:26 2023 From: haosun at openjdk.org (Hao Sun) Date: Thu, 30 Nov 2023 09:31:26 GMT Subject: RFR: JDK-8241503: C2: Share MacroAssembler between mach nodes during code emission [v4] In-Reply-To: References: Message-ID: On Wed, 29 Nov 2023 22:40:35 GMT, Cesar Soares Lucas wrote: >> # Description >> >> Please review this PR with a patch to re-use the same C2_MacroAssembler object to emit all instructions in the same compilation unit. >> >> Overall, the change is pretty simple. However, due to the renaming of the variable to access C2_MacroAssembler, from `_masm.` to `masm->`, and also some method prototype changes, the patch became quite large. >> >> # Help Needed for Testing >> >> I don't have access to all platforms necessary to test this. I hope some other folks can help with testing on `S390`, `RISC-V` and `PPC`. >> >> # Testing status >> >> ## tier1 >> >> | | Win | Mac | Linux | >> |----------|---------|---------|---------| >> | ARM64 | | | | >> | ARM32 | | | | >> | x86 | | | | >> | x64 | | | | >> | PPC64 | | | | >> | S390x | | | | >> | RiscV | | | | > > Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Fix merge > - Catch up with master branch. > > Merge remote-tracking branch 'origin/master' into reuse-macroasm > - Some inst_mark fixes; Catch up with master. > - Catch up with changes on master > - Reuse same C2_MacroAssembler object to emit instructions. We should update the copyright year for the following files: src/hotspot/cpu/aarch64/gc/x/x_aarch64.ad src/hotspot/cpu/arm/arm_32.ad src/hotspot/cpu/ppc/gc/x/x_ppc.ad src/hotspot/cpu/riscv/gc/x/x_riscv.ad src/hotspot/cpu/x86/c2_intelJccErratum_x86.hpp src/hotspot/cpu/x86/gc/x/x_x86_64.ad src/hotspot/cpu/x86/x86_32.ad src/hotspot/share/opto/c2_CodeStubs.hpp src/hotspot/share/opto/constantTable.hpp src/hotspot/cpu/aarch64/aarch64.ad line 2829: > 2827: enc_class aarch64_enc_ldrsbw(iRegI dst, memory1 mem) %{ > 2828: Register dst_reg = as_Register($dst$$reg); > 2829: loadStore(masm, &MacroAssembler::ldrsbw, dst_reg, $mem->opcode(), The block of code should be auto-generated by `ad_encode.m4` file. We'd better not edit these lines directly. Instead, we should update the m4 file accordingly. src/hotspot/cpu/aarch64/aarch64_vector.ad line 122: > 120: } > 121: int imm4 = disp / mesize / Matcher::scalable_vector_reg_size(vector_elem_bt); > 122: (masm->*insn)(reg, Assembler::elemType_to_regVariant(vector_elem_bt), pg, Address(base, imm4)); this update is missing in the corresponding `aarch64_vector_ad.m4` file. (that is, there is a mismatch between the ad file and the generated one.) ------------- PR Review: https://git.openjdk.org/jdk/pull/16484#pullrequestreview-1756986982 PR Review Comment: https://git.openjdk.org/jdk/pull/16484#discussion_r1410387066 PR Review Comment: https://git.openjdk.org/jdk/pull/16484#discussion_r1410381293 From sjohanss at openjdk.org Thu Nov 30 09:33:34 2023 From: sjohanss at openjdk.org (Stefan Johansson) Date: Thu, 30 Nov 2023 09:33:34 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48] In-Reply-To: References: <_lEBVrWV8wrVbmhOiu3AAqPJo_xBs718ZtA9V-VSzGM=.253c0ec8-256e-4dee-b125-90be6338e4b8@github.com> Message-ID: On Thu, 30 Nov 2023 06:45:02 GMT, Albert Mingkun Yang wrote: >> I agree that in the case that multiple closures are not active at the same time, we wouldn't have to implement it in this way. However, isn't it possible to have multiple closures active simultaneously, e.g. vm thread, concurrent mark thread, concurrent refine thread? I think to account for the races there, we can only update the `_gc_total_cpu_time_diff` member variable atomically during these closure destructions, and then publish the actual updated `gc_total` counter at manually specified times via `publish_gc_total_cpu_time()`. If we were to call `publish_gc_total_cpu_time` as part of the thread closure, I believe it would be difficult to prevent races when accessing the underlying counter from the various gc-related threads. >> >> Or maybe there is another strategy that I'm not seeing? > > Both `publish_gc_total_cpu_time` and `~ThreadTotalCPUTimeClosure` are called by the vm-thread inside a safepoint, so there shouldn't be any other threads running simultaneously, I believe. Me and Albert just spoke and we do see the problem that two concurrent threads could be executing the closure at the same time. So if having a total counter we need to sync the updates. But when talking we started to question how useful it is to have the gc_total counter. It is just an aggregate of the other gc-counters, but it is out of sync between safepoints. So you will always get a more accurate value by looking at the individual gc-counters. We came to the conclusion that it would probably be easier to drop `gc_total` right now, rather than trying to keep it in sync for all updates to the individual counters. Because having them out of sync doesn't feel like a great option. Are we missing anything or do you agree? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1410394993 From shade at openjdk.org Thu Nov 30 09:37:27 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 30 Nov 2023 09:37:27 GMT Subject: RFR: 8321063: AArch64: Zero build fails after JDK-8320368 In-Reply-To: <2l8_hqkC3ryoU7NfHN25nEECULjrX8qie6XM6f7gehI=.1922c2d5-0850-42d3-a9c1-7391b238aa7a@github.com> References: <2l8_hqkC3ryoU7NfHN25nEECULjrX8qie6XM6f7gehI=.1922c2d5-0850-42d3-a9c1-7391b238aa7a@github.com> Message-ID: On Thu, 30 Nov 2023 09:15:40 GMT, Aleksey Shipilev wrote: > Simple: we need a definition for Zero, because we cannot reach the AArch64-specific one in `src/cpu/aarch64`. > > Additional testing: > - [x] MacOS AArch64 Zero build now passes > - [ ] Extensive build matrix of Zero and Server builds > Oh, you were faster than me. I'll close #16898. Sorry! I thought I would get this off your plate, given how easy it is to build all Zero builds for me :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16897#issuecomment-1833402724 From ihse at openjdk.org Thu Nov 30 09:37:35 2023 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 30 Nov 2023 09:37:35 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v5] In-Reply-To: References: Message-ID: On Thu, 30 Nov 2023 06:39:43 GMT, Xiaohong Gong wrote: >> Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). >> >> SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. >> >> To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. >> >> Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. >> >> [1] https://github.com/openjdk/jdk/pull/3638 >> [2] https://sleef.org/ >> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ >> [4] https://packages.debian.org/bookworm/libsleef3 >> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Rename vmath to sleef in configure This version looks much better, thank you! I guess cflags/SVE_CFLAGS is an okay-ish solution. I'm still not 100% happy though, but it might be due to my limited understanding. Let me write down a few numbered statements and then you can tell me if I'm right or wrong. 1. The aarch64 supports two different SIMD instruction set additions, Neon and SVE. 2. A specific instance of an aarch64 CPU can implement Neon, or SVE, or none of them, but not both. 3. SVE is superior to Neon, and is far more common these days. 4. We would like to ship a single version of libvmath.so, that supports SVE if it happens to be run on a CPU with SVE. 5. THe same version will just use the fallback code that "works" but has lower performance if run on a CPU without SVE (regardless of if it has Neon or not) 6. If libvmath.so is built without SVE support, and is then run on a CPU with SVE, it will "work", but not utilize the SVE functionality, so have degraded performance compared to what we want. 7. To be able to build libvmath.so with SVE support, we need to be able to compile a simple test program using `#include ` and `-march=armv8-a+sve`. If this fails, we cannot build libvmath.so with SVE support. 8. The ability to build with SVE support should only be dependent on the gcc compiler and sysroot header files, and not the SIMD instruction set of the build machine CPU. If all these are correct, then I think the problem is that we just silently ignore if building with SVE fails. Instead, it should cause configure to fail. If, for some reason, we must support build environment that cannot build for SVE, then we need to have a configure flag that allows us to require the presence of SVE building ability, like --enable-sve-support, which will be "auto" by default and thus adapt to the platform, but can be set to on, which will cause a configure fail if the platform does not have SVE compilation abilities. We cannot just silently drop expected functionality depending on the build machine, or at the very least, we must have a way to prevent that from happening. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1833403493 From stuefe at openjdk.org Thu Nov 30 09:41:49 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 30 Nov 2023 09:41:49 GMT Subject: RFR: 8321063: AArch64: Zero build fails after JDK-8320368 In-Reply-To: References: <2l8_hqkC3ryoU7NfHN25nEECULjrX8qie6XM6f7gehI=.1922c2d5-0850-42d3-a9c1-7391b238aa7a@github.com> Message-ID: On Thu, 30 Nov 2023 09:34:33 GMT, Aleksey Shipilev wrote: > > Oh, you were faster than me. I'll close #16898. > > Sorry! I thought I would get this off your plate, given how easy it is to build all Zero builds for me :) No problem at all :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16897#issuecomment-1833409683 From stefank at openjdk.org Thu Nov 30 09:49:38 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 30 Nov 2023 09:49:38 GMT Subject: RFR: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object [v9] In-Reply-To: References: Message-ID: On Wed, 29 Nov 2023 06:38:51 GMT, Stefan Karlsson wrote: >> In the rewrites made for: >> [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` >> >> I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. >> >> The provided tests provoke this assert form: >> * the JNI thread detach code >> * thread dumping with locked monitors, and >> * the JVMTI GetOwnedMonitorInfo API. >> >> While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. >> >> The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. >> >> For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. >> >> Test: the written tests with and without the fix. Tier1-Tier3, so far. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Review comments Thanks all for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16783#issuecomment-1833418484 From stefank at openjdk.org Thu Nov 30 09:49:40 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 30 Nov 2023 09:49:40 GMT Subject: Integrated: 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object In-Reply-To: References: Message-ID: <4KIVsDOaiNSWSjgyJSd8X05mwzTQqsUryyPA3qJbr1A=.10ce1064-bc8d-45f7-83f9-a0a6f1c53b6d@github.com> On Wed, 22 Nov 2023 15:00:29 GMT, Stefan Karlsson wrote: > In the rewrites made for: > [JDK-8318757](https://bugs.openjdk.org/browse/JDK-8318757) `VM_ThreadDump asserts in interleaved ObjectMonitor::deflate_monitor calls` > > I removed the filtering of *owned ObjectMonitors with dead objects*. The reasoning was that you should never have an owned ObjectMonitor with a dead object. I added an assert to check this assumption. It turns out that the assumption was wrong *if* you use JNI to call MonitorEnter and then remove all references to the locked object. > > The provided tests provoke this assert form: > * the JNI thread detach code > * thread dumping with locked monitors, and > * the JVMTI GetOwnedMonitorInfo API. > > While investigating this we've found that the thread detach code becomes more correct when this filter was removed. Previously, the locked monitors never got unlocked because the ObjectMonitor iterator never exposed these monitors to the JNI detach code that unlocks the thread's monitors. That bug caused an ObjectMonitor leak. So, for this case I'm leaving these ObjectMonitors unfiltered so that we don't reintroduce the leak. > > The thread dumping case doesn't tolerate ObjectMonitor with dead objects, so I'm filtering those in the closure that collects ObjectMonitor. Side note: We have discussions about ways to completely rewrite this by letting each thread have thread-local information about JNI held locks. If we have this we could probably throw away the entire ObjectMonitorDump hashtable, and its walk of the `_in_use_list.`. > > For GetOwnedMonitorInfo it is unclear if we should expose these weird ObjectMonitor. If we do, then the users can detect that a thread holds a lock with a dead object, and the code will return NULL as one of the "owned monitors" returned. I don't think that's a good idea, so I'm filtering out these ObjectMonitor for those calls. > > Test: the written tests with and without the fix. Tier1-Tier3, so far. This pull request has now been integrated. Changeset: 0d146361 Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/0d146361f27e1415fab9272de1cdde84c074c718 Stats: 326 lines in 8 files changed: 319 ins; 1 del; 6 mod 8320515: assert(monitor->object_peek() != nullptr) failed: Owned monitors should not have a dead object Reviewed-by: dholmes, ihse, sspitsyn, dcubed ------------- PR: https://git.openjdk.org/jdk/pull/16783 From haosun at openjdk.org Thu Nov 30 09:51:56 2023 From: haosun at openjdk.org (Hao Sun) Date: Thu, 30 Nov 2023 09:51:56 GMT Subject: RFR: 8321063: AArch64: Zero build fails after JDK-8320368 In-Reply-To: <2l8_hqkC3ryoU7NfHN25nEECULjrX8qie6XM6f7gehI=.1922c2d5-0850-42d3-a9c1-7391b238aa7a@github.com> References: <2l8_hqkC3ryoU7NfHN25nEECULjrX8qie6XM6f7gehI=.1922c2d5-0850-42d3-a9c1-7391b238aa7a@github.com> Message-ID: On Thu, 30 Nov 2023 09:15:40 GMT, Aleksey Shipilev wrote: > Simple: we need a definition for Zero, because we cannot reach the AArch64-specific one in `src/cpu/aarch64`. > > Additional testing: > - [x] MacOS AArch64 Zero build now passes > - [ ] Extensive build matrix of Zero and Server builds Thanks for your fix. As I tested locally, Linux AArch64 Zero build passes now. ------------- Marked as reviewed by haosun (Committer). PR Review: https://git.openjdk.org/jdk/pull/16897#pullrequestreview-1757045467 From adinn at openjdk.org Thu Nov 30 09:57:07 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 30 Nov 2023 09:57:07 GMT Subject: RFR: 8320530: has_resolved_ref_index flag not restored after resetting entry [v3] In-Reply-To: References: <4Bi8mWx5pxfYciHnXHUla1X_BzUt_56q8MoRkCYc0dk=.50072528-a5d9-4708-995b-c0bd49f8c74b@github.com> Message-ID: On Wed, 29 Nov 2023 20:01:22 GMT, Matias Saavedra Silva wrote: >> ResolvedMethodEntry::reset_entry() clears the fields in the structure and then restores the constant pool index and the resolved references index if it has one. Currently, the resolved references index is restored without restoring the has_resolved_reference flag used by the structure. >> >> This patch restored the flag with the resolved references index. Verified with tier 1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Corrections and gtest fix Still good ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16769#pullrequestreview-1757056101 From ihse at openjdk.org Thu Nov 30 09:58:21 2023 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 30 Nov 2023 09:58:21 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v5] In-Reply-To: References: Message-ID: On Thu, 30 Nov 2023 06:39:43 GMT, Xiaohong Gong wrote: >> Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). >> >> SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. >> >> To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. >> >> Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. >> >> [1] https://github.com/openjdk/jdk/pull/3638 >> [2] https://sleef.org/ >> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ >> [4] https://packages.debian.org/bookworm/libsleef3 >> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Rename vmath to sleef in configure Okay, now I found a few more of your comments that I missed before. I apologize, the Github PR review UI can be a bit confusing when discussions are taking place in multiple locations. So, here's a revision to my list above: 1. An aach64 CPU can have both Neon and SVE present at the same time. 2. You are assuming that Neon is always present, and what I referred to as the fallback case is in fact using Neon instead of SVE. 4. You would like to split vect_math.c into two parts, e.g. vect_math_neon.c and vect_math_sve.c. 5. You will then, use heuristics in hotspot to determine at runtime if SVE or Neon functionality should be used. Even if SVE is present on the runtime machine, heuristics can chose to use the Neon implementation anyway in some cases. 6. Only vect_math_sve.c. need the -march+sve. 7. The neon part do not need the -march+sve flag, and will fail if built with this flag. (???) The last point seemed very confusing to me. Right now, you can compile the entire file with the -march+sve flag, right? Anyway, it is straightforward to add compiler flags to individual files. You do it like this: $(eval $(call SetupJdkLibrary, BUILD_LIBVMATH, \ NAME := vmath, \ CFLAGS := $(CFLAGS_JDKLIB) $(LIBSLEEF_CFLAGS) -fvisibility=default, \ vect_math_sve.c_CFLAGS := $(SVE_CFLAGS), \ ... ------------- PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1833434870 From sjohanss at openjdk.org Thu Nov 30 09:58:29 2023 From: sjohanss at openjdk.org (Stefan Johansson) Date: Thu, 30 Nov 2023 09:58:29 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v53] In-Reply-To: References: Message-ID: <-GX8bATX2hz3YWgnJbhTNEYbi4t8HxfdhYqBP-ulyGg=.0080d7b0-8e43-4b81-b885-1d4a742048cc@github.com> On Thu, 30 Nov 2023 02:45:45 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Add missing include Few more comments after the latest changes. src/hotspot/share/gc/parallel/parallelScavengeHeap.cpp line 905: > 903: gc_threads_do(&tttc); > 904: > 905: CPUTimeCounters::publish_gc_total_cpu_time(); As I suggested in the other comment, maybe we should not keep the total counter, but if we do we need to make sure the destructor of the closure is run before the call to `publish_gc_total_cpu_time()`. Otherwise we will publish a not yet updated value. src/hotspot/share/runtime/cpuTimeCounters.hpp line 59: > 57: NONCOPYABLE(CPUTimeCounters); > 58: > 59: static CPUTimeCounters* _instance; I would prefer if we made the whole class static and got rid of the instance and just made the `_cpu_time_counters` array static. The only drawback I/we (discussed this with Albert as well) can see is that the memory for the array would not be accounted in NMT, but this array will always be very small so should not be a big problem. Do you see any other concerns? ------------- Changes requested by sjohanss (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/15082#pullrequestreview-1757031262 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1410415366 PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1410410265 From aph at openjdk.org Thu Nov 30 10:01:11 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 30 Nov 2023 10:01:11 GMT Subject: RFR: JDK-8320892: AArch64: Restore FPU control state after JNI [v3] In-Reply-To: References: <-Jv5Xvre3lonydwQ5uzYN3QB8V0VIuORIhM1RtIdW5g=.06167df9-7268-4945-8e18-a04d19ee97e1@github.com> Message-ID: On Thu, 30 Nov 2023 05:42:57 GMT, David Holmes wrote: > I was also going to suggest adding a new flag and creating an alias. The new flag will need a CSR request of course. Given that it's new and it's diagnostic flag I'm a bit surprised at that. I was trying for a quick fix. Anyway, how do you create an alias? I can't see any examples, and I haven't found a way through the maze of twisty `#define` passages. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16851#issuecomment-1833439904 From sjohanss at openjdk.org Thu Nov 30 10:20:27 2023 From: sjohanss at openjdk.org (Stefan Johansson) Date: Thu, 30 Nov 2023 10:20:27 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v53] In-Reply-To: References: Message-ID: On Thu, 30 Nov 2023 02:45:45 GMT, Jonathan Joo wrote: >> 8315149: Add hsperf counters for CPU time of internal GC threads > > Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: > > Add missing include src/hotspot/share/runtime/cpuTimeCounters.cpp line 47: > 45: return "conc_dedup"; > 46: default: > 47: ShouldNotReachHere(); My IDE complained and I guess depending on warning level we might need a return here. Suggestion: ShouldNotReachHere(); return ""; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1410454940 From aph at openjdk.org Thu Nov 30 10:33:12 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 30 Nov 2023 10:33:12 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v5] In-Reply-To: References: Message-ID: On Thu, 30 Nov 2023 09:35:04 GMT, Magnus Ihse Bursie wrote: > This version looks much better, thank you! I guess cflags/SVE_CFLAGS is an okay-ish solution. > > I'm still not 100% happy though, but it might be due to my limited > understanding. Let me write down a few numbered statements and then > you can tell me if I'm right or wrong. > > 1. The aarch64 supports two different SIMD instruction set additions, Neon and SVE. True. > 2. A specific instance of an aarch64 CPU can implement Neon, or SVE, or none of them, but not both. All AArch64 CPUs which support SVE also supprt Neon. SVE is incomplete, and many functions still need Neon. > 3. SVE is superior to Neon, and is far more common these days. No. SVE is partial, and extends Neon, and is still fairly uncommon. > 4. We would like to ship a single version of libvmath.so, that supports SVE if it happens to be run on a CPU with SVE. True. > 5. THe same version will just use the fallback code that "works" > but has lower performance if run on a CPU without SVE (regardless of > if it has Neon or not) True. > 6. If libvmath.so is built without SVE support, and is then run > on a CPU with SVE, it will "work", but not utilize the SVE > functionality, so have degraded performance compared to what we > want. True. > 7. To be able to build libvmath.so with SVE support, we need to > be able to compile a simple test program using `#include > ` and `-march=armv8-a+sve`. If this fails, we cannot > build libvmath.so with SVE support. arm_sve.h is part of GCC. It was added to GCC in 2019. > 8. The ability to build with SVE support should only be > dependent on the gcc compiler and sysroot header files, and not the > SIMD instruction set of the build machine CPU. True. I'm not at all sure it should even depend on GCC. If I were doing this I'd build the interposition library in pure asm. In fact, given that SLEEF is using pure C linkage, defining the library in pure asm is trivial, and we can get rid of all this configury mess. > If all these are correct, then I think the problem is that we just > silently ignore if building with SVE fails. Instead, it should cause > configure to fail. > > If, for some reason, we must support build environment that cannot > build for SVE, then we need to have a configure flag that allows us > to require the presence of SVE building ability, like > --enable-sve-support, which will be "auto" by default and thus adapt > to the platform, but can be set to on, which will cause a configure > fail if the platform does not have SVE compilation abilities. > > We cannot just silently drop expected functionality depending on the > build machine, or at the very least, we must have a way to prevent > that from happening. True. Let's just drop the requirement to require GCC support for SVE. We don't need it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1833489792 From tschatzl at openjdk.org Thu Nov 30 10:37:11 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 30 Nov 2023 10:37:11 GMT Subject: RFR: 8317809: Insertion of free code blobs into code cache can be very slow during class unloading [v3] In-Reply-To: References: <_dcFF70_w7IXSjb6w-HuHCBkPyS3a6NlzejtqdfdYnM=.74e0f9df-eca3-49b0-be68-2d5824c16003@github.com> Message-ID: On Mon, 27 Nov 2023 14:53:34 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - Merge branch 'master' into mergeme >> - iwalulya review, naming >> - 8317809 Insert code blobs in a sorted fashion to exploit the finger-optimization when adding, making this procedure O(n) instead of O(n^2) >> >> Introduce a globally available ClassUnloadingContext that contains common methods pertaining to class and code unloading. >> GCs may use it to efficiently manage unlinked class loader datas and nmethods to allow use of common methods (unlink/merge). >> >> The steps typically are registering a new to be unlinked CLD/nmethod, and then purge its memory later. STW collectors perform >> this work in one big chunk taking the CodeCache_lock, for the entire duration, while concurrent collectors lock/unlock for every >> insertion to allow for concurrent users for the lock to progress. >> >> Some care has been taken to stay consistent with an "unloading = unlinking + purge" scheme; however particularly the existing >> CLD handling API (still) mixes unlinking and purging in its CLD::unload() call. To simplify this change that is mostly geared >> towards separating nmethod unlinking from purging, to make code blob freeing O(n) instead of O(n^2). >> >> Upcoming changes will >> * separate nmethod unregistering from nmethod purging to allow doing that in bulk (for the STW collectors); that can significantly >> reduce code purging time for the STW collectors. >> * better name the second stage of unlinking (called "cleaning" throughout, e.g. the work done in `G1CollectedHeap::complete_cleaning`) >> * untangle CLD unlinking and what's called "cleaning" now to allow moving more stuff into the second unlinking stage for better >> parallelism >> * G1: move some signifcant tasks from the remark pause to concurrent (unregistering nmethods, freeing code blobs and cld/metaspace purging) >> * Maybe move Serial/Parallel GC metaspace purging closer to other unlinking/purging code to keep things local and allow easier logging. >> - Only run test case on debug VMs, sufficient >> - 8320331 g1 full gc "during" verification accesses half-unloaded metadata > > src/hotspot/share/gc/shared/classUnloadingContext.hpp line 63: > >> 61: }; >> 62: >> 63: class DefaultClassUnloadingContext : public ClassUnloadingContext { > > I don't understand why they need to be two classes, even after reading "These are the reason for the class hierarchy for...". The reference to future/other PR(s) in the description doesn't really help -- it's unclear what is *necessary* for the current PR and what is preparation for future PR(s). The base class is unnecessary for this change, but very nice to have for future changes. I'll just merge them for now, and separate them again later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16759#discussion_r1410476983 From shade at openjdk.org Thu Nov 30 10:41:09 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 30 Nov 2023 10:41:09 GMT Subject: RFR: 8320924: Improve heap dump performance by optimizing archived object checks [v2] In-Reply-To: References: <8Ek_2iD6dG8MJE0AEHlzxcD4GDCYYEmKeVoBMO4PBF8=.4352c26a-76b9-46ae-af3f-8666821c9a9c@github.com> Message-ID: On Wed, 29 Nov 2023 17:32:19 GMT, Aleksey Shipilev wrote: >> Profiling heap dumping code reveals another simple issue: `mask_dormant_archived_object` on dumping hotpath takes quite a bit of time. We can reflow it for better inlineability, throwing out the non-essential parts into cold method. There is also no reason to peek into java mirror with (default) keep-alive, if we only use the result for null-check. >> >> Example improvements on Mac M1: >> >> >> % for I in `seq 1 5`; do build/macosx-aarch64-server-release/images/jdk/bin/java -XX:+UseParallelGC -XX:+HeapDumpAfterFullGC -Xms8g -Xmx8g HeapDump.java 2>&1 | grep created; rm *.hprof; done >> >> # Before >> Heap dump file created [1897307608 bytes in 1.584 secs] >> Heap dump file created [1897308278 bytes in 1.439 secs] >> Heap dump file created [1897308508 bytes in 1.460 secs] >> Heap dump file created [1897308505 bytes in 1.423 secs] >> Heap dump file created [1897308554 bytes in 1.414 secs] >> >> # After >> Heap dump file created [1897307648 bytes in 1.509 secs] >> Heap dump file created [1897308498 bytes in 1.281 secs] >> Heap dump file created [1897308554 bytes in 1.282 secs] >> Heap dump file created [1897308512 bytes in 1.263 secs] >> Heap dump file created [1897308554 bytes in 1.270 secs] >> >> >> ...which is about +12% faster heap dump. >> >> I also eyeballed the generated code and saw `mask_dormant_archived_object` fully inlined at least on x86_64. > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Switch logging: debug -> trace If there are no other comments, I would integrate this soon. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16863#issuecomment-1833502676 From eastigeevich at openjdk.org Thu Nov 30 10:52:07 2023 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 30 Nov 2023 10:52:07 GMT Subject: RFR: 8321025: Enable Neoverse N1 optimizations for Neoverse V2 In-Reply-To: <-_8kbpd5aGwBQ07LvWn-6G3g6Jh3qX-Y0ZlPldmsauM=.deda3a19-6357-4402-893a-066466eccfdf@github.com> References: <-_8kbpd5aGwBQ07LvWn-6G3g6Jh3qX-Y0ZlPldmsauM=.deda3a19-6357-4402-893a-066466eccfdf@github.com> Message-ID: On Wed, 29 Nov 2023 17:44:38 GMT, Evgeny Astigeevich wrote: > UseCryptoPmullForCRC32 According to the V2 optimization guide `pmull` is the same in V2 as in V1. So `UseCryptoPmullForCRC32` should be enabled for V2 as well. I'd do this in a separate PR to allow this change to be backported to jdk11u. There is no `UseCryptoPmullForCRC32` in jdk11u. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16887#issuecomment-1833519386 From shade at openjdk.org Thu Nov 30 10:55:15 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 30 Nov 2023 10:55:15 GMT Subject: RFR: 8321063: AArch64: Zero build fails after JDK-8320368 In-Reply-To: <2l8_hqkC3ryoU7NfHN25nEECULjrX8qie6XM6f7gehI=.1922c2d5-0850-42d3-a9c1-7391b238aa7a@github.com> References: <2l8_hqkC3ryoU7NfHN25nEECULjrX8qie6XM6f7gehI=.1922c2d5-0850-42d3-a9c1-7391b238aa7a@github.com> Message-ID: On Thu, 30 Nov 2023 09:15:40 GMT, Aleksey Shipilev wrote: > Simple: we need a definition for Zero, because we cannot reach the AArch64-specific one in `src/cpu/aarch64`. > > Additional testing: > - [x] MacOS AArch64 Zero build now passes > - [x] Extensive build matrix of Zero and Server builds Extensive matrix of Zero/Server builds passes for me locally. I am integrating now under triviality rule. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16897#issuecomment-1833520155 From shade at openjdk.org Thu Nov 30 10:55:15 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 30 Nov 2023 10:55:15 GMT Subject: Integrated: 8321063: AArch64: Zero build fails after JDK-8320368 In-Reply-To: <2l8_hqkC3ryoU7NfHN25nEECULjrX8qie6XM6f7gehI=.1922c2d5-0850-42d3-a9c1-7391b238aa7a@github.com> References: <2l8_hqkC3ryoU7NfHN25nEECULjrX8qie6XM6f7gehI=.1922c2d5-0850-42d3-a9c1-7391b238aa7a@github.com> Message-ID: On Thu, 30 Nov 2023 09:15:40 GMT, Aleksey Shipilev wrote: > Simple: we need a definition for Zero, because we cannot reach the AArch64-specific one in `src/cpu/aarch64`. > > Additional testing: > - [x] MacOS AArch64 Zero build now passes > - [x] Extensive build matrix of Zero and Server builds This pull request has now been integrated. Changeset: 8b102ed6 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/8b102ed6b4f595f07c0e741328f5fcac65320461 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8321063: AArch64: Zero build fails after JDK-8320368 Reviewed-by: stuefe, haosun ------------- PR: https://git.openjdk.org/jdk/pull/16897 From epeter at openjdk.org Thu Nov 30 10:59:06 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 Nov 2023 10:59:06 GMT Subject: RFR: 8306767: Concurrent repacking of extra data in MethodData is potentially unsafe In-Reply-To: References: Message-ID: On Tue, 28 Nov 2023 06:23:29 GMT, Emanuel Peter wrote: > I'm making sure that `allocate_bci_to_data` is only called when holding the `extra_data_lock`, so that no concurrent calls of it can ever occur. > > Testing: tier1-3 and stress. Update: The current fix is not sufficient. I'll summarize my state of knowledge: We have these accesses to the extra data: - read / update of a record (race conditions for counter updates are acceptable, they just make the result imprecise but not incorrect). - allocating a new record - cleaning out stale records I have had conversations with @fisk and @rwestrel about which can happen concurrently. There may have been a time when cleaning only happened during SafePoint, and hence there probably was no concurrent read / allocation with it. But this has changed, certainly with ZGC, but also with `WB_ClearMethodState` (call clearing at any time). The data structure uses the concept of "monotonicity" to ensure that reads are safe, even with concurrent allocations. The idea is that new records are only appended (into the section with all `DataLayout::arg_info_data_tag`). https://github.com/openjdk/jdk/blob/a5ccd3beaf069bdfe81736f6c62e5b4b9e18b5fe/src/hotspot/share/oops/methodData.cpp#L1438-L1445 We wrap the extra data in `BitData` or `SpeculativeTrapData` objects, which are nothing but pointers to the extra data array/buffer. Of course concurrent allocations have to be made mutually exclusive, hence we need the `extra_data_lock`. And @tkrodriguez has found an instance where a lock was not taken, hence this bug report. The problem now gets more complicated with cleaning. It removes records, and shifts other records "to the left" to compact the extra data. This certainly breaks monotonicity. This would be ok if there could be no concurrent read/update or allocation. But that seems certainly not to hold anymore now. https://github.com/openjdk/jdk/blob/a5ccd3beaf069bdfe81736f6c62e5b4b9e18b5fe/src/hotspot/share/oops/methodData.cpp#L1737-L1756 We can protect all of these with locks. But that still is not good enough if there are concurrent reads, which make assumptions about the offsets in the extra data array/buffer. Imagine this: Thread A: starts iterating over the extra data. Thread B: cleans out some stale records, and shifts around things. Thread A: continues iterating, with offsets that are now possibly not correct any more. I have another concern about what we do today during allocation: https://github.com/openjdk/jdk/blob/a5ccd3beaf069bdfe81736f6c62e5b4b9e18b5fe/src/hotspot/share/oops/methodData.cpp#L1503-L1532 We take the lock, and append a new record. But is this writing done safely, such that a concurrent read would be safe? Because currently it must be. We do `set_header`, which does a `u8` store to the extra data, including the tag. But the payload object is stored separately. I wonder what here is guaranteed to be atomic, maybe the header is stored atomically. But a concurrent read could certainly find complete garbage instead of the (not yet written) payload object. What if a read thus sees the tag for a `SpeculativeTrapData`, and we read its `data->method()` field, and it is garbage but not nullptr? In the light of this, I see 2 alternatives: 1. add locks everywhere, and make sure no `ProfileData* pdata` pointer leaks out of a lock. 2. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16840#issuecomment-1833529561 From aph at openjdk.org Thu Nov 30 11:16:12 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 30 Nov 2023 11:16:12 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v5] In-Reply-To: References: Message-ID: <9VeMdTAJPaPZDg9ZW7FVJOf9XGl4gGqAS-2g8SFc9c0=.36792cd5-66d9-4abc-ba0c-aee3478627f4@github.com> On Thu, 30 Nov 2023 06:39:43 GMT, Xiaohong Gong wrote: >> Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). >> >> SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. >> >> To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. >> >> Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. >> >> [1] https://github.com/openjdk/jdk/pull/3638 >> [2] https://sleef.org/ >> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ >> [4] https://packages.debian.org/bookworm/libsleef3 >> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Rename vmath to sleef in configure make/autoconf/lib-sleef.m4 line 56: > 54: AC_MSG_CHECKING([for the specified LIBSLEEF]) > 55: if test -e ${with_libsleef}/lib/libsleef.so && > 56: test -e ${with_libsleef}/include/sleef.h; then This fails on my system because libsleef is in `/usr/local/lib64/`. This is the correct place to look according to the Linux FHS. You should hard-code `/lib` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16234#discussion_r1410520870 From mli at openjdk.org Thu Nov 30 11:36:16 2023 From: mli at openjdk.org (Hamlin Li) Date: Thu, 30 Nov 2023 11:36:16 GMT Subject: RFR: 8318217: RISC-V: C2 VectorizedHashCode [v7] In-Reply-To: References: Message-ID: On Wed, 29 Nov 2023 19:25:21 GMT, Yuri Gaevsky wrote: >> Hello All, >> >> Please review these changes to support _vectorizedHashCode intrinsic on >> RISC-V platform. The patch adds the "scalar" code for the intrinsic without >> usage of any RVV instruction but provides manual unrolling of the appropriate >> loop. The code with usage of RVV instruction could be added as follow-up of >> the patch or independently. >> >> Thanks, >> -Yuri Gaevsky >> >> P.S. My OCA has been accepted recently (ygaevsky). >> >> ### Correctness checks >> >> Testing: tier1 tests successfully passed on a RISC-V StarFive JH7110 board with Linux. >> >> ### Performance results (the numbers for non-ints are similar) >> >> #### StarFive JH7110 board: >> >> >> ArraysHashCode: without intrinsic with intrinsic >> ------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> ------------------------------------------------------------------------------- >> multiints 0 avgt 30 2.658 ? 0.001 2.661 ? 0.004 ns/op >> multiints 1 avgt 30 4.881 ? 0.011 4.892 ? 0.015 ns/op >> multiints 2 avgt 30 16.109 ? 0.041 10.451 ? 0.075 ns/op >> multiints 3 avgt 30 14.873 ? 0.068 11.753 ? 0.024 ns/op >> multiints 4 avgt 30 17.283 ? 0.078 13.176 ? 0.044 ns/op >> multiints 5 avgt 30 19.691 ? 0.136 14.723 ? 0.046 ns/op >> multiints 6 avgt 30 21.727 ? 0.166 15.463 ? 0.124 ns/op >> multiints 7 avgt 30 23.790 ? 0.126 18.298 ? 0.059 ns/op >> multiints 8 avgt 30 23.527 ? 0.116 18.267 ? 0.046 ns/op >> multiints 9 avgt 30 27.981 ? 0.303 20.453 ? 0.069 ns/op >> multiints 10 avgt 30 26.947 ? 0.215 20.541 ? 0.051 ns/op >> multiints 50 avgt 30 95.373 ? 0.588 69.238 ? 0.208 ns/op >> multiints 100 avgt 30 177.109 ? 0.525 137.852 ? 0.417 ns/op >> multiints 200 avgt 30 341.074 ? 1.363 296.832 ? 0.725 ns/op >> multiints 500 avgt 30 847.993 ? 1.713 752.415 ? 1.918 ns/op >> multiints 1000 avgt 30 1610.199 ? 5.424 1426.112 ? 3.407 ns/op >> multiints 10000 avgt 30 16234.260 ? 26.789 14447.936 ? 26.345 ns/op >> multiints 100000 avgt 30 170726.025 ? 184.003 152587.649 ? 381.964 ns/op >> ---------------------------------------... > > Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: > > Use concrete registers for input parameters. Thanks for updating. Looks good to me. Just one minor comment. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1479: > 1477: case T_SHORT: BLOCK_COMMENT("arrays_hashcode(short) {"); break; > 1478: case T_INT: BLOCK_COMMENT("arrays_hashcode(int) {"); break; > 1479: default: BLOCK_COMMENT("arrays_hashcode {"); break; Is this `BLOCK_COMMENT("arrays_hashcode {"); break;` necessary? ------------- PR Review: https://git.openjdk.org/jdk/pull/16629#pullrequestreview-1757243790 PR Review Comment: https://git.openjdk.org/jdk/pull/16629#discussion_r1410542344 From eastigeevich at openjdk.org Thu Nov 30 11:36:20 2023 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 30 Nov 2023 11:36:20 GMT Subject: RFR: 8321025: Enable Neoverse N1 optimizations for Neoverse V2 In-Reply-To: References: Message-ID: On Wed, 29 Nov 2023 16:16:12 GMT, Evgeny Astigeevich wrote: > As Arm Neoverse V2 will benefit from the same optimizations as Neoverse N1 does, it should have OnSpinWaitInst/OnSpinWaitInstCount defaults set to "isb"/1 and UseSIMDForMemoryOps default set to true. > This patch sets these flags accordingly for the V2 architecture. Created https://bugs.openjdk.org/browse/JDK-8321105 ------------- PR Comment: https://git.openjdk.org/jdk/pull/16887#issuecomment-1833586376 From eastigeevich at openjdk.org Thu Nov 30 11:36:21 2023 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 30 Nov 2023 11:36:21 GMT Subject: RFR: 8321025: Enable Neoverse N1 optimizations for Neoverse V2 In-Reply-To: <1SRCHh28tB8Q5X2o7SWeW41LO4IqbiYpzyNKcMgOEvY=.02953e71-2114-4c39-90cc-1060437e4d0d@github.com> References: <-_8kbpd5aGwBQ07LvWn-6G3g6Jh3qX-Y0ZlPldmsauM=.deda3a19-6357-4402-893a-066466eccfdf@github.com> <1SRCHh28tB8Q5X2o7SWeW41LO4IqbiYpzyNKcMgOEvY=.02953e71-2114-4c39-90cc-1060437e4d0d@github.com> Message-ID: On Wed, 29 Nov 2023 17:54:50 GMT, Aleksey Shipilev wrote: >>> Okay in principle, but I have a question, there is another block below: >>> >>> ``` >>> // Neoverse V1 >>> if (_cpu == CPU_ARM && model_is(0xd40)) { >>> if (FLAG_IS_DEFAULT(UseCryptoPmullForCRC32)) { >>> FLAG_SET_DEFAULT(UseCryptoPmullForCRC32, true); >>> } >>> } >>> ``` >>> >>> Should it be enabled for V2 as well? >> >> Good catch. I'll check whether V2 has the same or better `pmull` as V1. > >> Good catch. I'll check whether V2 has the same or better `pmull` as V1. > > Although, it would not be "enabling N1 optos for V2", it would be "enabling V1 optos for V2" :) > Your call if you just want to make that change separately. Thanks @shipilev @nick-arm for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16887#issuecomment-1833588447 From eastigeevich at openjdk.org Thu Nov 30 11:36:22 2023 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 30 Nov 2023 11:36:22 GMT Subject: Integrated: 8321025: Enable Neoverse N1 optimizations for Neoverse V2 In-Reply-To: References: Message-ID: On Wed, 29 Nov 2023 16:16:12 GMT, Evgeny Astigeevich wrote: > As Arm Neoverse V2 will benefit from the same optimizations as Neoverse N1 does, it should have OnSpinWaitInst/OnSpinWaitInstCount defaults set to "isb"/1 and UseSIMDForMemoryOps default set to true. > This patch sets these flags accordingly for the V2 architecture. This pull request has now been integrated. Changeset: c9d15f7d Author: Evgeny Astigeevich URL: https://git.openjdk.org/jdk/commit/c9d15f7d5ee616bf48d85647ee504714ac5fafc2 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod 8321025: Enable Neoverse N1 optimizations for Neoverse V2 Reviewed-by: ngasson, shade ------------- PR: https://git.openjdk.org/jdk/pull/16887 From tschatzl at openjdk.org Thu Nov 30 11:38:28 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 30 Nov 2023 11:38:28 GMT Subject: RFR: 8317809: Insertion of free code blobs into code cache can be very slow during class unloading [v4] In-Reply-To: <_dcFF70_w7IXSjb6w-HuHCBkPyS3a6NlzejtqdfdYnM=.74e0f9df-eca3-49b0-be68-2d5824c16003@github.com> References: <_dcFF70_w7IXSjb6w-HuHCBkPyS3a6NlzejtqdfdYnM=.74e0f9df-eca3-49b0-be68-2d5824c16003@github.com> Message-ID: <2_ONmN3qxsdTIEJMbQhE82nBn10l_RnZm5-DZAmQn2I=.9ad49c74-3721-4299-8a9e-b8c1973eb494@github.com> > Insert code blobs in a sorted fashion to exploit the finger-optimization when adding, making this procedure O(n) instead of O(n^2) > > Introduces a globally available ClassUnloadingContext that contains common methods pertaining to class and code unloading. GCs may use it to efficiently manage unlinked class loader datas and nmethods to allow use of common methods (unlink/merge). > > The steps typically are registering a new to be unlinked CLD/nmethod, and then purge its memory later. STW collectors perform this work in one big chunk taking the CodeCache_lock, for the entire duration, while concurrent collectors lock/unlock for every insertion to allow for concurrent users for the lock to progress. > > Some care has been taken to stay consistent with an "unloading = unlinking + purge" scheme; however particularly the existing CLD handling API (still) mixes unlinking and purging in its CLD::unload() call. To simplify this change that is mostly geared towards separating nmethod unlinking from purging, to make code blob freeing O(n) instead of O(n^2). > > Upcoming changes will > * separate nmethod unregistering from nmethod purging to allow doing that in bulk (for the STW collectors); that can significantly reduce code purging time for the STW collectors. > * better name the second stage of unlinking (called "cleaning" throughout, e.g. the work done in `G1CollectedHeap::complete_cleaning`) > * untangle CLD unlinking and what's called "cleaning" now to allow moving more stuff into the second unlinking stage for better parallelism > * G1: move some significant tasks from the remark pause to concurrent (unregistering nmethods, freeing code blobs and cld/metaspace purging) > * Maybe move Serial/Parallel GC metaspace purging closer to other unlinking/purging code to keep things local and allow easier logging. > > These are the reason for the class hierarchy for `ClassUnloadingContext`: the goal is to ultimately have about this phasing (for G1): > 1. collect all dead CLDs, using the `register_unloading_class_loader_data` method *only* > 2. parallelize the stuff in `ClassLoaderData::unload()` in one way or another, adding them to the `complete_cleaning` (parallel) phase. > 3. `purge_nmethods`, `free_code_blobs` and the `remove_unlinked_nmethods_from_code_root_set` (from JDK-8317007) will be concurrent. > > Particularly the split of `SystemDictionary::do_unloading` into "only" traversing the CLDs to find the dead ones and then in parallel process them in 2. above warrants a separate `ClassUnloadingCo... Thomas Schatzl has updated the pull request incrementally with three additional commits since the last revision: - remove trailing whitespace - fix indentation after recent commit - Address ayang/iwalulya review comments, remove inheritance in ClassUnloadingContext for now as unnecessary for this change, use iterators, other review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16759/files - new: https://git.openjdk.org/jdk/pull/16759/files/e8c8477a..801426a2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16759&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16759&range=02-03 Stats: 103 lines in 9 files changed: 19 ins; 40 del; 44 mod Patch: https://git.openjdk.org/jdk/pull/16759.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16759/head:pull/16759 PR: https://git.openjdk.org/jdk/pull/16759 From aph at openjdk.org Thu Nov 30 11:51:11 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 30 Nov 2023 11:51:11 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v5] In-Reply-To: References: Message-ID: <8SBUvWGDLtQmwYPRBDeUkeuq4pf2nJfKfDY5rzZODFU=.3f1cc0ff-02e6-4a4e-9425-5ffccc9cbc8f@github.com> On Thu, 30 Nov 2023 06:39:43 GMT, Xiaohong Gong wrote: >> Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). >> >> SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. >> >> To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. >> >> Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. >> >> [1] https://github.com/openjdk/jdk/pull/3638 >> [2] https://sleef.org/ >> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ >> [4] https://packages.debian.org/bookworm/libsleef3 >> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Rename vmath to sleef in configure Do this, but with the name vect_math.S. Don't use SLEEF headers in the build. I think you can do this with no build-time dependency on SLEEF at all if you load the library lazily at runtime. [vect_math.S.txt](https://github.com/openjdk/jdk/files/13512306/vect_math.S.txt) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1833610537 From fyang at openjdk.org Thu Nov 30 12:13:08 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 30 Nov 2023 12:13:08 GMT Subject: RFR: 8319801: Recursive lightweight locking: aarch64 implementation [v4] In-Reply-To: References: Message-ID: On Tue, 21 Nov 2023 15:25:44 GMT, Axel Boldt-Christmas wrote: >> Implements the aarch64 port of JDK-8319796. >> >> There are two major parts for the port implementation. The C2 part, and the part shared by the interpreter, C1 and the native call wrapper. >> >> The biggest change for both parts is that we check the lock stack first and if it is a recursive lightweight [un]lock and in that case simply pop/push and finish successfully. >> >> Only if the recursive lightweight [un]lock fails does it look at the mark word. >> >> For the shared part if it is an unstructured exit, the monitor is inflated or the mark word transition fails it calls into the runtime. >> >> The C2 operates under a few more assumptions, that the locking is structured and balanced. This means that some checks can be elided. >> >> First this means that in C2 unlock if the obj is not on the top of the lock stack, it must be inflated. And reversely if we reach the inflated C2 unlock the obj is not on the lock stack. This second property makes it possible to avoid reading the owner (and checking if it is anonymous). Instead it can either just do an un-contended unlock by writing null to the owner, or if contention happens, simply write the thread to the owner and jump to the runtime. >> >> The aarch64 C2 port tries to avoid stronger memory semantics where ever possible. In C2 lock it first does a relaxed load of the mark word to check for inflation. Both lock and unlock uses a load/store exclusive register pair to transition the mark word. > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge remote-tracking branch 'upstream_jdk/pr/16606' into JDK-8319801 > - Merge remote-tracking branch 'upstream_jdk/pr/16606' into JDK-8319801 > - Merge remote-tracking branch 'upstream_jdk/pr/16606' into JDK-8319801 > - 8319801: Recursive lightweight locking: aarch64 implementation > - Cleanup: C2 fast_lock/fast_unlock aarch64 src/hotspot/cpu/aarch64/aarch64.ad line 16430: > 16428: > 16429: ins_cost(5 * INSN_COST); > 16430: format %{ "fastlock $object,$box\t! kills $tmp,$tmp2,tmp3" %} Nit: seems that this should be `$tmp3` instead of `tmp3`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16608#discussion_r1410582772 From vitaly.provodin at jetbrains.com Thu Nov 30 12:47:33 2023 From: vitaly.provodin at jetbrains.com (Vitaly Provodin) Date: Thu, 30 Nov 2023 19:47:33 +0700 Subject: RFR: 8267532: C2: Profile and prune untaken exception handlers [v10] In-Reply-To: <77x_EsDnKYUGRzIe6pvIk3t-7y5LjhSkxfR6xcVmH2s=.39b90785-1864-48bf-8da1-62246118e353@github.com> References: <77x_EsDnKYUGRzIe6pvIk3t-7y5LjhSkxfR6xcVmH2s=.39b90785-1864-48bf-8da1-62246118e353@github.com> Message-ID: <533C6C07-0E36-4232-B09B-399EBC52C3D2@jetbrains.com> Hi all, With the latest changes I got the following error =======================8<---------------------- ./src/hotspot/share/ci/ciMethodData.cpp:477:1: error: non-void function does not return a value in all control paths [-Werror,-Wreturn-type] } ^ 1 error generated. make[3]: *** [/opt/teamcity-agent/work/602288ed8ca22f30/build/macosx-aarch64-server-release/hotspot/variant-server/libjvm/objs/ciMethodData.o] Error 1 make[3]: *** Waiting for unfinished jobs.... make[2]: *** [hotspot-server-libs] Error 2 make[2]: *** Waiting for unfinished jobs?. =======================8 On 27. Nov 2023, at 17:04, Jorn Vernee wrote: > > On Thu, 23 Nov 2023 15:31:28 GMT, Jorn Vernee wrote: > >>> The issue is essentially that for the Java try-with-resource construct, javac generates multiple calls to `close(`) the resource. One of those calls is inside the hidden exception handler of the try block. The issue for us is that typically the exception handler is never entered (since no exception is thrown), however we don't profile exception handlers at the moment, so the block is not pruned. C2 doesn't inline the `close()` call in the handler due to low call site frequency. As a result, the receiver of that call escapes and can not be scalar replaced, which then leads to a loss in performance. >>> >>> There has been some discussion on the JBS issue that this could be fixed by profiling catch blocks. And another suggestion that partial escape analysis could help here to prevent the object from escaping. But, I think there are other benefits to being able to prune dead catch blocks, such as general reduction in code size, and other optimizations being possible by dead code being eliminated. So, I've implemented catch block profiling + pruning in this patch. >>> >>> The implementation is essentially very straightforward: we allocate an extra bit of profiling data for each >>> exception handler of a method in the `MethodData` for that method (which holds all the profiling >>> data). Then when looking up the exception handler after an exception is thrown, we mark the >>> exception handler as entered. When C2 parses the exception handler block, and it sees that it has >>> never been entered, we emit an uncommon trap instead. >>> >>> I've also cleaned up the handling of profiling data sections a bit. After adding the extra section of data to MethodData, I was seeing several crashes when ciMethodData was used. The underlying issue seemed to be that the offset of the parameter data was computed based on the total data size - parameter data size (which doesn't work if we add an additional section for exception handler data). I've re-written the code around this a bit to try and prevent issues in the future. Both MethodData and ciMethodData now track offsets of parameter data and exception handler data, and the size of the each data section is derived from the offsets. >>> >>> Finally, there was an assert firing in `freeze_internal` in `continuationFreezeThaw.cpp`: >>> >>> assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), >>> "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int... >> >> Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: >> >> - add interpreter profiling specific test cases >> - rename ex_handler -> exception_handler > > Another round of tier 1 - 8 testing came back clean. I'm planning to integrate the patch tomorrow. > > ------------- > > PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1827516420 -------------- next part -------------- An HTML attachment was scrubbed... URL: From coleenp at openjdk.org Thu Nov 30 13:13:23 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 30 Nov 2023 13:13:23 GMT Subject: RFR: 8313816: Accessing jmethodID might lead to spurious crashes [v11] In-Reply-To: References: Message-ID: On Thu, 30 Nov 2023 06:06:48 GMT, David Holmes wrote: >> Thanks everyone involved in reviewing this PR! You were awesome and helped me drive the PR to better place than it started! > > @jbachorik this should not have been integrated yet! You only have one review not the required two for hotspot changes. Further your one Reviewer didn't even review the final version of the change! I did review the final version of the change. Can @dholmes-ora or @tstuefe review again and we'll open CRs for anything missed? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16662#issuecomment-1833758619 From eosterlund at openjdk.org Thu Nov 30 13:26:14 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 30 Nov 2023 13:26:14 GMT Subject: RFR: 8321066: Multiple JFR tests have started failing Message-ID: Before integrating https://bugs.openjdk.org/browse/JDK-8310644 we added a seemingly innocent NoSafepointVerifier in some code that really shouldn't safepoint and reran tier1 only. What nobody anticipated is that the JFR dumping during crash reporting code probably introduced by https://bugs.openjdk.org/browse/JDK-8233706 performs safepoint polls from inside the crash reporter. These JFR tests try to provoke a crash and check that the JFR recording gets dumped. But we crash during crash reporting in debug builds, because the NSV doesn't like the safepoint polls inside the crash reporter. Now while this crash reporting code can seemingly make any NSV in the JVM fail if you get a crash there, and even worse, accept safepoints and do GC while crash reporting from totally safepoint unsafe code and what not, this change merely removes the new NSV from the fix that introduced the test failures. But maybe going forward we shouldn't poll for safepoints in the crash reporter. I tested that the reported test failures fail deterministically without this patch and do not fail with this patch. I also re-ran tier1 just to be safe. ------------- Commit messages: - 8321066: Multiple JFR tests have started failing Changes: https://git.openjdk.org/jdk/pull/16900/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16900&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8321066 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16900.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16900/head:pull/16900 PR: https://git.openjdk.org/jdk/pull/16900 From mcimadamore at openjdk.org Thu Nov 30 13:26:15 2023 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Thu, 30 Nov 2023 13:26:15 GMT Subject: RFR: 8321066: Multiple JFR tests have started failing In-Reply-To: References: Message-ID: On Thu, 30 Nov 2023 13:19:36 GMT, Erik ?sterlund wrote: > Before integrating https://bugs.openjdk.org/browse/JDK-8310644 we added a seemingly innocent NoSafepointVerifier in some code that really shouldn't safepoint and reran tier1 only. > > What nobody anticipated is that the JFR dumping during crash reporting code probably introduced by https://bugs.openjdk.org/browse/JDK-8233706 performs safepoint polls from inside the crash reporter. These JFR tests try to provoke a crash and check that the JFR recording gets dumped. But we crash during crash reporting in debug builds, because the NSV doesn't like the safepoint polls inside the crash reporter. > > Now while this crash reporting code can seemingly make any NSV in the JVM fail if you get a crash there, and even worse, accept safepoints and do GC while crash reporting from totally safepoint unsafe code and what not, this change merely removes the new NSV from the fix that introduced the test failures. But maybe going forward we shouldn't poll for safepoints in the crash reporter. > > I tested that the reported test failures fail deterministically without this patch and do not fail with this patch. I also re-ran tier1 just to be safe. Marked as reviewed by mcimadamore (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16900#pullrequestreview-1757453633 From jvernee at openjdk.org Thu Nov 30 13:26:16 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Thu, 30 Nov 2023 13:26:16 GMT Subject: RFR: 8321066: Multiple JFR tests have started failing In-Reply-To: References: Message-ID: On Thu, 30 Nov 2023 13:19:36 GMT, Erik ?sterlund wrote: > Before integrating https://bugs.openjdk.org/browse/JDK-8310644 we added a seemingly innocent NoSafepointVerifier in some code that really shouldn't safepoint and reran tier1 only. > > What nobody anticipated is that the JFR dumping during crash reporting code probably introduced by https://bugs.openjdk.org/browse/JDK-8233706 performs safepoint polls from inside the crash reporter. These JFR tests try to provoke a crash and check that the JFR recording gets dumped. But we crash during crash reporting in debug builds, because the NSV doesn't like the safepoint polls inside the crash reporter. > > Now while this crash reporting code can seemingly make any NSV in the JVM fail if you get a crash there, and even worse, accept safepoints and do GC while crash reporting from totally safepoint unsafe code and what not, this change merely removes the new NSV from the fix that introduced the test failures. But maybe going forward we shouldn't poll for safepoints in the crash reporter. > > I tested that the reported test failures fail deterministically without this patch and do not fail with this patch. I also re-ran tier1 just to be safe. Marked as reviewed by jvernee (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16900#pullrequestreview-1757455941 From thartmann at openjdk.org Thu Nov 30 13:33:11 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 30 Nov 2023 13:33:11 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v8] In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 18:35:25 GMT, Volodymyr Paprotski wrote: >> Splitting vblendvp[sd] into boolean operations is bit faster on ECore, get up to 30% gain >> >> >> =============== BEFORE =============== >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorSignum.floatSignum 256 avgt 3 77.766 ? 0.049 ns/op >> VectorSignum.floatSignum 512 avgt 3 154.889 ? 0.242 ns/op >> VectorSignum.floatSignum 1024 avgt 3 306.130 ? 0.605 ns/op >> VectorSignum.floatSignum 2048 avgt 3 609.965 ? 0.927 ns/op >> VectorSignum.doubleSignum 256 avgt 3 151.874 ? 1.748 ns/op >> VectorSignum.doubleSignum 512 avgt 3 303.080 ? 0.310 ns/op >> VectorSignum.doubleSignum 1024 avgt 3 607.517 ? 0.597 ns/op >> VectorSignum.doubleSignum 2048 avgt 3 1214.282 ? 1.834 ns/op >> Benchmark Mode Cnt Score Error Units >> MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op >> MaxMinOptimizeTest.dMax avgt 3 137.334 ? 0.128 us/op >> MaxMinOptimizeTest.dMin avgt 3 137.160 ? 0.465 us/op >> MaxMinOptimizeTest.dMul avgt 3 77.231 ? 0.051 us/op >> MaxMinOptimizeTest.fAdd avgt 3 77.165 ? 0.003 us/op >> MaxMinOptimizeTest.fMax avgt 3 107.428 ? 1.501 us/op >> MaxMinOptimizeTest.fMin avgt 3 107.186 ? 0.022 us/op >> MaxMinOptimizeTest.fMul avgt 3 77.164 ? 0.012 us/op >> >> =============== AFTER =============== >> Benchmark (SIZE) Mode Cnt Score Error Units >> VectorSignum.floatSignum 256 avgt 3 61.816 ? 1.980 ns/op >> VectorSignum.floatSignum 512 avgt 3 117.251 ? 0.052 ns/op >> VectorSignum.floatSignum 1024 avgt 3 231.356 ? 0.397 ns/op >> VectorSignum.floatSignum 2048 avgt 3 458.904 ? 0.774 ns/op >> VectorSignum.doubleSignum 256 avgt 3 121.449 ? 0.184 ns/op >> VectorSignum.doubleSignum 512 avgt 3 241.662 ? 0.189 ns/op >> VectorSignum.doubleSignum 1024 avgt 3 482.365 ? 0.165 ns/op >> VectorSignum.doubleSignum 2048 avgt 3 962.412 ? 1.401 ns/op >> Benchmark Mode Cnt Score Error Units >> MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op >> MaxMinOptimizeTest.dMax avgt 3 125.701 ? 0.082 us/op >> MaxMinOptimizeTest.dMin avgt 3 124.704 ? 0.119 us/op >> MaxMinOptimizeTest.dMul avgt 3 77.232 ? 0.028 us/op >> MaxMinOptimizeTest.fAdd avgt 3 77.169 ? 0.103 us/op >> MaxMinOptimizeTest.fMax avgt 3 97.939 ? 0.477 us/op >> MaxMinOptimizeTest.fMin avgt 3 98.012 ? 0.154 us/op >> MaxMinO... > > Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/cpu/x86/x86.ad > > Co-authored-by: Jatin Bhateja All tests passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16716#issuecomment-1833790386 From coleenp at openjdk.org Thu Nov 30 13:42:07 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 30 Nov 2023 13:42:07 GMT Subject: RFR: 8320530: has_resolved_ref_index flag not restored after resetting entry [v3] In-Reply-To: References: <4Bi8mWx5pxfYciHnXHUla1X_BzUt_56q8MoRkCYc0dk=.50072528-a5d9-4708-995b-c0bd49f8c74b@github.com> Message-ID: On Wed, 29 Nov 2023 20:01:22 GMT, Matias Saavedra Silva wrote: >> ResolvedMethodEntry::reset_entry() clears the fields in the structure and then restores the constant pool index and the resolved references index if it has one. Currently, the resolved references index is restored without restoring the has_resolved_reference flag used by the structure. >> >> This patch restored the flag with the resolved references index. Verified with tier 1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Corrections and gtest fix I have a couple of minor requests before you push. Hopefully not delaying. src/hotspot/share/oops/resolvedMethodEntry.hpp line 199: > 197: #ifdef ASSERT > 198: _has_interface_klass = true; > 199: #endif The == false in the asserts looks odd. ------------- Changes requested by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16769#pullrequestreview-1757475524 PR Review Comment: https://git.openjdk.org/jdk/pull/16769#discussion_r1410678100 From coleenp at openjdk.org Thu Nov 30 13:42:11 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 30 Nov 2023 13:42:11 GMT Subject: RFR: 8320530: has_resolved_ref_index flag not restored after resetting entry [v2] In-Reply-To: References: <4Bi8mWx5pxfYciHnXHUla1X_BzUt_56q8MoRkCYc0dk=.50072528-a5d9-4708-995b-c0bd49f8c74b@github.com> Message-ID: On Wed, 29 Nov 2023 06:30:44 GMT, David Holmes wrote: >> Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Added asserts to ensure correctness >> - Merge branch 'master' into resolved_ref_flag >> - Merge branch 'master' of https://github.com/openjdk/jdk into resolved_ref_flag >> - 8320530: has_resolved_ref_index flag not restored after resetting entry > > src/hotspot/share/oops/resolvedMethodEntry.hpp line 80: > >> 78: u1 _flags; // Flags: [00|has_resolved_ref_index|has_local_signature|has_appendix|forced_virtual|final|virtual_final] >> 79: u1 _bytecode1, _bytecode2; // Resolved invoke codes >> 80: DEBUG_ONLY( > > Nit: it is better to use `#ifdef ASSERT` than a multi-line `DEBUG_ONLY()` - and that will also allow the correct indentation for the new variables. I see, I said the opposite. I don't like the #ifdef ASSERT blocks. 3 lines vs 1. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16769#discussion_r1410686620 From coleenp at openjdk.org Thu Nov 30 13:42:12 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 30 Nov 2023 13:42:12 GMT Subject: RFR: 8320530: has_resolved_ref_index flag not restored after resetting entry [v2] In-Reply-To: References: <4Bi8mWx5pxfYciHnXHUla1X_BzUt_56q8MoRkCYc0dk=.50072528-a5d9-4708-995b-c0bd49f8c74b@github.com> Message-ID: On Thu, 30 Nov 2023 13:37:57 GMT, Coleen Phillimore wrote: >> src/hotspot/share/oops/resolvedMethodEntry.hpp line 80: >> >>> 78: u1 _flags; // Flags: [00|has_resolved_ref_index|has_local_signature|has_appendix|forced_virtual|final|virtual_final] >>> 79: u1 _bytecode1, _bytecode2; // Resolved invoke codes >>> 80: DEBUG_ONLY( >> >> Nit: it is better to use `#ifdef ASSERT` than a multi-line `DEBUG_ONLY()` - and that will also allow the correct indentation for the new variables. > > I see, I said the opposite. I don't like the #ifdef ASSERT blocks. 3 lines vs 1. I think each variable should have DEBUG_ONLY() around it, not grouped though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16769#discussion_r1410688666 From coleenp at openjdk.org Thu Nov 30 13:42:13 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 30 Nov 2023 13:42:13 GMT Subject: RFR: 8320530: has_resolved_ref_index flag not restored after resetting entry [v3] In-Reply-To: References: <4Bi8mWx5pxfYciHnXHUla1X_BzUt_56q8MoRkCYc0dk=.50072528-a5d9-4708-995b-c0bd49f8c74b@github.com> Message-ID: On Thu, 30 Nov 2023 13:32:12 GMT, Coleen Phillimore wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Corrections and gtest fix > > src/hotspot/share/oops/resolvedMethodEntry.hpp line 199: > >> 197: #ifdef ASSERT >> 198: _has_interface_klass = true; >> 199: #endif > > The == false in the asserts looks odd. This might be more compact (and below), as DEBUG_ONLY(_has_interface_klass=true); Same with all of these #ifdef ASSERT blocks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16769#discussion_r1410683618 From mbaesken at openjdk.org Thu Nov 30 14:05:17 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 30 Nov 2023 14:05:17 GMT Subject: RFR: JDK-8319927: Log that IEEE rounding mode was corrupted by loading a library In-Reply-To: References: Message-ID: <7VgoeBsuX1bUqgpxjJdcec3d8Spb5ZOPMA7dfcWx-2c=.a2284082-a4fe-4e35-a9bb-fa6907ae4434@github.com> On Fri, 17 Nov 2023 08:19:18 GMT, Andrew Haley wrote: >> [JDK-8295159](https://bugs.openjdk.org/browse/JDK-8295159) added some IEEE conformance checks and corrections on Linux and macOS/BSD , however in case of issues no logging is done, this should be improved. > > src/hotspot/os/bsd/os_bsd.cpp line 1013: > >> 1011: int rtn = fesetenv(&default_fenv); >> 1012: assert(rtn == 0, "fesetenv must succeed"); >> 1013: bool ieee_handling_after_issue = IEEE_subnormal_handling_OK(); > > This is a misleading name. It should be something explicit like `ieee_handling_succeeded`. Still, I suppose it's too late now. okay why not ieee_handling_succeeded; I can change the variable name in the follow up JDK-8321017 . ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16618#discussion_r1410717962 From mbaesken at openjdk.org Thu Nov 30 14:51:33 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 30 Nov 2023 14:51:33 GMT Subject: RFR: JDK-8321017: Record in JFR that IEEE rounding mode was corrupted by loading a library Message-ID: <2gUnFY5JoLg3EALGuXTJwhv2oyMm02zL3ckZQ3vkFck=.e5a60d88-99f9-4e97-8bcd-6dee3bf6f208@github.com> [JDK-8295159](https://bugs.openjdk.org/browse/JDK-8295159) added some IEEE conformance checks and corrections of the floating point environment on Linux and macOS/BSD, and later some UL logging was added too. However the information is not added to the JFR events, and this should be enhanced. The already existing NativeLibraryLoad event can be used for storing the additional information, because the IEEE conformance check and fenv get/set is placed in the HS dlopen_helper , where already the NativeLibraryLoad event objects are created/commited . ------------- Commit messages: - JDK-8321017 Changes: https://git.openjdk.org/jdk/pull/16903/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16903&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8321017 Stats: 34 lines in 5 files changed: 24 ins; 5 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16903.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16903/head:pull/16903 PR: https://git.openjdk.org/jdk/pull/16903 From aph at openjdk.org Thu Nov 30 14:53:16 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 30 Nov 2023 14:53:16 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v5] In-Reply-To: <8SBUvWGDLtQmwYPRBDeUkeuq4pf2nJfKfDY5rzZODFU=.3f1cc0ff-02e6-4a4e-9425-5ffccc9cbc8f@github.com> References: <8SBUvWGDLtQmwYPRBDeUkeuq4pf2nJfKfDY5rzZODFU=.3f1cc0ff-02e6-4a4e-9425-5ffccc9cbc8f@github.com> Message-ID: On Thu, 30 Nov 2023 11:46:58 GMT, Andrew Haley wrote: > [vect_math.S.txt](https://github.com/openjdk/jdk/files/13512306/vect_math.S.txt) I guess this will live only in os_linux and os_bsd because the Windows compiler won't like it AFAIK. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1833932148 From duke at openjdk.org Thu Nov 30 15:05:08 2023 From: duke at openjdk.org (Lei Zaakjyu) Date: Thu, 30 Nov 2023 15:05:08 GMT Subject: RFR: JDK-8234502 : Merge GenCollectedHeap and SerialHeap [v8] In-Reply-To: References: Message-ID: <182QWwu6PO97q3Jf3RaiiNLUKN6f-Xz5VjPb8rd0O50=.5f4a12df-3c3e-4ba7-8ba4-10627d7bbcdd@github.com> On Tue, 28 Nov 2023 09:20:58 GMT, Albert Mingkun Yang wrote: > I just noticed I referred to the wrong PR number in my previous msg... > > Could you resolve the conflict now that that PR is merged? ok! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16623#issuecomment-1833952980 From epeter at openjdk.org Thu Nov 30 15:08:21 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 Nov 2023 15:08:21 GMT Subject: RFR: 8306767: Concurrent repacking of extra data in MethodData is potentially unsafe [v2] In-Reply-To: References: Message-ID: <8cx19-Ux89HrhBDzaudlbG3qcRNAgu2q8riHqYssJrQ=.d8a328fa-c5df-46bb-a8f2-2274c7a18046@github.com> > I'm making sure that `allocate_bci_to_data` is only called when holding the `extra_data_lock`, so that no concurrent calls of it can ever occur. > > Testing: tier1-3 and stress. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: adding more verification and more locking, WIP ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16840/files - new: https://git.openjdk.org/jdk/pull/16840/files/9336c4fe..5b819f19 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16840&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16840&range=00-01 Stats: 103 lines in 9 files changed: 81 ins; 12 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/16840.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16840/head:pull/16840 PR: https://git.openjdk.org/jdk/pull/16840 From pchilanomate at openjdk.org Thu Nov 30 15:26:22 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 30 Nov 2023 15:26:22 GMT Subject: RFR: 8320275: assert(_chunk->bitmap().at(index)) failed: Bit not set at index [v3] In-Reply-To: References: Message-ID: <8uEcqzVz-sUB1NACfJnQ2c1s3Vxjf0d-V5Upwgi703o=.c9b8c675-f3e5-4a2d-94a0-64e9d5613bf4@github.com> > Please review the following fix. The assert fails while verifying the top frame of the stackChunk before returning from a thaw call. The stackChunk is in gc mode but we found a narrow oop for this c2 compiled frame that doesn't have its corresponding bit set. This is because while thawing its callee we cleared the bitmap range associated with the argument area, but this narrow oop happens to land at the very last stack slot of that region. > Loom code assumes the size of the argument area is always a multiple of 2 stack slots, as SharedRuntime::java_calling_convention() shows. But c2 doesn't seem to follow this convention and, knowing the last passed argument only takes one stack slot, it's using the remaining space to store a narrow oop for the caller. There are more details about the specific crash in JBS. > > The initial proposed fix is to just restrict the range of the bitmap we clear by excluding the last stack slot of the argument area, since passed oops are always word aligned. I've also experimented with a patch where I changed SharedRuntime::java_calling_convention() and Fingerprinter::do_type_calling_convention() to not round up the number of stack slots used, and then changed the callers to use a round up value or not depending on the needs [1]. I wasn't convinced it was worthy given we only care about this difference in this Loom code, but I don't mind going with that fix instead. The 3rd alternative would be to just change c2 to not use this stack slot and start spilling at a word aligned offset from the sp. > > I run the patch with the failing test and verified the crash doesn't reproduce anymore. I've also run this patch through loom tiers1-5. > > Thanks, > Patricio > > [1] https://github.com/pchilano/jdk/commit/42ae9269b28beb6f36c502182116545b680e418f Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: add comment in clear_bitmap_bits() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16837/files - new: https://git.openjdk.org/jdk/pull/16837/files/42478a45..4f580f5a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16837&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16837&range=01-02 Stats: 6 lines in 1 file changed: 5 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16837.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16837/head:pull/16837 PR: https://git.openjdk.org/jdk/pull/16837 From pchilanomate at openjdk.org Thu Nov 30 15:26:24 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 30 Nov 2023 15:26:24 GMT Subject: RFR: 8320275: assert(_chunk->bitmap().at(index)) failed: Bit not set at index [v2] In-Reply-To: <420maB6CB-flM2EsOOVlpWvOghCavZ02chpeIk9vCKs=.039647b7-4d60-4a52-a8b2-cf93972911f8@github.com> References: <0eX9lQsQl61MnSDcClo6e2S1wYOdDy21i-CzJq2faIw=.b7f0e9d0-62d6-43ce-a9be-57333f0f871d@github.com> <420maB6CB-flM2EsOOVlpWvOghCavZ02chpeIk9vCKs=.039647b7-4d60-4a52-a8b2-cf93972911f8@github.com> Message-ID: <9MSEsf9mSfm-poD1nfJKRdacExHlWTDa6T4cqARi1zw=.97de134a-eb1d-42e3-97cb-5e5eb69b1d38@github.com> On Thu, 30 Nov 2023 03:28:39 GMT, Dean Long wrote: > OK, the use of `address` still seems misleading, but I can't think of anything much better, except perhaps `void*`. > I think of it as just a range of memory we are passing. I don't immediately see void* as better since that could also point anywhere and not be aligned. > Do we really need a version of num_stack_arg_slots() that rounds up? > All the other callers in freeze/thaw calculate the size of the argument are in words based on this number (e.g. `frame::compiled_frame_stack_argsize()`). So if we have an odd number of stack slots we would end up calculating the wrong size. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2175: > >> 2173: // we need to clear the bits that correspond to arguments as they reside in the caller frame >> 2174: // or they will keep objects that are otherwise unreachable alive >> 2175: address effective_end = UseCompressedOops ? end : align_down(end, wordSize); > > Is the align_down for correctness, or just for the benefit of the new assert at line 2179? Since it's not immediately obvious, I think it deserves a comment. Because `end` is not necessarily word aligned anymore the pointer arithmetic we do in bit_index_for() would be UB, since `p` can point to the middle of an oop (in practice we would probably not see any issue because that's implemented as a substraction and then an arithmetic shift right which will round down the result). So we need to align `end` down if UseCompressedOops is not set. That last half word part should not contain an oop anyways so the assert is to verify that. I added a comment, please take a look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16837#issuecomment-1833991601 PR Review Comment: https://git.openjdk.org/jdk/pull/16837#discussion_r1410833228 From eosterlund at openjdk.org Thu Nov 30 15:31:07 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 30 Nov 2023 15:31:07 GMT Subject: RFR: 8321066: Multiple JFR tests have started failing In-Reply-To: References: Message-ID: On Thu, 30 Nov 2023 13:19:36 GMT, Erik ?sterlund wrote: > Before integrating https://bugs.openjdk.org/browse/JDK-8310644 we added a seemingly innocent NoSafepointVerifier in some code that really shouldn't safepoint and reran tier1 only. > > What nobody anticipated is that the JFR dumping during crash reporting code probably introduced by https://bugs.openjdk.org/browse/JDK-8233706 performs safepoint polls from inside the crash reporter. These JFR tests try to provoke a crash and check that the JFR recording gets dumped. But we crash during crash reporting in debug builds, because the NSV doesn't like the safepoint polls inside the crash reporter. > > Now while this crash reporting code can seemingly make any NSV in the JVM fail if you get a crash there, and even worse, accept safepoints and do GC while crash reporting from totally safepoint unsafe code and what not, this change merely removes the new NSV from the fix that introduced the test failures. But maybe going forward we shouldn't poll for safepoints in the crash reporter. > > I tested that the reported test failures fail deterministically without this patch and do not fail with this patch. I also re-ran tier1 just to be safe. Since this is causing a bit of a christmas tree in the CI and christmas shouldn't come quite yet, I'm going to go ahead and /integrate ------------- PR Comment: https://git.openjdk.org/jdk/pull/16900#issuecomment-1834002024 From tschatzl at openjdk.org Thu Nov 30 15:34:26 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 30 Nov 2023 15:34:26 GMT Subject: RFR: 8319313: G1: Rename G1EvacFailureInjector appropriately Message-ID: Hi all, please review this rename of `G1EvacFailureInjector` and associated options to `G1AllocationFailureInjector` according to the results of the discussion for the review of [JDK-8318706](https://bugs.openjdk.org/browse/JDK-8318706). To facilitate review the first commit implements the renaming changes, the second moves the affected files only. Testing: gha, local gc/g1 tests Thanks, Thomas ------------- Commit messages: - renames - Initial version, fix up g1EvacuationFailure -> g1AllocationFailure in various variants Changes: https://git.openjdk.org/jdk/pull/16905/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16905&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319313 Stats: 727 lines in 19 files changed: 343 ins; 335 del; 49 mod Patch: https://git.openjdk.org/jdk/pull/16905.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16905/head:pull/16905 PR: https://git.openjdk.org/jdk/pull/16905 From tschatzl at openjdk.org Thu Nov 30 15:42:06 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 30 Nov 2023 15:42:06 GMT Subject: RFR: 8320916: jdk/jfr/event/gc/stacktrace/TestParallelMarkSweepAllocationPendingStackTrace.java failed with "OutOfMemoryError: GC overhead limit exceeded" In-Reply-To: References: Message-ID: On Wed, 29 Nov 2023 00:35:47 GMT, Albert Mingkun Yang wrote: > Simple fix to reduce live set so that after the triggered full-gc, there is still some memory left. > > Test: ~2/10 failure before the fix and no failure observed for 100 iterations. Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16870#pullrequestreview-1757771064 From ayang at openjdk.org Thu Nov 30 15:51:25 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 30 Nov 2023 15:51:25 GMT Subject: Integrated: 8320916: jdk/jfr/event/gc/stacktrace/TestParallelMarkSweepAllocationPendingStackTrace.java failed with "OutOfMemoryError: GC overhead limit exceeded" In-Reply-To: References: Message-ID: On Wed, 29 Nov 2023 00:35:47 GMT, Albert Mingkun Yang wrote: > Simple fix to reduce live set so that after the triggered full-gc, there is still some memory left. > > Test: ~2/10 failure before the fix and no failure observed for 100 iterations. This pull request has now been integrated. Changeset: 69384745 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/693847452f208446a34186f142fe2c56a49ceceb Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8320916: jdk/jfr/event/gc/stacktrace/TestParallelMarkSweepAllocationPendingStackTrace.java failed with "OutOfMemoryError: GC overhead limit exceeded" Reviewed-by: sjohanss, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/16870 From ayang at openjdk.org Thu Nov 30 15:51:25 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 30 Nov 2023 15:51:25 GMT Subject: RFR: 8320916: jdk/jfr/event/gc/stacktrace/TestParallelMarkSweepAllocationPendingStackTrace.java failed with "OutOfMemoryError: GC overhead limit exceeded" In-Reply-To: References: Message-ID: On Wed, 29 Nov 2023 00:35:47 GMT, Albert Mingkun Yang wrote: > Simple fix to reduce live set so that after the triggered full-gc, there is still some memory left. > > Test: ~2/10 failure before the fix and no failure observed for 100 iterations. Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16870#issuecomment-1834033077 From rriggs at openjdk.org Thu Nov 30 15:51:46 2023 From: rriggs at openjdk.org (Roger Riggs) Date: Thu, 30 Nov 2023 15:51:46 GMT Subject: RFR: 8311906: Improve robustness of String constructors with mutable array inputs [v14] In-Reply-To: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> References: <6SKlGLh5MmxoEx07wHCCUc8KWbbhcspLJmcc1uxQ_FI=.ca33bfb4-fa5c-45f0-b49f-ee6c5c6b68b4@github.com> Message-ID: > Strings, after construction, are immutable but may be constructed from mutable arrays of bytes, characters, or integers. > The string constructors should guard against the effects of mutating the arrays during construction that might invalidate internal invariants for the correct behavior of operations on the resulting strings. In particular, a number of operations have optimizations for operations on pairs of latin1 strings and pairs of non-latin1 strings, while operations between latin1 and non-latin1 strings use a more general implementation. > > The changes include: > > - Adding a warning to each constructor with an array as an argument to indicate that the results are indeterminate > if the input array is modified before the constructor returns. > The resulting string may contain any combination of characters sampled from the input array. > > - Ensure that strings that are represented as non-latin1 contain at least one non-latin1 character. > For latin1 inputs, whether the arrays contain ASCII, ISO-8859-1, UTF8, or another encoding decoded to latin1 the scanning and compression is unchanged. > If a non-latin1 character is found, the string is represented as non-latin1 with the added verification that a non-latin1 character is present at the same index. > If that character is found to be latin1, then the input array has been modified and the result of the scan may be incorrect. > Though a ConcurrentModificationException could be thrown, the risk to an existing application of an unexpected exception should be avoided. > Instead, the non-latin1 copy of the input is re-scanned and compressed; that scan determines whether the latin1 or the non-latin1 representation is returned. > > - The methods that scan for non-latin1 characters and their intrinsic implementations are updated to return the index of the non-latin1 character. > > - String construction from StringBuilder and CharSequence must also be guarded as their contents may be modified during construction. Roger Riggs has updated the pull request incrementally with one additional commit since the last revision: Correct jcc/jccb branches ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16425/files - new: https://git.openjdk.org/jdk/pull/16425/files/5299c43b..b2fc3855 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16425&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16425&range=12-13 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16425/head:pull/16425 PR: https://git.openjdk.org/jdk/pull/16425 From epeter at openjdk.org Thu Nov 30 15:53:30 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 Nov 2023 15:53:30 GMT Subject: RFR: 8306767: Concurrent repacking of extra data in MethodData is potentially unsafe [v3] In-Reply-To: References: Message-ID: > I'm making sure that `allocate_bci_to_data` is only called when holding the `extra_data_lock`, so that no concurrent calls of it can ever occur. > > Testing: tier1-3 and stress. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more locking, still fails tho - WIP ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16840/files - new: https://git.openjdk.org/jdk/pull/16840/files/5b819f19..54f2c498 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16840&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16840&range=01-02 Stats: 19 lines in 3 files changed: 15 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16840.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16840/head:pull/16840 PR: https://git.openjdk.org/jdk/pull/16840 From duke at openjdk.org Thu Nov 30 16:03:14 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Thu, 30 Nov 2023 16:03:14 GMT Subject: RFR: 8320347: Emulate vblendvp[sd] on ECore [v8] In-Reply-To: References: Message-ID: <9wGMR6x1wsmpaWsleOrqS98piAx7UyRLmrX-D8GWKxo=.2bf625c8-df8b-4eab-ba45-c226af6a75be@github.com> On Thu, 30 Nov 2023 13:30:42 GMT, Tobias Hartmann wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/cpu/x86/x86.ad >> >> Co-authored-by: Jatin Bhateja > > All tests passed. Thanks @TobiHartmann ------------- PR Comment: https://git.openjdk.org/jdk/pull/16716#issuecomment-1834057603 From duke at openjdk.org Thu Nov 30 16:14:32 2023 From: duke at openjdk.org (Volodymyr Paprotski) Date: Thu, 30 Nov 2023 16:14:32 GMT Subject: Integrated: 8320347: Emulate vblendvp[sd] on ECore In-Reply-To: References: Message-ID: On Fri, 17 Nov 2023 19:58:13 GMT, Volodymyr Paprotski wrote: > Splitting vblendvp[sd] into boolean operations is bit faster on ECore, get up to 30% gain > > > =============== BEFORE =============== > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 77.766 ? 0.049 ns/op > VectorSignum.floatSignum 512 avgt 3 154.889 ? 0.242 ns/op > VectorSignum.floatSignum 1024 avgt 3 306.130 ? 0.605 ns/op > VectorSignum.floatSignum 2048 avgt 3 609.965 ? 0.927 ns/op > VectorSignum.doubleSignum 256 avgt 3 151.874 ? 1.748 ns/op > VectorSignum.doubleSignum 512 avgt 3 303.080 ? 0.310 ns/op > VectorSignum.doubleSignum 1024 avgt 3 607.517 ? 0.597 ns/op > VectorSignum.doubleSignum 2048 avgt 3 1214.282 ? 1.834 ns/op > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op > MaxMinOptimizeTest.dMax avgt 3 137.334 ? 0.128 us/op > MaxMinOptimizeTest.dMin avgt 3 137.160 ? 0.465 us/op > MaxMinOptimizeTest.dMul avgt 3 77.231 ? 0.051 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.165 ? 0.003 us/op > MaxMinOptimizeTest.fMax avgt 3 107.428 ? 1.501 us/op > MaxMinOptimizeTest.fMin avgt 3 107.186 ? 0.022 us/op > MaxMinOptimizeTest.fMul avgt 3 77.164 ? 0.012 us/op > > =============== AFTER =============== > Benchmark (SIZE) Mode Cnt Score Error Units > VectorSignum.floatSignum 256 avgt 3 61.816 ? 1.980 ns/op > VectorSignum.floatSignum 512 avgt 3 117.251 ? 0.052 ns/op > VectorSignum.floatSignum 1024 avgt 3 231.356 ? 0.397 ns/op > VectorSignum.floatSignum 2048 avgt 3 458.904 ? 0.774 ns/op > VectorSignum.doubleSignum 256 avgt 3 121.449 ? 0.184 ns/op > VectorSignum.doubleSignum 512 avgt 3 241.662 ? 0.189 ns/op > VectorSignum.doubleSignum 1024 avgt 3 482.365 ? 0.165 ns/op > VectorSignum.doubleSignum 2048 avgt 3 962.412 ? 1.401 ns/op > Benchmark Mode Cnt Score Error Units > MaxMinOptimizeTest.dAdd avgt 3 77.240 ? 0.029 us/op > MaxMinOptimizeTest.dMax avgt 3 125.701 ? 0.082 us/op > MaxMinOptimizeTest.dMin avgt 3 124.704 ? 0.119 us/op > MaxMinOptimizeTest.dMul avgt 3 77.232 ? 0.028 us/op > MaxMinOptimizeTest.fAdd avgt 3 77.169 ? 0.103 us/op > MaxMinOptimizeTest.fMax avgt 3 97.939 ? 0.477 us/op > MaxMinOptimizeTest.fMin avgt 3 98.012 ? 0.154 us/op > MaxMinOptimizeTest.fMul avgt 3 77.174 ? 0.012 us/op This pull request has now been integrated. Changeset: 6aba6aa6 Author: Volodymyr Paprotski <101140609+vpaprotsk at users.noreply.github.com> Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/6aba6aa6f14c022ae70aee4e7a65ee74464de3a2 Stats: 350 lines in 8 files changed: 257 ins; 56 del; 37 mod 8320347: Emulate vblendvp[sd] on ECore Reviewed-by: sviswanathan, jbhateja ------------- PR: https://git.openjdk.org/jdk/pull/16716 From epeter at openjdk.org Thu Nov 30 16:16:54 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 30 Nov 2023 16:16:54 GMT Subject: RFR: 8306767: Concurrent repacking of extra data in MethodData is potentially unsafe [v4] In-Reply-To: References: Message-ID: > I'm making sure that `allocate_bci_to_data` is only called when holding the `extra_data_lock`, so that no concurrent calls of it can ever occur. > > Testing: tier1-3 and stress. Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - manual merge with master after JDK-8267532 - more locking, still fails tho - WIP - adding more verification and more locking, WIP - add locks for jvmci calls to allocate_bci_to_data - 8306767 ------------- Changes: https://git.openjdk.org/jdk/pull/16840/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16840&range=03 Stats: 123 lines in 9 files changed: 102 ins; 9 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/16840.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16840/head:pull/16840 PR: https://git.openjdk.org/jdk/pull/16840 From dcubed at openjdk.org Thu Nov 30 16:22:23 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 30 Nov 2023 16:22:23 GMT Subject: RFR: 8321066: Multiple JFR tests have started failing In-Reply-To: References: Message-ID: On Thu, 30 Nov 2023 13:19:36 GMT, Erik ?sterlund wrote: > Before integrating https://bugs.openjdk.org/browse/JDK-8310644 we added a seemingly innocent NoSafepointVerifier in some code that really shouldn't safepoint and reran tier1 only. > > What nobody anticipated is that the JFR dumping during crash reporting code probably introduced by https://bugs.openjdk.org/browse/JDK-8233706 performs safepoint polls from inside the crash reporter. These JFR tests try to provoke a crash and check that the JFR recording gets dumped. But we crash during crash reporting in debug builds, because the NSV doesn't like the safepoint polls inside the crash reporter. > > Now while this crash reporting code can seemingly make any NSV in the JVM fail if you get a crash there, and even worse, accept safepoints and do GC while crash reporting from totally safepoint unsafe code and what not, this change merely removes the new NSV from the fix that introduced the test failures. But maybe going forward we shouldn't poll for safepoints in the crash reporter. > > I tested that the reported test failures fail deterministically without this patch and do not fail with this patch. I also re-ran tier1 just to be safe. Thumbs up. Please integrate soon. ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16900#pullrequestreview-1757854634 From dcubed at openjdk.org Thu Nov 30 16:22:25 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 30 Nov 2023 16:22:25 GMT Subject: RFR: 8321066: Multiple JFR tests have started failing In-Reply-To: References: Message-ID: On Thu, 30 Nov 2023 15:28:51 GMT, Erik ?sterlund wrote: >> Before integrating https://bugs.openjdk.org/browse/JDK-8310644 we added a seemingly innocent NoSafepointVerifier in some code that really shouldn't safepoint and reran tier1 only. >> >> What nobody anticipated is that the JFR dumping during crash reporting code probably introduced by https://bugs.openjdk.org/browse/JDK-8233706 performs safepoint polls from inside the crash reporter. These JFR tests try to provoke a crash and check that the JFR recording gets dumped. But we crash during crash reporting in debug builds, because the NSV doesn't like the safepoint polls inside the crash reporter. >> >> Now while this crash reporting code can seemingly make any NSV in the JVM fail if you get a crash there, and even worse, accept safepoints and do GC while crash reporting from totally safepoint unsafe code and what not, this change merely removes the new NSV from the fix that introduced the test failures. But maybe going forward we shouldn't poll for safepoints in the crash reporter. >> >> I tested that the reported test failures fail deterministically without this patch and do not fail with this patch. I also re-ran tier1 just to be safe. > > Since this is causing a bit of a christmas tree in the CI and christmas shouldn't come quite yet, I'm going to go ahead and /integrate @fisk - Perhaps the "/integrate" needs to be on a line by itself? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16900#issuecomment-1834094440 From eosterlund at openjdk.org Thu Nov 30 16:40:15 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 30 Nov 2023 16:40:15 GMT Subject: Integrated: 8321066: Multiple JFR tests have started failing In-Reply-To: References: Message-ID: On Thu, 30 Nov 2023 13:19:36 GMT, Erik ?sterlund wrote: > Before integrating https://bugs.openjdk.org/browse/JDK-8310644 we added a seemingly innocent NoSafepointVerifier in some code that really shouldn't safepoint and reran tier1 only. > > What nobody anticipated is that the JFR dumping during crash reporting code probably introduced by https://bugs.openjdk.org/browse/JDK-8233706 performs safepoint polls from inside the crash reporter. These JFR tests try to provoke a crash and check that the JFR recording gets dumped. But we crash during crash reporting in debug builds, because the NSV doesn't like the safepoint polls inside the crash reporter. > > Now while this crash reporting code can seemingly make any NSV in the JVM fail if you get a crash there, and even worse, accept safepoints and do GC while crash reporting from totally safepoint unsafe code and what not, this change merely removes the new NSV from the fix that introduced the test failures. But maybe going forward we shouldn't poll for safepoints in the crash reporter. > > I tested that the reported test failures fail deterministically without this patch and do not fail with this patch. I also re-ran tier1 just to be safe. This pull request has now been integrated. Changeset: 7c135c36 Author: Erik ?sterlund URL: https://git.openjdk.org/jdk/commit/7c135c3697eafedc6e244f5c866a40127247e26a Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod 8321066: Multiple JFR tests have started failing Reviewed-by: mcimadamore, jvernee, dcubed ------------- PR: https://git.openjdk.org/jdk/pull/16900 From jiangli at openjdk.org Thu Nov 30 16:50:23 2023 From: jiangli at openjdk.org (Jiangli Zhou) Date: Thu, 30 Nov 2023 16:50:23 GMT Subject: RFR: 8319935: Ensure only one JvmtiThreadState is created for one JavaThread associated with attached native thread [v5] In-Reply-To: References: <3GC3ckHJhlYl3Vj_oh7DJorPy8NneY9rw2EMBQeyFvY=.6c7281cd-f97d-4805-8695-ddd66e5e6415@github.com> Message-ID: On Wed, 29 Nov 2023 23:06:10 GMT, Daniel D. Daugherty wrote: > A belated thumbs up. Sorry I didn't get back to this review before the fix was integrated. Still thanks for reviewing the change, @dcubed-ojdk. > > I found just a nit comment that could be more clear. The particular issue occurred when `JavaThread::allocate_threadObj` was allocating and initializing the Thread instance. When the allocation of the Thread object triggered sampling, it could create a `JvmtiThreadState` with null thread oop with the bug. It seems "is being allocated" describes the issue more accurately. > src/hotspot/share/prims/jvmtiExport.cpp line 3143: > >> 3141: >> 3142: // If the current thread is attaching from native and its Java thread object >> 3143: // is being allocated, things are not ready for allocation sampling. > > nit - typo: s/is being allocated/has not been allocated/ Please see other comment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16642#issuecomment-1834148374 PR Review Comment: https://git.openjdk.org/jdk/pull/16642#discussion_r1410956027 From pchilanomate at openjdk.org Thu Nov 30 16:56:04 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 30 Nov 2023 16:56:04 GMT Subject: RFR: 8320239: add dynamic switch for JvmtiVTMSTransitionDisabler sync protocol [v3] In-Reply-To: References: Message-ID: On Thu, 30 Nov 2023 07:34:23 GMT, Serguei Spitsyn wrote: >> This is an update for a performance/scalability enhancement. >> >> The `JvmtiVTMSTransitionDisabler`sync protocol is on a performance critical path of the virtual threads mount state transitions (VTMS transitions). It has a noticeable performance overhead (about 10%) which contributes to the combined JVMTI performance overhead when Java apps are executed with loaded JVMTI agents. >> >> Please, also see another/related performance issue which contributes around 70% of total performance overhead: >> [8308614](https://bugs.openjdk.org/browse/JDK-8308614): Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 >> >> Testing: >> - Ran mach5 tiers 1-6 with no regressions noticed. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: addressed a race condition Thanks Serguei, looks good to me. ------------- Marked as reviewed by pchilanomate (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16688#pullrequestreview-1757941041 From sspitsyn at openjdk.org Thu Nov 30 17:01:24 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 Nov 2023 17:01:24 GMT Subject: Integrated: 8320239: add dynamic switch for JvmtiVTMSTransitionDisabler sync protocol In-Reply-To: References: Message-ID: On Thu, 16 Nov 2023 12:35:08 GMT, Serguei Spitsyn wrote: > This is an update for a performance/scalability enhancement. > > The `JvmtiVTMSTransitionDisabler`sync protocol is on a performance critical path of the virtual threads mount state transitions (VTMS transitions). It has a noticeable performance overhead (about 10%) which contributes to the combined JVMTI performance overhead when Java apps are executed with loaded JVMTI agents. > > Please, also see another/related performance issue which contributes around 70% of total performance overhead: > [8308614](https://bugs.openjdk.org/browse/JDK-8308614): Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 > > Testing: > - Ran mach5 tiers 1-6 with no regressions noticed. This pull request has now been integrated. Changeset: 41daa3b9 Author: Serguei Spitsyn URL: https://git.openjdk.org/jdk/commit/41daa3b934255420dcf414cf9045289ba05a9f48 Stats: 47 lines in 2 files changed: 40 ins; 4 del; 3 mod 8320239: add dynamic switch for JvmtiVTMSTransitionDisabler sync protocol Reviewed-by: lmesnik, pchilanomate, amenkov ------------- PR: https://git.openjdk.org/jdk/pull/16688 From sspitsyn at openjdk.org Thu Nov 30 17:01:22 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 Nov 2023 17:01:22 GMT Subject: RFR: 8320239: add dynamic switch for JvmtiVTMSTransitionDisabler sync protocol [v3] In-Reply-To: References: Message-ID: <5g-8z-piS7aZSv_WxK0Nq4i3nCB6nCban162zPn73dQ=.678b827b-0363-44a5-8782-a1d0802e1b6a@github.com> On Thu, 30 Nov 2023 07:34:23 GMT, Serguei Spitsyn wrote: >> This is an update for a performance/scalability enhancement. >> >> The `JvmtiVTMSTransitionDisabler`sync protocol is on a performance critical path of the virtual threads mount state transitions (VTMS transitions). It has a noticeable performance overhead (about 10%) which contributes to the combined JVMTI performance overhead when Java apps are executed with loaded JVMTI agents. >> >> Please, also see another/related performance issue which contributes around 70% of total performance overhead: >> [8308614](https://bugs.openjdk.org/browse/JDK-8308614): Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 >> >> Testing: >> - Ran mach5 tiers 1-6 with no regressions noticed. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: addressed a race condition Leonid, Alex and Patricio thank you for review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16688#issuecomment-1834166897 From matsaave at openjdk.org Thu Nov 30 17:31:44 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 30 Nov 2023 17:31:44 GMT Subject: RFR: 8320530: has_resolved_ref_index flag not restored after resetting entry [v4] In-Reply-To: <4Bi8mWx5pxfYciHnXHUla1X_BzUt_56q8MoRkCYc0dk=.50072528-a5d9-4708-995b-c0bd49f8c74b@github.com> References: <4Bi8mWx5pxfYciHnXHUla1X_BzUt_56q8MoRkCYc0dk=.50072528-a5d9-4708-995b-c0bd49f8c74b@github.com> Message-ID: > ResolvedMethodEntry::reset_entry() clears the fields in the structure and then restores the constant pool index and the resolved references index if it has one. Currently, the resolved references index is restored without restoring the has_resolved_reference flag used by the structure. > > This patch restored the flag with the resolved references index. Verified with tier 1-5 tests. Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'master' into resolved_ref_flag - Coleen comments - Corrections and gtest fix - Added asserts to ensure correctness - Merge branch 'master' into resolved_ref_flag - Merge branch 'master' of https://github.com/openjdk/jdk into resolved_ref_flag - 8320530: has_resolved_ref_index flag not restored after resetting entry ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16769/files - new: https://git.openjdk.org/jdk/pull/16769/files/0b82817d..d7d7b8da Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16769&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16769&range=02-03 Stats: 17721 lines in 468 files changed: 12999 ins; 2833 del; 1889 mod Patch: https://git.openjdk.org/jdk/pull/16769.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16769/head:pull/16769 PR: https://git.openjdk.org/jdk/pull/16769 From dcubed at openjdk.org Thu Nov 30 17:36:13 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 30 Nov 2023 17:36:13 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 [v4] In-Reply-To: References: Message-ID: On Thu, 30 Nov 2023 02:08:39 GMT, Serguei Spitsyn wrote: >> This is a fix of a performance/scalability related issue. The `JvmtiThreadState` objects for virtual thread filtered events enabled globally are created eagerly because it is needed when the `interp_only_mode` is enabled. Otherwise, some events which are generated in `interp_only_mode` from the debugging version of interpreter chunks can be missed. >> However, it has to be okay to avoid eager creation of these object if no `interp_only_mode` has ever been requested. >> It seems to be an extremely important optimization to create JvmtiThreadState objects lazily in such cases. >> It is done by introducing the flag `JvmtiThreadState::_seen_interp_only_mode` which indicates when the `JvmtiThreadState` objects have to be created eagerly. >> >> Additionally, the fix includes the following related changes: >> - Use condition double checking idiom for `MutexLocker mu(JvmtiThreadState_lock)` in the function `JvmtiVTMSTransitionDisabler::VTMS_mount_end` which is on a performance-critical path and looks like this: >> >> JvmtiThreadState* state = thread->jvmti_thread_state(); >> if (state != nullptr && state->is_pending_interp_only_mode()) { >> MutexLocker mu(JvmtiThreadState_lock); >> state = thread->jvmti_thread_state(); >> if (state != nullptr && state->is_pending_interp_only_mode()) { >> JvmtiEventController::enter_interp_only_mode(); >> } >> } >> >> >> - Add extra check of `JvmtiExport::can_support_virtual_threads()` when virtual thread mount and unmount are posted. >> - Minor: Added a `ThreadsListHandle` to the `JvmtiEventControllerPrivate::enter_interp_only_mode`. It is needed because of the dynamic creation of compensating carrier threads which is racy for JVMTI `SetEventNotificationMode` implementation. >> >> Performance mesurements: >> - Without this fix the test provided by the bug submitter gives execution numbers: >> - no ClassLoad events enabled: 3251 ms >> - ClassLoad events are enabled: 40534 ms >> >> - With the fix: >> - no ClassLoad events enabled: 3270 ms >> - ClassLoad events are enabled: 3385 ms >> >> Testing: >> - Ran mach5 tiers 1-6, no regressions are noticed > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: remove newly added ThreadsListHandle from enter_interp_only_mode Removing the TLH is the right thing to do. If we get that failure mode again, then we can file a new bug and hopefully have an hs_err_pid file with it. I don't think we should change the guarantee() in `Handshake::execute()`. When the three parameter version of `execute()` is called with `tlh == nullptr`, the caller is saying that there is supposed to be a ThreadsListHandle in the calling context. Yes, if the target thread is the calling thread, then a ThreadsListHandle is not needed, but that's why we have this code to prevent the call to `Handshake::execute()`: if (target->is_handshake_safe_for(current)) { hs.do_thread(target); In other words, I think `Handshake::execute()` is working the way it is supposed to when `tlh == nullptr` is passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16686#issuecomment-1834244635 From cslucas at openjdk.org Thu Nov 30 17:38:16 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 30 Nov 2023 17:38:16 GMT Subject: RFR: JDK-8241503: C2: Share MacroAssembler between mach nodes during code emission [v4] In-Reply-To: References: Message-ID: On Thu, 30 Nov 2023 09:24:12 GMT, Hao Sun wrote: >> Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - Fix merge >> - Catch up with master branch. >> >> Merge remote-tracking branch 'origin/master' into reuse-macroasm >> - Some inst_mark fixes; Catch up with master. >> - Catch up with changes on master >> - Reuse same C2_MacroAssembler object to emit instructions. > > src/hotspot/cpu/aarch64/aarch64.ad line 2829: > >> 2827: enc_class aarch64_enc_ldrsbw(iRegI dst, memory1 mem) %{ >> 2828: Register dst_reg = as_Register($dst$$reg); >> 2829: loadStore(masm, &MacroAssembler::ldrsbw, dst_reg, $mem->opcode(), > > The block of code should be auto-generated by `ad_encode.m4` file. > We'd better not edit these lines directly. Instead, we should update the m4 file accordingly. Oops. I didn't see the comment there! Thank you for letting me know. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16484#discussion_r1411050332 From clanger at openjdk.org Thu Nov 30 17:41:04 2023 From: clanger at openjdk.org (Christoph Langer) Date: Thu, 30 Nov 2023 17:41:04 GMT Subject: RFR: JDK-8321017: Record in JFR that IEEE rounding mode was corrupted by loading a library In-Reply-To: <2gUnFY5JoLg3EALGuXTJwhv2oyMm02zL3ckZQ3vkFck=.e5a60d88-99f9-4e97-8bcd-6dee3bf6f208@github.com> References: <2gUnFY5JoLg3EALGuXTJwhv2oyMm02zL3ckZQ3vkFck=.e5a60d88-99f9-4e97-8bcd-6dee3bf6f208@github.com> Message-ID: On Thu, 30 Nov 2023 14:44:03 GMT, Matthias Baesken wrote: > [JDK-8295159](https://bugs.openjdk.org/browse/JDK-8295159) added some IEEE conformance checks and corrections of the floating point environment on Linux and macOS/BSD, and later some UL logging was added too. > However the information is not added to the JFR events, and this should be enhanced. > The already existing NativeLibraryLoad event can be used for storing the additional information, because the IEEE conformance check and fenv get/set is placed in the HS dlopen_helper , where already the NativeLibraryLoad event objects are created/commited . This does not build on MacOS. Removed from SAP internal test queue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16903#issuecomment-1834254415 From dcubed at openjdk.org Thu Nov 30 17:54:13 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 30 Nov 2023 17:54:13 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 [v4] In-Reply-To: References: Message-ID: On Thu, 30 Nov 2023 02:08:39 GMT, Serguei Spitsyn wrote: >> This is a fix of a performance/scalability related issue. The `JvmtiThreadState` objects for virtual thread filtered events enabled globally are created eagerly because it is needed when the `interp_only_mode` is enabled. Otherwise, some events which are generated in `interp_only_mode` from the debugging version of interpreter chunks can be missed. >> However, it has to be okay to avoid eager creation of these object if no `interp_only_mode` has ever been requested. >> It seems to be an extremely important optimization to create JvmtiThreadState objects lazily in such cases. >> It is done by introducing the flag `JvmtiThreadState::_seen_interp_only_mode` which indicates when the `JvmtiThreadState` objects have to be created eagerly. >> >> Additionally, the fix includes the following related changes: >> - Use condition double checking idiom for `MutexLocker mu(JvmtiThreadState_lock)` in the function `JvmtiVTMSTransitionDisabler::VTMS_mount_end` which is on a performance-critical path and looks like this: >> >> JvmtiThreadState* state = thread->jvmti_thread_state(); >> if (state != nullptr && state->is_pending_interp_only_mode()) { >> MutexLocker mu(JvmtiThreadState_lock); >> state = thread->jvmti_thread_state(); >> if (state != nullptr && state->is_pending_interp_only_mode()) { >> JvmtiEventController::enter_interp_only_mode(); >> } >> } >> >> >> - Add extra check of `JvmtiExport::can_support_virtual_threads()` when virtual thread mount and unmount are posted. >> - Minor: Added a `ThreadsListHandle` to the `JvmtiEventControllerPrivate::enter_interp_only_mode`. It is needed because of the dynamic creation of compensating carrier threads which is racy for JVMTI `SetEventNotificationMode` implementation. >> >> Performance mesurements: >> - Without this fix the test provided by the bug submitter gives execution numbers: >> - no ClassLoad events enabled: 3251 ms >> - ClassLoad events are enabled: 40534 ms >> >> - With the fix: >> - no ClassLoad events enabled: 3270 ms >> - ClassLoad events are enabled: 3385 ms >> >> Testing: >> - Ran mach5 tiers 1-6, no regressions are noticed > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > review: remove newly added ThreadsListHandle from enter_interp_only_mode Changes requested by dcubed (Reviewer). src/hotspot/share/prims/jvmtiThreadState.cpp line 530: > 528: assert(!thread->is_in_tmp_VTMS_transition(), "sanity check"); > 529: > 530: // If interp_only_mode is enabled then we must eagerly create JvmtiThreadState typo: s/is enabled/has been enabled/ src/hotspot/share/prims/jvmtiThreadState.cpp line 536: > 534: JvmtiEventController::thread_started(thread); > 535: } > 536: if (JvmtiExport::should_post_vthread_start()) { Why is this check: `if (JvmtiExport::can_support_virtual_threads()) {` removed? src/hotspot/share/prims/jvmtiThreadState.cpp line 552: > 550: JvmtiExport::post_vthread_unmount(vthread); > 551: } > 552: if (JvmtiExport::can_support_virtual_threads()) { Why is this check: if (JvmtiExport::can_support_virtual_threads()) { removed? src/hotspot/share/prims/jvmtiThreadState.hpp line 234: > 232: inline void set_head_env_thread_state(JvmtiEnvThreadState* ets); > 233: > 234: static bool _seen_interp_only_mode; // interp_only_mode was requested once perhaps: s/requested once/requested at least once/ ------------- PR Review: https://git.openjdk.org/jdk/pull/16686#pullrequestreview-1758088444 PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1411066065 PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1411051100 PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1411051508 PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1411059699 From luhenry at openjdk.org Thu Nov 30 17:55:14 2023 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 30 Nov 2023 17:55:14 GMT Subject: RFR: 8315856: RISC-V: Use Zacas extension for cmpxchg Message-ID: 8315856: RISC-V: Use Zacas extension for cmpxchg ------------- Commit messages: - 8315856: RISC-V: Use Zacas extension for cmpxchg Changes: https://git.openjdk.org/jdk/pull/16910/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16910&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8315856 Stats: 189 lines in 5 files changed: 159 ins; 3 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/16910.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16910/head:pull/16910 PR: https://git.openjdk.org/jdk/pull/16910 From never at openjdk.org Thu Nov 30 18:24:11 2023 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 30 Nov 2023 18:24:11 GMT Subject: RFR: 8306767: Concurrent repacking of extra data in MethodData is potentially unsafe [v4] In-Reply-To: References: Message-ID: On Thu, 30 Nov 2023 16:16:54 GMT, Emanuel Peter wrote: >> I'm making sure that `allocate_bci_to_data` is only called when holding the `extra_data_lock`, so that no concurrent calls of it can ever occur. >> >> Testing: tier1-3 and stress. > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - manual merge with master after JDK-8267532 > - more locking, still fails tho - WIP > - adding more verification and more locking, WIP > - add locks for jvmci calls to allocate_bci_to_data > - 8306767 I agree with Roland that none of the accesses from C++ code are performance critical so always requiring a lock shouldn't matter for performance. It is somewhat intrusive though. The alternative is to make the API distinguish clearly between preallocated data and the extra data and adjust all internal usages to select the right one. For instance speculative_trap_data_tag usages are only in the extra_data section so access to those could have a distinct API. The only other thing in extra_data are bit_data_tag so any caller that's looking for a different kind of record is always safe since it must be from the preallocated section. A change like that might be clarifying in general as well but I think there's a question of effort vs benefit. Also to clarify, I never actually observed this problem in practice but inferred the possibility while addressing MDO concurrency issues with Graal. It would be very hard to notice and very transient but it could lead to crashes since SpeculativeTrapData contains a Method*. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16840#issuecomment-1834316537 From kbarrett at openjdk.org Thu Nov 30 18:25:25 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 30 Nov 2023 18:25:25 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v13] In-Reply-To: References: Message-ID: On Tue, 28 Nov 2023 14:27:47 GMT, Oli Gillespie wrote: >> Attempt to fix regressions in class-loading performance caused by fixing a symbol table leak in [JDK-8313678](https://bugs.openjdk.org/browse/JDK-8313678). >> >> See lengthy discussion in https://bugs.openjdk.org/browse/JDK-8315559 for more background. In short, the leak was providing an accidental cache for temporary symbols, allowing reuse. >> >> This change keeps new temporary symbols alive in a queue for a short time, allowing them to be re-used by subsequent operations. For example, when attempting to load a class we may call JVM_FindLoadedClass for multiple classloaders back-to-back, and all of them will create a TempNewSymbol for the same string. At present, each call will leave a dead symbol in the table and create a new one. Dead symbols add cleanup and lookup overhead, and insertion is also costly. With this change, the symbol from the first invocation will live for some time after it is used, and subsequent callers can find the symbol alive in the table - avoiding the extra work. >> >> The queue is bounded, and when full new entries displace the oldest entry. This means symbols are held for the time it takes for 100 new temp symbols to be created. 100 is chosen arbitrarily - the tradeoff is memory usage versus 'cache' hit rate. >> >> When concurrent symbol table cleanup runs, it also drains the queue. >> >> In my testing, this brings Dacapo pmd performance back to where it was before the leak was fixed. >> >> Thanks @shipilev , @coleenp and @MDBijman for helping with this fix. > > Oli Gillespie has updated the pull request incrementally with two additional commits since the last revision: > > - Don't drain queue periodically > - Avoid TempNewSymbol in placeholders test Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16398#pullrequestreview-1758177754 From coleenp at openjdk.org Thu Nov 30 18:29:08 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 30 Nov 2023 18:29:08 GMT Subject: RFR: 8320530: has_resolved_ref_index flag not restored after resetting entry [v4] In-Reply-To: References: <4Bi8mWx5pxfYciHnXHUla1X_BzUt_56q8MoRkCYc0dk=.50072528-a5d9-4708-995b-c0bd49f8c74b@github.com> Message-ID: On Thu, 30 Nov 2023 17:31:44 GMT, Matias Saavedra Silva wrote: >> ResolvedMethodEntry::reset_entry() clears the fields in the structure and then restores the constant pool index and the resolved references index if it has one. Currently, the resolved references index is restored without restoring the has_resolved_reference flag used by the structure. >> >> This patch restored the flag with the resolved references index. Verified with tier 1-5 tests. > > Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into resolved_ref_flag > - Coleen comments > - Corrections and gtest fix > - Added asserts to ensure correctness > - Merge branch 'master' into resolved_ref_flag > - Merge branch 'master' of https://github.com/openjdk/jdk into resolved_ref_flag > - 8320530: has_resolved_ref_index flag not restored after resetting entry I think this looks better. thank you! ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16769#pullrequestreview-1758188188 From shade at openjdk.org Thu Nov 30 18:33:17 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 30 Nov 2023 18:33:17 GMT Subject: RFR: 8315559: Delay TempSymbol cleanup to avoid symbol table churn [v13] In-Reply-To: References: Message-ID: On Tue, 28 Nov 2023 14:27:47 GMT, Oli Gillespie wrote: >> Attempt to fix regressions in class-loading performance caused by fixing a symbol table leak in [JDK-8313678](https://bugs.openjdk.org/browse/JDK-8313678). >> >> See lengthy discussion in https://bugs.openjdk.org/browse/JDK-8315559 for more background. In short, the leak was providing an accidental cache for temporary symbols, allowing reuse. >> >> This change keeps new temporary symbols alive in a queue for a short time, allowing them to be re-used by subsequent operations. For example, when attempting to load a class we may call JVM_FindLoadedClass for multiple classloaders back-to-back, and all of them will create a TempNewSymbol for the same string. At present, each call will leave a dead symbol in the table and create a new one. Dead symbols add cleanup and lookup overhead, and insertion is also costly. With this change, the symbol from the first invocation will live for some time after it is used, and subsequent callers can find the symbol alive in the table - avoiding the extra work. >> >> The queue is bounded, and when full new entries displace the oldest entry. This means symbols are held for the time it takes for 100 new temp symbols to be created. 100 is chosen arbitrarily - the tradeoff is memory usage versus 'cache' hit rate. >> >> When concurrent symbol table cleanup runs, it also drains the queue. >> >> In my testing, this brings Dacapo pmd performance back to where it was before the leak was fixed. >> >> Thanks @shipilev , @coleenp and @MDBijman for helping with this fix. > > Oli Gillespie has updated the pull request incrementally with two additional commits since the last revision: > > - Don't drain queue periodically > - Avoid TempNewSymbol in placeholders test I am good with this version. New file needs a copyright header. src/hotspot/share/oops/symbolHandle.cpp line 1: > 1: #include "precompiled.hpp" This misses copyright header. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16398#pullrequestreview-1758189316 PR Review Comment: https://git.openjdk.org/jdk/pull/16398#discussion_r1411106160 From vladimir.kozlov at oracle.com Thu Nov 30 18:48:21 2023 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 30 Nov 2023 10:48:21 -0800 Subject: RFR: 8267532: C2: Profile and prune untaken exception handlers [v10] In-Reply-To: <533C6C07-0E36-4232-B09B-399EBC52C3D2@jetbrains.com> References: <77x_EsDnKYUGRzIe6pvIk3t-7y5LjhSkxfR6xcVmH2s=.39b90785-1864-48bf-8da1-62246118e353@github.com> <533C6C07-0E36-4232-B09B-399EBC52C3D2@jetbrains.com> Message-ID: <98473bbe-433b-47e9-b836-04f2e96f8a15@oracle.com> I hit is too on my Mac, I filed following bug and assigned to Jorn. https://bugs.openjdk.org/browse/JDK-8321141 Note, I checked and all testing passed when 8267532 was reviewed. May be something to do with old Xcode I used to compile or something else. Thanks, Vladimir K On 11/30/23 4:47 AM, Vitaly Provodin wrote: > Hi all, > > With the latest changes I got the following error > > =======================8<---------------------- > ./src/hotspot/share/ci/ciMethodData.cpp:477:1: error: non-void function does not return a value in all control paths > [-Werror,-Wreturn-type] > } > ^ > 1 error generated. > make[3]: *** > [/opt/teamcity-agent/work/602288ed8ca22f30/build/macosx-aarch64-server-release/hotspot/variant-server/libjvm/objs/ciMethodData.o] Error 1 > make[3]: *** Waiting for unfinished jobs.... > make[2]: *** [hotspot-server-libs] Error 2 > make[2]: *** Waiting for unfinished jobs?. > =======================8 > Here is my build environment > > Configuration summary: > * Name: macosx-aarch64-server-release > * Debug level: release > * HS debug level: product > * JVM variants: server > * JVM features: server: 'cds compiler1 compiler2 dtrace epsilongc g1gc jfr jni-check jvmci jvmti management parallelgc > serialgc services shenandoahgc vm-structs zgc' > * OpenJDK target: OS: macosx, CPU architecture: aarch64, address length: 64 > * Version string: 22+9-b1917 (22) > * Source date: 1701334649 (2023-11-30T08:57:29Z) > > Tools summary: > * Boot JDK: openjdk version "22" 2024-03-19 OpenJDK Runtime Environment JBR-22+9-1795-nomod (build 22+9-b1795) OpenJDK > 64-Bit Server VM JBR-22+9-1795-nomod (build 22+9-b1795, mixed mode) > * Toolchain: clang (clang/LLVM from Xcode 12.2) > * Sysroot: /Applications/Xcode_12.2.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.0.sdk > * C Compiler: Version 12.0.0 (at /usr/bin/clang) > * C++ Compiler: Version 12.0.0 (at /usr/bin/clang++) > > Could you please clarify how to overcome this issue? > > Thanks, > Vitaly > > >> On 27. Nov 2023, at 17:04, Jorn Vernee wrote: >> >> On Thu, 23 Nov 2023 15:31:28 GMT, Jorn Vernee wrote: >> >>>> The issue is essentially that for the Java try-with-resource construct, javac generates multiple calls to `close(`) >>>> the resource. One of those calls is inside the hidden exception handler of the try block. The issue for us is that >>>> typically the exception handler is never entered (since no exception is thrown), however we don't profile exception >>>> handlers at the moment, so the block is not pruned. C2 doesn't inline the `close()` call in the handler due to low >>>> call site frequency. As a result, the receiver of that call escapes and can not be scalar replaced, which then leads >>>> to a loss in performance. >>>> >>>> There has been some discussion on the JBS issue that this could be fixed by profiling catch blocks. And another >>>> suggestion that partial escape analysis could help here to prevent the object from escaping. But, I think there are >>>> other benefits to being able to prune dead catch blocks, such as general reduction in code size, and other >>>> optimizations being possible by dead code being eliminated. So, I've implemented catch block profiling + pruning in >>>> this patch. >>>> >>>> The implementation is essentially very straightforward: we allocate an extra bit of profiling data for each >>>> exception handler of a method in the `MethodData` for that method (which holds all the profiling >>>> data). Then when looking up the exception handler after an exception is thrown, we mark the >>>> exception handler as entered. When C2 parses the exception handler block, and it sees that it has >>>> never been entered, we emit an uncommon trap instead. >>>> >>>> I've also cleaned up the handling of profiling data sections a bit. After adding the extra section of data to >>>> MethodData, I was seeing several crashes when ciMethodData was used. The underlying issue seemed to be that the >>>> offset of the parameter data was computed based on the total data size - parameter data size (which doesn't work if >>>> we add an additional section for exception handler data). I've re-written the code around this a bit to try and >>>> prevent issues in the future. Both MethodData and ciMethodData now track offsets of parameter data and exception >>>> handler data, and the size of the each data section is derived from the offsets. >>>> >>>> Finally, there was an assert firing in `freeze_internal` in `continuationFreezeThaw.cpp`: >>>> >>>> ???assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), >>>> ????????"Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int... >>> >>> Jorn Vernee has updated the pull request incrementally with two additional commits since the last revision: >>> >>> - add interpreter profiling specific test cases >>> - rename ex_handler -> exception_handler >> >> Another round of tier 1 - 8 testing came back clean. I'm planning to integrate the patch tomorrow. >> >> ------------- >> >> PR Comment: https://git.openjdk.org/jdk/pull/16416#issuecomment-1827516420 > From jorn.vernee at oracle.com Thu Nov 30 19:14:37 2023 From: jorn.vernee at oracle.com (Jorn Vernee) Date: Thu, 30 Nov 2023 20:14:37 +0100 Subject: RFR: 8267532: C2: Profile and prune untaken exception handlers [v10] In-Reply-To: <98473bbe-433b-47e9-b836-04f2e96f8a15@oracle.com> References: <77x_EsDnKYUGRzIe6pvIk3t-7y5LjhSkxfR6xcVmH2s=.39b90785-1864-48bf-8da1-62246118e353@github.com> <533C6C07-0E36-4232-B09B-399EBC52C3D2@jetbrains.com> <98473bbe-433b-47e9-b836-04f2e96f8a15@oracle.com> Message-ID: <7225449b-e9bf-4cc4-8c08-add235e7a143@oracle.com> Hello, It seems to be an issue with XCode 12.2 not supporting the [[noreturn]] attribute. Note that the build guide recommends at least XCode 14 [1], so you may want to upgrade XCode to see if that helps. Jorn [1]: https://github.com/openjdk/jdk/blob/master/doc/building.md#macos On 30/11/2023 19:48, Vladimir Kozlov wrote: > I hit is too on my Mac, I filed following bug and assigned to Jorn. > > https://bugs.openjdk.org/browse/JDK-8321141 > > Note, I checked and all testing passed when 8267532 was reviewed. > May be something to do with old Xcode I used to compile or something > else. > > Thanks, > Vladimir K > > On 11/30/23 4:47 AM, Vitaly Provodin wrote: >> Hi all, >> >> With the latest changes I got the following error >> >> =======================8<---------------------- >> ./src/hotspot/share/ci/ciMethodData.cpp:477:1: error: non-void >> function does not return a value in all control paths >> [-Werror,-Wreturn-type] >> } >> ^ >> 1 error generated. >> make[3]: *** >> [/opt/teamcity-agent/work/602288ed8ca22f30/build/macosx-aarch64-server-release/hotspot/variant-server/libjvm/objs/ciMethodData.o] >> Error 1 >> make[3]: *** Waiting for unfinished jobs.... >> make[2]: *** [hotspot-server-libs] Error 2 >> make[2]: *** Waiting for unfinished jobs?. >> =======================8> >> Here is my build environment >> >> Configuration summary: >> * Name: macosx-aarch64-server-release >> * Debug level: release >> * HS debug level: product >> * JVM variants: server >> * JVM features: server: 'cds compiler1 compiler2 dtrace epsilongc >> g1gc jfr jni-check jvmci jvmti management parallelgc serialgc >> services shenandoahgc vm-structs zgc' >> * OpenJDK target: OS: macosx, CPU architecture: aarch64, address >> length: 64 >> * Version string: 22+9-b1917 (22) >> * Source date: 1701334649 (2023-11-30T08:57:29Z) >> >> Tools summary: >> * Boot JDK: openjdk version "22" 2024-03-19 OpenJDK Runtime >> Environment JBR-22+9-1795-nomod (build 22+9-b1795) OpenJDK 64-Bit >> Server VM JBR-22+9-1795-nomod (build 22+9-b1795, mixed mode) >> * Toolchain: clang (clang/LLVM from Xcode 12.2) >> * Sysroot: >> /Applications/Xcode_12.2.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.0.sdk >> * C Compiler: Version 12.0.0 (at /usr/bin/clang) >> * C++ Compiler: Version 12.0.0 (at /usr/bin/clang++) >> >> Could you please clarify how to overcome this issue? >> >> Thanks, >> Vitaly >> >> >>> On 27. Nov 2023, at 17:04, Jorn Vernee wrote: >>> >>> On Thu, 23 Nov 2023 15:31:28 GMT, Jorn Vernee >>> wrote: >>> >>>>> The issue is essentially that for the Java try-with-resource >>>>> construct, javac generates multiple calls to `close(`) the >>>>> resource. One of those calls is inside the hidden exception >>>>> handler of the try block. The issue for us is that typically the >>>>> exception handler is never entered (since no exception is thrown), >>>>> however we don't profile exception handlers at the moment, so the >>>>> block is not pruned. C2 doesn't inline the `close()` call in the >>>>> handler due to low call site frequency. As a result, the receiver >>>>> of that call escapes and can not be scalar replaced, which then >>>>> leads to a loss in performance. >>>>> >>>>> There has been some discussion on the JBS issue that this could be >>>>> fixed by profiling catch blocks. And another suggestion that >>>>> partial escape analysis could help here to prevent the object from >>>>> escaping. But, I think there are other benefits to being able to >>>>> prune dead catch blocks, such as general reduction in code size, >>>>> and other optimizations being possible by dead code being >>>>> eliminated. So, I've implemented catch block profiling + pruning >>>>> in this patch. >>>>> >>>>> The implementation is essentially very straightforward: we >>>>> allocate an extra bit of profiling data for each >>>>> exception handler of a method in the `MethodData` for that method >>>>> (which holds all the profiling >>>>> data). Then when looking up the exception handler after an >>>>> exception is thrown, we mark the >>>>> exception handler as entered. When C2 parses the exception handler >>>>> block, and it sees that it has >>>>> never been entered, we emit an uncommon trap instead. >>>>> >>>>> I've also cleaned up the handling of profiling data sections a >>>>> bit. After adding the extra section of data to MethodData, I was >>>>> seeing several crashes when ciMethodData was used. The underlying >>>>> issue seemed to be that the offset of the parameter data was >>>>> computed based on the total data size - parameter data size (which >>>>> doesn't work if we add an additional section for exception handler >>>>> data). I've re-written the code around this a bit to try and >>>>> prevent issues in the future. Both MethodData and ciMethodData now >>>>> track offsets of parameter data and exception handler data, and >>>>> the size of the each data section is derived from the offsets. >>>>> >>>>> Finally, there was an assert firing in `freeze_internal` in >>>>> `continuationFreezeThaw.cpp`: >>>>> >>>>> ???assert(monitors_on_stack(current) == >>>>> ((current->held_monitor_count() - current->jni_monitor_count()) > 0), >>>>> ????????"Held monitor count and locks on stack invariant: " >>>>> INT64_FORMAT " JNI: " INT64_FORMAT, (int... >>>> >>>> Jorn Vernee has updated the pull request incrementally with two >>>> additional commits since the last revision: >>>> >>>> - add interpreter profiling specific test cases >>>> - rename ex_handler -> exception_handler >>> >>> Another round of tier 1 - 8 testing came back clean. I'm planning to >>> integrate the patch tomorrow. >>> >>> ------------- >>> >>> PR Comment: >>> https://git.openjdk.org/jdk/pull/16416#issuecomment-1827516420 >> From ihse at openjdk.org Thu Nov 30 20:16:14 2023 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 30 Nov 2023 20:16:14 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v5] In-Reply-To: References: <8SBUvWGDLtQmwYPRBDeUkeuq4pf2nJfKfDY5rzZODFU=.3f1cc0ff-02e6-4a4e-9425-5ffccc9cbc8f@github.com> Message-ID: On Thu, 30 Nov 2023 14:50:24 GMT, Andrew Haley wrote: >> Do this, but with the name vect_math.S. Don't use SLEEF headers in the build. I think you can do this with no build-time dependency on SLEEF at all if you load the library lazily at runtime. >> >> [vect_math.S.txt](https://github.com/openjdk/jdk/files/13512306/vect_math.S.txt) > >> [vect_math.S.txt](https://github.com/openjdk/jdk/files/13512306/vect_math.S.txt) > > I guess this will live only in os_linux and os_bsd because the Windows compiler won't like it AFAIK. @theRealAph So your suggestion is that this assembly files lives in hotspot, instead of jdk.incubator.vector? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1834474434 From ihse at openjdk.org Thu Nov 30 20:16:16 2023 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 30 Nov 2023 20:16:16 GMT Subject: RFR: 8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF [v5] In-Reply-To: References: Message-ID: On Thu, 30 Nov 2023 06:39:43 GMT, Xiaohong Gong wrote: >> Currently the vector floating-point math APIs like `VectorOperators.SIN/COS/TAN...` are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]). >> >> SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead. >> >> To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. `-XX:UseSleefLib`) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly. >> >> Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts. >> >> [1] https://github.com/openjdk/jdk/pull/3638 >> [2] https://sleef.org/ >> [3] https://packages.fedoraproject.org/pkgs/sleef/sleef/ >> [4] https://packages.debian.org/bookworm/libsleef3 >> [5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Rename vmath to sleef in configure Not having a build time dependency on libsleef means you cannot really verify that the functions you want to call are correct, but maybe you feel secure that they will never change? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16234#issuecomment-1834476264 From dlong at openjdk.org Thu Nov 30 20:22:08 2023 From: dlong at openjdk.org (Dean Long) Date: Thu, 30 Nov 2023 20:22:08 GMT Subject: RFR: 8320275: assert(_chunk->bitmap().at(index)) failed: Bit not set at index [v3] In-Reply-To: <8uEcqzVz-sUB1NACfJnQ2c1s3Vxjf0d-V5Upwgi703o=.c9b8c675-f3e5-4a2d-94a0-64e9d5613bf4@github.com> References: <8uEcqzVz-sUB1NACfJnQ2c1s3Vxjf0d-V5Upwgi703o=.c9b8c675-f3e5-4a2d-94a0-64e9d5613bf4@github.com> Message-ID: On Thu, 30 Nov 2023 15:26:22 GMT, Patricio Chilano Mateo wrote: >> Please review the following fix. The assert fails while verifying the top frame of the stackChunk before returning from a thaw call. The stackChunk is in gc mode but we found a narrow oop for this c2 compiled frame that doesn't have its corresponding bit set. This is because while thawing its callee we cleared the bitmap range associated with the argument area, but this narrow oop happens to land at the very last stack slot of that region. >> Loom code assumes the size of the argument area is always a multiple of 2 stack slots, as SharedRuntime::java_calling_convention() shows. But c2 doesn't seem to follow this convention and, knowing the last passed argument only takes one stack slot, it's using the remaining space to store a narrow oop for the caller. There are more details about the specific crash in JBS. >> >> The initial proposed fix is to just restrict the range of the bitmap we clear by excluding the last stack slot of the argument area, since passed oops are always word aligned. I've also experimented with a patch where I changed SharedRuntime::java_calling_convention() and Fingerprinter::do_type_calling_convention() to not round up the number of stack slots used, and then changed the callers to use a round up value or not depending on the needs [1]. I wasn't convinced it was worthy given we only care about this difference in this Loom code, but I don't mind going with that fix instead. The 3rd alternative would be to just change c2 to not use this stack slot and start spilling at a word aligned offset from the sp. >> >> I run the patch with the failing test and verified the crash doesn't reproduce anymore. I've also run this patch through loom tiers1-5. >> >> Thanks, >> Patricio >> >> [1] https://github.com/pchilano/jdk/commit/42ae9269b28beb6f36c502182116545b680e418f > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > add comment in clear_bitmap_bits() OK, this looks good to me. Please get a 2nd review. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16837#pullrequestreview-1758378427 From dlong at openjdk.org Thu Nov 30 20:30:09 2023 From: dlong at openjdk.org (Dean Long) Date: Thu, 30 Nov 2023 20:30:09 GMT Subject: RFR: 8320275: assert(_chunk->bitmap().at(index)) failed: Bit not set at index [v3] In-Reply-To: <8uEcqzVz-sUB1NACfJnQ2c1s3Vxjf0d-V5Upwgi703o=.c9b8c675-f3e5-4a2d-94a0-64e9d5613bf4@github.com> References: <8uEcqzVz-sUB1NACfJnQ2c1s3Vxjf0d-V5Upwgi703o=.c9b8c675-f3e5-4a2d-94a0-64e9d5613bf4@github.com> Message-ID: On Thu, 30 Nov 2023 15:26:22 GMT, Patricio Chilano Mateo wrote: >> Please review the following fix. The assert fails while verifying the top frame of the stackChunk before returning from a thaw call. The stackChunk is in gc mode but we found a narrow oop for this c2 compiled frame that doesn't have its corresponding bit set. This is because while thawing its callee we cleared the bitmap range associated with the argument area, but this narrow oop happens to land at the very last stack slot of that region. >> Loom code assumes the size of the argument area is always a multiple of 2 stack slots, as SharedRuntime::java_calling_convention() shows. But c2 doesn't seem to follow this convention and, knowing the last passed argument only takes one stack slot, it's using the remaining space to store a narrow oop for the caller. There are more details about the specific crash in JBS. >> >> The initial proposed fix is to just restrict the range of the bitmap we clear by excluding the last stack slot of the argument area, since passed oops are always word aligned. I've also experimented with a patch where I changed SharedRuntime::java_calling_convention() and Fingerprinter::do_type_calling_convention() to not round up the number of stack slots used, and then changed the callers to use a round up value or not depending on the needs [1]. I wasn't convinced it was worthy given we only care about this difference in this Loom code, but I don't mind going with that fix instead. The 3rd alternative would be to just change c2 to not use this stack slot and start spilling at a word aligned offset from the sp. >> >> I run the patch with the failing test and verified the crash doesn't reproduce anymore. I've also run this patch through loom tiers1-5. >> >> Thanks, >> Patricio >> >> [1] https://github.com/pchilano/jdk/commit/42ae9269b28beb6f36c502182116545b680e418f > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > add comment in clear_bitmap_bits() I would be tempted to put the round up in `compiled_frame_stack_argsize`, but it's not a big deal. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16837#issuecomment-1834508273 From dlong at openjdk.org Thu Nov 30 20:38:13 2023 From: dlong at openjdk.org (Dean Long) Date: Thu, 30 Nov 2023 20:38:13 GMT Subject: RFR: 8320275: assert(_chunk->bitmap().at(index)) failed: Bit not set at index [v3] In-Reply-To: <8uEcqzVz-sUB1NACfJnQ2c1s3Vxjf0d-V5Upwgi703o=.c9b8c675-f3e5-4a2d-94a0-64e9d5613bf4@github.com> References: <8uEcqzVz-sUB1NACfJnQ2c1s3Vxjf0d-V5Upwgi703o=.c9b8c675-f3e5-4a2d-94a0-64e9d5613bf4@github.com> Message-ID: On Thu, 30 Nov 2023 15:26:22 GMT, Patricio Chilano Mateo wrote: >> Please review the following fix. The assert fails while verifying the top frame of the stackChunk before returning from a thaw call. The stackChunk is in gc mode but we found a narrow oop for this c2 compiled frame that doesn't have its corresponding bit set. This is because while thawing its callee we cleared the bitmap range associated with the argument area, but this narrow oop happens to land at the very last stack slot of that region. >> Loom code assumes the size of the argument area is always a multiple of 2 stack slots, as SharedRuntime::java_calling_convention() shows. But c2 doesn't seem to follow this convention and, knowing the last passed argument only takes one stack slot, it's using the remaining space to store a narrow oop for the caller. There are more details about the specific crash in JBS. >> >> The initial proposed fix is to just restrict the range of the bitmap we clear by excluding the last stack slot of the argument area, since passed oops are always word aligned. I've also experimented with a patch where I changed SharedRuntime::java_calling_convention() and Fingerprinter::do_type_calling_convention() to not round up the number of stack slots used, and then changed the callers to use a round up value or not depending on the needs [1]. I wasn't convinced it was worthy given we only care about this difference in this Loom code, but I don't mind going with that fix instead. The 3rd alternative would be to just change c2 to not use this stack slot and start spilling at a word aligned offset from the sp. >> >> I run the patch with the failing test and verified the crash doesn't reproduce anymore. I've also run this patch through loom tiers1-5. >> >> Thanks, >> Patricio >> >> [1] https://github.com/pchilano/jdk/commit/42ae9269b28beb6f36c502182116545b680e418f > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > add comment in clear_bitmap_bits() src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2313: > 2311: } else if (_cont.tail()->has_bitmap() && added_argsize > 0) { > 2312: address start = (address)(heap_frame_top + ContinuationHelper::CompiledFrame::size(hf) + frame::metadata_words_at_top); > 2313: int stack_args_slots = f.cb()->as_compiled_method()->method()->num_stack_arg_slots(false /* rounded */); It would be nice if we could trust the `added_argsize` value here, but that would require more changes to where rounding is done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16837#discussion_r1411231506 From sspitsyn at openjdk.org Thu Nov 30 20:51:05 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 Nov 2023 20:51:05 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 [v4] In-Reply-To: References: Message-ID: On Thu, 30 Nov 2023 17:33:28 GMT, Daniel D. Daugherty wrote: > Removing the TLH is the right thing to do. If we get that failure mode again, then we can file a new bug and hopefully have an hs_err_pid file with it. Agreed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16686#issuecomment-1834539426 From sspitsyn at openjdk.org Thu Nov 30 21:01:11 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 Nov 2023 21:01:11 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 [v4] In-Reply-To: References: Message-ID: <1O1StrzGC1kcsPY3sXxNKbTAWjkECET0d9cnJYMaiuw=.bf3da9a0-1042-46df-a406-122675bbd0c7@github.com> On Thu, 30 Nov 2023 17:33:28 GMT, Daniel D. Daugherty wrote: > I don't think we should change the guarantee() in Handshake::execute(). When the > three parameter version of execute() is called with tlh == nullptr, the caller is > saying that there is supposed to be a ThreadsListHandle in the calling context. Yes, > if the target thread is the calling thread, then a ThreadsListHandle is not needed, > but that's why we have this code to prevent the call to Handshake::execute(): > if (target->is_handshake_safe_for(current)) { > hs.do_thread(target); > > In other words, I think Handshake::execute() is working the way it is supposed to > when tlh == nullptr is passed. Just to share my view... It is a little bit ugly to do it for each call site. The `Handshake::execute()` can do it instead, so its call sites could be simplified. BTW, it is done in the `JvmtiHandshake::execute()` and one can find it to be convenient. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16686#issuecomment-1834550955 From sspitsyn at openjdk.org Thu Nov 30 21:21:17 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 Nov 2023 21:21:17 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 [v4] In-Reply-To: References: Message-ID: On Thu, 30 Nov 2023 17:35:57 GMT, Daniel D. Daugherty wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: remove newly added ThreadsListHandle from enter_interp_only_mode > > src/hotspot/share/prims/jvmtiThreadState.cpp line 536: > >> 534: JvmtiEventController::thread_started(thread); >> 535: } >> 536: if (JvmtiExport::should_post_vthread_start()) { > > Why is this check: `if (JvmtiExport::can_support_virtual_threads()) {` removed? It is because the `JvmtiExport::should_post_vthread_start()` returns `true` only if the `JvmtiExport::can_support_virtual_threads()` returns `true`. The JVMTI `SetEventNotificationMode()` function checks for a required event capability by using the `JvmtiUtil::has_event_capability()` function that is defined in the `jvmtiEnter.cpp` generated with the `jvmtiEnter.xsl` script. It looks like this: // Check Event Capabilities bool JvmtiUtil::has_event_capability(jvmtiEvent event_type, const jvmtiCapabilities* capabilities_ptr) { switch (event_type) { . . . case JVMTI_EVENT_VIRTUAL_THREAD_START: return capabilities_ptr->can_support_virtual_threads != 0; case JVMTI_EVENT_VIRTUAL_THREAD_END: return capabilities_ptr->can_support_virtual_threads != 0; . . . > src/hotspot/share/prims/jvmtiThreadState.cpp line 552: > >> 550: JvmtiExport::post_vthread_unmount(vthread); >> 551: } >> 552: if (JvmtiExport::can_support_virtual_threads()) { > > Why is this check: if (JvmtiExport::can_support_virtual_threads()) { removed? The same reason as above. Thank you for checking this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1411269738 PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1411271722 From jjoo at openjdk.org Thu Nov 30 21:21:48 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Thu, 30 Nov 2023 21:21:48 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v54] In-Reply-To: References: Message-ID: > 8315149: Add hsperf counters for CPU time of internal GC threads Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: Return after ShouldNotReachHere Co-authored-by: Stefan Johansson <54407259+kstefanj at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15082/files - new: https://git.openjdk.org/jdk/pull/15082/files/7e4cdcd3..fcf00cfe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=53 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15082&range=52-53 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/15082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15082/head:pull/15082 PR: https://git.openjdk.org/jdk/pull/15082 From jjoo at openjdk.org Thu Nov 30 21:21:49 2023 From: jjoo at openjdk.org (Jonathan Joo) Date: Thu, 30 Nov 2023 21:21:49 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48] In-Reply-To: References: <_lEBVrWV8wrVbmhOiu3AAqPJo_xBs718ZtA9V-VSzGM=.253c0ec8-256e-4dee-b125-90be6338e4b8@github.com> Message-ID: On Thu, 30 Nov 2023 09:30:14 GMT, Stefan Johansson wrote: >> Both `publish_gc_total_cpu_time` and `~ThreadTotalCPUTimeClosure` are called by the vm-thread inside a safepoint, so there shouldn't be any other threads running simultaneously, I believe. > > Me and Albert just spoke and we do see the problem that two concurrent threads could be executing the closure at the same time. So if having a total counter we need to sync the updates. But when talking we started to question how useful it is to have the gc_total counter. It is just an aggregate of the other gc-counters, but it is out of sync between safepoints. So you will always get a more accurate value by looking at the individual gc-counters. > > We came to the conclusion that it would probably be easier to drop `gc_total` right now, rather than trying to keep it in sync for all updates to the individual counters. Because having them out of sync doesn't feel like a great option. > > Are we missing anything or do you agree? @simonis was the original suggester of this counter, so I will defer to his expertise. I do agree that dropping the counter would simplify things, but it also might not hurt to just leave it in. I'm okay with either option! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1411270546 From matsaave at openjdk.org Thu Nov 30 21:23:24 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 30 Nov 2023 21:23:24 GMT Subject: RFR: 8320530: has_resolved_ref_index flag not restored after resetting entry [v4] In-Reply-To: References: <4Bi8mWx5pxfYciHnXHUla1X_BzUt_56q8MoRkCYc0dk=.50072528-a5d9-4708-995b-c0bd49f8c74b@github.com> Message-ID: On Mon, 27 Nov 2023 09:03:01 GMT, Andrew Dinn wrote: >>> Change seems fine but what was the effect of not restoring the flag? Does this cause failures or just unnecessary re-resolution, or? >>> >>> Thanks >> >> The code works somehow, but in an unsafe manner. We are reading from `resolved_references_index()` even when the `has_resolved_ref_shift` bit has been (improperly) cleared. Adding the following assert makes it impossible to start jtreg: >> >> >> diff --git a/src/hotspot/share/oops/cpCache.cpp b/src/hotspot/share/oops/cpCache.cpp >> index daa094baa7e..760c5268c88 100644 >> --- a/src/hotspot/share/oops/cpCache.cpp >> +++ b/src/hotspot/share/oops/cpCache.cpp >> @@ -310,6 +310,9 @@ ResolvedMethodEntry* ConstantPoolCache::set_method_handle(int method_index, cons >> // Store appendix, if any. >> if (has_appendix) { >> + assert(method_entry->has_resolved_ref_index(), "huh"); >> const int appendix_index = method_entry->resolved_references_index(); >> objArrayOop resolved_references = constant_pool()->resolved_references(); >> >> >> I think we should take this as a chance to tighten up the code in resolvedMethodEntry.hpp: >> >> - `resolved_references_index()` should assert that `has_resolved_ref_index()`. >> - The following three functions should assert for mutual exclusivity. I.e., you can't call set_klass() and then call set_resolved_references_index(). Probably the easiest is to add two more bits: `_has_klass_shift` and `_has_table_index_shift`. At entry of these three setters, we should assert that all three klass/table_index/resolved_references_index bits are cleared. >> >> >> void set_klass(InstanceKlass* klass) { >> _entry_specific._interface_klass = klass; >> } >> void set_resolved_references_index(u2 ref_index) { >> set_flags(1 << has_resolved_ref_shift); >> _entry_specific._resolved_references_index = ref_index; >> } >> void set_table_index(u2 table_index) { >> _entry_specific._table_index = table_index; >> } >> >> >> Also, `has_resolved_ref_index` should be renamed to `has_resolved_reference_index`; otherwise it's difficult to search for all code related to `resolved_refenece_index`. > >> . . . Probably the easiest is to add two more bits: _has_klass_shift and _has_table_index_shift. . . . > > Maybe so, but only in debug builds? Thank you for the reviews @adinn, @coleenp, and @dholmes-ora! The remaining GHA failure is not related to my change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16769#issuecomment-1834575171 From matsaave at openjdk.org Thu Nov 30 21:23:27 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 30 Nov 2023 21:23:27 GMT Subject: RFR: 8320530: has_resolved_ref_index flag not restored after resetting entry [v3] In-Reply-To: References: <4Bi8mWx5pxfYciHnXHUla1X_BzUt_56q8MoRkCYc0dk=.50072528-a5d9-4708-995b-c0bd49f8c74b@github.com> Message-ID: On Thu, 30 Nov 2023 06:21:31 GMT, David Holmes wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Corrections and gtest fix > > src/hotspot/share/oops/resolvedMethodEntry.hpp line 99: > >> 97: _has_interface_klass = false; >> 98: _has_table_index = false; >> 99: #endif > > Indent appears to be 1 instead of 2 I believe the indentation is correct. I have implemented Coleen's suggestion which should make the spacing clearer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16769#discussion_r1411270875 From matsaave at openjdk.org Thu Nov 30 21:23:29 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 30 Nov 2023 21:23:29 GMT Subject: Integrated: 8320530: has_resolved_ref_index flag not restored after resetting entry In-Reply-To: <4Bi8mWx5pxfYciHnXHUla1X_BzUt_56q8MoRkCYc0dk=.50072528-a5d9-4708-995b-c0bd49f8c74b@github.com> References: <4Bi8mWx5pxfYciHnXHUla1X_BzUt_56q8MoRkCYc0dk=.50072528-a5d9-4708-995b-c0bd49f8c74b@github.com> Message-ID: On Tue, 21 Nov 2023 16:38:14 GMT, Matias Saavedra Silva wrote: > ResolvedMethodEntry::reset_entry() clears the fields in the structure and then restores the constant pool index and the resolved references index if it has one. Currently, the resolved references index is restored without restoring the has_resolved_reference flag used by the structure. > > This patch restored the flag with the resolved references index. Verified with tier 1-5 tests. This pull request has now been integrated. Changeset: c4732c2b Author: Matias Saavedra Silva URL: https://git.openjdk.org/jdk/commit/c4732c2baa4d6fd1775f81a90e74675c39811495 Stats: 47 lines in 2 files changed: 36 ins; 8 del; 3 mod 8320530: has_resolved_ref_index flag not restored after resetting entry Reviewed-by: adinn, dholmes, iklam, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/16769 From sspitsyn at openjdk.org Thu Nov 30 21:27:27 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 Nov 2023 21:27:27 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 [v5] In-Reply-To: References: Message-ID: > This is a fix of a performance/scalability related issue. The `JvmtiThreadState` objects for virtual thread filtered events enabled globally are created eagerly because it is needed when the `interp_only_mode` is enabled. Otherwise, some events which are generated in `interp_only_mode` from the debugging version of interpreter chunks can be missed. > However, it has to be okay to avoid eager creation of these object if no `interp_only_mode` has ever been requested. > It seems to be an extremely important optimization to create JvmtiThreadState objects lazily in such cases. > It is done by introducing the flag `JvmtiThreadState::_seen_interp_only_mode` which indicates when the `JvmtiThreadState` objects have to be created eagerly. > > Additionally, the fix includes the following related changes: > - Use condition double checking idiom for `MutexLocker mu(JvmtiThreadState_lock)` in the function `JvmtiVTMSTransitionDisabler::VTMS_mount_end` which is on a performance-critical path and looks like this: > > JvmtiThreadState* state = thread->jvmti_thread_state(); > if (state != nullptr && state->is_pending_interp_only_mode()) { > MutexLocker mu(JvmtiThreadState_lock); > state = thread->jvmti_thread_state(); > if (state != nullptr && state->is_pending_interp_only_mode()) { > JvmtiEventController::enter_interp_only_mode(); > } > } > > > - Add extra check of `JvmtiExport::can_support_virtual_threads()` when virtual thread mount and unmount are posted. > - Minor: Added a `ThreadsListHandle` to the `JvmtiEventControllerPrivate::enter_interp_only_mode`. It is needed because of the dynamic creation of compensating carrier threads which is racy for JVMTI `SetEventNotificationMode` implementation. > > Performance mesurements: > - Without this fix the test provided by the bug submitter gives execution numbers: > - no ClassLoad events enabled: 3251 ms > - ClassLoad events are enabled: 40534 ms > > - With the fix: > - no ClassLoad events enabled: 3270 ms > - ClassLoad events are enabled: 3385 ms > > Testing: > - Ran mach5 tiers 1-6, no regressions are noticed Serguei Spitsyn has updated the pull request incrementally with two additional commits since the last revision: - review: one more minor correction of a comment - review: minor correction of a comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16686/files - new: https://git.openjdk.org/jdk/pull/16686/files/e3d30c86..d6a8245a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16686&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16686&range=03-04 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16686.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16686/head:pull/16686 PR: https://git.openjdk.org/jdk/pull/16686 From sspitsyn at openjdk.org Thu Nov 30 21:27:30 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 Nov 2023 21:27:30 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 [v4] In-Reply-To: References: Message-ID: On Thu, 30 Nov 2023 17:48:48 GMT, Daniel D. Daugherty wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> review: remove newly added ThreadsListHandle from enter_interp_only_mode > > src/hotspot/share/prims/jvmtiThreadState.cpp line 530: > >> 528: assert(!thread->is_in_tmp_VTMS_transition(), "sanity check"); >> 529: >> 530: // If interp_only_mode is enabled then we must eagerly create JvmtiThreadState > > typo: s/is enabled/has been enabled/ Thanks - fixed now. > src/hotspot/share/prims/jvmtiThreadState.hpp line 234: > >> 232: inline void set_head_env_thread_state(JvmtiEnvThreadState* ets); >> 233: >> 234: static bool _seen_interp_only_mode; // interp_only_mode was requested once > > perhaps: s/requested once/requested at least once/ Thank you - fixed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1411277467 PR Review Comment: https://git.openjdk.org/jdk/pull/16686#discussion_r1411273636 From manc at openjdk.org Thu Nov 30 21:47:21 2023 From: manc at openjdk.org (Man Cao) Date: Thu, 30 Nov 2023 21:47:21 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v48] In-Reply-To: References: <_lEBVrWV8wrVbmhOiu3AAqPJo_xBs718ZtA9V-VSzGM=.253c0ec8-256e-4dee-b125-90be6338e4b8@github.com> Message-ID: On Thu, 30 Nov 2023 21:17:14 GMT, Jonathan Joo wrote: >> Me and Albert just spoke and we do see the problem that two concurrent threads could be executing the closure at the same time. So if having a total counter we need to sync the updates. But when talking we started to question how useful it is to have the gc_total counter. It is just an aggregate of the other gc-counters, but it is out of sync between safepoints. So you will always get a more accurate value by looking at the individual gc-counters. >> >> We came to the conclusion that it would probably be easier to drop `gc_total` right now, rather than trying to keep it in sync for all updates to the individual counters. Because having them out of sync doesn't feel like a great option. >> >> Are we missing anything or do you agree? > > @simonis was the original suggester of this counter, so I will defer to his expertise. I do agree that dropping the counter would simplify things, but it also might not hurt to just leave it in. I'm okay with either option! Right, see @simonis 's comments at https://github.com/openjdk/jdk/pull/15082#pullrequestreview-1613868256, https://github.com/openjdk/jdk/pull/15082#discussion_r1321703912. I initially had similar thought that `gc_total` isn't necessary and provides redundant data. Now I agree with @simonis that the `gc_total` is valuable to users. It saves users from extra work of aggregating different sets of counters for different garbage collectors, and potential mistakes of missing some counters. It is also future-proof that if GC implementation changes that add additional threads, users wouldn't need to change their code to add the counter for additional threads. I think the maintenance overhead is quite small for `gc_total` since it is mostly in this class. The benefit to users is worth it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1411293819 From manc at openjdk.org Thu Nov 30 22:03:28 2023 From: manc at openjdk.org (Man Cao) Date: Thu, 30 Nov 2023 22:03:28 GMT Subject: RFR: 8315149: Add hsperf counters for CPU time of internal GC threads [v53] In-Reply-To: <-GX8bATX2hz3YWgnJbhTNEYbi4t8HxfdhYqBP-ulyGg=.0080d7b0-8e43-4b81-b885-1d4a742048cc@github.com> References: <-GX8bATX2hz3YWgnJbhTNEYbi4t8HxfdhYqBP-ulyGg=.0080d7b0-8e43-4b81-b885-1d4a742048cc@github.com> Message-ID: On Thu, 30 Nov 2023 09:41:55 GMT, Stefan Johansson wrote: >> Jonathan Joo has updated the pull request incrementally with one additional commit since the last revision: >> >> Add missing include > > src/hotspot/share/runtime/cpuTimeCounters.hpp line 59: > >> 57: NONCOPYABLE(CPUTimeCounters); >> 58: >> 59: static CPUTimeCounters* _instance; > > I would prefer if we made the whole class static and got rid of the instance and just made the `_cpu_time_counters` array static. The only drawback I/we (discussed this with Albert as well) can see is that the memory for the array would not be accounted in NMT, but this array will always be very small so should not be a big problem. > > Do you see any other concerns? I thought it is typically preferred to initialize a singleton object on the heap, rather than using several static variables. It is easier to control the initialization order and timing of an on-heap singleton object than statics. Moreover, for this class, `initialize()` could also check `if (UsePerfData)`, and only create the singleton object under `UsePerfData`. This could save some memory when `UsePerfData` is false. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15082#discussion_r1411317828 From dcubed at openjdk.org Thu Nov 30 22:05:40 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 30 Nov 2023 22:05:40 GMT Subject: RFR: 8319773: Avoid inflating monitors when installing hash codes for LM_LIGHTWEIGHT [v10] In-Reply-To: References: <2MRTHFoYSaSW2NH922LOEvqKx4NLjshWaHJaYV2RdVY=.e234046a-aac8-4d7b-81b9-269506944165@github.com> Message-ID: On Thu, 30 Nov 2023 08:06:28 GMT, Axel Boldt-Christmas wrote: >> So what did you decide to do here? > > For now I believe the extra code noise from trying to handle this race with deflation is not worth it. I creates some questionable code paths and head scratchers. If we were to add a separate FastHashCode just for LM_LIGHTWEIGHT it would be worth it as the while loop body would look quite a bit different and be easier to reason about. > > But I was looking for input if we should handle this case regardless of code complexity. Or maybe taking this all the way and create a separate FastHashCode with its own more understandable logic which does not have to try to fit in with the legacy locking/inflation protocol. > > Regardless if we were to just go with it as it is now there should probably be a comment here along the line: > ```c++ > // With LM_LIGHTWEIGHT FastHashCode may race with deflation here and cause a monitor to be re-inflated. I don't think the race with deflation is limited to LM_LIGHTWEIGHT. The inflation code below detects when there is a collision with async deflation and retries which can lead to a re-inflation when we loop around again. We can reach the code below with LM_LEGACY, LM_LIGHTWEIGHT, or LM_MONITOR so I don't think you need the LM_LIGHTWEIGHT specific comment. Yes, we can reach this point in the code when `mark.has_monitor() == true` and not just when `LockingMode == LM_LIGHTWEIGHT`, but the `inflate()` function already has to handle that race (and it does). When a Java monitor is lightweight locked or stack-locked, there can be more than one contending thread and each of those threads will attempt to `inflate()` the Java monitor into an ObjectMonitor. Only one thread can win the inflation race and all of the racers trust `inflate()` to do the right thing. What's the "right thing"? One of the callers to `inflate()` will install the ObjectMonitor successfully and return it to that caller. All of the other callers to `inflate()` will detect that they lost the race and return the winner's ObjectMonitor to their callers. There's no reason for the logic to skip the call to `inflate()` because races are already handled by `inflate()`. We got into this spiraling thread because we were trying to figure out if a non-JavaThread could call `inflate()` because `inflate()` can call `is_lock_owned()` which has a header comment which talks about non-JavaThreads... I believe that is possible with JVM/TI tagging even when we are in LM_LIGHTWEIGHT mode because a lightweight monitor can be inflated by a contending thread which can cause the ObjectMonitor to have an anonymous owner. In that case, this if-statement in `inflate()` can execute: if (LockingMode == LM_LIGHTWEIGHT && inf->is_owner_anonymous() && is_lock_owned(current, object)) { inf->set_owner_from_anonymous(current); JavaThread::cast(current)->lock_stack().remove(object); } Of course, if our caller is the VMThread, `is_lock_owned()` will return false so we won't execute the if-statement's code block. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16603#discussion_r1411319912 From dcubed at openjdk.org Thu Nov 30 22:45:07 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 30 Nov 2023 22:45:07 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 [v4] In-Reply-To: <1O1StrzGC1kcsPY3sXxNKbTAWjkECET0d9cnJYMaiuw=.bf3da9a0-1042-46df-a406-122675bbd0c7@github.com> References: <1O1StrzGC1kcsPY3sXxNKbTAWjkECET0d9cnJYMaiuw=.bf3da9a0-1042-46df-a406-122675bbd0c7@github.com> Message-ID: <7B13I8qHFWXuGbbXw17ha9lM5InVs1caN5wON6PdGG0=.528715e5-50a8-4c3d-aed5-c5a69c3cd911@github.com> On Thu, 30 Nov 2023 20:58:41 GMT, Serguei Spitsyn wrote: > It is a little bit ugly to do it for each call site. > The Handshake::execute() can do it instead, so its call sites could be simplified. > BTW, it is done in the JvmtiHandshake::execute() and one can find it to be convenient. Agreed, but it is intentional that direct uses of the three parameter version of `Handshake::execute()` with `tlh == nullptr` require that the caller understands the calling context. That means knowing where your ThreadsListHandle is and not getting away with not having one when the caller is current thread. Higher layer callers of `Handshake::execute()` like `JvmtiHandshake::execute()` or `JvmtiEventControllerPrivate::enter_interp_only_mode()` can make that less strict choice if they choose to. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16686#issuecomment-1834675329 From dcubed at openjdk.org Thu Nov 30 22:50:10 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 30 Nov 2023 22:50:10 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 [v5] In-Reply-To: References: Message-ID: On Thu, 30 Nov 2023 21:27:27 GMT, Serguei Spitsyn wrote: >> This is a fix of a performance/scalability related issue. The `JvmtiThreadState` objects for virtual thread filtered events enabled globally are created eagerly because it is needed when the `interp_only_mode` is enabled. Otherwise, some events which are generated in `interp_only_mode` from the debugging version of interpreter chunks can be missed. >> However, it has to be okay to avoid eager creation of these object if no `interp_only_mode` has ever been requested. >> It seems to be an extremely important optimization to create JvmtiThreadState objects lazily in such cases. >> It is done by introducing the flag `JvmtiThreadState::_seen_interp_only_mode` which indicates when the `JvmtiThreadState` objects have to be created eagerly. >> >> Additionally, the fix includes the following related changes: >> - Use condition double checking idiom for `MutexLocker mu(JvmtiThreadState_lock)` in the function `JvmtiVTMSTransitionDisabler::VTMS_mount_end` which is on a performance-critical path and looks like this: >> >> JvmtiThreadState* state = thread->jvmti_thread_state(); >> if (state != nullptr && state->is_pending_interp_only_mode()) { >> MutexLocker mu(JvmtiThreadState_lock); >> state = thread->jvmti_thread_state(); >> if (state != nullptr && state->is_pending_interp_only_mode()) { >> JvmtiEventController::enter_interp_only_mode(); >> } >> } >> >> >> - Add extra check of `JvmtiExport::can_support_virtual_threads()` when virtual thread mount and unmount are posted. >> - Minor: Added a `ThreadsListHandle` to the `JvmtiEventControllerPrivate::enter_interp_only_mode`. It is needed because of the dynamic creation of compensating carrier threads which is racy for JVMTI `SetEventNotificationMode` implementation. >> >> Performance mesurements: >> - Without this fix the test provided by the bug submitter gives execution numbers: >> - no ClassLoad events enabled: 3251 ms >> - ClassLoad events are enabled: 40534 ms >> >> - With the fix: >> - no ClassLoad events enabled: 3270 ms >> - ClassLoad events are enabled: 3385 ms >> >> Testing: >> - Ran mach5 tiers 1-6, no regressions are noticed > > Serguei Spitsyn has updated the pull request incrementally with two additional commits since the last revision: > > - review: one more minor correction of a comment > - review: minor correction of a comment Thanks for fixing the comments. ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16686#pullrequestreview-1758619790 From sspitsyn at openjdk.org Thu Nov 30 22:57:09 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 30 Nov 2023 22:57:09 GMT Subject: RFR: 8308614: Enabling JVMTI ClassLoad event slows down vthread creation by factor 10 [v5] In-Reply-To: References: Message-ID: On Thu, 30 Nov 2023 21:27:27 GMT, Serguei Spitsyn wrote: >> This is a fix of a performance/scalability related issue. The `JvmtiThreadState` objects for virtual thread filtered events enabled globally are created eagerly because it is needed when the `interp_only_mode` is enabled. Otherwise, some events which are generated in `interp_only_mode` from the debugging version of interpreter chunks can be missed. >> However, it has to be okay to avoid eager creation of these object if no `interp_only_mode` has ever been requested. >> It seems to be an extremely important optimization to create JvmtiThreadState objects lazily in such cases. >> It is done by introducing the flag `JvmtiThreadState::_seen_interp_only_mode` which indicates when the `JvmtiThreadState` objects have to be created eagerly. >> >> Additionally, the fix includes the following related changes: >> - Use condition double checking idiom for `MutexLocker mu(JvmtiThreadState_lock)` in the function `JvmtiVTMSTransitionDisabler::VTMS_mount_end` which is on a performance-critical path and looks like this: >> >> JvmtiThreadState* state = thread->jvmti_thread_state(); >> if (state != nullptr && state->is_pending_interp_only_mode()) { >> MutexLocker mu(JvmtiThreadState_lock); >> state = thread->jvmti_thread_state(); >> if (state != nullptr && state->is_pending_interp_only_mode()) { >> JvmtiEventController::enter_interp_only_mode(); >> } >> } >> >> >> - Add extra check of `JvmtiExport::can_support_virtual_threads()` when virtual thread mount and unmount are posted. >> - Minor: Added a `ThreadsListHandle` to the `JvmtiEventControllerPrivate::enter_interp_only_mode`. It is needed because of the dynamic creation of compensating carrier threads which is racy for JVMTI `SetEventNotificationMode` implementation. >> >> Performance mesurements: >> - Without this fix the test provided by the bug submitter gives execution numbers: >> - no ClassLoad events enabled: 3251 ms >> - ClassLoad events are enabled: 40534 ms >> >> - With the fix: >> - no ClassLoad events enabled: 3270 ms >> - ClassLoad events are enabled: 3385 ms >> >> Testing: >> - Ran mach5 tiers 1-6, no regressions are noticed > > Serguei Spitsyn has updated the pull request incrementally with two additional commits since the last revision: > > - review: one more minor correction of a comment > - review: minor correction of a comment Thank you for review, Dan! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16686#issuecomment-1834685966