From lmesnik at openjdk.org Sun Sep 1 16:16:26 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sun, 1 Sep 2024 16:16:26 GMT Subject: Integrated: 8338934: vmTestbase/nsk/jvmti/*Field*Watch/TestDescription.java tests timeout intermittently In-Reply-To: References: Message-ID: On Thu, 29 Aug 2024 18:18:12 GMT, Leonid Mesnik wrote: > The tests time out because of dedlock of of the thread that is in transition and thread changing field watches. > > They use JvmtiThreadState_lock and JvmtiVTMSTransitionDisabler. > > The change field watch require disabler, but attempt to use it only when already locked in > > void > JvmtiEventController::change_field_watch(jvmtiEvent event_type, bool added) { > MutexLocker mu(JvmtiThreadState_lock); > JvmtiEventControllerPrivate::change_field_watch(event_type, added); > } > > > while it is needed to first disable transitions and then try to use JvmtiThreadState_lock. > I quickly looked that most of jvmti methods do it already. Also moved disabler into jvmtiEmv.cpp to be more consistent with other methods. > > > I was able to verify my fix in loom repo locally. and run tier1 + tier5-svc testing in jdk. This pull request has now been integrated. Changeset: 92aafb43 Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/92aafb43424321d8f2552aa34a9a3df291abf992 Stats: 8 lines in 3 files changed: 4 ins; 2 del; 2 mod 8338934: vmTestbase/nsk/jvmti/*Field*Watch/TestDescription.java tests timeout intermittently Reviewed-by: sspitsyn, amenkov ------------- PR: https://git.openjdk.org/jdk/pull/20776 From jwaters at openjdk.org Sun Sep 1 16:33:24 2024 From: jwaters at openjdk.org (Julian Waters) Date: Sun, 1 Sep 2024 16:33:24 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> Message-ID: On Wed, 21 Aug 2024 22:14:40 GMT, Magnus Ihse Bursie wrote: >> As a preparation for Hermetic Java, we need to have a way to look up during runtime if we are using a statically linked library or not. >> >> This change will be the first step needed towards compiling the object files only once, and then link them into either dynamic or static libraries. (The only exception will be the linktype.c[pp] files, which needs to be compiled twice, once for the dynamic libraries and once for the static libraries.) Getting there will require further work though. >> >> This is part of the changes that make up the draft PR https://github.com/openjdk/jdk/pull/19478, which I have broken out. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Also update build to link properly This does make me wonder: What if the new method for checking if the VM was statically linked was inlined? Then the problem comes back yet again as the object files need to be recompiled once more. This is possible if Link Time Optimization is switched on, and I don't like the implication that LTO might be removed as a result just to make this work ------------- PR Comment: https://git.openjdk.org/jdk/pull/20666#issuecomment-2323416932 From dholmes at openjdk.org Mon Sep 2 00:45:32 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 2 Sep 2024 00:45:32 GMT Subject: RFR: 8339316: Test runtime/exceptionMsgs/NoClassDefFoundError/NoClassDefFoundErrorTest.java fails after JDK-8338257 Message-ID: In JDK-8338257 I overlooked updating the callers of `UTF8::is_legal_utf8` to pass a `size_t` length parameter. In some cases the length was explicitly cast to `int` and in the test case in question (with `-Xcheck:jni`) this caused integer overflow to a negative value which then became an exceedingly large `size_t` value and we then tried to do utf8 validation on random bytes. Testing: - failing test - tiers 1-4 Thanks ------------- Commit messages: - 8339316: Test runtime/exceptionMsgs/NoClassDefFoundError/NoClassDefFoundErrorTest.java fails after JDK-8338257 Changes: https://git.openjdk.org/jdk/pull/20804/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20804&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339316 Stats: 11 lines in 7 files changed: 1 ins; 4 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/20804.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20804/head:pull/20804 PR: https://git.openjdk.org/jdk/pull/20804 From dholmes at openjdk.org Mon Sep 2 03:54:24 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 2 Sep 2024 03:54:24 GMT Subject: RFR: 8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths [v2] In-Reply-To: References: <2e6s-MMPDH7HvC8BHvUV4SzjJximYjZr44OL_CnwFWc=.042e04ef-ba2c-4964-9973-4d9963a6410a@github.com> Message-ID: On Fri, 30 Aug 2024 20:45:11 GMT, Chris Plummer wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Exclude test on 32-bit > > Overall it looks good to me, although I don't have experience adding a new JNI API (the dtrace probes were new to me), but it seems you are following what is already in place for other functions, and the testing looks good. Thanks for the review @plummercj ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20784#issuecomment-2323760808 From dholmes at openjdk.org Mon Sep 2 04:06:28 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 2 Sep 2024 04:06:28 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 17:13:06 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. > > There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) FWIW as I recall the suggestion to include NMT in the name in some form was to make it clear that these kinds of parameter, which appear all over the place, are needed because of NMT and are not inherently part of whatever API they appear in. Whether that happens via a namespace, a nested enum, or a simple prefix, I don't really care except to say that anything that can then result in dropping the NMT in the source code (e.g. via a using directive) completely defeats the purpose of having it in the first place. So if there is no good answer here than I guess we just drop NMT from the name. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2323770639 From dholmes at openjdk.org Mon Sep 2 04:12:26 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 2 Sep 2024 04:12:26 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> Message-ID: On Sun, 1 Sep 2024 16:30:33 GMT, Julian Waters wrote: > This does make me wonder: What if the new method for checking if the VM was statically linked was inlined? Then the problem comes back yet again as the object files need to be recompiled once more. This is possible if Link Time Optimization is switched on, and I don't like the implication that LTO might be removed as a result just to make this work Wouldn't such link-time "inlining" only appear in the object code stored within the library, not in the original object file? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20666#issuecomment-2323775705 From jwaters at openjdk.org Mon Sep 2 04:15:21 2024 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 2 Sep 2024 04:15:21 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> Message-ID: On Mon, 2 Sep 2024 04:09:19 GMT, David Holmes wrote: > > This does make me wonder: What if the new method for checking if the VM was statically linked was inlined? Then the problem comes back yet again as the object files need to be recompiled once more. This is possible if Link Time Optimization is switched on, and I don't like the implication that LTO might be removed as a result just to make this work > > Wouldn't such link-time "inlining" only appear in the object code stored within the library, not in the original object file? Ah, you're right, good catch. I got mixed up with the implementation details of different compilers there, sorry ------------- PR Comment: https://git.openjdk.org/jdk/pull/20666#issuecomment-2323778397 From fyang at openjdk.org Mon Sep 2 06:06:59 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 2 Sep 2024 06:06:59 GMT Subject: RFR: 8339359: RISC-V: Use auipc explicitly in far_jump and far_call macro assembler routines Message-ID: Previous discussion: https://github.com/openjdk/jdk/pull/18942#issuecomment-2109162337 For MacroAssembler::far_call and MacroAssembler::far_jump, I would suggest we use explicit auipc instead of MacroAssembler::la for them as the destination is ensured to be in code cache. This will help save unnecessary check in MacroAssembler::la and make the code more consistent. Also this will help distinguish these two macro assembler routines from MacroAssembler::rt_call Testing: - [ ] release & fastdebug builds - [ ] Tiered 1-3 tests ------------- Commit messages: - 8339359: RISC-V: Use auipc explicitly in far_jump and far_call macro assembler routines Changes: https://git.openjdk.org/jdk/pull/20805/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20805&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339359 Stats: 11 lines in 1 file changed: 7 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20805.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20805/head:pull/20805 PR: https://git.openjdk.org/jdk/pull/20805 From rcastanedalo at openjdk.org Mon Sep 2 06:38:07 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 2 Sep 2024 06:38:07 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v12] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 18 additional commits since the last revision: - Merge jdk-24+13 - Add test to motivate compile-time null checks in 'refine_barrier_by_new_val_type' - Remark relation between compiler optimization and barrier filter - Make 'refine_barrier_by_new_val_type' static and its input argument 'const' - Replace 'the null' with 'null' in comment - Remove redundant redefinitions of '__' - Replace 'already dirty' with 'young' in post-barrier fast path comment - Rename g1XChgX to g1GetAndSetX for consistency with Ideal operation names - Pass oldval to the pre-barrier in g1CompareAndExchange/SwapP - Assert that no implicit null checks are generated for memory accesses with barriers - ... and 8 more: https://git.openjdk.org/jdk/compare/52ffcda1...4ee450ad ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/57adcfb0..4ee450ad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=10-11 Stats: 30577 lines in 938 files changed: 18592 ins; 8033 del; 3952 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Mon Sep 2 06:38:07 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 2 Sep 2024 06:38:07 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v11] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 13:23:32 GMT, Feilong Jiang wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with six additional commits since the last revision: >> >> - Add test to motivate compile-time null checks in 'refine_barrier_by_new_val_type' >> - Remark relation between compiler optimization and barrier filter >> - Make 'refine_barrier_by_new_val_type' static and its input argument 'const' >> - Replace 'the null' with 'null' in comment >> - Remove redundant redefinitions of '__' >> - Replace 'already dirty' with 'young' in post-barrier fast path comment > > risc-v port looks good too. > OK, if there are no objections from @feilongjiang or @snazarkin within a couple of days I will prepare an update to jdk-24+13. @TheRealMDoerr done (commit 4ee450a). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2323921726 From aturbanov at openjdk.org Mon Sep 2 07:53:26 2024 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Mon, 2 Sep 2024 07:53:26 GMT Subject: RFR: 8339307: jhsdb jstack could not trace FFM upcall frame [v3] In-Reply-To: References: Message-ID: On Sat, 31 Aug 2024 09:34:09 GMT, Yasumasa Suenaga wrote: >> I attempted to check stack trace in the core generated by [SEGV example in upcall](https://github.com/YaSuenag/garakuta/blob/841452d9176dab1ddbb552009c180530eb81190b/NativeSEGV/ffm/upcall/src/main/java/com/yasuenag/garakuta/nativesegv/upcall/Main.java) with `jhsdb jstack`, however it failed with following exception. >> >> >> Error occurred during stack walking: >> java.lang.RuntimeException: Couldn't deduce type of CodeBlob @0x00007fa04c265990 for PC=0x00007fa04c265aa6 >> at jdk.hotspot.agent/sun.jvm.hotspot.code.CodeCache.findBlobUnsafe(CodeCache.java:124) >> at jdk.hotspot.agent/sun.jvm.hotspot.code.CodeCache.findBlob(CodeCache.java:83) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.cb(Frame.java:119) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.adjustUnextendedSP(X86Frame.java:334) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.initFrame(X86Frame.java:137) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.(X86Frame.java:163) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.senderForInterpreterFrame(X86Frame.java:361) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.sender(X86Frame.java:281) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.sender(Frame.java:207) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.realSender(Frame.java:212) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.sender(VFrame.java:120) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.javaSender(VFrame.java:144) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:81) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.run(JStack.java:67) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:278) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:241) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:134) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.runWithArgs(JStack.java:90) >> at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runJSTACK(SALauncher.java:302) >> at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:500) >> Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for type of address 0x00007fa04c265990 (nearest symbol is _ZTV10Upcall... > > Yasumasa Suenaga has updated the pull request incrementally with one additional commit since the last revision: > > Add frame size to all of UpcallStub::create() call test/hotspot/jtreg/serviceability/sa/LingeredAppWithFFMUpcall.java line 45: > 43: try { > 44: Thread.sleep(600000); // 10 min > 45: } catch(InterruptedException e) { Suggestion: } catch (InterruptedException e) { test/hotspot/jtreg/serviceability/sa/LingeredAppWithFFMUpcall.java line 61: > 59: > 60: public static void main(String[] args) { > 61: try{ Suggestion: try { test/hotspot/jtreg/serviceability/sa/LingeredAppWithFFMUpcall.java line 65: > 63: (new Thread(() -> callJNI(upcallAddr), THREAD_NAME)).start(); > 64: LingeredApp.main(args); > 65: } catch(Exception e) { Suggestion: } catch (Exception e) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20789#discussion_r1740474940 PR Review Comment: https://git.openjdk.org/jdk/pull/20789#discussion_r1740475126 PR Review Comment: https://git.openjdk.org/jdk/pull/20789#discussion_r1740475258 From alanb at openjdk.org Mon Sep 2 09:08:18 2024 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 2 Sep 2024 09:08:18 GMT Subject: RFR: 8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths [v2] In-Reply-To: <2e6s-MMPDH7HvC8BHvUV4SzjJximYjZr44OL_CnwFWc=.042e04ef-ba2c-4964-9973-4d9963a6410a@github.com> References: <2e6s-MMPDH7HvC8BHvUV4SzjJximYjZr44OL_CnwFWc=.042e04ef-ba2c-4964-9973-4d9963a6410a@github.com> Message-ID: <8Lgmm7EJJ6BLMS1hIp6eWvnUPw-pzO-Svw9I8g33JeU=.44b18408-f1a7-4119-94c4-c8f7665bac61@github.com> On Fri, 30 Aug 2024 05:21:54 GMT, David Holmes wrote: >> This is the implementation of a new method added to the JNI specification. >> >> From the CSR request: >> >> The `GetStringUTFLength` function returns the length as a `jint` (`jsize`) value and so is limited to returning at most `Integer.MAX_VALUE`. But a Java string can itself consist of `Integer.MAX_VALUE` characters, each of which may require more than one byte to represent them in modified UTF-8 format.** It follows then that this function cannot return the correct answer for all String values and yet the specification makes no mention of this, nor of any possible error to report if this situation is encountered. >> >> **The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. With compact strings this reduces to 2*`Integer.MAX_VALUE`. >> >> Solution >> >> Deprecate the existing JNI `GetStringUTFLength` method noting that it may return a truncated length, and add a new method, JNI `GetStringUTFLengthAsLong` that returns the string length as a `jlong` value. >> >> --- >> >> We also add a truncation warning to `GetStringUTFLength` under -Xcheck:jni >> >> There are some incidental whitespace changes in `src/hotspot/os/posix/dtrace/hotspot_jni.d` along with the new method entries. >> >> Testing: >> - new test added >> - tiers 1-3 sanity >> >> Thanks > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Exclude test on 32-bit Deprecating the existing function and introducing the new function looks okay. The test is really 3 tests in one (GetStringUTFLength returning a truncated size, GetStringUTFLength with -Xcheck:jni prints a warning, and GetStringUTFLengthAsLong returns the long size). Personally I would have done this as 3 test cases rather in one launch with -Xcheck:jni but that's your choice. ------------- Marked as reviewed by alanb (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20784#pullrequestreview-2275089037 From ihse at openjdk.org Mon Sep 2 09:17:28 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 2 Sep 2024 09:17:28 GMT Subject: Integrated: 8338768: Introduce runtime lookup to check for static builds In-Reply-To: References: Message-ID: <7ZCNx3fBoTkmUDxGYAcN6eqygCSF-kxH3vBPSczQqaA=.85487425-1e4f-4496-92f7-92d7a6d69156@github.com> On Wed, 21 Aug 2024 21:53:39 GMT, Magnus Ihse Bursie wrote: > As a preparation for Hermetic Java, we need to have a way to look up during runtime if we are using a statically linked library or not. > > This change will be the first step needed towards compiling the object files only once, and then link them into either dynamic or static libraries. (The only exception will be the linktype.c[pp] files, which needs to be compiled twice, once for the dynamic libraries and once for the static libraries.) Getting there will require further work though. > > This is part of the changes that make up the draft PR https://github.com/openjdk/jdk/pull/19478, which I have broken out. This pull request has now been integrated. Changeset: a136a85b Author: Magnus Ihse Bursie URL: https://git.openjdk.org/jdk/commit/a136a85b6f5bbc92727883693c1ce31c37a82fd5 Stats: 205 lines in 12 files changed: 109 ins; 21 del; 75 mod 8338768: Introduce runtime lookup to check for static builds Co-authored-by: Magnus Ihse Bursie Co-authored-by: Jiangli Zhou Reviewed-by: prr, jiangli, alanb ------------- PR: https://git.openjdk.org/jdk/pull/20666 From dholmes at openjdk.org Mon Sep 2 09:50:18 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 2 Sep 2024 09:50:18 GMT Subject: RFR: 8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths [v2] In-Reply-To: <8Lgmm7EJJ6BLMS1hIp6eWvnUPw-pzO-Svw9I8g33JeU=.44b18408-f1a7-4119-94c4-c8f7665bac61@github.com> References: <2e6s-MMPDH7HvC8BHvUV4SzjJximYjZr44OL_CnwFWc=.042e04ef-ba2c-4964-9973-4d9963a6410a@github.com> <8Lgmm7EJJ6BLMS1hIp6eWvnUPw-pzO-Svw9I8g33JeU=.44b18408-f1a7-4119-94c4-c8f7665bac61@github.com> Message-ID: On Mon, 2 Sep 2024 09:05:17 GMT, Alan Bateman wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Exclude test on 32-bit > > Deprecating the existing function and introducing the new function looks okay. > > The test is really 3 tests in one (GetStringUTFLength returning a truncated size, GetStringUTFLength with -Xcheck:jni prints a warning, and GetStringUTFLengthAsLong returns the long size). Personally I would have done this as 3 test cases rather in one launch with -Xcheck:jni but that's your choice. Thanks for the review @AlanBateman . I like the test the way it is. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20784#issuecomment-2324302820 From rehn at openjdk.org Mon Sep 2 09:56:17 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 2 Sep 2024 09:56:17 GMT Subject: RFR: 8339359: RISC-V: Use auipc explicitly in far_jump and far_call macro assembler routines In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 01:15:36 GMT, Fei Yang wrote: > Previous discussion: https://github.com/openjdk/jdk/pull/18942#issuecomment-2109162337 > > For MacroAssembler::far_call and MacroAssembler::far_jump, I would suggest we use explicit > auipc instead of MacroAssembler::la for them as the destination is ensured to be in code cache. > This will help save unnecessary check in MacroAssembler::la and make the code more consistent. > Also this will help distinguish these two macro assembler routines from MacroAssembler::rt_call > > Testing: > - [x] release & fastdebug builds > - [ ] Tiered 1-3 tests Looks good, thanks! A personal opionion is that I really like my assembler to be as WYSIWYG as possible. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20805#issuecomment-2324315199 From ysuenaga at openjdk.org Mon Sep 2 10:29:40 2024 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Mon, 2 Sep 2024 10:29:40 GMT Subject: RFR: 8339307: jhsdb jstack could not trace FFM upcall frame [v4] In-Reply-To: References: Message-ID: > I attempted to check stack trace in the core generated by [SEGV example in upcall](https://github.com/YaSuenag/garakuta/blob/841452d9176dab1ddbb552009c180530eb81190b/NativeSEGV/ffm/upcall/src/main/java/com/yasuenag/garakuta/nativesegv/upcall/Main.java) with `jhsdb jstack`, however it failed with following exception. > > > Error occurred during stack walking: > java.lang.RuntimeException: Couldn't deduce type of CodeBlob @0x00007fa04c265990 for PC=0x00007fa04c265aa6 > at jdk.hotspot.agent/sun.jvm.hotspot.code.CodeCache.findBlobUnsafe(CodeCache.java:124) > at jdk.hotspot.agent/sun.jvm.hotspot.code.CodeCache.findBlob(CodeCache.java:83) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.cb(Frame.java:119) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.adjustUnextendedSP(X86Frame.java:334) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.initFrame(X86Frame.java:137) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.(X86Frame.java:163) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.senderForInterpreterFrame(X86Frame.java:361) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.sender(X86Frame.java:281) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.sender(Frame.java:207) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.realSender(Frame.java:212) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.sender(VFrame.java:120) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.javaSender(VFrame.java:144) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:81) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.run(JStack.java:67) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:278) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:241) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:134) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.runWithArgs(JStack.java:90) > at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runJSTACK(SALauncher.java:302) > at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:500) > Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for type of address 0x00007fa04c265990 (nearest symbol is _ZTV10UpcallStub) > at jdk.hotspot.agent/sun.jvm.hotspot.run... Yasumasa Suenaga has updated the pull request incrementally with three additional commits since the last revision: - Update test/hotspot/jtreg/serviceability/sa/LingeredAppWithFFMUpcall.java Co-authored-by: Andrey Turbanov - Update test/hotspot/jtreg/serviceability/sa/LingeredAppWithFFMUpcall.java Co-authored-by: Andrey Turbanov - Update test/hotspot/jtreg/serviceability/sa/LingeredAppWithFFMUpcall.java Co-authored-by: Andrey Turbanov ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20789/files - new: https://git.openjdk.org/jdk/pull/20789/files/90bccf1f..63ed18dd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20789&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20789&range=02-03 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20789.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20789/head:pull/20789 PR: https://git.openjdk.org/jdk/pull/20789 From amitkumar at openjdk.org Mon Sep 2 10:33:23 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 2 Sep 2024 10:33:23 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v3] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 20:11:06 GMT, Coleen Phillimore wrote: >> Coleen Phillimore has updated the pull request incrementally with three additional commits since the last revision: >> >> - Fix jvmci code. >> - Some C2 refactoring. >> - Assembly corrections from Matias and Dean. > > Thanks Chris and Matias for reviewing parts of this. Hi @coleenp, I got this error while build on my Mac-M1, did you see something like this ? : ERROR: Failed to generate link optimization data. This is likely a problem with the newly built JVM/JDK. # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/Users/amitkumar/jdk/src/hotspot/share/oops/klassFlags.hpp:72), pid=57968, tid=10499 # assert(!is_value_based_class()) failed: set once # # JRE version: (24.0) (fastdebug build ) # Java VM: OpenJDK 64-Bit Server VM (fastdebug 24-internal-adhoc.amitkumar.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-aarch64) # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /Users/amitkumar/jdk/make/hs_err_pid57968.log # # These are the changes with which I am build the JVM. I can reproduce it on my s390x-machine as well. diff --git a/src/hotspot/share/runtime/globals.hpp b/src/hotspot/share/runtime/globals.hpp index d442894798b..fbd1a2b3281 100644 --- a/src/hotspot/share/runtime/globals.hpp +++ b/src/hotspot/share/runtime/globals.hpp @@ -170,7 +170,7 @@ const int ObjectAlignmentInBytes = 8; product(bool, AlwaysSafeConstructors, false, EXPERIMENTAL, \ "Force safe construction, as if all fields are final.") \ \ - product(bool, UnlockDiagnosticVMOptions, trueInDebug, DIAGNOSTIC, \ + product(bool, UnlockDiagnosticVMOptions, true, DIAGNOSTIC, \ "Enable normal processing of flags relating to field diagnostics")\ \ product(bool, UnlockExperimentalVMOptions, false, EXPERIMENTAL, \ @@ -819,7 +819,7 @@ const int ObjectAlignmentInBytes = 8; product(bool, RestrictContended, true, \ "Restrict @Contended to trusted classes") \ \ - product(int, DiagnoseSyncOnValueBasedClasses, 0, DIAGNOSTIC, \ + product(int, DiagnoseSyncOnValueBasedClasses, 1, DIAGNOSTIC, \ "Detect and take action upon identifying synchronization on " \ "value based classes. Modes: " \ "0: off; " \ ------------- PR Comment: https://git.openjdk.org/jdk/pull/20719#issuecomment-2324389377 From jvernee at openjdk.org Mon Sep 2 12:01:18 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 2 Sep 2024 12:01:18 GMT Subject: RFR: 8339307: jhsdb jstack could not trace FFM upcall frame [v4] In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 10:29:40 GMT, Yasumasa Suenaga wrote: >> I attempted to check stack trace in the core generated by [SEGV example in upcall](https://github.com/YaSuenag/garakuta/blob/841452d9176dab1ddbb552009c180530eb81190b/NativeSEGV/ffm/upcall/src/main/java/com/yasuenag/garakuta/nativesegv/upcall/Main.java) with `jhsdb jstack`, however it failed with following exception. >> >> >> Error occurred during stack walking: >> java.lang.RuntimeException: Couldn't deduce type of CodeBlob @0x00007fa04c265990 for PC=0x00007fa04c265aa6 >> at jdk.hotspot.agent/sun.jvm.hotspot.code.CodeCache.findBlobUnsafe(CodeCache.java:124) >> at jdk.hotspot.agent/sun.jvm.hotspot.code.CodeCache.findBlob(CodeCache.java:83) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.cb(Frame.java:119) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.adjustUnextendedSP(X86Frame.java:334) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.initFrame(X86Frame.java:137) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.(X86Frame.java:163) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.senderForInterpreterFrame(X86Frame.java:361) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.sender(X86Frame.java:281) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.sender(Frame.java:207) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.realSender(Frame.java:212) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.sender(VFrame.java:120) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.javaSender(VFrame.java:144) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:81) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.run(JStack.java:67) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:278) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:241) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:134) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.runWithArgs(JStack.java:90) >> at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runJSTACK(SALauncher.java:302) >> at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:500) >> Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for type of address 0x00007fa04c265990 (nearest symbol is _ZTV10Upcall... > > Yasumasa Suenaga has updated the pull request incrementally with three additional commits since the last revision: > > - Update test/hotspot/jtreg/serviceability/sa/LingeredAppWithFFMUpcall.java > > Co-authored-by: Andrey Turbanov > - Update test/hotspot/jtreg/serviceability/sa/LingeredAppWithFFMUpcall.java > > Co-authored-by: Andrey Turbanov > - Update test/hotspot/jtreg/serviceability/sa/LingeredAppWithFFMUpcall.java > > Co-authored-by: Andrey Turbanov I had a look at the SA agent code in `src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/x86/X86Frame.java` and it looks like this has it's own separate stack walking implementation. I understand that adding the `UpcallStub` type to the SA agent code makes the `WrongTypeException` go away, and then we run into an assertion failure because the frame size is zero? Error occurred during stack walking: sun.jvm.hotspot.utilities.AssertionFailure: must have non-zero frame size at jdk.hotspot.agent/sun.jvm.hotspot.utilities.Assert.that(Assert.java:32) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.senderForCompiledFrame(X86Frame.java:383) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.sender(X86Frame.java:292) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.sender(Frame.java:207) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.realSender(Frame.java:212) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.sender(VFrame.java:120) at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.javaSender(VFrame.java:150) at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:81) at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45) at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.run(JStack.java:67) at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:278) at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:241) at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:134) at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.runWithArgs(JStack.java:90) at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runJSTACK(SALauncher.java:302) at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:500) The issue here appears to be that the stack walking code in the SA agent doesn't handle upcall stub frames at all at the moment, so the code falls back to handling for compiled frames. However, that code looks wrong for upcall stub frames, since we need to look at the frame anchor to get to the caller (see how this is implemented in e.g. `src/hotspot/cpu/x86/frame_x86.cpp`). Note how there is also special handling for (JNI) entry frames in the SA. >From the output, this still seems to work out, I'm guessing because we end up walking the native frames until we get back to Java, and the native frames are simply ignored. I'm not sure if that will always work for arbitrary native code though. I think the right fix here is to implement handling for upcall stub frames in the SA agent, since that's also how entry frames are handled. I don't think setting the frame size in hotspot is actually needed if we do that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20789#issuecomment-2324580579 From jbhateja at openjdk.org Mon Sep 2 12:20:59 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 2 Sep 2024 12:20:59 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v5] In-Reply-To: References: Message-ID: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolved ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/c42b4afa..767aeef3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=03-04 Stats: 249 lines in 9 files changed: 75 ins; 67 del; 107 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From jbhateja at openjdk.org Mon Sep 2 12:21:00 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 2 Sep 2024 12:21:00 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v4] In-Reply-To: References: Message-ID: On Mon, 26 Aug 2024 22:17:55 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions. > > src/hotspot/cpu/x86/assembler_x86.cpp line 10229: > >> 10227: InstructionMark im(this); >> 10228: assert(VM_Version::supports_avx512bw() && (vector_len == AVX_512bit || VM_Version::supports_avx512vl()), ""); >> 10229: InstructionAttr attributes(vector_len, /* vex_w */ true,/* legacy_mode */ false, /* no_mask_reg */ false,/* uses_vl */ true); > > vex_w could be false here. Encoding specification mentions W bit gets ignored, so no functional issues, will make it false to comply with our convention. > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 6698: > >> 6696: // Unsigned values ranges comprise of only +ve numbers, thus there exist only an upper bound saturation. >> 6697: // overflow = ((UMAX - MAX(SRC1 & SRC2)) >> 31 == 1 >> 6698: // Res = Signed Add INP1, INP2 > > The >>> 31 is not coded so comment could be improved to match the code. > Comment has SRC1/INP1 term mixed. > Also, could overflow not be implemented based on much simpler Java scalar algo: > Overflow = Res This is much straight forward, also evex supports unsigned comparison. Java scalar algo was empirically proved to hold good, I also verified with Alive2 solver which proved its semantic equivalence to HD section 2-13 based vector implementation. Here is the link to Alive2 solver which operates on LLVM IR inputs. [https://alive2.llvm.org/ce/z/XDQ7dY](https://alive2.llvm.org/ce/z/XDQ7dY) > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 6749: > >> 6747: vpor(xtmp2, xtmp2, src2, vlen_enc); >> 6748: // Compute mask for muxing T1 with T3 using SRC1. >> 6749: vpsign_extend_dq(etype, xtmp4, src1, vlen_enc); > > I don't think we need to do the sign extension. The blend instruction uses most significant bit to do the blend. Original vector is has double / quad word lanes which are being blended using byte level mask, sign extension will ensure that sign bit is propagated to MSB bits of each constituent byte mask corresponding double / quad word source lane. > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 6939: > >> 6937: >> 6938: // Compose saturating min/max vector using first input polarity mask. >> 6939: vpsign_extend_dq(etype, xtmp4, src1, vlen_enc); > > Sign extend to lower bits not needed as blend uses msbit only. Original vector is has double / quad word lanes which are being blended using byte level mask, sign extension will ensure that sign bit is propagated to MSB bits of each constituent byte mask corresponding double / quad word source lane. > src/hotspot/cpu/x86/x86.ad line 1953: > >> 1951: if (UseAVX < 1 || size_in_bits < 128 || (size_in_bits == 512 && !VM_Version::supports_avx512bw())) { >> 1952: return false; >> 1953: } > > UseAVX < 1 could be written as UseAVX == 0. Could we not do register version for size_in_bit < 128? I get your point, but constraints ensure we only address cases with vector size >= 128 bit in this patch.. > src/hotspot/cpu/x86/x86.ad line 10635: > >> 10633: %} >> 10634: >> 10635: instruct saturating_unsigned_add_reg_avx(vec dst, vec src1, vec src2, vec xtmp1, vec xtmp2, vec xtmp3, vec xtmp4) > > Should the temp here and all the places related to !avx512vl() be legVec instead of vec? Predicate already has AVX512VL check and so does dynamic register classes associated with its operands. > src/hotspot/cpu/x86/x86.ad line 10656: > >> 10654: match(Set dst (SaturatingSubVI src1 src2)); >> 10655: match(Set dst (SaturatingSubVL src1 src2)); >> 10656: effect(TEMP ktmp); > > This needs TEMP dst as well. There is no use of either of the source operands after assignment to dst in the macro assembly routine. > src/java.base/share/classes/java/lang/Byte.java line 647: > >> 645: */ >> 646: public static byte subSaturating(byte a, byte b) { >> 647: byte res = (byte)(a - b); > > Could we not do subSaturating as an int operation on similar lines as addSaturating? Yes, core libs also have {add/subtract}Exact API which instead of saturating over / underflowing values throws ArithmeticException. Streamlining overflow checks for saturating long operations on the same lines to address Joe's concerns on new constants. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1740820063 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1740819360 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1740823487 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1740823011 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1740819742 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1740819629 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1740822270 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1740821085 From jbhateja at openjdk.org Mon Sep 2 12:21:00 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 2 Sep 2024 12:21:00 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v2] In-Reply-To: References: Message-ID: On Wed, 28 Aug 2024 16:05:45 GMT, Sandhya Viswanathan wrote: >> Wonder if it would have been simpler if we added unsigned vector operators like Op_SaturatingUnsignedAddVB etc. We are not adding unsigned data types to Java, only supporting unsigned (saturating) operations on existing signed integral types. > > If the aim is to reduce the number of nodes, we could merge the Op_SaturatingAddVB, Op_SaturatingAddVS, Op_SaturatingAddVI, and Op_SaturatingAddVL into one Op_SaturatingAddV. Likewise for unsigned saturating add into Op_SaturatingUnsignedAddV. Hey @sviswa7, our concern was around value ranges of new unsigned scalar type, which as mentioned will be addressed when I support intrinsification of new core lib APIs and associated range constraining / folding optimization in a follow up patch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1740817837 From luhenry at openjdk.org Mon Sep 2 13:36:17 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 2 Sep 2024 13:36:17 GMT Subject: RFR: 8339359: RISC-V: Use auipc explicitly in far_jump and far_call macro assembler routines In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 01:15:36 GMT, Fei Yang wrote: > Previous discussion: https://github.com/openjdk/jdk/pull/18942#issuecomment-2109162337 > > For MacroAssembler::far_call and MacroAssembler::far_jump, I would suggest we use explicit > auipc instead of MacroAssembler::la for them as the destination is ensured to be in code cache. > This will help save unnecessary check in MacroAssembler::la and make the code more consistent. > Also this will help distinguish these two macro assembler routines from MacroAssembler::rt_call > > Testing: > - [x] release & fastdebug builds > - [ ] Tiered 1-3 tests LGTM ------------- Marked as reviewed by luhenry (Committer). PR Review: https://git.openjdk.org/jdk/pull/20805#pullrequestreview-2275682328 From fgao at openjdk.org Mon Sep 2 13:40:18 2024 From: fgao at openjdk.org (Fei Gao) Date: Mon, 2 Sep 2024 13:40:18 GMT Subject: RFR: 8339063: [aarch64] Skip verify_sve_vector_length after native calls if SVE supports 128 bits VL only [v3] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 03:03:53 GMT, Joshua Zhu wrote: >> Please review this minor enhancement that skips verify_sve_vector_length after native calls. >> It works on SVE micro-architecture that only supports 128-bit vector length. > > Joshua Zhu has updated the pull request incrementally with one additional commit since the last revision: > > Fix mismatch issue in ad m4 file LGTM! Thanks for the update. ------------- Marked as reviewed by fgao (Committer). PR Review: https://git.openjdk.org/jdk/pull/20724#pullrequestreview-2275690197 From adinn at openjdk.org Mon Sep 2 13:45:20 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 2 Sep 2024 13:45:20 GMT Subject: RFR: 8339063: [aarch64] Skip verify_sve_vector_length after native calls if SVE supports 128 bits VL only [v3] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 03:03:53 GMT, Joshua Zhu wrote: >> Please review this minor enhancement that skips verify_sve_vector_length after native calls. >> It works on SVE micro-architecture that only supports 128-bit vector length. > > Joshua Zhu has updated the pull request incrementally with one additional commit since the last revision: > > Fix mismatch issue in ad m4 file still good ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20724#pullrequestreview-2275700348 From ysuenaga at openjdk.org Mon Sep 2 14:52:21 2024 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Mon, 2 Sep 2024 14:52:21 GMT Subject: RFR: 8339307: jhsdb jstack could not trace FFM upcall frame [v4] In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 11:59:01 GMT, Jorn Vernee wrote: > I understand that adding the UpcallStub type to the SA agent code makes the WrongTypeException go away, and then we run into an assertion failure because the frame size is zero? Yes. > Note how there is also special handling for (JNI) entry frames in the SA. Do you mean `JavaCallWrapper` (`X86Frame::senderForEntryFrame` in SA) ? > I'm guessing because we end up walking the native frames until we get back to Java, and the native frames are simply ignored. I'm not sure if that will always work for arbitrary native code though. > > I think the right fix here is to implement handling for upcall stub frames in the SA agent, since that's also how entry frames are handled. I don't think setting the frame size in hotspot is actually needed if we do that. If we add some frame info (return address and FP) like `JavaCallWrapper` to `UpcallStub` and process it in SA, we do not need frame size of `UpcallStub` as you said. But I think it should be fixed in all of upcall implementation. `UpcallStub` is "Stub", so it compliant native calling convention. Thus I believe native frame unwinder like `X86Frame` should always work if frame size is set in `UpcallStub`. We need to fix all of upcall implementation in both case, and zero frame size is not nature. In addition adding frame size is simpler than add special handling for `UpcallStub` and SA. Thus I give +1 to add frame size to `UpcallStub`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20789#issuecomment-2324921013 From jvernee at openjdk.org Mon Sep 2 15:16:20 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 2 Sep 2024 15:16:20 GMT Subject: RFR: 8339307: jhsdb jstack could not trace FFM upcall frame [v4] In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 14:50:10 GMT, Yasumasa Suenaga wrote: > > Note how there is also special handling for (JNI) entry frames in the SA. > > Do you mean `JavaCallWrapper` (`X86Frame::senderForEntryFrame` in SA) ? Yes. Internally it loads the fields of `JavaFrameAnchor`, which points at the previous Java frame. > > I'm guessing because we end up walking the native frames until we get back to Java, and the native frames are simply ignored. I'm not sure if that will always work for arbitrary native code though. > > I think the right fix here is to implement handling for upcall stub frames in the SA agent, since that's also how entry frames are handled. I don't think setting the frame size in hotspot is actually needed if we do that. > > If we add some frame info (return address and FP) like `JavaCallWrapper` to `UpcallStub` and process it in SA, we do not need frame size of `UpcallStub` as you said. But I think it should be fixed in all of upcall implementation. `UpcallStub` is "Stub", so it compliant native calling convention. Thus I believe native frame unwinder like `X86Frame` should always work if frame size is set in `UpcallStub`. The problem is not the upcall stub frame itself. We know which ABI that is using. The problems is any intermediate frames between the upcall stub frame and last Java frame before that. These intermediate native frames can have any ABI. There is no single 'native calling convention'. We know that we are interfacing with something that follows the C ABI, but that code may switch to a different ABI (e.g. Rust, C++, or some other language) which may have a different stack frame layout. Stack walking those frames might break. The frame anchor used by entry/upcall frames helps to avoid this by letting the stack walker jump over all the native frames, and continue walking at the last java frame before the upcall stub instead. That means it doesn't have to deal with the foreign stack layout of frames in between. > We need to fix all of upcall implementation in both case, and zero frame size is not nature. In addition adding frame size is simpler than add special handling for `UpcallStub` and SA. Thus I give +1 to add frame size to `UpcallStub`. I'm not necessarily opposed to adding a frame size to upcall stubs, but as a fix for SA stack walking this seems like a band-aid. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20789#issuecomment-2324964613 From jzhu at openjdk.org Mon Sep 2 15:18:20 2024 From: jzhu at openjdk.org (Joshua Zhu) Date: Mon, 2 Sep 2024 15:18:20 GMT Subject: RFR: 8339063: [aarch64] Skip verify_sve_vector_length after native calls if SVE supports 128 bits VL only [v2] In-Reply-To: References: Message-ID: On Wed, 28 Aug 2024 15:52:16 GMT, Andrew Dinn wrote: >> Joshua Zhu has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix compilation failure with --disable-precompiled-headers > > Ok, that sounds like it is sufficient. Thank you for the reviews! @adinn @fg1417 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20724#issuecomment-2324965015 From duke at openjdk.org Mon Sep 2 15:18:21 2024 From: duke at openjdk.org (duke) Date: Mon, 2 Sep 2024 15:18:21 GMT Subject: RFR: 8339063: [aarch64] Skip verify_sve_vector_length after native calls if SVE supports 128 bits VL only [v3] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 03:03:53 GMT, Joshua Zhu wrote: >> Please review this minor enhancement that skips verify_sve_vector_length after native calls. >> It works on SVE micro-architecture that only supports 128-bit vector length. > > Joshua Zhu has updated the pull request incrementally with one additional commit since the last revision: > > Fix mismatch issue in ad m4 file @JoshuaZhuwj Your change (at version d19108585ebb2b849229c2bf11d0ea6d6860a56e) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20724#issuecomment-2324967336 From jzhu at openjdk.org Mon Sep 2 15:40:29 2024 From: jzhu at openjdk.org (Joshua Zhu) Date: Mon, 2 Sep 2024 15:40:29 GMT Subject: Integrated: 8339063: [aarch64] Skip verify_sve_vector_length after native calls if SVE supports 128 bits VL only In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 09:28:52 GMT, Joshua Zhu wrote: > Please review this minor enhancement that skips verify_sve_vector_length after native calls. > It works on SVE micro-architecture that only supports 128-bit vector length. This pull request has now been integrated. Changeset: 0e6bb514 Author: Joshua Zhu Committer: Andrew Dinn URL: https://git.openjdk.org/jdk/commit/0e6bb514c8ec7c4a7100fe06eaa9e954a74fda30 Stats: 60 lines in 7 files changed: 33 ins; 14 del; 13 mod 8339063: [aarch64] Skip verify_sve_vector_length after native calls if SVE supports 128 bits VL only Reviewed-by: adinn, fgao ------------- PR: https://git.openjdk.org/jdk/pull/20724 From mdoerr at openjdk.org Mon Sep 2 20:54:48 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 2 Sep 2024 20:54:48 GMT Subject: RFR: 8339411: [PPC64] cmpxchgw/h/b doesn't handle external Label Message-ID: <3iI0-mL3n5TuWz7FHHo57N6RoKGX1ZIX9Jag5P9GPtc=.290f8349-7049-4a56-853e-db3f1bce0b1a@github.com> I had forgotten to copy one line from the cmpxchgd in [JDK-8338814](https://bugs.openjdk.org/browse/JDK-8338814) (https://github.com/openjdk/jdk/commit/2edf574f62837678e621e1dfdd8d8a77dbe17ad6). ------------- Commit messages: - 8339411: [PPC64] cmpxchgw/h/b doesn't handle external Label Changes: https://git.openjdk.org/jdk/pull/20826/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20826&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339411 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20826.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20826/head:pull/20826 PR: https://git.openjdk.org/jdk/pull/20826 From lucy at openjdk.org Mon Sep 2 21:38:18 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Mon, 2 Sep 2024 21:38:18 GMT Subject: RFR: 8339411: [PPC64] cmpxchgw/h/b doesn't handle external Label In-Reply-To: <3iI0-mL3n5TuWz7FHHo57N6RoKGX1ZIX9Jag5P9GPtc=.290f8349-7049-4a56-853e-db3f1bce0b1a@github.com> References: <3iI0-mL3n5TuWz7FHHo57N6RoKGX1ZIX9Jag5P9GPtc=.290f8349-7049-4a56-853e-db3f1bce0b1a@github.com> Message-ID: On Mon, 2 Sep 2024 20:48:59 GMT, Martin Doerr wrote: > I had forgotten to copy one line from the cmpxchgd in [JDK-8338814](https://bugs.openjdk.org/browse/JDK-8338814) (https://github.com/openjdk/jdk/commit/2edf574f62837678e621e1dfdd8d8a77dbe17ad6). This one looks good, probably even trivial. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20826#pullrequestreview-2276163438 From fyang at openjdk.org Tue Sep 3 02:14:05 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 3 Sep 2024 02:14:05 GMT Subject: RFR: 8339359: RISC-V: Use auipc explicitly in far_jump and far_call macro assembler routines [v2] In-Reply-To: References: Message-ID: > Previous discussion: https://github.com/openjdk/jdk/pull/18942#issuecomment-2109162337 > > For MacroAssembler::far_call and MacroAssembler::far_jump, I would suggest we use explicit > auipc instead of MacroAssembler::la for them as the destination is ensured to be in code cache. > This will help save unnecessary check in MacroAssembler::la and make the code more consistent. > Also this will help distinguish these two macro assembler routines from MacroAssembler::rt_call > > Testing: > - [x] release & fastdebug builds > - [x] Tiered 1-3 tests Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Fix assertion message - Merge remote-tracking branch 'upstream/master' into JDK-8339359 - 8339359: RISC-V: Use auipc explicitly in far_jump and far_call macro assembler routines ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20805/files - new: https://git.openjdk.org/jdk/pull/20805/files/95b69ff0..8c426ff9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20805&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20805&range=00-01 Stats: 984 lines in 25 files changed: 159 ins; 731 del; 94 mod Patch: https://git.openjdk.org/jdk/pull/20805.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20805/head:pull/20805 PR: https://git.openjdk.org/jdk/pull/20805 From fyang at openjdk.org Tue Sep 3 02:14:05 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 3 Sep 2024 02:14:05 GMT Subject: RFR: 8339359: RISC-V: Use auipc explicitly in far_jump and far_call macro assembler routines In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 01:15:36 GMT, Fei Yang wrote: > Previous discussion: https://github.com/openjdk/jdk/pull/18942#issuecomment-2109162337 > > For MacroAssembler::far_call and MacroAssembler::far_jump, I would suggest we use explicit > auipc instead of MacroAssembler::la for them as the destination is ensured to be in code cache. > This will help save unnecessary check in MacroAssembler::la and make the code more consistent. > Also this will help distinguish these two macro assembler routines from MacroAssembler::rt_call > > Testing: > - [x] release & fastdebug builds > - [x] Tiered 1-3 tests Thanks for having a look. I just added one extra commit correcting the assertion message of far_jump. My local tier1-3 tests are good. @robehn : Can you take another look and approve this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20805#issuecomment-2325474428 From dholmes at openjdk.org Tue Sep 3 03:00:56 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 3 Sep 2024 03:00:56 GMT Subject: RFR: 8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths [v3] In-Reply-To: References: Message-ID: <0gGFOgqa6b1PoY1u-9WKnJ3UFeyz-w2qsipYoeqsoIA=.9beb9dd6-a875-4c83-85e0-d066077c5b96@github.com> > This is the implementation of a new method added to the JNI specification. > > From the CSR request: > > The `GetStringUTFLength` function returns the length as a `jint` (`jsize`) value and so is limited to returning at most `Integer.MAX_VALUE`. But a Java string can itself consist of `Integer.MAX_VALUE` characters, each of which may require more than one byte to represent them in modified UTF-8 format.** It follows then that this function cannot return the correct answer for all String values and yet the specification makes no mention of this, nor of any possible error to report if this situation is encountered. > > **The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. With compact strings this reduces to 2*`Integer.MAX_VALUE`. > > Solution > > Deprecate the existing JNI `GetStringUTFLength` method noting that it may return a truncated length, and add a new method, JNI `GetStringUTFLengthAsLong` that returns the string length as a `jlong` value. > > --- > > We also add a truncation warning to `GetStringUTFLength` under -Xcheck:jni > > There are some incidental whitespace changes in `src/hotspot/os/posix/dtrace/hotspot_jni.d` along with the new method entries. > > Testing: > - new test added > - tiers 1-3 sanity > > Thanks David Holmes has updated the pull request incrementally with one additional commit since the last revision: The JNI version update was incompete ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20784/files - new: https://git.openjdk.org/jdk/pull/20784/files/73174e64..29cfa8ba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20784&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20784&range=01-02 Stats: 5 lines in 3 files changed: 1 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20784.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20784/head:pull/20784 PR: https://git.openjdk.org/jdk/pull/20784 From dholmes at openjdk.org Tue Sep 3 03:00:56 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 3 Sep 2024 03:00:56 GMT Subject: RFR: 8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths [v2] In-Reply-To: References: <2e6s-MMPDH7HvC8BHvUV4SzjJximYjZr44OL_CnwFWc=.042e04ef-ba2c-4964-9973-4d9963a6410a@github.com> Message-ID: On Fri, 30 Aug 2024 20:45:11 GMT, Chris Plummer wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> Exclude test on 32-bit > > Overall it looks good to me, although I don't have experience adding a new JNI API (the dtrace probes were new to me), but it seems you are following what is already in place for other functions, and the testing looks good. @plummercj and @AlanBateman could you please re-review as I was missing the main parts of the JNI version update! Thanks ------------- PR Comment: https://git.openjdk.org/jdk/pull/20784#issuecomment-2325518928 From rehn at openjdk.org Tue Sep 3 05:44:19 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 3 Sep 2024 05:44:19 GMT Subject: RFR: 8339359: RISC-V: Use auipc explicitly in far_jump and far_call macro assembler routines [v2] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 02:14:05 GMT, Fei Yang wrote: >> Previous discussion: https://github.com/openjdk/jdk/pull/18942#issuecomment-2109162337 >> >> For MacroAssembler::far_call and MacroAssembler::far_jump, I would suggest we use explicit >> auipc instead of MacroAssembler::la for them as the destination is ensured to be in code cache. >> This will help save unnecessary check in MacroAssembler::la and make the code more consistent. >> Also this will help distinguish these two macro assembler routines from MacroAssembler::rt_call >> >> Testing: >> - [x] release & fastdebug builds >> - [x] Tiered 1-3 tests > > Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Fix assertion message > - Merge remote-tracking branch 'upstream/master' into JDK-8339359 > - 8339359: RISC-V: Use auipc explicitly in far_jump and far_call macro assembler routines Thanks, yes! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20805#pullrequestreview-2276446008 From alanb at openjdk.org Tue Sep 3 06:04:19 2024 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 3 Sep 2024 06:04:19 GMT Subject: RFR: 8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths [v3] In-Reply-To: <0gGFOgqa6b1PoY1u-9WKnJ3UFeyz-w2qsipYoeqsoIA=.9beb9dd6-a875-4c83-85e0-d066077c5b96@github.com> References: <0gGFOgqa6b1PoY1u-9WKnJ3UFeyz-w2qsipYoeqsoIA=.9beb9dd6-a875-4c83-85e0-d066077c5b96@github.com> Message-ID: On Tue, 3 Sep 2024 03:00:56 GMT, David Holmes wrote: >> This is the implementation of a new method added to the JNI specification. >> >> From the CSR request: >> >> The `GetStringUTFLength` function returns the length as a `jint` (`jsize`) value and so is limited to returning at most `Integer.MAX_VALUE`. But a Java string can itself consist of `Integer.MAX_VALUE` characters, each of which may require more than one byte to represent them in modified UTF-8 format.** It follows then that this function cannot return the correct answer for all String values and yet the specification makes no mention of this, nor of any possible error to report if this situation is encountered. >> >> **The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. With compact strings this reduces to 2*`Integer.MAX_VALUE`. >> >> Solution >> >> Deprecate the existing JNI `GetStringUTFLength` method noting that it may return a truncated length, and add a new method, JNI `GetStringUTFLengthAsLong` that returns the string length as a `jlong` value. >> >> --- >> >> We also add a truncation warning to `GetStringUTFLength` under -Xcheck:jni >> >> There are some incidental whitespace changes in `src/hotspot/os/posix/dtrace/hotspot_jni.d` along with the new method entries. >> >> Testing: >> - new test added >> - tiers 1-3 sanity >> >> Thanks > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > The JNI version update was incompete Marked as reviewed by alanb (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20784#pullrequestreview-2276469775 From dholmes at openjdk.org Tue Sep 3 06:22:19 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 3 Sep 2024 06:22:19 GMT Subject: RFR: 8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths [v3] In-Reply-To: References: <0gGFOgqa6b1PoY1u-9WKnJ3UFeyz-w2qsipYoeqsoIA=.9beb9dd6-a875-4c83-85e0-d066077c5b96@github.com> Message-ID: On Tue, 3 Sep 2024 06:01:15 GMT, Alan Bateman wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> The JNI version update was incompete > > Marked as reviewed by alanb (Reviewer). Thanks @AlanBateman ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20784#issuecomment-2325689077 From fyang at openjdk.org Tue Sep 3 06:58:20 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 3 Sep 2024 06:58:20 GMT Subject: RFR: 8339359: RISC-V: Use auipc explicitly in far_jump and far_call macro assembler routines [v2] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 02:14:05 GMT, Fei Yang wrote: >> Previous discussion: https://github.com/openjdk/jdk/pull/18942#issuecomment-2109162337 >> >> For MacroAssembler::far_call and MacroAssembler::far_jump, I would suggest we use explicit >> auipc instead of MacroAssembler::la for them as the destination is ensured to be in code cache. >> This will help save unnecessary check in MacroAssembler::la and make the code more consistent. >> Also this will help distinguish these two macro assembler routines from MacroAssembler::rt_call >> >> Testing: >> - [x] release & fastdebug builds >> - [x] Tiered 1-3 tests > > Fei Yang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Fix assertion message > - Merge remote-tracking branch 'upstream/master' into JDK-8339359 > - 8339359: RISC-V: Use auipc explicitly in far_jump and far_call macro assembler routines Thanks all for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20805#issuecomment-2325739081 From fyang at openjdk.org Tue Sep 3 07:01:25 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 3 Sep 2024 07:01:25 GMT Subject: Integrated: 8339359: RISC-V: Use auipc explicitly in far_jump and far_call macro assembler routines In-Reply-To: References: Message-ID: <_tEKNTpc8eIoS281AP4h6t3r2j4HpCp_ACSvGUeliGM=.e80622e0-c05a-4b3e-8f37-b54e77363342@github.com> On Mon, 2 Sep 2024 01:15:36 GMT, Fei Yang wrote: > Previous discussion: https://github.com/openjdk/jdk/pull/18942#issuecomment-2109162337 > > For MacroAssembler::far_call and MacroAssembler::far_jump, I would suggest we use explicit > auipc instead of MacroAssembler::la for them as the destination is ensured to be in code cache. > This will help save unnecessary check in MacroAssembler::la and make the code more consistent. > Also this will help distinguish these two macro assembler routines from MacroAssembler::rt_call > > Testing: > - [x] release & fastdebug builds > - [x] Tiered 1-3 tests This pull request has now been integrated. Changeset: dc4fd896 Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/dc4fd896289db1d2f6f7bbf5795fec533448a48c Stats: 12 lines in 1 file changed: 7 ins; 0 del; 5 mod 8339359: RISC-V: Use auipc explicitly in far_jump and far_call macro assembler routines Reviewed-by: rehn, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/20805 From amitkumar at openjdk.org Tue Sep 3 07:07:49 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 3 Sep 2024 07:07:49 GMT Subject: RFR: 8339419: [s390x] Problemlist compiler/c2/irTests/TestIfMinMax.java Message-ID: <5AmsYDvJtlYrlmC-DeT3Qps3jC_iTU7eDvayX5VJbHA=.393e7a5e-3bc8-44f6-9ae6-0ffc8d6c06b9@github.com> TestIfMinMax.java is failing on s390x, for now I want to disable this test for s390x-platform. In future whenever the failure will be fixed with [JDK-8339220](https://bugs.openjdk.org/browse/JDK-8339220), changes done by this PR will be reverted. I guess this is trivial patch and one review will be required. ------------- Commit messages: - problemlist Changes: https://git.openjdk.org/jdk/pull/20827/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20827&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339419 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20827.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20827/head:pull/20827 PR: https://git.openjdk.org/jdk/pull/20827 From rcastanedalo at openjdk.org Tue Sep 3 07:26:00 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 3 Sep 2024 07:26:00 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> Message-ID: <3ME602kl5NBEdiWT53M-B-YHtyU4yYgwc-lVFPxjiKg=.7298f764-788a-4c79-a2aa-324407c63813@github.com> On Fri, 30 Aug 2024 13:49:10 GMT, Roberto Casta?eda Lozano wrote: > Thanks for your suggestions and comments, Kim! I have addressed them now (modulo a couple of follow-up experiments that I will try out in the next days), please let me know if there is any further question. @kimbarrett I have addressed all your comments now (including follow-up enhancements), please re-review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2325782979 From rcastanedalo at openjdk.org Tue Sep 3 07:26:00 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 3 Sep 2024 07:26:00 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v13] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with four additional commits since the last revision: - Increase test coverage of new-object stores with different type information - Refactor the two post-barrier removal cases into a single expression - Remove unnecessary early null-based post-barrier elision - Make store capturability test G1-specific and more precise ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/4ee450ad..1ea2862f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=11-12 Stats: 88 lines in 5 files changed: 66 ins; 7 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Tue Sep 3 07:26:00 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 3 Sep 2024 07:26:00 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> Message-ID: On Fri, 30 Aug 2024 08:23:44 GMT, Roberto Casta?eda Lozano wrote: > I will study if the check in get_store_barrier is superseded by that in refine_barrier_by_new_val_type. If I can convince myself that this is the case I will consider removing the former. This was indeed the case, so I have removed the compile-time null check from `G1BarrierSetC2::get_store_barrier` (commit deac05d7) and simplified the code around it (commit 6f4027bf). I also added a few extra test cases to exercise stores on newly-allocated objects with different nullness information (commit 1ea2862f). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1741555725 From rcastanedalo at openjdk.org Tue Sep 3 07:26:01 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 3 Sep 2024 07:26:01 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v9] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 13:40:24 GMT, Roberto Casta?eda Lozano wrote: > A perhaps more principled solution might be extending store-capturing analysis to reject stores with late-expanded barriers. I will give it a try. This option proved to be infeasible because other GCs (ZGC) rely on store capturing for barrier elision. Furthermore, this would prevent eliding G1 barriers that are found to be elidable only after the program is simplified by C2's intermediate optimizations, even if `ReduceInitialCardMarks` is enabled (I found a few such cases, e.g. where range check elimination is the enabling simplification). Instead, I have opted to remove the `ReduceInitialCardMarks` condition from `StoreNode::Ideal` and introduce a GC-specific test to determine whether a store can be captured and used for object initialization (commit 6b9954979). For G1, this is true iff the store does not have any barrier or it does have barriers but `ReduceInitialCardMarks` is enabled. For all other GCs the test is always true, which preserves the original mainline behavior. To summarize, this option makes the logic clearer, improves analysis precision, and isolates the changes to G1. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1741554994 From duke at openjdk.org Tue Sep 3 07:40:29 2024 From: duke at openjdk.org (Francesco Nigro) Date: Tue, 3 Sep 2024 07:40:29 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Tue, 27 Aug 2024 17:09:18 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Working on it @galderz in the benchmark did you collected the mispredicts/branches? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2325808756 From thartmann at openjdk.org Tue Sep 3 08:06:29 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 3 Sep 2024 08:06:29 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v5] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 21:51:49 GMT, Coleen Phillimore wrote: >> Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. >> >> Tested with tier1-7. >> >> NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Add parameters and rename generate_klass_flags_guard. The JIT changes look good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20719#pullrequestreview-2276697701 From fgao at openjdk.org Tue Sep 3 08:36:23 2024 From: fgao at openjdk.org (Fei Gao) Date: Tue, 3 Sep 2024 08:36:23 GMT Subject: RFR: 8337536: AArch64: Enable BTI branch protection for runtime part [v3] In-Reply-To: <7JRzzIvH26CZPYCX76eWBbQSYUhMDnOqRufDtWaIXq8=.d3270022-4933-4fa7-828a-f57dbc5b8a46@github.com> References: <7JRzzIvH26CZPYCX76eWBbQSYUhMDnOqRufDtWaIXq8=.d3270022-4933-4fa7-828a-f57dbc5b8a46@github.com> Message-ID: On Thu, 15 Aug 2024 15:32:28 GMT, Fei Gao wrote: >> This patch enables BTI branch protection for runtime part on Linux/aarch64 platform. >> >> Motivation >> >> 1. Since Fedora 33, glibc+kernel are PAC/BTI enabled by default. User-level packages can gain additional hardening by compiling with the GCC/Clang flag `-mbranch-protection=flag`. See [1]. >> >> 2. In JDK-8277204 [2], `--enable-branch-protection` was introduced as one VM configure flag, which would pass `-mbranch-protection=standard` compilation flags to all c/c++ files. Note that `standard` turns on both `pac-ret` and `bti` branch protections. For more details about code reuse attacks and hardware-assisted branch protections on AArch64, see [3]. >> >> However, we checked the `.note.gnu.property` section of all the shared libraries under `jdk/lib` on Fedora 40, and found that only libjvm.so didn't set these two target feature bits: >> >> >> GNU_PROPERTY_AARCH64_FEATURE_1_BTI >> GNU_PROPERTY_AARCH64_FEATURE_1_PAC >> >> >> Note-1: BTI is an all or nothing property for a link unit [4]. That is, libjvm.so is not BTI-enabled. >> >> Note-2: PAC bit in `.note.gnu.property` section is used to protect `.got.plt` table. It's independent of whether the relocatable objects use PAC or not. >> >> Goal >> >> Hence, this patch aims to set PAC/BTI feature bits of the `.note.gnu.property` section for libjvm.so. >> >> Implementation >> >> Task-1: find out the problematic input objects >> >> From [5], "Static linkers processing ELF relocatable objects must set the feature bit in the output object or image only if all the input objects have the corresponding feature bit set." Hence we suspect that the root cause is probably that the PAC/BTI feature bits are not set only for some input objects of libjvm.so. >> >> In order to find out these inputs, we passed `--force-bti` linker flag [4] in my local test. This linker flag would warn if any input object does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI. We got the following list: >> >> >> src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S >> src/hotspot/os_cpu/linux_aarch64/copy_linux_aarch64.S >> src/hotspot/os_cpu/linux_aarch64/safefetch_linux_aarch64.S >> src/hotspot/os_cpu/linux_aarch64/threadLS_linux_aarch64.S >> >> >> Task-2: add `.note.gnu.property` section for these assembly files >> >> As mentioned in Motivation-2 part, `-mbranch-protection=standard` is passed to compile c/c++ files but these assembly files are missed. >> >> In this patch, we also pass `-mbranch-protection=standard` flag to assembler (See the update i... > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Fix indentation > - Merge branch 'master' into enable-bti-runtime > - Clean up makefile > - Merge branch 'master' into enable-bti-runtime > - 8337536: AArch64: Enable BTI branch protection for runtime part > > This patch enables BTI branch protection for runtime part on > Linux/aarch64 platform. > > Motivation > > 1. Since Fedora 33, glibc+kernel are PAC/BTI enabled by default. > User-level packages can gain additional hardening by compiling with the > GCC/Clang flag `-mbranch-protection=flag`. See [1]. > > 2. In JDK-8277204 [2], `--enable-branch-protection` was introduced as > one VM configure flag, which would pass `-mbranch-protection=standard` > compilation flags to all c/c++ files. Note that `standard` turns on both > `pac-ret` and `bti` branch protections. For more details about code > reuse attacks and hardware-assisted branch protections on AArch64, see > [3]. > > However, we checked the `.note.gnu.property` section of all the shared > libraries under `jdk/lib` on Fedora 40, and found that only libjvm.so > didn't set these two target feature bits: > > ``` > GNU_PROPERTY_AARCH64_FEATURE_1_BTI > GNU_PROPERTY_AARCH64_FEATURE_1_PAC > ``` > > Note-1: BTI is an all or nothing property for a link unit [4]. That is, > libjvm.so is not BTI-enabled. > > Note-2: PAC bit in `.note.gnu.property` section is used to protect > `.got.plt` table. It's independent of whether the relocatable objects > use PAC or not. > > Goal > > Hence, this patch aims to set PAC/BTI feature bits of the > `.note.gnu.property` section for libjvm.so. > > Implementation > > Task-1: find out the problematic input objects > > From [5], "Static linkers processing ELF relocatable objects must set > the feature bit in the output object or image only if all the input > objects have the corresponding feature bit set." Hence we suspect that > the root cause is probably that the PAC/BTI feature bits are not set > only for some input objects of libjvm.so. > > In order to find out these inputs, we passed `--force-bti` linker flag > [4] in my local test. This linker flag would warn if any input object > does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI. We got the following > list: > > ... Can I have a review please? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20491#issuecomment-2325917997 From yzheng at openjdk.org Tue Sep 3 08:47:22 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 3 Sep 2024 08:47:22 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v5] In-Reply-To: References: Message-ID: <4a2yapux5-GcnB8oTuHmp2_rsqfLCsexWN9ifNzEjwg=.7594c3dd-6a33-434f-b57a-2105cb9f5c65@github.com> On Fri, 30 Aug 2024 21:51:49 GMT, Coleen Phillimore wrote: >> Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. >> >> Tested with tier1-7. >> >> NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Add parameters and rename generate_klass_flags_guard. JVMCI changes look good to me! ------------- Marked as reviewed by yzheng (Committer). PR Review: https://git.openjdk.org/jdk/pull/20719#pullrequestreview-2276790447 From yzheng at openjdk.org Tue Sep 3 08:47:23 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 3 Sep 2024 08:47:23 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 20:23:54 GMT, Coleen Phillimore wrote: >> I don't think the JVMCI knows about the type KlassFlags - I used the same code that I used for InstanceKlass::_misc_flags._flags (see above this). > > I made the change to refactor the getMiscFlags function, but if you want to add knowledge of the KlassFlags class (and InstanceKlassFlags also), you could do that separately from this PR. I think JVMCI already knows these type via the objArrayKlass import, as it knows about KlassFlags. I will open another PR for refactoring these and other things unrelated to this PR in `HotSpotVMConfig` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1741669853 From thartmann at openjdk.org Tue Sep 3 08:55:18 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 3 Sep 2024 08:55:18 GMT Subject: RFR: 8339419: [s390x] Problemlist compiler/c2/irTests/TestIfMinMax.java In-Reply-To: <5AmsYDvJtlYrlmC-DeT3Qps3jC_iTU7eDvayX5VJbHA=.393e7a5e-3bc8-44f6-9ae6-0ffc8d6c06b9@github.com> References: <5AmsYDvJtlYrlmC-DeT3Qps3jC_iTU7eDvayX5VJbHA=.393e7a5e-3bc8-44f6-9ae6-0ffc8d6c06b9@github.com> Message-ID: On Tue, 3 Sep 2024 06:50:26 GMT, Amit Kumar wrote: > TestIfMinMax.java is failing on s390x, for now I want to disable this test for s390x-platform. In future whenever the failure will be fixed with [JDK-8339220](https://bugs.openjdk.org/browse/JDK-8339220), changes done by this PR will be reverted. > > I guess this is trivial patch and one review will be required. The bug number is missing. ------------- Changes requested by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20827#pullrequestreview-2276816261 From amitkumar at openjdk.org Tue Sep 3 09:17:35 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 3 Sep 2024 09:17:35 GMT Subject: RFR: 8339419: [s390x] Problemlist compiler/c2/irTests/TestIfMinMax.java [v2] In-Reply-To: <5AmsYDvJtlYrlmC-DeT3Qps3jC_iTU7eDvayX5VJbHA=.393e7a5e-3bc8-44f6-9ae6-0ffc8d6c06b9@github.com> References: <5AmsYDvJtlYrlmC-DeT3Qps3jC_iTU7eDvayX5VJbHA=.393e7a5e-3bc8-44f6-9ae6-0ffc8d6c06b9@github.com> Message-ID: > TestIfMinMax.java is failing on s390x, for now I want to disable this test for s390x-platform. In future whenever the failure will be fixed with [JDK-8339220](https://bugs.openjdk.org/browse/JDK-8339220), changes done by this PR will be reverted. > > I guess this is trivial patch and one review will be required. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: adds bug id ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20827/files - new: https://git.openjdk.org/jdk/pull/20827/files/842ee200..b50da67c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20827&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20827&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20827.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20827/head:pull/20827 PR: https://git.openjdk.org/jdk/pull/20827 From amitkumar at openjdk.org Tue Sep 3 09:17:35 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 3 Sep 2024 09:17:35 GMT Subject: RFR: 8339419: [s390x] Problemlist compiler/c2/irTests/TestIfMinMax.java [v2] In-Reply-To: References: <5AmsYDvJtlYrlmC-DeT3Qps3jC_iTU7eDvayX5VJbHA=.393e7a5e-3bc8-44f6-9ae6-0ffc8d6c06b9@github.com> Message-ID: On Tue, 3 Sep 2024 08:52:55 GMT, Tobias Hartmann wrote: > The bug number is missing. Oops, Sorry I missed it. I have added it now. Thanks for pointing it out. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20827#issuecomment-2326010893 From mbaesken at openjdk.org Tue Sep 3 09:23:18 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 3 Sep 2024 09:23:18 GMT Subject: RFR: 8339411: [PPC64] cmpxchgw/h/b doesn't handle external Label In-Reply-To: <3iI0-mL3n5TuWz7FHHo57N6RoKGX1ZIX9Jag5P9GPtc=.290f8349-7049-4a56-853e-db3f1bce0b1a@github.com> References: <3iI0-mL3n5TuWz7FHHo57N6RoKGX1ZIX9Jag5P9GPtc=.290f8349-7049-4a56-853e-db3f1bce0b1a@github.com> Message-ID: On Mon, 2 Sep 2024 20:48:59 GMT, Martin Doerr wrote: > I had forgotten to copy one line from the cmpxchgd in [JDK-8338814](https://bugs.openjdk.org/browse/JDK-8338814) (https://github.com/openjdk/jdk/commit/2edf574f62837678e621e1dfdd8d8a77dbe17ad6). Marked as reviewed by mbaesken (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20826#pullrequestreview-2276882355 From aph at openjdk.org Tue Sep 3 09:28:22 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 3 Sep 2024 09:28:22 GMT Subject: RFR: 8337536: AArch64: Enable BTI branch protection for runtime part [v3] In-Reply-To: <7JRzzIvH26CZPYCX76eWBbQSYUhMDnOqRufDtWaIXq8=.d3270022-4933-4fa7-828a-f57dbc5b8a46@github.com> References: <7JRzzIvH26CZPYCX76eWBbQSYUhMDnOqRufDtWaIXq8=.d3270022-4933-4fa7-828a-f57dbc5b8a46@github.com> Message-ID: On Thu, 15 Aug 2024 15:32:28 GMT, Fei Gao wrote: >> This patch enables BTI branch protection for runtime part on Linux/aarch64 platform. >> >> Motivation >> >> 1. Since Fedora 33, glibc+kernel are PAC/BTI enabled by default. User-level packages can gain additional hardening by compiling with the GCC/Clang flag `-mbranch-protection=flag`. See [1]. >> >> 2. In JDK-8277204 [2], `--enable-branch-protection` was introduced as one VM configure flag, which would pass `-mbranch-protection=standard` compilation flags to all c/c++ files. Note that `standard` turns on both `pac-ret` and `bti` branch protections. For more details about code reuse attacks and hardware-assisted branch protections on AArch64, see [3]. >> >> However, we checked the `.note.gnu.property` section of all the shared libraries under `jdk/lib` on Fedora 40, and found that only libjvm.so didn't set these two target feature bits: >> >> >> GNU_PROPERTY_AARCH64_FEATURE_1_BTI >> GNU_PROPERTY_AARCH64_FEATURE_1_PAC >> >> >> Note-1: BTI is an all or nothing property for a link unit [4]. That is, libjvm.so is not BTI-enabled. >> >> Note-2: PAC bit in `.note.gnu.property` section is used to protect `.got.plt` table. It's independent of whether the relocatable objects use PAC or not. >> >> Goal >> >> Hence, this patch aims to set PAC/BTI feature bits of the `.note.gnu.property` section for libjvm.so. >> >> Implementation >> >> Task-1: find out the problematic input objects >> >> From [5], "Static linkers processing ELF relocatable objects must set the feature bit in the output object or image only if all the input objects have the corresponding feature bit set." Hence we suspect that the root cause is probably that the PAC/BTI feature bits are not set only for some input objects of libjvm.so. >> >> In order to find out these inputs, we passed `--force-bti` linker flag [4] in my local test. This linker flag would warn if any input object does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI. We got the following list: >> >> >> src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S >> src/hotspot/os_cpu/linux_aarch64/copy_linux_aarch64.S >> src/hotspot/os_cpu/linux_aarch64/safefetch_linux_aarch64.S >> src/hotspot/os_cpu/linux_aarch64/threadLS_linux_aarch64.S >> >> >> Task-2: add `.note.gnu.property` section for these assembly files >> >> As mentioned in Motivation-2 part, `-mbranch-protection=standard` is passed to compile c/c++ files but these assembly files are missed. >> >> In this patch, we also pass `-mbranch-protection=standard` flag to assembler (See the update i... > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Fix indentation > - Merge branch 'master' into enable-bti-runtime > - Clean up makefile > - Merge branch 'master' into enable-bti-runtime > - 8337536: AArch64: Enable BTI branch protection for runtime part > > This patch enables BTI branch protection for runtime part on > Linux/aarch64 platform. > > Motivation > > 1. Since Fedora 33, glibc+kernel are PAC/BTI enabled by default. > User-level packages can gain additional hardening by compiling with the > GCC/Clang flag `-mbranch-protection=flag`. See [1]. > > 2. In JDK-8277204 [2], `--enable-branch-protection` was introduced as > one VM configure flag, which would pass `-mbranch-protection=standard` > compilation flags to all c/c++ files. Note that `standard` turns on both > `pac-ret` and `bti` branch protections. For more details about code > reuse attacks and hardware-assisted branch protections on AArch64, see > [3]. > > However, we checked the `.note.gnu.property` section of all the shared > libraries under `jdk/lib` on Fedora 40, and found that only libjvm.so > didn't set these two target feature bits: > > ``` > GNU_PROPERTY_AARCH64_FEATURE_1_BTI > GNU_PROPERTY_AARCH64_FEATURE_1_PAC > ``` > > Note-1: BTI is an all or nothing property for a link unit [4]. That is, > libjvm.so is not BTI-enabled. > > Note-2: PAC bit in `.note.gnu.property` section is used to protect > `.got.plt` table. It's independent of whether the relocatable objects > use PAC or not. > > Goal > > Hence, this patch aims to set PAC/BTI feature bits of the > `.note.gnu.property` section for libjvm.so. > > Implementation > > Task-1: find out the problematic input objects > > From [5], "Static linkers processing ELF relocatable objects must set > the feature bit in the output object or image only if all the input > objects have the corresponding feature bit set." Hence we suspect that > the root cause is probably that the PAC/BTI feature bits are not set > only for some input objects of libjvm.so. > > In order to find out these inputs, we passed `--force-bti` linker flag > [4] in my local test. This linker flag would warn if any input object > does not have GNU_PROPERTY_AARCH64_FEATURE_1_BTI. We got the following > list: > > ... What is the effect on JNI? Is there full interworking with non-branch-protected libraries? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20491#issuecomment-2326037813 From mdoerr at openjdk.org Tue Sep 3 09:31:26 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Sep 2024 09:31:26 GMT Subject: RFR: 8339411: [PPC64] cmpxchgw/h/b doesn't handle external Label In-Reply-To: <3iI0-mL3n5TuWz7FHHo57N6RoKGX1ZIX9Jag5P9GPtc=.290f8349-7049-4a56-853e-db3f1bce0b1a@github.com> References: <3iI0-mL3n5TuWz7FHHo57N6RoKGX1ZIX9Jag5P9GPtc=.290f8349-7049-4a56-853e-db3f1bce0b1a@github.com> Message-ID: <6qRBMGhs_UaSnfa16yoQ7fFwNkTNPPqVRvIRVdfVl4U=.2225937a-54b1-4759-88d7-f8133e2efbbc@github.com> On Mon, 2 Sep 2024 20:48:59 GMT, Martin Doerr wrote: > I had forgotten to copy one line from the cmpxchgd in [JDK-8338814](https://bugs.openjdk.org/browse/JDK-8338814) (https://github.com/openjdk/jdk/commit/2edf574f62837678e621e1dfdd8d8a77dbe17ad6). Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20826#issuecomment-2326039726 From mdoerr at openjdk.org Tue Sep 3 09:31:26 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Sep 2024 09:31:26 GMT Subject: Integrated: 8339411: [PPC64] cmpxchgw/h/b doesn't handle external Label In-Reply-To: <3iI0-mL3n5TuWz7FHHo57N6RoKGX1ZIX9Jag5P9GPtc=.290f8349-7049-4a56-853e-db3f1bce0b1a@github.com> References: <3iI0-mL3n5TuWz7FHHo57N6RoKGX1ZIX9Jag5P9GPtc=.290f8349-7049-4a56-853e-db3f1bce0b1a@github.com> Message-ID: On Mon, 2 Sep 2024 20:48:59 GMT, Martin Doerr wrote: > I had forgotten to copy one line from the cmpxchgd in [JDK-8338814](https://bugs.openjdk.org/browse/JDK-8338814) (https://github.com/openjdk/jdk/commit/2edf574f62837678e621e1dfdd8d8a77dbe17ad6). This pull request has now been integrated. Changeset: 6f3e3fd0 Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/6f3e3fd0d4f5e80e3fdbd26be6483c672479802a Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8339411: [PPC64] cmpxchgw/h/b doesn't handle external Label Reviewed-by: lucy, mbaesken ------------- PR: https://git.openjdk.org/jdk/pull/20826 From thartmann at openjdk.org Tue Sep 3 10:05:18 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 3 Sep 2024 10:05:18 GMT Subject: RFR: 8339419: [s390x] Problemlist compiler/c2/irTests/TestIfMinMax.java [v2] In-Reply-To: References: <5AmsYDvJtlYrlmC-DeT3Qps3jC_iTU7eDvayX5VJbHA=.393e7a5e-3bc8-44f6-9ae6-0ffc8d6c06b9@github.com> Message-ID: On Tue, 3 Sep 2024 09:17:35 GMT, Amit Kumar wrote: >> TestIfMinMax.java is failing on s390x, for now I want to disable this test for s390x-platform. In future whenever the failure will be fixed with [JDK-8339220](https://bugs.openjdk.org/browse/JDK-8339220), changes done by this PR will be reverted. >> >> I guess this is trivial patch and one review will be required. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > adds bug id Looks good and trivial to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20827#pullrequestreview-2276979419 From adinn at openjdk.org Tue Sep 3 10:16:56 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 3 Sep 2024 10:16:56 GMT Subject: RFR: 8339466: Enumerate shared stubs and define static fields and names via declarations Message-ID: Systematize handling of SharedRuntime stubs. Generate enum ids, static fields and names from declarations using template macros. Systematically reference stubs and stub names using ids. ------------- Commit messages: - typo - fix error in x86_32 generator - fix whitespace and error in x86 generator - 8339466: Enumerate shared stubs and define static fields and names via declarations Changes: https://git.openjdk.org/jdk/pull/20832/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20832&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339466 Stats: 458 lines in 11 files changed: 319 ins; 34 del; 105 mod Patch: https://git.openjdk.org/jdk/pull/20832.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20832/head:pull/20832 PR: https://git.openjdk.org/jdk/pull/20832 From adinn at openjdk.org Tue Sep 3 10:51:53 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 3 Sep 2024 10:51:53 GMT Subject: RFR: 8339466: Enumerate shared stubs and define static fields and names via declarations [v2] In-Reply-To: References: Message-ID: <58VOUJfcrvWbeA2QQWGJuXIoSMHJ-L_DNF-HpEM2clg=.e7743483-3026-4fd6-a1d7-c06b6cd6a299@github.com> > Systematize handling of SharedRuntime stubs. Generate enum ids, static fields and names from declarations using template macros. Systematically reference stubs and stub names using ids. Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: fix errors in ppc generator ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20832/files - new: https://git.openjdk.org/jdk/pull/20832/files/84778be3..b5220093 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20832&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20832&range=00-01 Stats: 8 lines in 1 file changed: 6 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20832.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20832/head:pull/20832 PR: https://git.openjdk.org/jdk/pull/20832 From mdoerr at openjdk.org Tue Sep 3 12:06:27 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Sep 2024 12:06:27 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v13] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 07:26:00 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with four additional commits since the last revision: > > - Increase test coverage of new-object stores with different type information > - Refactor the two post-barrier removal cases into a single expression > - Remove unnecessary early null-based post-barrier elision > - Make store capturability test G1-specific and more precise src/hotspot/cpu/aarch64/gc/g1/g1_aarch64.ad line 646: > 644: instruct g1LoadPVolatile(iRegPNoSp dst, indirect mem, iRegPNoSp tmp1, iRegPNoSp tmp2, rFlagsReg cr) > 645: %{ > 646: predicate(UseG1GC && needs_acquiring_load(n) && n->as_Load()->barrier_data() != 0); Remark: This node should never match because the `referent` is never volatile (same for `g1LoadNVolatile`): https://github.com/openjdk/jdk/blob/7a418fc07464fe359a0b45b6d797c65c573770cb/src/java.base/share/classes/java/lang/ref/Reference.java#L157 Hence, `needs_acquiring_load(n)` and `n->as_Load()->barrier_data() != 0` are never true at the same time. Not sure if this should somehow be reflected in the code. I've inserted `ShouldNotReachHere` on PPC64. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1741937425 From mdoerr at openjdk.org Tue Sep 3 12:15:27 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Sep 2024 12:15:27 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v2] In-Reply-To: References: <4c-MLXwKcNcSnloSkYkuk3gnv3ux5i5beS51Fd9Z8MQ=.cd0a7eba-ff26-4855-a01c-d1ae5182100b@github.com> <8fuUEkswt05x0IuT4PrNQuYgLd49g4EpZWOPPQog4PQ=.70b5edb6-98d0-4276-8578-f7a496b7f2a7@github.com> <3H3rBSKDnpg5fmYqcZ5hT9yH2EAxCocycRompQJJCOo=.1b30fd89-09e9-4708-bd20-cdea00e809a7@github.com> <7Bjcf6MF4aTBuk4DmTnGzP0WwCWqJx_sv5k2sGMt9No=.ca426f04-4fa5-4ab1-a414-8c5e6a4e0dce@github.com> <6PF-kgezzOb9Ed7j-BbrwaURnLJH5aFgOouFwTYiFrE=.670092b8-6980-43f3-a091-25312cfa0f1b@github.com> <1vQH6zpEgjhIO_mq9DCnpwxgDXmnqdx0owlvjJq4Fcw=.78e60c6e-2b23-4e94-a998-e7ba9eafcb6a@github.com> <6rvTU-KY2DpLx3sK7zmkaZCBaNELr3FLGItGwUJzNUM=.0e75c7a7-4309-49c5-b48b-5aa642bcbe43@github.com> Message-ID: On Thu, 29 Aug 2024 08:37:24 GMT, Albert Mingkun Yang wrote: >> Thanks, I prototyped the refactored version for both x64 and aarch64 [here](https://github.com/robcasloz/jdk/commit/c1ae871eadac0d44981b7892ac8f7b64e8734283). I do not have a strong opinion for or against this refactoring. @albertnetymk @kimbarrett what do you think about it? (asking since you have recently looked at this code). > > I find the use of default-arg-value `bool decode_new_val = false` a bit confusing. (I tend to think default-arg-value makes the code less readable in general.) > > If not using default-arg-value, I suspect the diff will be larger, and I don't see the immediate benefit of this refactoring. Maybe this can be deferred to its own PR if it's really desirable? @albertnetymk: FYI: The basic idea was to make compressed Oops optimizations easier. It allows using shorter decoding sequences and removing redundant null checks in the fast path. I've implemented it on PPC64: https://github.com/TheRealMDoerr/jdk/blob/ed9c0232f53a15d768804348e1d8a111fed9a19e/src/hotspot/cpu/ppc/gc/g1/g1BarrierSetAssembler_ppc.cpp#L471 But, I'm ok with postponing it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1741950634 From epeter at openjdk.org Tue Sep 3 12:15:31 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Sep 2024 12:15:31 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: <7bghGF2-qbhP1hJA2ljtdA3xSUSqiV0RLaOYm4AcZSQ=.eb3e36b2-5461-4755-ae71-2de89660649f@github.com> On Thu, 29 Aug 2024 05:42:58 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Adding descriptive comments Ok, I left a few more comments. Generally, this looks like a nice feature, thanks for implementing it @jatin-bhateja ! ? A few issues with code style (camelCase vs snake_case). I'm also wondering about good naming. Why did we/you chose "select" for this? Why not "shuffle"? Does "select" not often get used as synonym of "blend", which has different semantics? Also: I'm a little worried about the semantics change of the RearrangeNode that you did with the changes in `RearrangeNode::Ideal`. It looks a little "hacky", especially in conjunction with the `vector_indexes_needs_massaging` method. Can you give a clear definition of the semantics of `RearrangeNode` and `vector_indexes_needs_massaging`, please? I also added some control questions for testing. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 6446: > 6444: } > 6445: > 6446: void C2_MacroAssembler::select_from_two_vector_evex(BasicType elem_bt, XMMRegister dst, XMMRegister src1, I also wonder if you could use the plural in these cases? You are selecting from two vectors, with the plural "s". Of course it is a bit annoying if you would have to name the IR node `SelectFromTwoVectors`, because we usually name the vector nodes `...Vector`, without the plural "s". src/hotspot/share/opto/library_call.cpp line 749: > 747: return inline_vector_compress_expand(); > 748: case vmIntrinsics::_VectorSelectFromTwoVectorOp: > 749: return inline_vector_select_from_two_vectors(); Interesting, here you use the correct plural "vectors". src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 544: > 542: byte[] vpayload1 = ((ByteVector)v1).vec(); > 543: byte[] vpayload2 = ((ByteVector)v2).vec(); > 544: byte[] vpayload3 = ((ByteVector)v3).vec(); Is there a reason you are not using more descriptive names here instead of `vpayload1`? I also wonder if the `selectFromHelper` should not be named more specifically: `selectFromTwoVector(s)Helper`? src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 2595: > 2593: @ForceInline > 2594: final ByteVector selectFromTemplate(ByteVector v1, ByteVector v2) { > 2595: int twovectorlen = length() * 2; `twovectorlen` -> `twoVectorLen` I think in Java we are supposed to use camelCase src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Vector.java line 2770: > 2768: > 2769: /** > 2770: * Rearranges the lane elements of two vectors, selecting lanes I have a bit of a name concern here. Why are we calling it "select" and not "rearrange"? Because for a single "from" vector we also call it "rearrange", right? Is "select" not often synonymous to "blend", which works also with two "from" vectors, but with a mask and not indexing for "selection/rearranging"? test/jdk/jdk/incubator/vector/Byte128VectorTests.java line 324: > 322: boolean is_exceptional_idx = (int)order[idx] >= vector_len; > 323: int oidx = is_exceptional_idx ? ((int)order[idx] - vector_len) : (int)order[idx]; > 324: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx])); I thought general Java style is camelCase? Is that not followed in the VectorAPI code? test/jdk/jdk/incubator/vector/ShortMaxVectorTests.java line 1048: > 1046: return SHORT_GENERATOR_SELECT_FROM_TRIPLES.stream().map(List::toArray). > 1047: toArray(Object[][]::new); > 1048: } Just a control question: does this also occasionally generate examples with out-of-bounds indices? Negative out of bounds and positive out of bounds? test/jdk/jdk/incubator/vector/ShortMaxVectorTests.java line 5812: > 5810: ShortVector bv = ShortVector.fromArray(SPECIES, b, i); > 5811: ShortVector idxv = ShortVector.fromArray(SPECIES, idx, i); > 5812: idxv.selectFrom(av, bv).intoArray(r, i); Would this test catch a bug where the backend would generate vectors that are too long or too short? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20508#pullrequestreview-2276944129 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741766060 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741773766 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741914524 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741911809 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741919025 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741920940 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741947885 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741949290 From epeter at openjdk.org Tue Sep 3 12:15:32 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Sep 2024 12:15:32 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 30 Aug 2024 14:02:46 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding descriptive comments > > src/hotspot/cpu/x86/x86.ad line 10490: > >> 10488: >> 10489: >> 10490: instruct selectFromTwoVec_evex(vec dst, vec src1, vec src2) > > You could rename `dst` -> `mask_and_dst`. That would maybe help the reader to more quickly know that it is an input-mask and output-dst. Also, for consistency, I would write out the name `selectFromTwoVector(s)_evex` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1741772354 From mdoerr at openjdk.org Tue Sep 3 12:20:25 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 3 Sep 2024 12:20:25 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: <3ME602kl5NBEdiWT53M-B-YHtyU4yYgwc-lVFPxjiKg=.7298f764-788a-4c79-a2aa-324407c63813@github.com> References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> <3ME602kl5NBEdiWT53M-B-YHtyU4yYgwc-lVFPxjiKg=.7298f764-788a-4c79-a2aa-324407c63813@github.com> Message-ID: On Tue, 3 Sep 2024 07:22:32 GMT, Roberto Casta?eda Lozano wrote: >>> I've only looked at the changes in gc directories (shared and cpu-specific). >> >> Thanks for your suggestions and comments, Kim! I have addressed them now (modulo a couple of follow-up experiments that I will try out in the next days), please let me know if there is any further question. > >> Thanks for your suggestions and comments, Kim! I have addressed them now (modulo a couple of follow-up experiments that I will try out in the next days), please let me know if there is any further question. > > @kimbarrett I have addressed all your comments now (including follow-up enhancements), please re-review. @robcasloz: Thanks for the updates! I have an implementation for PPC64: https://github.com/TheRealMDoerr/jdk/commit/ed9c0232f53a15d768804348e1d8a111fed9a19e Do you prefer integrating it soon? Otherwise, I could keep it separate and do more rebasing and testing while this PR evolves further. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2326378191 From coleenp at openjdk.org Tue Sep 3 12:33:47 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 3 Sep 2024 12:33:47 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v6] In-Reply-To: References: Message-ID: > Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. > > Tested with tier1-7. > > NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: - Remove unused function declaration. - Add parameters and rename generate_klass_flags_guard. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20719/files - new: https://git.openjdk.org/jdk/pull/20719/files/4c3a04dc..79c35f7c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20719&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20719&range=04-05 Stats: 5 lines in 2 files changed: 0 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20719.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20719/head:pull/20719 PR: https://git.openjdk.org/jdk/pull/20719 From coleenp at openjdk.org Tue Sep 3 12:33:48 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 3 Sep 2024 12:33:48 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v3] In-Reply-To: References: Message-ID: <7zEKCE09_TD7dLWKVs0aB_8i7Dbh7T6Gqhx5TT2z628=.76734232-80a1-4d2b-af5b-6850351c8af3@github.com> On Mon, 2 Sep 2024 10:30:25 GMT, Amit Kumar wrote: >> Thanks Chris and Matias for reviewing parts of this. > > Hi @coleenp, > > I got this error while build on my Mac-M1, did you see something like this ? : > > ERROR: Failed to generate link optimization data. This is likely a problem with the newly built JVM/JDK. > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/Users/amitkumar/jdk/src/hotspot/share/oops/klassFlags.hpp:72), pid=57968, tid=10499 > # assert(!is_value_based_class()) failed: set once > # > # JRE version: (24.0) (fastdebug build ) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 24-internal-adhoc.amitkumar.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-aarch64) > # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again > # > # An error report file with more information is saved as: > # /Users/amitkumar/jdk/make/hs_err_pid57968.log > # > # > > > These are the changes with which I am build the JVM. I can reproduce it on my s390x-machine as well. > > diff --git a/src/hotspot/share/runtime/globals.hpp b/src/hotspot/share/runtime/globals.hpp > index d442894798b..fbd1a2b3281 100644 > --- a/src/hotspot/share/runtime/globals.hpp > +++ b/src/hotspot/share/runtime/globals.hpp > @@ -170,7 +170,7 @@ const int ObjectAlignmentInBytes = 8; > product(bool, AlwaysSafeConstructors, false, EXPERIMENTAL, \ > "Force safe construction, as if all fields are final.") \ > \ > - product(bool, UnlockDiagnosticVMOptions, trueInDebug, DIAGNOSTIC, \ > + product(bool, UnlockDiagnosticVMOptions, true, DIAGNOSTIC, \ > "Enable normal processing of flags relating to field diagnostics")\ > \ > product(bool, UnlockExperimentalVMOptions, false, EXPERIMENTAL, \ > @@ -819,7 +819,7 @@ const int ObjectAlignmentInBytes = 8; > product(bool, RestrictContended, true, \ > "Restrict @Contended to trusted classes") \ > \ > - product(int, DiagnoseSyncOnValueBasedClasses, 0, DIAGNOSTIC, \ > + product(int, DiagnoseSyncOnValueBasedClasses, 1, DIAGNOSTIC, \ > "Detect and take action upon identifying synchronization on " \ > "value based classes. Modes: " ... @offamitkumar Thank you for finding this bug. These flags have asserts that they're only set once, but CDS restores sets the value for this flag. Since it was set when dumping the archive, it resets it, which is okay in this case. I have a fix for this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20719#issuecomment-2326395437 From coleenp at openjdk.org Tue Sep 3 12:33:48 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 3 Sep 2024 12:33:48 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v5] In-Reply-To: References: Message-ID: <5-nmLoyCJm-LswMCIwx8ieuePekAgdTJbq1mCuDCpS8=.6a6ace3b-24e3-4dbb-ab76-f4d02f3802bf@github.com> On Sat, 31 Aug 2024 10:22:29 GMT, ExE Boss wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Add parameters and rename generate_klass_flags_guard. > > src/hotspot/share/opto/library_call.hpp line 161: > >> 159: Node* generate_mods_flags_guard(Node* kls, >> 160: int modifier_mask, int modifier_bits, >> 161: RegionNode* region); > > This?method was?removed. > > Suggestion: Thank you for noticing this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1741975860 From coleenp at openjdk.org Tue Sep 3 12:43:23 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 3 Sep 2024 12:43:23 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 08:42:23 GMT, Yudi Zheng wrote: >> I made the change to refactor the getMiscFlags function, but if you want to add knowledge of the KlassFlags class (and InstanceKlassFlags also), you could do that separately from this PR. > > I think JVMCI already knows these type via the objArrayKlass import, as it knows about KlassFlags. I will open another PR for refactoring these and other things unrelated to this PR in `HotSpotVMConfig` Ok, yes, thanks for opening a new issue for this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1741993231 From coleenp at openjdk.org Tue Sep 3 12:43:24 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 3 Sep 2024 12:43:24 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v3] In-Reply-To: References: <-dyvOdrDMU8UERNjLmg8NhFNta6ukiqRXrM1oJvyzc4=.f9f80021-8b7a-425f-807a-89b7dab293dc@github.com> Message-ID: On Fri, 30 Aug 2024 22:25:54 GMT, Dean Long wrote: >> Really, this is better? it adds three parameters. I made this change. > > It reduces duplicate code, which is usually good. Yes, I like it better. Ok! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1741992407 From coleenp at openjdk.org Tue Sep 3 12:51:23 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 3 Sep 2024 12:51:23 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 15:38:24 GMT, Thomas Stuefe wrote: >>> I don't think the costs for two address comparisons matter, not with the comparatively few deallocations that happen (few hundreds or few thousand). If deallocate is hot, we are using metaspace wrong. >> >> MethodData does a lot of deallocations from metaspace because it's allocated racily. It might be using Metaspace wrong. > >> > I don't think the costs for two address comparisons matter, not with the comparatively few deallocations that happen (few hundreds or few thousand). If deallocate is hot, we are using metaspace wrong. >> >> MethodData does a lot of deallocations from metaspace because it's allocated racily. It might be using Metaspace wrong. > > I think that should be okay. This should still be an exception. I have never seen that many deallocations happening in customer cases. @tstuefe Do you have more comments on this PR? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19157#issuecomment-2326439851 From epeter at openjdk.org Tue Sep 3 12:54:24 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Sep 2024 12:54:24 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v5] In-Reply-To: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> References: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> Message-ID: On Mon, 2 Sep 2024 12:20:59 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolved Ok. This is a huge change. And you do not just introduce changes to the VectorAPI and add Vector instructions. But you also add the scalar instructions. Can you split this into at least 2 PR's that are smaller please? - Scalar saturating instructions: they could even be made available to the user via `Integer.saturatingAdd` etc. Would that not be desired? - Vector saturating instructions I'm afraid that now you are not using the scalar ops individually at all, and they are only used as fallback when the vector-api code is not intrinsified. But how can we test this properly? I'm just not very happy having to review 9K+ PR's ? src/hotspot/cpu/x86/assembler_x86.cpp line 560: > 558: } > 559: > 560: bool Assembler::needs_evex(XMMRegister reg1, XMMRegister reg2, XMMRegister reg3) { This is an ASSERT / DEBUG only method, correct? Do you want to `#ifdef ASSERT` it accordingly? src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 914: > 912: case T_SHORT: vpminuw(dst, src1, src2, vlen_enc); break; > 913: case T_INT: vpminud(dst, src1, src2, vlen_enc); break; > 914: case T_LONG: evpminuq(dst, k0, src1, src2, false, vlen_enc); break; Can you explain to me what the `k0` is and where it comes from? src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 929: > 927: } > 928: > 929: void C2_MacroAssembler::vpuminmaxq(int opcode, XMMRegister dst, XMMRegister src1, XMMRegister src2, XMMRegister xtmp1, XMMRegister xtmp2, int vlen_enc) { Either wrap all inputs or none ;) src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4705: > 4703: default: > 4704: fatal("Unsupported operation %s", NodeClassNames[ideal_opc]); > 4705: break; Did you mean to explicitly mention these cases as unsupported? If yes, please add a comment to the code why. src/hotspot/cpu/x86/x86.ad line 6527: > 6525: %} > 6526: ins_pipe( pipe_slow ); > 6527: %} Should change the `uminmax_reg` to indicate that it is a `vector` operation? The `format` already says `vector_uminmax_reg`... Because what if we one day want to use the name `uminmax_reg` for a scalar operation? src/hotspot/share/opto/addnode.hpp line 194: > 192: class SaturatingAddINode : public Node { > 193: public: > 194: SaturatingAddINode(Node* in1, Node* in2) : Node(in1,in2) {} Suggestion: SaturatingAddINode(Node* in1, Node* in2) : Node(in1, in2) {} In other places below as well. src/hotspot/share/opto/addnode.hpp line 198: > 196: virtual const Type* bottom_type() const { return TypeInt::INT; } > 197: virtual uint ideal_reg() const { return Op_RegI; } > 198: }; Are these not supposed to inherit from the `AddNode`, and then override the corresponding methods? Or are you making them separate for a good reason? src/hotspot/share/opto/addnode.hpp line 462: > 460: //------------------------------UMaxINode--------------------------------------- > 461: // Maximum of 2 unsigned integers. > 462: class UMaxLNode : public Node { Here you comment it with `UMaxINode`, but below it is the `UMaxLNode`. The `-------xyz------` comments are really useless. But the semantics description is useful (though you again say integer instead of long here...). src/hotspot/share/opto/matcher.hpp line 380: > 378: static BasicType vector_element_basic_type(const MachNode* use, const MachOper* opnd); > 379: static const Type* vector_element_type(const Node* n); > 380: static const Type* vector_element_type(const MachNode* use, const MachOper* opnd); You should probably create your own section for this, since this is not about the **basic** type. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20507#pullrequestreview-2277262281 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1741956515 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1741964463 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1741961089 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1741971197 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1741976975 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1741990855 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1741984047 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1741987722 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1741997411 From ihse at openjdk.org Tue Sep 3 13:02:55 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 3 Sep 2024 13:02:55 GMT Subject: RFR: 8339480: Build static-jdk image with a statically linked launcher In-Reply-To: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> References: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> Message-ID: On Tue, 3 Sep 2024 12:50:01 GMT, Magnus Ihse Bursie wrote: > As a prerequisite for Hermetic Java, we need a statically linked `java` launcher. It should behave like the normal, dynamically linked `java` launcher, except that all JDK native libraries should be statically, not dynamically, linked. > > This patch is the first step towards this goal. It will generate a `static-jdk` image with a statically linked launcher. This launcher is missing several native libs, however, and does therefore not behave like a proper dynamic java. One of the reasons for this is that local symbol hiding in static libraries are not implemented yet, which causes symbol clashes when linking all static libraries together. This will be addressed in an upcoming patch. > > All changes in the `src` directory are copied from, or inspired by, changes made in [the hermetic-java-runtime branch in Project Leyden](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). @jianglizhou Can you please check if there are any other contributors that should be acknowledged? The `static-jdk` image is enough to run a simple HelloWorld java program; however it cannot yet run JTReg tests. The reason for this is, afaict, that the javac launcher is missing. I am planning to examine if it is possible to add a small shim `javac` launcher that calls the static `java` with the proper main class. (And similarly for the other launchers.) The goal here must, after all, be that we should be able to run the normal jtreg tests on the `static-jdk` image. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20837#issuecomment-2326445877 PR Comment: https://git.openjdk.org/jdk/pull/20837#issuecomment-2326452510 From ihse at openjdk.org Tue Sep 3 13:02:55 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 3 Sep 2024 13:02:55 GMT Subject: RFR: 8339480: Build static-jdk image with a statically linked launcher Message-ID: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> As a prerequisite for Hermetic Java, we need a statically linked `java` launcher. It should behave like the normal, dynamically linked `java` launcher, except that all JDK native libraries should be statically, not dynamically, linked. This patch is the first step towards this goal. It will generate a `static-jdk` image with a statically linked launcher. This launcher is missing several native libs, however, and does therefore not behave like a proper dynamic java. One of the reasons for this is that local symbol hiding in static libraries are not implemented yet, which causes symbol clashes when linking all static libraries together. This will be addressed in an upcoming patch. All changes in the `src` directory are copied from, or inspired by, changes made in [the hermetic-java-runtime branch in Project Leyden](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). ------------- Commit messages: - Makefile changes needed for static-launcher and static-jdk-image targets - Incorporate changes from leyden/hermetic-java-runtime that allows running a static launcher Changes: https://git.openjdk.org/jdk/pull/20837/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20837&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339480 Stats: 429 lines in 22 files changed: 351 ins; 5 del; 73 mod Patch: https://git.openjdk.org/jdk/pull/20837.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20837/head:pull/20837 PR: https://git.openjdk.org/jdk/pull/20837 From epeter at openjdk.org Tue Sep 3 13:06:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Sep 2024 13:06:27 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v5] In-Reply-To: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> References: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> Message-ID: On Mon, 2 Sep 2024 12:20:59 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolved Ok, I left a few more comments. I think this PR could definately be split. It would make it more reviewable for me. src/hotspot/share/opto/vectornode.hpp line 148: > 146: > 147: //===========================Vector=ALU=Operations============================= > 148: class SaturatingVectorNode : public VectorNode { Semantics description of Saturation would be appreciated :) src/hotspot/share/opto/vectornode.hpp line 634: > 632: virtual int Opcode() const; > 633: }; > 634: This could also be a separate PR. Or are they somehow inseparable from the "saturation" changes? src/hotspot/share/prims/vectorSupport.hpp line 129: > 127: VECTOR_OP_SUSUB = 122, > 128: VECTOR_OP_UMIN = 123, > 129: VECTOR_OP_UMAX = 124, Please keep the alignment consistent. src/java.base/share/classes/java/lang/Integer.java line 1994: > 1992: * @return the greater of {@code a} and {@code b} > 1993: * @see java.util.function.BinaryOperator > 1994: * @since 1.8 Is this a copy error or did this already exist since `1.8`? src/java.base/share/classes/jdk/internal/vm/vector/VectorSupport.java line 395: > 393: > 394: /* ============================================================================ */ > 395: These comment lines seem redundant... test/jdk/jdk/incubator/vector/gen-template.sh line 317: > 315: function gen_saturating_binary_op { > 316: echo "Generating binary op $1 ($2)..." > 317: # gen_op_tmpl $binary_scalar "$@" Is this commented on purpose? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20507#pullrequestreview-2277361678 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742016482 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742019985 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742021810 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742024534 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742026062 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742028394 From epeter at openjdk.org Tue Sep 3 13:12:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Sep 2024 13:12:22 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v5] In-Reply-To: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> References: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> Message-ID: On Mon, 2 Sep 2024 12:20:59 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolved I really like the additions here. More scalar ops and vector ops are fantastic! But I'd like you to split it into scalar and vector changes. Because on both sides we'll have to do some review work to get it all right. You did in fact add `java/lang` methods. I think you need to add tests for all of those. As well. That's going to be even more code to review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2326480778 PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2326486187 From adinn at openjdk.org Tue Sep 3 13:52:24 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 3 Sep 2024 13:52:24 GMT Subject: RFR: 8339466: Enumerate shared stubs and define static fields and names via declarations [v2] In-Reply-To: <58VOUJfcrvWbeA2QQWGJuXIoSMHJ-L_DNF-HpEM2clg=.e7743483-3026-4fd6-a1d7-c06b6cd6a299@github.com> References: <58VOUJfcrvWbeA2QQWGJuXIoSMHJ-L_DNF-HpEM2clg=.e7743483-3026-4fd6-a1d7-c06b6cd6a299@github.com> Message-ID: On Tue, 3 Sep 2024 10:51:53 GMT, Andrew Dinn wrote: >> Systematize handling of SharedRuntime stubs. Generate enum ids, static fields and names from declarations using template macros. Systematically reference stubs and stub names using ids. > > Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: > > fix errors in ppc generator src/hotspot/share/runtime/stubDeclarations.hpp line 28: > 26: #ifndef SHARE_RUNTIME_SHAREDRUNTIME_ID_HPP > 27: #define SHARE_RUNTIME_SHAREDRUNTIME_ID_HPP > 28: This should be SHARE_RUNTIME_STUBDECLARATIONS_HPP ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20832#discussion_r1742101540 From coleenp at openjdk.org Tue Sep 3 14:57:19 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 3 Sep 2024 14:57:19 GMT Subject: RFR: 8339316: Test runtime/exceptionMsgs/NoClassDefFoundError/NoClassDefFoundErrorTest.java fails after JDK-8338257 In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 00:40:54 GMT, David Holmes wrote: > In JDK-8338257 I overlooked updating the callers of `UTF8::is_legal_utf8` to pass a `size_t` length parameter. In some cases the length was explicitly cast to `int` and in the test case in question (with `-Xcheck:jni`) this caused integer overflow to a negative value which then became an exceedingly large `size_t` value and we then tried to do utf8 validation on random bytes. > > Testing: > - failing test > - tiers 1-4 > > Thanks src/hotspot/share/classfile/systemDictionary.cpp line 288: > 286: } > 287: // Callers should ensure that the name is never an illegal UTF8 string. > 288: assert(UTF8::is_legal_utf8((const unsigned char*)name, name_len, false), Is there where the > INT_MAX length got in from JNI? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20804#discussion_r1742218782 From amitkumar at openjdk.org Tue Sep 3 15:35:23 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 3 Sep 2024 15:35:23 GMT Subject: RFR: 8339419: [s390x] Problemlist compiler/c2/irTests/TestIfMinMax.java [v2] In-Reply-To: References: <5AmsYDvJtlYrlmC-DeT3Qps3jC_iTU7eDvayX5VJbHA=.393e7a5e-3bc8-44f6-9ae6-0ffc8d6c06b9@github.com> Message-ID: On Tue, 3 Sep 2024 09:17:35 GMT, Amit Kumar wrote: >> TestIfMinMax.java is failing on s390x, for now I want to disable this test for s390x-platform. In future whenever the failure will be fixed with [JDK-8339220](https://bugs.openjdk.org/browse/JDK-8339220), changes done by this PR will be reverted. >> >> I guess this is trivial patch and one review will be required. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > adds bug id Thanks for approval, I think it's trival. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20827#issuecomment-2326826424 From amitkumar at openjdk.org Tue Sep 3 15:35:23 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 3 Sep 2024 15:35:23 GMT Subject: Integrated: 8339419: [s390x] Problemlist compiler/c2/irTests/TestIfMinMax.java In-Reply-To: <5AmsYDvJtlYrlmC-DeT3Qps3jC_iTU7eDvayX5VJbHA=.393e7a5e-3bc8-44f6-9ae6-0ffc8d6c06b9@github.com> References: <5AmsYDvJtlYrlmC-DeT3Qps3jC_iTU7eDvayX5VJbHA=.393e7a5e-3bc8-44f6-9ae6-0ffc8d6c06b9@github.com> Message-ID: On Tue, 3 Sep 2024 06:50:26 GMT, Amit Kumar wrote: > TestIfMinMax.java is failing on s390x, for now I want to disable this test for s390x-platform. In future whenever the failure will be fixed with [JDK-8339220](https://bugs.openjdk.org/browse/JDK-8339220), changes done by this PR will be reverted. > > I guess this is trivial patch and one review will be required. This pull request has now been integrated. Changeset: 0d593cd1 Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/0d593cd1945e93a7d3c33ad270a81403b6fbeb3f Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8339419: [s390x] Problemlist compiler/c2/irTests/TestIfMinMax.java Reviewed-by: thartmann ------------- PR: https://git.openjdk.org/jdk/pull/20827 From lucy at openjdk.org Tue Sep 3 15:46:20 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 3 Sep 2024 15:46:20 GMT Subject: RFR: 8338658: New Object to ObjectMonitor mapping: s390x implementation In-Reply-To: References: Message-ID: On Wed, 28 Aug 2024 04:26:23 GMT, Amit Kumar wrote: > s390x implementation of [JDK-8315884](https://bugs.openjdk.org/browse/JDK-8315884) New Object to ObjectMonitor mapping; > > Testing: > - tier1-test (fastdebug) > - tier1-test with UseObjectMonitorTable (fastdebug) > - tier1-test with UseObjectMonitorTable (release) I made a few comments you may want to consider. src/hotspot/cpu/s390/macroAssembler_s390.cpp line 6026: > 6024: z_cghi(top, 0); > 6025: z_stg(top, Address(basic_lock, BasicObjectLock::lock_offset() + in_ByteSize((BasicLock::object_monitor_cache_offset_in_bytes())))); > 6026: } What is this block good for? Looking at x86, it should store '0' into cache. Instead, it stores whatever top contains. The cgi has no effect. src/hotspot/cpu/s390/macroAssembler_s390.cpp line 6165: > 6163: z_stg(tmp1, Address(box, BasicLock::object_monitor_cache_offset_in_bytes())); > 6164: } > 6165: Why not use mvghi here to directly write zero to memory? Prerequisites: displacement must be uimm12. src/hotspot/cpu/s390/macroAssembler_s390.cpp line 6255: > 6253: // check for match. > 6254: z_cg(obj, Address(tmp1)); > 6255: z_bre(monitor_found); Are we sure there are at least three (this one and two unrolled) non-null cache entries? src/hotspot/cpu/s390/macroAssembler_s390.cpp line 6261: > 6259: z_lg(tmp2, Address(tmp1)); // TODO: top is killed here!!!! > 6260: z_ltgr(tmp2, tmp2); > 6261: z_brne(loop); // if not EQ to 0, go for another loop Why not use cghsi (does not kill a register) or ltg at least to check for null? ------------- Changes requested by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20740#pullrequestreview-2276038484 PR Review Comment: https://git.openjdk.org/jdk/pull/20740#discussion_r1741163393 PR Review Comment: https://git.openjdk.org/jdk/pull/20740#discussion_r1741171834 PR Review Comment: https://git.openjdk.org/jdk/pull/20740#discussion_r1741185191 PR Review Comment: https://git.openjdk.org/jdk/pull/20740#discussion_r1741183341 From stuefe at openjdk.org Tue Sep 3 15:53:27 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 3 Sep 2024 15:53:27 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace In-Reply-To: References: Message-ID: On Tue, 27 Aug 2024 15:38:24 GMT, Thomas Stuefe wrote: >>> I don't think the costs for two address comparisons matter, not with the comparatively few deallocations that happen (few hundreds or few thousand). If deallocate is hot, we are using metaspace wrong. >> >> MethodData does a lot of deallocations from metaspace because it's allocated racily. It might be using Metaspace wrong. > >> > I don't think the costs for two address comparisons matter, not with the comparatively few deallocations that happen (few hundreds or few thousand). If deallocate is hot, we are using metaspace wrong. >> >> MethodData does a lot of deallocations from metaspace because it's allocated racily. It might be using Metaspace wrong. > > I think that should be okay. This should still be an exception. I have never seen that many deallocations happening in customer cases. > @tstuefe Do you have more comments on this PR? Sorry, I was swamped the past days. I'll take a look tomorrow. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19157#issuecomment-2326867932 From sgehwolf at openjdk.org Tue Sep 3 16:09:01 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Tue, 3 Sep 2024 16:09:01 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v8] In-Reply-To: References: Message-ID: > Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. > > I'm adding those tests in order to not regress another time. > > Testing: > - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. > - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) > - [x] GHA Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: - Merge branch 'master' into jdk-8333446-systemd-slice-tests - Fix comment of WB::host_cpus() - Handle non-root + CGv2 - Add nested hierarchy to test framework - Revert "Add root check for SystemdMemoryAwarenessTest.java" This reverts commit 7e8d9ed46815096ae8c4502f3320ebf5208438d5. - Add root check for SystemdMemoryAwarenessTest.java - Merge branch 'master' into jdk-8333446-systemd-slice-tests - Merge branch 'master' into jdk-8333446-systemd-slice-tests - Merge branch 'master' into jdk-8333446-systemd-slice-tests - Add Whitebox check for host cpu - ... and 5 more: https://git.openjdk.org/jdk/compare/fc4604c0...cf49a96e ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19530/files - new: https://git.openjdk.org/jdk/pull/19530/files/a98fd7d6..cf49a96e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19530&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19530&range=06-07 Stats: 10055 lines in 354 files changed: 6004 ins; 1681 del; 2370 mod Patch: https://git.openjdk.org/jdk/pull/19530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19530/head:pull/19530 PR: https://git.openjdk.org/jdk/pull/19530 From gziemski at openjdk.org Tue Sep 3 17:05:19 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 3 Sep 2024 17:05:19 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: <5TUXq3-MyXcwxS30Wf2DQKXzADQY-W5sSipZTW4P7zg=.3824be94-4d6b-4e2a-a5ea-a5f25d71afb7@github.com> On Mon, 2 Sep 2024 04:03:28 GMT, David Holmes wrote: > FWIW as I recall the suggestion to include NMT in the name in some form was to make it clear that these kinds of parameter, which appear all over the place, are needed because of NMT and are not inherently part of whatever API they appear in. Whether that happens via a namespace, a nested enum, or a simple prefix, I don't really care except to say that anything that can then result in dropping the NMT in the source code (e.g. via a using directive) completely defeats the purpose of having it in the first place. So if there is no good answer here than I guess we just drop NMT from the name. Kim and Stefan said that any consideration of using namespace would require additional discussion. And almost everyone dislikes adding the `NMT_` prefix. I feel like we achieved (imperfect) agreement that allows us to proceed with a cleanup using `MemTag` as the new type name to replace `MEMFLAGS`, which I hope everyone agrees is an improvement. I am almost done with that name change and I see it as a significant improvement worthwhile of this effort, with anything else that can be handled in followups, however, if you feel strongly that we should discuss the full topic right now, before proceeding, please let it be known here. Personally I just wanted to cleanup `MEMFLAGS` and related `flag(s)` names that we used in very inconsistent matter. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2327019788 From erikj at openjdk.org Tue Sep 3 18:12:22 2024 From: erikj at openjdk.org (Erik Joelsson) Date: Tue, 3 Sep 2024 18:12:22 GMT Subject: RFR: 8339480: Build static-jdk image with a statically linked launcher In-Reply-To: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> References: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> Message-ID: On Tue, 3 Sep 2024 12:50:01 GMT, Magnus Ihse Bursie wrote: > As a prerequisite for Hermetic Java, we need a statically linked `java` launcher. It should behave like the normal, dynamically linked `java` launcher, except that all JDK native libraries should be statically, not dynamically, linked. > > This patch is the first step towards this goal. It will generate a `static-jdk` image with a statically linked launcher. This launcher is missing several native libs, however, and does therefore not behave like a proper dynamic java. One of the reasons for this is that local symbol hiding in static libraries are not implemented yet, which causes symbol clashes when linking all static libraries together. This will be addressed in an upcoming patch. > > All changes in the `src` directory are copied from, or inspired by, changes made in [the hermetic-java-runtime branch in Project Leyden](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). I tried to take this for a spin on my m1 mac laptop. I ran configure and then `make static-jdk-image`. The build failed with the following: chmod: /Users/erik/dev/jdk/build/macosx-aarch64/jdk/lib/server/libjsig.dylib: No such file or directory ModuleWrapper.gmk:81: recipe for target '/Users/erik/dev/jdk/build/macosx-aarch64/jdk/lib/server/libjsig.dylib' failed make[3]: *** [/Users/erik/dev/jdk/build/macosx-aarch64/jdk/lib/server/libjsig.dylib] Error 1 make[3]: *** Waiting for unfinished jobs.... make/Main.gmk:191: recipe for target 'java.base-libs' failed make[2]: *** [java.base-libs] Error 2 I'm guessing this would work if I built the regular image first, or at least at the same time. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20837#issuecomment-2327132241 From kvn at openjdk.org Tue Sep 3 18:33:26 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Sep 2024 18:33:26 GMT Subject: RFR: 8339466: Enumerate shared stubs and define static fields and names via declarations [v2] In-Reply-To: <58VOUJfcrvWbeA2QQWGJuXIoSMHJ-L_DNF-HpEM2clg=.e7743483-3026-4fd6-a1d7-c06b6cd6a299@github.com> References: <58VOUJfcrvWbeA2QQWGJuXIoSMHJ-L_DNF-HpEM2clg=.e7743483-3026-4fd6-a1d7-c06b6cd6a299@github.com> Message-ID: On Tue, 3 Sep 2024 10:51:53 GMT, Andrew Dinn wrote: >> Systematize handling of SharedRuntime stubs. Generate enum ids, static fields and names from declarations using template macros. Systematically reference stubs and stub names using ids. > > Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: > > fix errors in ppc generator src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 2571: > 2569: SafepointBlob* SharedRuntime::generate_handler_blob(sharedStubId id, address call_ptr) { > 2570: assert((id >= sharedStubId::polling_page_vectors_safepoint_handler_id || > 2571: id <= sharedStubId::polling_page_return_handler_id), This and all other similar assert checks depends on the order of stubs ID. I think such checks should be in `sharedStubId` where the order is defined: `sharedStubId::is_polling_page_id(id)` src/hotspot/share/runtime/stubDeclarations.hpp line 2: > 1: /* > 2: * Copyright (c) 2024, 2024, Oracle and/or its affiliates. All rights reserved. You need only one 2024 year ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20832#discussion_r1742492107 PR Review Comment: https://git.openjdk.org/jdk/pull/20832#discussion_r1742486529 From iklam at openjdk.org Tue Sep 3 19:11:00 2024 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 3 Sep 2024 19:11:00 GMT Subject: RFR: 8338018: Rename ClassPrelinker to AOTConstantPoolResolver [v2] In-Reply-To: References: Message-ID: > This is the 2nd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > A simple renaming of the `ClassPrelinker` class to `AOTConstantPoolLinker`, so that the name is consistent with new classes that will be introduced in subsequent PRs for JEP 483 (`AOTClassLinker`, `AOTLinkedClassTable`, and `AOTLinkedClassBulkLoader`). > > ----- > See [here](https://bugs.openjdk.org/browse/JDK-8315737) for the sequence of dependent RFEs for implementing JEP 483. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'jep-483-step-01-8338017-add-aot-command-line-aliases' into jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver - Merge branch 'jep-483-step-01-8338017-add-aot-command-line-aliases' into jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver - Merge branch 'jep-483-step-01-8338017-add-aot-command-line-aliases' into jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver - Merge branch 'jep-483-step-01-8338017-add-aot-command-line-aliases' into jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver - 8338018: Rename ClassPrelinker to AOTConstantPoolResolver ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20517/files - new: https://git.openjdk.org/jdk/pull/20517/files/eed58795..fed4dfed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20517&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20517&range=00-01 Stats: 22614 lines in 714 files changed: 15522 ins; 4063 del; 3029 mod Patch: https://git.openjdk.org/jdk/pull/20517.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20517/head:pull/20517 PR: https://git.openjdk.org/jdk/pull/20517 From cjplummer at openjdk.org Tue Sep 3 19:50:26 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 3 Sep 2024 19:50:26 GMT Subject: RFR: 8339307: jhsdb jstack could not trace FFM upcall frame [v4] In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 10:29:40 GMT, Yasumasa Suenaga wrote: >> I attempted to check stack trace in the core generated by [SEGV example in upcall](https://github.com/YaSuenag/garakuta/blob/841452d9176dab1ddbb552009c180530eb81190b/NativeSEGV/ffm/upcall/src/main/java/com/yasuenag/garakuta/nativesegv/upcall/Main.java) with `jhsdb jstack`, however it failed with following exception. >> >> >> Error occurred during stack walking: >> java.lang.RuntimeException: Couldn't deduce type of CodeBlob @0x00007fa04c265990 for PC=0x00007fa04c265aa6 >> at jdk.hotspot.agent/sun.jvm.hotspot.code.CodeCache.findBlobUnsafe(CodeCache.java:124) >> at jdk.hotspot.agent/sun.jvm.hotspot.code.CodeCache.findBlob(CodeCache.java:83) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.cb(Frame.java:119) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.adjustUnextendedSP(X86Frame.java:334) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.initFrame(X86Frame.java:137) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.(X86Frame.java:163) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.senderForInterpreterFrame(X86Frame.java:361) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.sender(X86Frame.java:281) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.sender(Frame.java:207) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.realSender(Frame.java:212) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.sender(VFrame.java:120) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.javaSender(VFrame.java:144) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:81) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.run(JStack.java:67) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:278) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:241) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:134) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.runWithArgs(JStack.java:90) >> at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runJSTACK(SALauncher.java:302) >> at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:500) >> Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for type of address 0x00007fa04c265990 (nearest symbol is _ZTV10Upcall... > > Yasumasa Suenaga has updated the pull request incrementally with three additional commits since the last revision: > > - Update test/hotspot/jtreg/serviceability/sa/LingeredAppWithFFMUpcall.java > > Co-authored-by: Andrey Turbanov > - Update test/hotspot/jtreg/serviceability/sa/LingeredAppWithFFMUpcall.java > > Co-authored-by: Andrey Turbanov > - Update test/hotspot/jtreg/serviceability/sa/LingeredAppWithFFMUpcall.java > > Co-authored-by: Andrey Turbanov test/hotspot/jtreg/serviceability/sa/TestJhsdbJstackUpcall.java line 49: > 47: * Test should focus JNI call (caller of upcall) because the frame > 48: * prior to the upcall cannot be obtained if some exception happens > 49: * in during to process upcall. 2 questions here. Shouldn't we also check for the java frame that the upcall was mode to. 2nd is why are we worried about an exception here. test/hotspot/jtreg/serviceability/sa/TestJhsdbJstackUpcall.java line 51: > 49: * in during to process upcall. > 50: */ > 51: private static boolean isJNIFrame(List lines) { I think this method could use a better name like hasFFMUpcall(). test/hotspot/jtreg/serviceability/sa/TestJhsdbJstackUpcall.java line 57: > 55: > 56: private static void runJstackInLoop(LingeredApp app) throws Exception { > 57: for (int i = 0; i < MAX_ITERATIONS; i++) { What is the reason for doing 20 iterations. Is it because you are waiting for THREAD_NAME to enter the sleep() call? If so, we've addressed this in the past for the general case of wanting to do a "stable"stack trace by using the LingeredApp's SteadyStateThread. LingeredApp.startApp() will not return until this thread has become stable (blocked). Maybe you can do something similar with THREAD_NAME. test/hotspot/jtreg/serviceability/sa/TestJhsdbJstackUpcall.java line 76: > 74: out.shouldContain(LingeredAppWithFFMUpcall.THREAD_NAME); > 75: if (isJNIFrame(out.asLines())) { > 76: System.out.println("DEBUG: Test triggered interesting condition."); I'm not so sure what is meant by "intersting condition". Perhaps you mean "detected FFM upcall" or something like that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20789#discussion_r1742592058 PR Review Comment: https://git.openjdk.org/jdk/pull/20789#discussion_r1742590621 PR Review Comment: https://git.openjdk.org/jdk/pull/20789#discussion_r1742584621 PR Review Comment: https://git.openjdk.org/jdk/pull/20789#discussion_r1742588694 From ihse at openjdk.org Tue Sep 3 19:53:18 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 3 Sep 2024 19:53:18 GMT Subject: RFR: 8339480: Build static-jdk image with a statically linked launcher In-Reply-To: References: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> Message-ID: <1nXZ6wL05zwQD3alCwKAo7P0XbkXkJbX_p5TFTXobvI=.d119ebf5-f5b0-495b-a16b-259110b13357@github.com> On Tue, 3 Sep 2024 18:10:06 GMT, Erik Joelsson wrote: > I'm guessing this would work if I built the regular image first, or at least at the same time. No, I don't think that should matter. `static-jdk-image` depends on `exploded-image`, and the files in your error message resides in `jdk`, not `images/jdk`. I have never seen this problem before. I'm not even sure why or when we try to chmod libjsig? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20837#issuecomment-2327308045 From ihse at openjdk.org Tue Sep 3 19:53:19 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 3 Sep 2024 19:53:19 GMT Subject: RFR: 8339480: Build static-jdk image with a statically linked launcher In-Reply-To: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> References: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> Message-ID: On Tue, 3 Sep 2024 12:50:01 GMT, Magnus Ihse Bursie wrote: > As a prerequisite for Hermetic Java, we need a statically linked `java` launcher. It should behave like the normal, dynamically linked `java` launcher, except that all JDK native libraries should be statically, not dynamically, linked. > > This patch is the first step towards this goal. It will generate a `static-jdk` image with a statically linked launcher. This launcher is missing several native libs, however, and does therefore not behave like a proper dynamic java. One of the reasons for this is that local symbol hiding in static libraries are not implemented yet, which causes symbol clashes when linking all static libraries together. This will be addressed in an upcoming patch. > > All changes in the `src` directory are copied from, or inspired by, changes made in [the hermetic-java-runtime branch in Project Leyden](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). I wonder if it is related to JDK-8336498? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20837#issuecomment-2327311982 From erikj at openjdk.org Tue Sep 3 20:01:21 2024 From: erikj at openjdk.org (Erik Joelsson) Date: Tue, 3 Sep 2024 20:01:21 GMT Subject: RFR: 8339480: Build static-jdk image with a statically linked launcher In-Reply-To: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> References: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> Message-ID: On Tue, 3 Sep 2024 12:50:01 GMT, Magnus Ihse Bursie wrote: > As a prerequisite for Hermetic Java, we need a statically linked `java` launcher. It should behave like the normal, dynamically linked `java` launcher, except that all JDK native libraries should be statically, not dynamically, linked. > > This patch is the first step towards this goal. It will generate a `static-jdk` image with a statically linked launcher. This launcher is missing several native libs, however, and does therefore not behave like a proper dynamic java. One of the reasons for this is that local symbol hiding in static libraries are not implemented yet, which causes symbol clashes when linking all static libraries together. This will be addressed in an upcoming patch. > > All changes in the `src` directory are copied from, or inspired by, changes made in [the hermetic-java-runtime branch in Project Leyden](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). make/ModuleWrapper.gmk line 59: > 57: endif > 58: endif > 59: This part looks a bit convoluted. It would be nice with a comment explaining what it does, where `$($(MODULE)_JDK_LIBS)` is coming from and what consumes the module-libs.txt. make/StaticLibs.gmk line 171: > 169: > 170: $(eval $(call SetupCopyFiles, copy-static-launcher, \ > 171: SRC := $(dir $(JAVA_LAUNCHER)), \ If only copying a single file, this becomes the default value for SRC, so no need to specify it. make/common/JdkNativeCompilation.gmk line 310: > 308: $$(MODULE)_JDK_LIBS += $$($1_NAME) > 309: endif > 310: endif Same, here as in ModuleWrapper.gmk, I think we need a comment explaining how this is consumed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20837#discussion_r1742601500 PR Review Comment: https://git.openjdk.org/jdk/pull/20837#discussion_r1742500522 PR Review Comment: https://git.openjdk.org/jdk/pull/20837#discussion_r1742605313 From matsaave at openjdk.org Tue Sep 3 20:12:31 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Tue, 3 Sep 2024 20:12:31 GMT Subject: RFR: 8338530: CDS warning Skipping java/lang/invoke/BoundMethodHandle$Species_LLLL Message-ID: <7gA4o-mcaRw6Qz_MYfmnV5kyvyaJlsQNXP1HiscQJx4=.7bd68e07-5e9b-4ccd-a6b8-e7c6dc67e855@github.com> Since [JDK-8336856](https://bugs.openjdk.org/browse/JDK-8336856), `java -Xshare:dump` reports a warning where a dynamically generated class, java/lang/invoke/BoundMethodHandle$Species_LLLL, is excluded. This patch silently excludes the class as it cannot be archived. Verified with tier x-y tests ------------- Commit messages: - Removed assert change - Cleanup - Reverted test - Removed leftover prototyping code - Silently exclude class - Merge branch 'master' into cds_warning_8338530 - Merge branch 'master' into cds_warning_8338530 - Only checks for BoundMethodHandle - 8338530: CDS warning Skipping java/lang/invoke/BoundMethodHandle Changes: https://git.openjdk.org/jdk/pull/20799/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20799&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338530 Stats: 10 lines in 1 file changed: 7 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20799.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20799/head:pull/20799 PR: https://git.openjdk.org/jdk/pull/20799 From liach at openjdk.org Tue Sep 3 20:12:33 2024 From: liach at openjdk.org (Chen Liang) Date: Tue, 3 Sep 2024 20:12:33 GMT Subject: RFR: 8338530: CDS warning Skipping java/lang/invoke/BoundMethodHandle$Species_LLLL In-Reply-To: <7gA4o-mcaRw6Qz_MYfmnV5kyvyaJlsQNXP1HiscQJx4=.7bd68e07-5e9b-4ccd-a6b8-e7c6dc67e855@github.com> References: <7gA4o-mcaRw6Qz_MYfmnV5kyvyaJlsQNXP1HiscQJx4=.7bd68e07-5e9b-4ccd-a6b8-e7c6dc67e855@github.com> Message-ID: On Fri, 30 Aug 2024 18:05:24 GMT, Matias Saavedra Silva wrote: > Since [JDK-8336856](https://bugs.openjdk.org/browse/JDK-8336856), `java -Xshare:dump` reports a warning where a dynamically generated class, java/lang/invoke/BoundMethodHandle$Species_LLLL, is excluded. This patch silently excludes the class as it cannot be archived. Verified with tier x-y tests Is it possible for us at java.lang.invoke to enhance the `GenerateJliClassesHelper` to generate this class? I can look into it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20799#issuecomment-2322130581 From duke at openjdk.org Tue Sep 3 20:12:35 2024 From: duke at openjdk.org (ExE Boss) Date: Tue, 3 Sep 2024 20:12:35 GMT Subject: RFR: 8338530: CDS warning Skipping java/lang/invoke/BoundMethodHandle$Species_LLLL In-Reply-To: References: <7gA4o-mcaRw6Qz_MYfmnV5kyvyaJlsQNXP1HiscQJx4=.7bd68e07-5e9b-4ccd-a6b8-e7c6dc67e855@github.com> Message-ID: On Fri, 30 Aug 2024 18:39:11 GMT, Chen Liang wrote: > Is it possible for us at java.lang.invoke to enhance the `GenerateJliClassesHelper` to generate this class? I can look into it. `GenerateJLIClassesPlugin`/`Helper` is?fine, all?that?needs to?be?updated is?the?[`HelloClasslist`]?script to?hit the?code?path which?generates `BoundMethodHandle$Species_LLLL`. [`HelloClasslist`]: https://github.com/openjdk/jdk/blob/master/make/jdk/src/classes/build/tools/classlist/HelloClasslist.java ------------- PR Comment: https://git.openjdk.org/jdk/pull/20799#issuecomment-2322857366 From cjplummer at openjdk.org Tue Sep 3 20:52:21 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 3 Sep 2024 20:52:21 GMT Subject: RFR: 8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths [v3] In-Reply-To: <0gGFOgqa6b1PoY1u-9WKnJ3UFeyz-w2qsipYoeqsoIA=.9beb9dd6-a875-4c83-85e0-d066077c5b96@github.com> References: <0gGFOgqa6b1PoY1u-9WKnJ3UFeyz-w2qsipYoeqsoIA=.9beb9dd6-a875-4c83-85e0-d066077c5b96@github.com> Message-ID: On Tue, 3 Sep 2024 03:00:56 GMT, David Holmes wrote: >> This is the implementation of a new method added to the JNI specification. >> >> From the CSR request: >> >> The `GetStringUTFLength` function returns the length as a `jint` (`jsize`) value and so is limited to returning at most `Integer.MAX_VALUE`. But a Java string can itself consist of `Integer.MAX_VALUE` characters, each of which may require more than one byte to represent them in modified UTF-8 format.** It follows then that this function cannot return the correct answer for all String values and yet the specification makes no mention of this, nor of any possible error to report if this situation is encountered. >> >> **The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. With compact strings this reduces to 2*`Integer.MAX_VALUE`. >> >> Solution >> >> Deprecate the existing JNI `GetStringUTFLength` method noting that it may return a truncated length, and add a new method, JNI `GetStringUTFLengthAsLong` that returns the string length as a `jlong` value. >> >> --- >> >> We also add a truncation warning to `GetStringUTFLength` under -Xcheck:jni >> >> There are some incidental whitespace changes in `src/hotspot/os/posix/dtrace/hotspot_jni.d` along with the new method entries. >> >> Testing: >> - new test added >> - tiers 1-3 sanity >> >> Thanks > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > The JNI version update was incompete Marked as reviewed by cjplummer (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20784#pullrequestreview-2278475330 From dholmes at openjdk.org Tue Sep 3 21:47:19 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 3 Sep 2024 21:47:19 GMT Subject: RFR: 8339316: Test runtime/exceptionMsgs/NoClassDefFoundError/NoClassDefFoundErrorTest.java fails after JDK-8338257 In-Reply-To: References: Message-ID: <1bdJKzE4Lpy7HTyn07lhsFUHNI73hPgd96LLDuo-I5s=.5f459765-bb6e-40d6-a158-dd964ad7e5c1@github.com> On Tue, 3 Sep 2024 14:54:51 GMT, Coleen Phillimore wrote: >> In JDK-8338257 I overlooked updating the callers of `UTF8::is_legal_utf8` to pass a `size_t` length parameter. In some cases the length was explicitly cast to `int` and in the test case in question (with `-Xcheck:jni`) this caused integer overflow to a negative value which then became an exceedingly large `size_t` value and we then tried to do utf8 validation on random bytes. >> >> Testing: >> - failing test >> - tiers 1-4 >> >> Thanks > > src/hotspot/share/classfile/systemDictionary.cpp line 288: > >> 286: } >> 287: // Callers should ensure that the name is never an illegal UTF8 string. >> 288: assert(UTF8::is_legal_utf8((const unsigned char*)name, name_len, false), > > Is there where the > INT_MAX length got in from JNI? No, the failing path was in jniCheck.cpp. In this code we have already rejected strings > `Symbol::max_length`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20804#discussion_r1742730181 From dholmes at openjdk.org Tue Sep 3 21:57:19 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 3 Sep 2024 21:57:19 GMT Subject: RFR: 8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths [v3] In-Reply-To: References: <0gGFOgqa6b1PoY1u-9WKnJ3UFeyz-w2qsipYoeqsoIA=.9beb9dd6-a875-4c83-85e0-d066077c5b96@github.com> Message-ID: On Tue, 3 Sep 2024 20:50:09 GMT, Chris Plummer wrote: >> David Holmes has updated the pull request incrementally with one additional commit since the last revision: >> >> The JNI version update was incompete > > Marked as reviewed by cjplummer (Reviewer). Thanks @plummercj ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20784#issuecomment-2327507194 From sviswanathan at openjdk.org Tue Sep 3 22:17:34 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 3 Sep 2024 22:17:34 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v5] In-Reply-To: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> References: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> Message-ID: On Mon, 2 Sep 2024 12:20:59 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolved Thanks for considering the review comments. I have some minor follow ups. src/hotspot/cpu/x86/assembler_x86.cpp line 8470: > 8468: void Assembler::vpmaxud(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) { > 8469: assert(vector_len == AVX_128bit ? VM_Version::supports_avx() : > 8470: (vector_len == AVX_256bit ? VM_Version::supports_avx2() : VM_Version::supports_avx512bw()), ""); avx512bw check here seems wrong. src/hotspot/cpu/x86/assembler_x86.cpp line 8479: > 8477: void Assembler::vpmaxud(XMMRegister dst, XMMRegister nds, Address src, int vector_len) { > 8478: assert(vector_len == AVX_128bit ? VM_Version::supports_avx() : > 8479: (vector_len == AVX_256bit ? VM_Version::supports_avx2() : VM_Version::supports_avx512bw()), ""); avx512bw check here seems wrong. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 945: > 943: } else { > 944: vpblendvb(dst, src2, src1, xtmp1, vlen_enc); > 945: } The comment needs to move inside if and else block as the code in these blocks is reverse of each other. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 6691: > 6689: // Res = INP1 - INP2 (non-commutative and non-associative) > 6690: // Res = Mask ? Zero : Res > 6691: evmasked_op(etype == T_INT ? Op_SubVI : Op_SubVL, etype, ktmp, dst, src1, src2, false, vlen_enc, false); Do the comments need update here? e.g. 6688 is setting mask bits to true for src2 6713: int vlen_enc) { > 6714: // Unsigned values ranges comprise of only +ve numbers, thus there exist only an upper bound saturation. > 6715: // overflow_mask = (SRC1 + SRC2) 1985: public static long addSaturating(long a, long b) { > 1986: long res = a + b; > 1987: // HD 2-12 Overflow iff both arguments have the opposite sign of the result HD -> Hacker's Delight src/java.base/share/classes/java/lang/Long.java line 2008: > 2006: public static long subSaturating(long a, long b) { > 2007: long res = a - b; > 2008: // HD 2-12 Overflow iff the arguments have different signs and HD -> Hacker's Delight ------------- PR Review: https://git.openjdk.org/jdk/pull/20507#pullrequestreview-2277917757 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742347879 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742348218 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742725069 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742733746 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742751114 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742452009 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742452802 From sviswanathan at openjdk.org Tue Sep 3 22:23:21 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 3 Sep 2024 22:23:21 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v4] In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 12:17:08 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/x86.ad line 10656: >> >>> 10654: match(Set dst (SaturatingSubVI src1 src2)); >>> 10655: match(Set dst (SaturatingSubVL src1 src2)); >>> 10656: effect(TEMP ktmp); >> >> This needs TEMP dst as well. > > There is no use of either of the source operands after assignment to dst in the macro assembly routine. Sorry, I meant this for another instruct. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742767372 From sviswanathan at openjdk.org Tue Sep 3 22:23:22 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 3 Sep 2024 22:23:22 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v5] In-Reply-To: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> References: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> Message-ID: On Mon, 2 Sep 2024 12:20:59 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolved src/hotspot/cpu/x86/x86.ad line 10684: > 10682: match(Set dst (SaturatingSubVI src1 src2)); > 10683: match(Set dst (SaturatingSubVL src1 src2)); > 10684: effect(TEMP xtmp1, TEMP xtmp2); Here we need TEMP dst in effect, the saturating_unsigned_sub_dq_avx defines and uses dst across xtmp1. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742768246 From sviswanathan at openjdk.org Tue Sep 3 22:28:20 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 3 Sep 2024 22:28:20 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v2] In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 12:15:10 GMT, Jatin Bhateja wrote: >> If the aim is to reduce the number of nodes, we could merge the Op_SaturatingAddVB, Op_SaturatingAddVS, Op_SaturatingAddVI, and Op_SaturatingAddVL into one Op_SaturatingAddV. Likewise for unsigned saturating add into Op_SaturatingUnsignedAddV. > > Hey @sviswa7, our concern was around value ranges of new unsigned scalar type, which as mentioned will be addressed when I support intrinsification of new core lib APIs and associated range constraining / folding optimization in a follow up patch. Reiterating, we are not adding unsigned scalar types with this patch, we are only supporting unsigned (saturating) operations on existing signed integral types. So in my thoughts, we could avoid this change as I mentioned above, but I will leave this one to other reviewers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1742772702 From darcy at openjdk.org Tue Sep 3 22:58:22 2024 From: darcy at openjdk.org (Joe Darcy) Date: Tue, 3 Sep 2024 22:58:22 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v2] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 20:26:05 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm >> >> Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup >> -- | -- | -- | -- >> MathBench.tanhDouble | 70900 | 95618 | 1.35x > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Add stub initialization and extra tanh tests test/jdk/java/lang/Math/HyperbolicTests.java line 984: > 982: double b1 = 0.02; > 983: double b2 = 5.1; > 984: double b3 = 55 * Math.log(2)/2; // ~19.062 Probably better to use StrictMath.log here or, better use, precompute the value as a constant and document its conceptual origin. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1742790463 From darcy at openjdk.org Wed Sep 4 00:03:20 2024 From: darcy at openjdk.org (Joe Darcy) Date: Wed, 4 Sep 2024 00:03:20 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v2] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 20:26:05 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm >> >> Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup >> -- | -- | -- | -- >> MathBench.tanhDouble | 70900 | 95618 | 1.35x > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Add stub initialization and extra tanh tests test/jdk/java/lang/Math/HyperbolicTests.java line 1009: > 1007: for(int i = 0; i < testCases.length; i++) { > 1008: double testCase = testCases[i]; > 1009: failures += testTanhWithReferenceUlpDiff(testCase, StrictMath.tanh(testCase), 2.5); The allowable worst-case error is 2.5 ulp, although at many arguments FDLIBM has a smaller error. For a general Math vs StrictMath test with an allowable 2.5 ulp error, without knowing how accurate FDLIBM is for that function and argument, a large error of approx. 2X the nominal error should be allowed (in case FDLIBM errors in one direction and the Math method errors in the other direction). If the test is going to use randomness, then its jtreg tags should include `@key randomness` and it is preferable to use jdk.test.lib.RandomFactory to get and Random object since that handles printing out a key so the random sequence can be replicated if the test fails. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1742826418 From ysuenaga at openjdk.org Wed Sep 4 00:35:21 2024 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Wed, 4 Sep 2024 00:35:21 GMT Subject: RFR: 8339307: jhsdb jstack could not trace FFM upcall frame [v4] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 19:39:59 GMT, Chris Plummer wrote: >> Yasumasa Suenaga has updated the pull request incrementally with three additional commits since the last revision: >> >> - Update test/hotspot/jtreg/serviceability/sa/LingeredAppWithFFMUpcall.java >> >> Co-authored-by: Andrey Turbanov >> - Update test/hotspot/jtreg/serviceability/sa/LingeredAppWithFFMUpcall.java >> >> Co-authored-by: Andrey Turbanov >> - Update test/hotspot/jtreg/serviceability/sa/LingeredAppWithFFMUpcall.java >> >> Co-authored-by: Andrey Turbanov > > test/hotspot/jtreg/serviceability/sa/TestJhsdbJstackUpcall.java line 57: > >> 55: >> 56: private static void runJstackInLoop(LingeredApp app) throws Exception { >> 57: for (int i = 0; i < MAX_ITERATIONS; i++) { > > What is the reason for doing 20 iterations. Is it because you are waiting for THREAD_NAME to enter the sleep() call? If so, we've addressed this in the past for the general case of wanting to do a "stable"stack trace by using the LingeredApp's SteadyStateThread. LingeredApp.startApp() will not return until this thread has become stable (blocked). Maybe you can do something similar with THREAD_NAME. TBH this testcase is based on TestJhsdbJstackMixed.java , so I'm not stick this code. I will fix to use SteadyStateThread. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20789#discussion_r1742842412 From ysuenaga at openjdk.org Wed Sep 4 00:42:25 2024 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Wed, 4 Sep 2024 00:42:25 GMT Subject: RFR: 8339307: jhsdb jstack could not trace FFM upcall frame [v4] In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 15:13:44 GMT, Jorn Vernee wrote: >>> I understand that adding the UpcallStub type to the SA agent code makes the WrongTypeException go away, and then we run into an assertion failure because the frame size is zero? >> >> Yes. >> >>> Note how there is also special handling for (JNI) entry frames in the SA. >> >> Do you mean `JavaCallWrapper` (`X86Frame::senderForEntryFrame` in SA) ? >> >>> I'm guessing because we end up walking the native frames until we get back to Java, and the native frames are simply ignored. I'm not sure if that will always work for arbitrary native code though. >>> >>> I think the right fix here is to implement handling for upcall stub frames in the SA agent, since that's also how entry frames are handled. I don't think setting the frame size in hotspot is actually needed if we do that. >> >> If we add some frame info (return address and FP) like `JavaCallWrapper` to `UpcallStub` and process it in SA, we do not need frame size of `UpcallStub` as you said. But I think it should be fixed in all of upcall implementation. >> `UpcallStub` is "Stub", so it compliant native calling convention. Thus I believe native frame unwinder like `X86Frame` should always work if frame size is set in `UpcallStub`. >> >> We need to fix all of upcall implementation in both case, and zero frame size is not nature. In addition adding frame size is simpler than add special handling for `UpcallStub` and SA. Thus I give +1 to add frame size to `UpcallStub`. > >> > Note how there is also special handling for (JNI) entry frames in the SA. >> >> Do you mean `JavaCallWrapper` (`X86Frame::senderForEntryFrame` in SA) ? > > Yes. Internally it loads the fields of `JavaFrameAnchor`, which points at the previous Java frame. `UpcallStub` frames also have a `JavaFrameAnchor`. The value can be retrieved through `upcall_stub` -> `frame_data` -> `jfa`. The byte offset of the frame data is stored in the `UpcallStub::_frame_data_offset` field. It can be added to the unextended SP. > >> > I'm guessing because we end up walking the native frames until we get back to Java, and the native frames are simply ignored. I'm not sure if that will always work for arbitrary native code though. >> > I think the right fix here is to implement handling for upcall stub frames in the SA agent, since that's also how entry frames are handled. I don't think setting the frame size in hotspot is actually needed if we do that. >> >> If we add some frame info (return address and FP) like `JavaCallWrapper` to `UpcallStub` and process it in SA, we do not need frame size of `UpcallStub` as you said. But I think it should be fixed in all of upcall implementation. `UpcallStub` is "Stub", so it compliant native calling convention. Thus I believe native frame unwinder like `X86Frame` should always work if frame size is set in `UpcallStub`. > > The problem is not the upcall stub frame itself. We know which ABI that is using. The problems is any intermediate frames between the upcall stub frame and last Java frame before that. These intermediate native frames can have any ABI. There is no single 'native calling convention'. We know that we are interfacing with something that follows the C ABI, but that code may switch to a different ABI (e.g. Rust, C++, or some other language) which may have a different stack frame layout. Stack walking those frames might break. The frame anchor used by entry/upcall frames helps to avoid this by letting the stack walker jump over all the native frames, and continue walking at the last java frame before the upcall stub instead. That means it doesn't have to deal with the foreign stack layout of frames in between. > >> We need to fix all of upcall implementation in both case, and zero frame size is not nature. In addition adding frame size is simpler than add special handling for `UpcallStub` and SA. Thus I give +1 to add frame size to `UpcallStub`. > > I'm not necessarily opposed to adding a frame size to upcall stubs, but as a fix for SA stack... @JornVernee @plummercj Thanks for your comment! I will try to fix SA to refer `JavaFrameAnchor`, and also to fix test case in weekend. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20789#issuecomment-2327691917 From jbhateja at openjdk.org Wed Sep 4 02:00:25 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 4 Sep 2024 02:00:25 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v2] In-Reply-To: References: <8CAXws7Rp6HKERu5hSTOrXi8GRFRdV4I670Nf8NSZlI=.ba6acccb-77e5-46a6-bec2-e0ea97dfe85d@github.com> Message-ID: <2nRoXBr_v8DjlG4wJlWF9OhYMmgpTUDX6VAQnvO3DCY=.596e5e39-c5ba-4d20-b5e0-aa301f7c9d76@github.com> On Tue, 27 Aug 2024 22:23:44 GMT, Srinivas Vamsi Parasa wrote: >> I agree, this is all rather obscure. Ideally the same names that are used in wherever this comes from. >> >> Where does the algorithm come from? What are its accuracy guarantees? >> >> In addition, given the rarity of hyperbolic tangents in Java applications, do we need this? > > @theRealAph, this implementation is based on Intel libm math library and meets the accuracy requirements. The algorithm is provided in the comments. @vamsi-parasa don't hesitate in adding as much and explicit information about the original source from where the algorithm has been picked up, even though the PR explicitly mentions libm. Adding the link to source references is a good practice. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1742887385 From fyang at openjdk.org Wed Sep 4 02:47:24 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 4 Sep 2024 02:47:24 GMT Subject: RFR: 8339466: Enumerate shared stubs and define static fields and names via declarations [v2] In-Reply-To: <58VOUJfcrvWbeA2QQWGJuXIoSMHJ-L_DNF-HpEM2clg=.e7743483-3026-4fd6-a1d7-c06b6cd6a299@github.com> References: <58VOUJfcrvWbeA2QQWGJuXIoSMHJ-L_DNF-HpEM2clg=.e7743483-3026-4fd6-a1d7-c06b6cd6a299@github.com> Message-ID: On Tue, 3 Sep 2024 10:51:53 GMT, Andrew Dinn wrote: >> Systematize handling of SharedRuntime stubs. Generate enum ids, static fields and names from declarations using template macros. Systematically reference stubs and stub names using ids. > > Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: > > fix errors in ppc generator Hi Andrew: I just checked riscv-specific changes and seems that we are lacking following small change. Could you please add it? I did a release build and ran tier1 test on linux-riscv64 platform. diff --git a/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp b/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp index 2e8362814de..0481bc3483b 100644 --- a/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp +++ b/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp @@ -2447,7 +2447,8 @@ SafepointBlob* SharedRuntime::generate_handler_blob(sharedStubId id, address cal OopMap* map = nullptr; // Allocate space for the code. Setup code generation tools. - CodeBuffer buffer("handler_blob", 2048, 1024); + const char *name = SharedRuntime::stub_name(id); + CodeBuffer buffer(name, 2048, 1024); MacroAssembler* masm = new MacroAssembler(&buffer); assert_cond(masm != nullptr); @@ -2455,7 +2456,7 @@ SafepointBlob* SharedRuntime::generate_handler_blob(sharedStubId id, address cal address call_pc = nullptr; int frame_size_in_words = -1; bool cause_return = (id == sharedStubId::polling_page_return_handler_id); - RegisterSaver reg_save(id == sharedStubId::polling_page_vectors_safepoint_handler_id /* save_vectors */); + RegisterSaver reg_saver(id == sharedStubId::polling_page_vectors_safepoint_handler_id /* save_vectors */); // Save Integer and Float registers. map = reg_saver.save_live_registers(masm, 0, &frame_size_in_words); ------------- PR Comment: https://git.openjdk.org/jdk/pull/20832#issuecomment-2327806460 From dholmes at openjdk.org Wed Sep 4 03:44:30 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 4 Sep 2024 03:44:30 GMT Subject: Integrated: 8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 02:07:54 GMT, David Holmes wrote: > This is the implementation of a new method added to the JNI specification. > > From the CSR request: > > The `GetStringUTFLength` function returns the length as a `jint` (`jsize`) value and so is limited to returning at most `Integer.MAX_VALUE`. But a Java string can itself consist of `Integer.MAX_VALUE` characters, each of which may require more than one byte to represent them in modified UTF-8 format.** It follows then that this function cannot return the correct answer for all String values and yet the specification makes no mention of this, nor of any possible error to report if this situation is encountered. > > **The modified UTF-8 format used by the VM can require up to six bytes to represent one unicode character, but six byte characters are stored as UTF16 surrogate pairs. Hence the most bytes per character is 3, and so the maximum length is 3*`Integer.MAX_VALUE`. With compact strings this reduces to 2*`Integer.MAX_VALUE`. > > Solution > > Deprecate the existing JNI `GetStringUTFLength` method noting that it may return a truncated length, and add a new method, JNI `GetStringUTFLengthAsLong` that returns the string length as a `jlong` value. > > --- > > We also add a truncation warning to `GetStringUTFLength` under -Xcheck:jni > > There are some incidental whitespace changes in `src/hotspot/os/posix/dtrace/hotspot_jni.d` along with the new method entries. > > Testing: > - new test added > - tiers 1-3 sanity > > Thanks This pull request has now been integrated. Changeset: 90f3f432 Author: David Holmes URL: https://git.openjdk.org/jdk/commit/90f3f4325772773f1dc1814c56d7326d5389e2c7 Stats: 209 lines in 9 files changed: 182 ins; 1 del; 26 mod 8328877: [JNI] The JNI Specification needs to address the limitations of integer UTF-8 String lengths Reviewed-by: cjplummer, alanb ------------- PR: https://git.openjdk.org/jdk/pull/20784 From dholmes at openjdk.org Wed Sep 4 04:26:58 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 4 Sep 2024 04:26:58 GMT Subject: RFR: 8339316: Test runtime/exceptionMsgs/NoClassDefFoundError/NoClassDefFoundErrorTest.java fails after JDK-8338257 [v2] In-Reply-To: References: Message-ID: > In JDK-8338257 I overlooked updating the callers of `UTF8::is_legal_utf8` to pass a `size_t` length parameter. In some cases the length was explicitly cast to `int` and in the test case in question (with `-Xcheck:jni`) this caused integer overflow to a negative value which then became an exceedingly large `size_t` value and we then tried to do utf8 validation on random bytes. > > Testing: > - failing test > - tiers 1-4 > > Thanks David Holmes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into 8339316-verify-utf8 - 8339316: Test runtime/exceptionMsgs/NoClassDefFoundError/NoClassDefFoundErrorTest.java fails after JDK-8338257 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20804/files - new: https://git.openjdk.org/jdk/pull/20804/files/920b4da3..4263a4c4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20804&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20804&range=00-01 Stats: 4157 lines in 181 files changed: 2263 ins; 982 del; 912 mod Patch: https://git.openjdk.org/jdk/pull/20804.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20804/head:pull/20804 PR: https://git.openjdk.org/jdk/pull/20804 From cjplummer at openjdk.org Wed Sep 4 06:14:34 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 4 Sep 2024 06:14:34 GMT Subject: RFR: 8339307: jhsdb jstack could not trace FFM upcall frame [v4] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 00:33:06 GMT, Yasumasa Suenaga wrote: >> test/hotspot/jtreg/serviceability/sa/TestJhsdbJstackUpcall.java line 57: >> >>> 55: >>> 56: private static void runJstackInLoop(LingeredApp app) throws Exception { >>> 57: for (int i = 0; i < MAX_ITERATIONS; i++) { >> >> What is the reason for doing 20 iterations. Is it because you are waiting for THREAD_NAME to enter the sleep() call? If so, we've addressed this in the past for the general case of wanting to do a "stable"stack trace by using the LingeredApp's SteadyStateThread. LingeredApp.startApp() will not return until this thread has become stable (blocked). Maybe you can do something similar with THREAD_NAME. > > TBH this testcase is based on TestJhsdbJstackMixed.java , so I'm not stick this code. I will fix to use SteadyStateThread. Ok. I think TestJhsdbJstackMixed has a loop because it's trying to get a stack trace at a point when the thread is executing at a PC that used result in an SA exception (which is now fixed). SteadyStateThread might be a bit tricky to integrate into this test since the test has no control over what the thread does. Possibly with a new LIngeredApp subclass you could control SteadyStateThread to call your FFM code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20789#discussion_r1743127430 From amitkumar at openjdk.org Wed Sep 4 06:37:25 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 4 Sep 2024 06:37:25 GMT Subject: RFR: 8338658: New Object to ObjectMonitor mapping: s390x implementation In-Reply-To: References: Message-ID: <7ODOU2xJpTiLcvTCwz113KzHAPbLUiIaRoDf1TC_zhU=.b64ff099-8682-4b08-bd62-563917837f89@github.com> On Mon, 2 Sep 2024 18:27:11 GMT, Lutz Schmidt wrote: >> s390x implementation of [JDK-8315884](https://bugs.openjdk.org/browse/JDK-8315884) New Object to ObjectMonitor mapping; >> >> Testing: >> - tier1-test (fastdebug) >> - tier1-test with UseObjectMonitorTable (fastdebug) >> - tier1-test with UseObjectMonitorTable (release) > > src/hotspot/cpu/s390/macroAssembler_s390.cpp line 6255: > >> 6253: // check for match. >> 6254: z_cg(obj, Address(tmp1)); >> 6255: z_bre(monitor_found); > > Are we sure there are at least three (this one and two unrolled) non-null cache entries? I couldn't find answer for that. Maybe @xmas92 can tell us about that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20740#discussion_r1743149363 From jsjolen at openjdk.org Wed Sep 4 06:57:19 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 4 Sep 2024 06:57:19 GMT Subject: RFR: 8339316: Test runtime/exceptionMsgs/NoClassDefFoundError/NoClassDefFoundErrorTest.java fails after JDK-8338257 [v2] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 04:26:58 GMT, David Holmes wrote: >> In JDK-8338257 I overlooked updating the callers of `UTF8::is_legal_utf8` to pass a `size_t` length parameter. In some cases the length was explicitly cast to `int` and in the test case in question (with `-Xcheck:jni`) this caused integer overflow to a negative value which then became an exceedingly large `size_t` value and we then tried to do utf8 validation on random bytes. >> >> Testing: >> - failing test >> - tiers 1-4 >> >> Thanks > > David Holmes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into 8339316-verify-utf8 > - 8339316: Test runtime/exceptionMsgs/NoClassDefFoundError/NoClassDefFoundErrorTest.java fails after JDK-8338257 LGTM, thank you. ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20804#pullrequestreview-2279150888 From dholmes at openjdk.org Wed Sep 4 07:03:21 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 4 Sep 2024 07:03:21 GMT Subject: RFR: 8339316: Test runtime/exceptionMsgs/NoClassDefFoundError/NoClassDefFoundErrorTest.java fails after JDK-8338257 [v2] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 06:54:22 GMT, Johan Sj?len wrote: >> David Holmes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'master' into 8339316-verify-utf8 >> - 8339316: Test runtime/exceptionMsgs/NoClassDefFoundError/NoClassDefFoundErrorTest.java fails after JDK-8338257 > > LGTM, thank you. Thanks for the review @jdksjolen ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20804#issuecomment-2328069295 From amitkumar at openjdk.org Wed Sep 4 07:20:35 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 4 Sep 2024 07:20:35 GMT Subject: RFR: 8338658: New Object to ObjectMonitor mapping: s390x implementation [v2] In-Reply-To: References: Message-ID: <8mpU_pIgnJ_eNCSDLSMRR8zvDErPhSV_G8XePpmUl8U=.026964ac-f75e-472d-9187-f0c65548fa0c@github.com> > s390x implementation of [JDK-8315884](https://bugs.openjdk.org/browse/JDK-8315884) New Object to ObjectMonitor mapping; > > Testing: > - tier1-test (fastdebug) > - tier1-test with UseObjectMonitorTable (fastdebug) > - tier1-test with UseObjectMonitorTable (release) Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20740/files - new: https://git.openjdk.org/jdk/pull/20740/files/6612c429..ed9edb65 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20740&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20740&range=00-01 Stats: 16 lines in 1 file changed: 10 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20740.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20740/head:pull/20740 PR: https://git.openjdk.org/jdk/pull/20740 From duke at openjdk.org Wed Sep 4 08:09:34 2024 From: duke at openjdk.org (duke) Date: Wed, 4 Sep 2024 08:09:34 GMT Subject: Withdrawn: 8334513: New test gc/TestAlwaysPreTouchBehavior.java is failing In-Reply-To: References: Message-ID: On Thu, 20 Jun 2024 11:35:06 GMT, Thomas Stuefe wrote: > See JBS issue. > > It is not completely obvious what the problem is in Oracle's CI, but the current assumption is that RSS of the testee VM gets reduced after it started and before we measured due to memory pressure. > > The patch: > - exposes os::available_memory via Whitebox > - For the test to count as failed, we require a certain minimum size of available memory both before and during the start of the testee JVM. Otherwise, we throw a `SkippedException` > > I have some misgivings about this solution, though: > 1) obviously, it is not bullet-proof either, since it is vulnerable to fast changes in machine memory load. > 2) On MacOS, we have the problem that 'os::available_memory()' totally underreports how much memory is available. Therefore, as an estimate of whether the test is valid, it is too conservative. I opened https://bugs.openjdk.org/browse/JDK-8334767 to track that issue. As long as it is not fixed, the tests will likely fall below the threshold on MacOS and, therefore, be skipped. Still, this is somewhat better than outright excluding the test for MacOS (or is it? Open to opinions) > 3) `SkippedException` leads to the test counting as "passed", not "skipped". I think that is a usability issue with jtreg. I cannot easily see which tests had been skipped due to SkippedException. > > Despite my doubts, I think this is the best we can come up with if we want to have such a test. > > Note: One way to go about (3) would be to make "minimum available memory" a `@requires` tag, similar to os.maxMemory. However, I fear that this may be easily misused and cause many tests to be excluded without notice. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/19803 From epeter at openjdk.org Wed Sep 4 08:21:26 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Sep 2024 08:21:26 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v5] In-Reply-To: References: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> Message-ID: On Tue, 3 Sep 2024 16:23:56 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolved > > src/hotspot/cpu/x86/assembler_x86.cpp line 8470: > >> 8468: void Assembler::vpmaxud(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) { >> 8469: assert(vector_len == AVX_128bit ? VM_Version::supports_avx() : >> 8470: (vector_len == AVX_256bit ? VM_Version::supports_avx2() : VM_Version::supports_avx512bw()), ""); > > avx512bw check here seems wrong. If this is indeed wrong, then we are missing tests, and you should add some more. > src/hotspot/cpu/x86/assembler_x86.cpp line 8479: > >> 8477: void Assembler::vpmaxud(XMMRegister dst, XMMRegister nds, Address src, int vector_len) { >> 8478: assert(vector_len == AVX_128bit ? VM_Version::supports_avx() : >> 8479: (vector_len == AVX_256bit ? VM_Version::supports_avx2() : VM_Version::supports_avx512bw()), ""); > > avx512bw check here seems wrong. If this is indeed wrong, then we are missing tests, and you should add some more. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1743283892 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1743284116 From rcastanedalo at openjdk.org Wed Sep 4 09:06:45 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 4 Sep 2024 09:06:45 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v14] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: 8334111: Implementation of Late Barrier Expansion for G1: ppc port ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/1ea2862f..ed9c0232 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=12-13 Stats: 1036 lines in 5 files changed: 947 ins; 64 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From aboldtch at openjdk.org Wed Sep 4 09:09:19 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 4 Sep 2024 09:09:19 GMT Subject: RFR: 8338658: New Object to ObjectMonitor mapping: s390x implementation [v2] In-Reply-To: <7ODOU2xJpTiLcvTCwz113KzHAPbLUiIaRoDf1TC_zhU=.b64ff099-8682-4b08-bd62-563917837f89@github.com> References: <7ODOU2xJpTiLcvTCwz113KzHAPbLUiIaRoDf1TC_zhU=.b64ff099-8682-4b08-bd62-563917837f89@github.com> Message-ID: <_DK9kodmp_ATB5lajLKG2sbkTelH1vAQ9d26WYJES_g=.f892b962-abbb-41d2-8156-3cad77a59c21@github.com> On Wed, 4 Sep 2024 06:34:51 GMT, Amit Kumar wrote: >> src/hotspot/cpu/s390/macroAssembler_s390.cpp line 6255: >> >>> 6253: // check for match. >>> 6254: z_cg(obj, Address(tmp1)); >>> 6255: z_bre(monitor_found); >> >> Are we sure there are at least three (this one and two unrolled) non-null cache entries? > > I couldn't find answer for that. Maybe @xmas92 can tell us about that. There are 8 cache entries, and a null sentinel at the end. All entries can be null. So the answer is no we can not be sure about that as one, two or three of the first three entries may be null. But I am not sure what the reason is for this question. The non-empty/non-null entries always comes first, followed by the null entries if any, followed by a null sentinel. The unrolled entries do not check for null, only for a match. The loop will check the rest of the entries and go to a slow path when a null entry (or the null sentinel) is encountered. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20740#discussion_r1743355125 From rcastanedalo at openjdk.org Wed Sep 4 09:10:27 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 4 Sep 2024 09:10:27 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> <3ME602kl5NBEdiWT53M-B-YHtyU4yYgwc-lVFPxjiKg=.7298f764-788a-4c79-a2aa-324407c63813@github.com> Message-ID: On Tue, 3 Sep 2024 12:17:58 GMT, Martin Doerr wrote: > I have an implementation for PPC64: https://github.com/TheRealMDoerr/jdk/commit/ed9c0232f53a15d768804348e1d8a111fed9a19e Do you prefer integrating it soon? That's great, thank you for your work and your comments and suggestions, Martin! I just merged your implementation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2328319555 From adinn at openjdk.org Wed Sep 4 09:57:18 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 4 Sep 2024 09:57:18 GMT Subject: RFR: 8339466: Enumerate shared stubs and define static fields and names via declarations [v2] In-Reply-To: <58VOUJfcrvWbeA2QQWGJuXIoSMHJ-L_DNF-HpEM2clg=.e7743483-3026-4fd6-a1d7-c06b6cd6a299@github.com> References: <58VOUJfcrvWbeA2QQWGJuXIoSMHJ-L_DNF-HpEM2clg=.e7743483-3026-4fd6-a1d7-c06b6cd6a299@github.com> Message-ID: On Tue, 3 Sep 2024 10:51:53 GMT, Andrew Dinn wrote: >> Systematize handling of SharedRuntime stubs. Generate enum ids, static fields and names from declarations using template macros. Systematically reference stubs and stub names using ids. > > Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: > > fix errors in ppc generator src/hotspot/share/runtime/sharedRuntime.hpp line 50: > 48: #define SHARED_STUB_ID_ENUM_DECLARE(name, type) STUB_ID_NAME(name), > 49: enum class sharedStubId :int { > 50: NO_STUBID = -1, This ought to be SharedStubId ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20832#discussion_r1743427746 From adinn at openjdk.org Wed Sep 4 10:27:19 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 4 Sep 2024 10:27:19 GMT Subject: RFR: 8339466: Enumerate shared stubs and define static fields and names via declarations [v2] In-Reply-To: References: <58VOUJfcrvWbeA2QQWGJuXIoSMHJ-L_DNF-HpEM2clg=.e7743483-3026-4fd6-a1d7-c06b6cd6a299@github.com> Message-ID: On Tue, 3 Sep 2024 18:24:16 GMT, Vladimir Kozlov wrote: >> Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: >> >> fix errors in ppc generator > > src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 2571: > >> 2569: SafepointBlob* SharedRuntime::generate_handler_blob(sharedStubId id, address call_ptr) { >> 2570: assert((id >= sharedStubId::polling_page_vectors_safepoint_handler_id || >> 2571: id <= sharedStubId::polling_page_return_handler_id), > > This and all other similar assert checks depends on the order of stubs ID. > I think such checks should be in `sharedStubId` where the order is defined: > `sharedStubId::is_polling_page_id(id)` Yes, defining a (non-PRODUCT) validation method for this is a good idea. However, I cannot create it in sharedStubId as it is an enum class. I will put it into class SharedRuntime. I'll also create methods `is_handle_id()` and `is_throw_id()`. I was unsure about relying on order/grouping when I wrote this. On reflection, I think it would be safer to make `is_polling_page_id()` explicitly enumerate the relevant cases. It is ok to rely on order here with SharedRuntime stubs. We are just using the declarations in stubDeclarations.hpp to define enum tags and static fields so we can define an order that groups related stubs without any further constraint. However, the current C1 stubs declaration order defines the order of generation of the stubs. Likewise, when it comes to systematizing the OptoRuntime stubs we will also want to use the declarations to drive the calls to `generate_xxx_blob()` and `generate_stub()`. So, in those cases the declaration order primarily needs to respect stub-stub dependencies. > src/hotspot/share/runtime/stubDeclarations.hpp line 2: > >> 1: /* >> 2: * Copyright (c) 2024, 2024, Oracle and/or its affiliates. All rights reserved. > > You need only one 2024 year Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20832#discussion_r1743477126 PR Review Comment: https://git.openjdk.org/jdk/pull/20832#discussion_r1743480713 From adinn at openjdk.org Wed Sep 4 10:32:20 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 4 Sep 2024 10:32:20 GMT Subject: RFR: 8339466: Enumerate shared stubs and define static fields and names via declarations [v2] In-Reply-To: <58VOUJfcrvWbeA2QQWGJuXIoSMHJ-L_DNF-HpEM2clg=.e7743483-3026-4fd6-a1d7-c06b6cd6a299@github.com> References: <58VOUJfcrvWbeA2QQWGJuXIoSMHJ-L_DNF-HpEM2clg=.e7743483-3026-4fd6-a1d7-c06b6cd6a299@github.com> Message-ID: On Tue, 3 Sep 2024 10:51:53 GMT, Andrew Dinn wrote: >> Systematize handling of SharedRuntime stubs. Generate enum ids, static fields and names from declarations using template macros. Systematically reference stubs and stub names using ids. > > Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: > > fix errors in ppc generator src/hotspot/share/runtime/stubDeclarations.hpp line 75: > 73: // C1 stubs are always generated in a generic CodeBlob > 74: > 75: #ifdef COMPILER1 c1 and Opto macros are not really needed in this PR so I'll remove them for now and re-add them in the follow-up PRs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20832#discussion_r1743490014 From adinn at openjdk.org Wed Sep 4 10:58:43 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 4 Sep 2024 10:58:43 GMT Subject: RFR: 8339466: Enumerate shared stubs and define static fields and names via declarations [v3] In-Reply-To: References: Message-ID: > Systematize handling of SharedRuntime stubs. Generate enum ids, static fields and names from declarations using template macros. Systematically reference stubs and stub names using ids. Andrew Dinn has updated the pull request incrementally with three additional commits since the last revision: - riscv review feedback - fix enum class name - initial review feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20832/files - new: https://git.openjdk.org/jdk/pull/20832/files/b5220093..681f8cb7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20832&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20832&range=01-02 Stats: 256 lines in 11 files changed: 25 ins; 84 del; 147 mod Patch: https://git.openjdk.org/jdk/pull/20832.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20832/head:pull/20832 PR: https://git.openjdk.org/jdk/pull/20832 From adinn at openjdk.org Wed Sep 4 10:58:43 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 4 Sep 2024 10:58:43 GMT Subject: RFR: 8339466: Enumerate shared stubs and define static fields and names via declarations [v2] In-Reply-To: References: <58VOUJfcrvWbeA2QQWGJuXIoSMHJ-L_DNF-HpEM2clg=.e7743483-3026-4fd6-a1d7-c06b6cd6a299@github.com> Message-ID: On Wed, 4 Sep 2024 02:43:44 GMT, Fei Yang wrote: >> Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: >> >> fix errors in ppc generator > > Hi Andrew: > I just checked riscv-specific changes and seems that we are lacking following small change. > Could you please add it? I did a release build and ran tier1 test on linux-riscv64 platform. > > > diff --git a/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp b/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp > index 2e8362814de..0481bc3483b 100644 > --- a/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp > +++ b/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp > @@ -2447,7 +2447,8 @@ SafepointBlob* SharedRuntime::generate_handler_blob(sharedStubId id, address cal > OopMap* map = nullptr; > > // Allocate space for the code. Setup code generation tools. > - CodeBuffer buffer("handler_blob", 2048, 1024); > + const char *name = SharedRuntime::stub_name(id); > + CodeBuffer buffer(name, 2048, 1024); > MacroAssembler* masm = new MacroAssembler(&buffer); > assert_cond(masm != nullptr); > > @@ -2455,7 +2456,7 @@ SafepointBlob* SharedRuntime::generate_handler_blob(sharedStubId id, address cal > address call_pc = nullptr; > int frame_size_in_words = -1; > bool cause_return = (id == sharedStubId::polling_page_return_handler_id); > - RegisterSaver reg_save(id == sharedStubId::polling_page_vectors_safepoint_handler_id /* save_vectors */); > + RegisterSaver reg_saver(id == sharedStubId::polling_page_vectors_safepoint_handler_id /* save_vectors */); > > // Save Integer and Float registers. > map = reg_saver.save_live_registers(masm, 0, &frame_size_in_words); @RealFYang Thanks for checking the riscv code. I have pushed a new version that should fix the errors. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20832#issuecomment-2328575392 From adinn at openjdk.org Wed Sep 4 11:10:36 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 4 Sep 2024 11:10:36 GMT Subject: RFR: 8339466: Enumerate shared stubs and define static fields and names via declarations [v4] In-Reply-To: References: Message-ID: > Systematize handling of SharedRuntime stubs. Generate enum ids, static fields and names from declarations using template macros. Systematically reference stubs and stub names using ids. Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: clean up asserts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20832/files - new: https://git.openjdk.org/jdk/pull/20832/files/681f8cb7..8d477b10 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20832&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20832&range=02-03 Stats: 63 lines in 7 files changed: 0 ins; 42 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/20832.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20832/head:pull/20832 PR: https://git.openjdk.org/jdk/pull/20832 From adinn at openjdk.org Wed Sep 4 11:10:36 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 4 Sep 2024 11:10:36 GMT Subject: RFR: 8339466: Enumerate shared stubs and define static fields and names via declarations [v2] In-Reply-To: References: <58VOUJfcrvWbeA2QQWGJuXIoSMHJ-L_DNF-HpEM2clg=.e7743483-3026-4fd6-a1d7-c06b6cd6a299@github.com> Message-ID: On Wed, 4 Sep 2024 10:22:54 GMT, Andrew Dinn wrote: >> src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 2571: >> >>> 2569: SafepointBlob* SharedRuntime::generate_handler_blob(sharedStubId id, address call_ptr) { >>> 2570: assert((id >= sharedStubId::polling_page_vectors_safepoint_handler_id || >>> 2571: id <= sharedStubId::polling_page_return_handler_id), >> >> This and all other similar assert checks depends on the order of stubs ID. >> I think such checks should be in `sharedStubId` where the order is defined: >> `sharedStubId::is_polling_page_id(id)` > > Yes, defining a (non-PRODUCT) validation method for this is a good idea. However, I cannot create it in sharedStubId as it is an enum class. I will put it into class SharedRuntime. I'll also create methods `is_resolve_id()` and `is_throw_id()`. > > I was unsure about relying on order/grouping when I wrote this. On reflection, I think it would be safer to make `is_polling_page_id()` explicitly enumerate the relevant cases. It is ok to rely on order here with SharedRuntime stubs. We are just using the declarations in stubDeclarations.hpp to define enum tags and static fields so we can define an order that groups related stubs without any further constraint. However, the current C1 stubs declaration order defines the order of generation of the stubs. Likewise, when it comes to systematizing the OptoRuntime stubs we will also want to use the declarations to drive the calls to `generate_xxx_blob()` and `generate_stub()`. So, in those cases the declaration order primarily needs to respect stub-stub dependencies. Latest version has non-product methods in SharedRuntime and calls them from the asserts. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20832#discussion_r1743556125 From duke at openjdk.org Wed Sep 4 11:27:44 2024 From: duke at openjdk.org (Casper Norrbin) Date: Wed, 4 Sep 2024 11:27:44 GMT Subject: RFR: 8318127: align_up has potential overflow Message-ID: Hi everyone, The `align_up` function contained code which could potentially overflow and produce an incorrect result. This PR adds an assert to check for such. Additionally, a test case that previously caused an overflow has been updated to use the highest possible non-aligned value that does not trigger an overflow. ------------- Commit messages: - change reserve_memory test - align overflow check Changes: https://git.openjdk.org/jdk/pull/20808/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20808&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318127 Stats: 6 lines in 3 files changed: 2 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20808.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20808/head:pull/20808 PR: https://git.openjdk.org/jdk/pull/20808 From lucy at openjdk.org Wed Sep 4 12:30:20 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 4 Sep 2024 12:30:20 GMT Subject: RFR: 8338658: New Object to ObjectMonitor mapping: s390x implementation [v2] In-Reply-To: <_DK9kodmp_ATB5lajLKG2sbkTelH1vAQ9d26WYJES_g=.f892b962-abbb-41d2-8156-3cad77a59c21@github.com> References: <7ODOU2xJpTiLcvTCwz113KzHAPbLUiIaRoDf1TC_zhU=.b64ff099-8682-4b08-bd62-563917837f89@github.com> <_DK9kodmp_ATB5lajLKG2sbkTelH1vAQ9d26WYJES_g=.f892b962-abbb-41d2-8156-3cad77a59c21@github.com> Message-ID: On Wed, 4 Sep 2024 09:06:19 GMT, Axel Boldt-Christmas wrote: >> I couldn't find answer for that. Maybe @xmas92 can tell us about that. > > There are 8 cache entries, and a null sentinel at the end. All entries can be null. > > So the answer is no we can not be sure about that as one, two or three of the first three entries may be null. But I am not sure what the reason is for this question. > > The non-empty/non-null entries always comes first, followed by the null entries if any, followed by a null sentinel. The unrolled entries do not check for null, only for a match. The loop will check the rest of the entries and go to a slow path when a null entry (or the null sentinel) is encountered. Thank you , @xmas92 ! With that, the code looks correct. The unrolled iterations may find a match. All other cases are safe, no out-of-bounds. The loop iterations check for null and are thus safe as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20740#discussion_r1743689971 From amitkumar at openjdk.org Wed Sep 4 12:42:19 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 4 Sep 2024 12:42:19 GMT Subject: RFR: 8338658: New Object to ObjectMonitor mapping: s390x implementation [v2] In-Reply-To: <8mpU_pIgnJ_eNCSDLSMRR8zvDErPhSV_G8XePpmUl8U=.026964ac-f75e-472d-9187-f0c65548fa0c@github.com> References: <8mpU_pIgnJ_eNCSDLSMRR8zvDErPhSV_G8XePpmUl8U=.026964ac-f75e-472d-9187-f0c65548fa0c@github.com> Message-ID: <2yeb4_jKO9U7D1zHyLgi0GTKUym2iesw8lSMSl9tvIo=.c59ab670-31d2-48f3-aa2c-032ca2890c66@github.com> On Wed, 4 Sep 2024 07:20:35 GMT, Amit Kumar wrote: >> s390x implementation of [JDK-8315884](https://bugs.openjdk.org/browse/JDK-8315884) New Object to ObjectMonitor mapping; >> >> Testing: >> - tier1-test (fastdebug) >> - tier1-test with UseObjectMonitorTable (fastdebug) >> - tier1-test with UseObjectMonitorTable (release) > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > review comments src/hotspot/cpu/s390/macroAssembler_s390.cpp line 6030: > 6028: z_lghi(top, 0); // tmp1 is free at this point > 6029: z_stg(top, om_cache_addr); > 6030: } @RealLucy should I remove the else Part ? I tested on tier1 and it is not being executed, So maybe safe to remove ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20740#discussion_r1743710586 From fgao at openjdk.org Wed Sep 4 13:05:20 2024 From: fgao at openjdk.org (Fei Gao) Date: Wed, 4 Sep 2024 13:05:20 GMT Subject: RFR: 8337536: AArch64: Enable BTI branch protection for runtime part [v3] In-Reply-To: References: <7JRzzIvH26CZPYCX76eWBbQSYUhMDnOqRufDtWaIXq8=.d3270022-4933-4fa7-828a-f57dbc5b8a46@github.com> Message-ID: On Tue, 3 Sep 2024 09:25:55 GMT, Andrew Haley wrote: > What is the effect on JNI? Is there full interworking with non-branch-protected libraries? @theRealAph, thanks for your review! It should be no problem to have libjvm.so built with BTI and a JNI library built without BTI. BTI marks code pages as "Guarded". For executable pages that have been guarded, all indirect branches must have a destination that is a BTI instruction of the appropriate type. But for unguarded pages, we don?t do this check. This allows BTI to be incrementally turned on for a specific codebase. BTI would then protect the branches within the libraries with BTI but not those without BTI. When we're jumping from JNI to libjvm, it's OK because BTI is enabled for libjvm.so and all the entry points have landing pads. When we're jumping from libjvm to JNI, it's also OK because the code cache pages have BTI disabled so it doesn't need landing pads. To verify it, after patching this PR, I disabled the `-mbranch-protection=standard` flag for all other libraries and enabled it only for jvm, we can have libjvm.so built with BTI and all other libraries built without BTI. Jtreg tests passed. Before the patch, on mainline, we have all other libraries built with BTI but libjvm.so built without BTI, we also have no BTI failures. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20491#issuecomment-2328972872 From lucy at openjdk.org Wed Sep 4 13:10:25 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Wed, 4 Sep 2024 13:10:25 GMT Subject: RFR: 8338658: New Object to ObjectMonitor mapping: s390x implementation [v2] In-Reply-To: <2yeb4_jKO9U7D1zHyLgi0GTKUym2iesw8lSMSl9tvIo=.c59ab670-31d2-48f3-aa2c-032ca2890c66@github.com> References: <8mpU_pIgnJ_eNCSDLSMRR8zvDErPhSV_G8XePpmUl8U=.026964ac-f75e-472d-9187-f0c65548fa0c@github.com> <2yeb4_jKO9U7D1zHyLgi0GTKUym2iesw8lSMSl9tvIo=.c59ab670-31d2-48f3-aa2c-032ca2890c66@github.com> Message-ID: On Wed, 4 Sep 2024 12:39:52 GMT, Amit Kumar wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> review comments > > src/hotspot/cpu/s390/macroAssembler_s390.cpp line 6030: > >> 6028: z_lghi(top, 0); // tmp1 is free at this point >> 6029: z_stg(top, om_cache_addr); >> 6030: } > > @RealLucy should I remove the else Part ? I tested on tier1 and it is not being executed, So maybe safe to remove ? I am often undecided myself. The code as it is now is correct for all displacements. If you omit the else part, you introduce a hidden dependency on the layout of BasicObjectLock and BasicLock. On the other hand, how likely is it that anybody will fundamentally change the layout and thus break the disp12 requirement? Compact or generally valid? **Your choice.** ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20740#discussion_r1743759824 From matsaave at openjdk.org Wed Sep 4 15:37:24 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 4 Sep 2024 15:37:24 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v6] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 12:33:47 GMT, Coleen Phillimore wrote: >> Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. >> >> Tested with tier1-7. >> >> NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. > > Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: > > - Remove unused function declaration. > - Add parameters and rename generate_klass_flags_guard. Updates look good! ------------- Marked as reviewed by matsaave (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20719#pullrequestreview-2280590336 From coleenp at openjdk.org Wed Sep 4 15:51:32 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 4 Sep 2024 15:51:32 GMT Subject: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v6] In-Reply-To: References: Message-ID: <4l8wz-JmihCu8GfhNpa9n9zmFL8kgUohfFIiiFDzRdA=.d6e6bed5-6080-4584-863f-6846b1a65f3c@github.com> On Tue, 3 Sep 2024 12:33:47 GMT, Coleen Phillimore wrote: >> Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. >> >> Tested with tier1-7. >> >> NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. > > Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: > > - Remove unused function declaration. > - Add parameters and rename generate_klass_flags_guard. Thank you for reviewing, Dean, Tobias, Amit, Exe-Boss and reviewing the updates also Matias. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20719#issuecomment-2329419116 From coleenp at openjdk.org Wed Sep 4 15:51:33 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 4 Sep 2024 15:51:33 GMT Subject: Integrated: 8339112: Move JVM Klass flags out of AccessFlags In-Reply-To: References: Message-ID: <65Xo_ExETiiaZqM_SqeQ-ZOd2A6tyYUDvq5x15xLQhs=.cd87cc4a-e5c7-451c-bf06-f0ba00ad4326@github.com> On Mon, 26 Aug 2024 23:54:22 GMT, Coleen Phillimore wrote: > Move JVM implementation access flags that are not specified by the classfile format into Klass so we can shrink AccessFlags to u2 in a future change. > > Tested with tier1-7. > > NOTE: there are arm, ppc and s390 changes to this that are just a guess. Also, graal changes. This pull request has now been integrated. Changeset: 0cfd08f5 Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/0cfd08f55aa166dc3f027887c886fa0b40a2ca21 Stats: 329 lines in 53 files changed: 163 ins; 51 del; 115 mod 8339112: Move JVM Klass flags out of AccessFlags Reviewed-by: matsaave, cjplummer, dlong, thartmann, yzheng ------------- PR: https://git.openjdk.org/jdk/pull/20719 From matsaave at openjdk.org Wed Sep 4 16:29:51 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 4 Sep 2024 16:29:51 GMT Subject: RFR: 8338530: CDS warning Skipping java/lang/invoke/BoundMethodHandle$Species_LLLL [v2] In-Reply-To: <7gA4o-mcaRw6Qz_MYfmnV5kyvyaJlsQNXP1HiscQJx4=.7bd68e07-5e9b-4ccd-a6b8-e7c6dc67e855@github.com> References: <7gA4o-mcaRw6Qz_MYfmnV5kyvyaJlsQNXP1HiscQJx4=.7bd68e07-5e9b-4ccd-a6b8-e7c6dc67e855@github.com> Message-ID: <6oZ1_6CnRsDu73RQapRjaIRf_6111l8of3dXgFI-quw=.a0865dc8-144b-48b5-a6d4-bba2cc1ce7c7@github.com> > Since [JDK-8336856](https://bugs.openjdk.org/browse/JDK-8336856), `java -Xshare:dump` reports a warning where a dynamically generated class, java/lang/invoke/BoundMethodHandle$Species_LLLL, is excluded. This patch silently excludes the class as it cannot be archived. Verified with tier x-y tests Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: - Updated test copyright - Added to test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20799/files - new: https://git.openjdk.org/jdk/pull/20799/files/41917d77..df490051 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20799&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20799&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20799.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20799/head:pull/20799 PR: https://git.openjdk.org/jdk/pull/20799 From redestad at openjdk.org Wed Sep 4 16:29:51 2024 From: redestad at openjdk.org (Claes Redestad) Date: Wed, 4 Sep 2024 16:29:51 GMT Subject: RFR: 8338530: CDS warning Skipping java/lang/invoke/BoundMethodHandle$Species_LLLL In-Reply-To: References: <7gA4o-mcaRw6Qz_MYfmnV5kyvyaJlsQNXP1HiscQJx4=.7bd68e07-5e9b-4ccd-a6b8-e7c6dc67e855@github.com> Message-ID: On Sat, 31 Aug 2024 10:40:40 GMT, ExE Boss wrote: >> Is it possible for us at java.lang.invoke to enhance the `GenerateJliClassesHelper` to generate this class? I can look into it. > >> Is it possible for us at java.lang.invoke to enhance the `GenerateJliClassesHelper` to generate this class? I can look into it. > > `GenerateJLIClassesPlugin`/`Helper` is?fine, all?that?needs to?be?updated is?the?[`HelloClasslist`]?script to?hit the?code?path which?generates `BoundMethodHandle$Species_LLLL`. > > [`HelloClasslist`]: https://github.com/openjdk/jdk/blob/master/make/jdk/src/classes/build/tools/classlist/HelloClasslist.java Yes, `Species_*` classes can be statically generated by jlink - using a build-time "training run" of the `HelloClasslist` program @ExE-Boss refers to to generate a list of classes to generate statically into the JDK image. What happened in [JDK-8336856](https://bugs.openjdk.org/browse/JDK-8336856) is likely that the `String` concat expressions touched by `HelloClasslist` no longer needs `Species_LLLL`, but something touched when running `java -Xshare:dump` does, so we get a warning. A dynamically generated `Species_LLLL` wouldn't exist in the JDK image, so it makes sense to exclude it from a CDS dump. Random thoughts: 1. It would probably work to actually make these dumpable into the CDS archive - though we might need to also dump the full bytecode and patch that up when we load them. 2. Looking at what `-Xshare:dump` does there's some `linkMethodHandleConstant` upcall which ends up creating the `Species_LLLL` class seen here - via `MethodHandleImpl.bindCaller`. Anyone who can point me to what's making such upcalls during dump? The provided `Lookup` might need to have other privileges to avoid generating forms with caller sensitive bindings. @iklam ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20799#issuecomment-2327633728 From iklam at openjdk.org Wed Sep 4 16:29:51 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 4 Sep 2024 16:29:51 GMT Subject: RFR: 8338530: CDS warning Skipping java/lang/invoke/BoundMethodHandle$Species_LLLL In-Reply-To: References: <7gA4o-mcaRw6Qz_MYfmnV5kyvyaJlsQNXP1HiscQJx4=.7bd68e07-5e9b-4ccd-a6b8-e7c6dc67e855@github.com> Message-ID: On Sat, 31 Aug 2024 10:40:40 GMT, ExE Boss wrote: >> Is it possible for us at java.lang.invoke to enhance the `GenerateJliClassesHelper` to generate this class? I can look into it. > >> Is it possible for us at java.lang.invoke to enhance the `GenerateJliClassesHelper` to generate this class? I can look into it. > > `GenerateJLIClassesPlugin`/`Helper` is?fine, all?that?needs to?be?updated is?the?[`HelloClasslist`]?script to?hit the?code?path which?generates `BoundMethodHandle$Species_LLLL`. > > [`HelloClasslist`]: https://github.com/openjdk/jdk/blob/master/make/jdk/src/classes/build/tools/classlist/HelloClasslist.java > Yes, `Species_*` classes can be statically generated by jlink - using a build-time "training run" of the `HelloClasslist` program @ExE-Boss refers to to generate a list of classes to generate statically into the JDK image. > > What happened in [JDK-8336856](https://bugs.openjdk.org/browse/JDK-8336856) is likely that the `String` concat expressions touched by `HelloClasslist` no longer needs `Species_LLLL`, but something touched when running `java -Xshare:dump` does, so we get a warning. A dynamically generated `Species_LLLL` wouldn't exist in the JDK image, so it makes sense to exclude it from a CDS dump. > > Random thoughts: > > 1. It would probably work to actually make these dumpable into the CDS archive - though we might need to also dump the full bytecode and patch that up when we load them. > 2. Looking at what `-Xshare:dump` does there's some `linkMethodHandleConstant` upcall which ends up creating the `Species_LLLL` class seen here - via `MethodHandleImpl.bindCaller`. Anyone who can point me to what's making such upcalls during dump? The provided `Lookup` might need to have other privileges to avoid generating forms with caller sensitive bindings. @iklam ? I added this inside `KlassFactory::create_from_stream()` if (name->ends_with("BoundMethodHandle$Species_LLLL")) { fatal("here"); } The hs_err file shows V [libjvm.so+0x10e084b] KlassFactory::create_from_stream(ClassFileStream*, Symbol*, ClassLoaderData*, ClassLoadInfo const&, JavaThread*)+0x15b (klassFactory.cpp:185) V [libjvm.so+0x1547651] SystemDictionary::resolve_class_from_stream(ClassFileStream*, Symbol*, Handle, ClassLoadInfo const&, JavaThread*)+0x13f (systemDictionary.cpp:908) V [libjvm.so+0x15478f0] SystemDictionary::resolve_from_stream(ClassFileStream*, Symbol*, Handle, ClassLoadInfo const&, JavaThread*)+0x6c (systemDictionary.cpp:946) V [libjvm.so+0xf2510d] jvm_lookup_define_class(_jclass*, char const*, signed char const*, int, _jobject*, unsigned char, int, _jobject*, JavaThread*)+0x5b0 (jvm.cpp:1011) V [libjvm.so+0xf256ab] JVM_LookupDefineClass+0x133 (jvm.cpp:1086) C [libjava.so+0xec1b] Java_java_lang_ClassLoader_defineClass0+0x14b j java.lang.ClassLoader.defineClass0(Ljava/lang/ClassLoader;Ljava/lang/Class;Ljava/lang/String;[BIILjava/security/ProtectionDomain;ZILjava/lang/Object;)Ljava/lang/Class;+0 java.base at 24-internal j java.lang.System$2.defineClass(Ljava/lang/ClassLoader;Ljava/lang/Class;Ljava/lang/String;[BLjava/security/ProtectionDomain;ZILjava/lang/Object;)Ljava/lang/Class;+17 java.base at 24-internal j java.lang.invoke.MethodHandles$Lookup$ClassDefiner.defineClass(ZLjava/lang/Object;)Ljava/lang/Class;+57 java.base at 24-internal j java.lang.invoke.MethodHandles$Lookup$ClassDefiner.defineClass(Z)Ljava/lang/Class;+3 java.base at 24-internal j java.lang.invoke.ClassSpecializer$Factory.generateConcreteSpeciesCode(Ljava/lang/String;Ljava/lang/invoke/ClassSpecializer$SpeciesData;)Ljava/lang/Class;+37 java.base at 24-internal j java.lang.invoke.ClassSpecializer$Factory.loadSpecies(Ljava/lang/invoke/ClassSpecializer$SpeciesData;)Ljava/lang/invoke/ClassSpecializer$SpeciesData;+102 java.base at 24-internal j java.lang.invoke.ClassSpecializer.findSpecies(Ljava/lang/Object;)Ljava/lang/invoke/ClassSpecializer$SpeciesData;+53 java.base at 24-internal j java.lang.invoke.BoundMethodHandle$SpeciesData.extendWith(Ljava/lang/invoke/LambdaForm$BasicType;)Ljava/lang/invoke/BoundMethodHandle$SpeciesData;+48 java.base at 24-internal j java.lang.invoke.LambdaFormEditor.newSpeciesData(Ljava/lang/invoke/LambdaForm$BasicType;)Ljava/lang/invoke/BoundMethodHandle$SpeciesData;+5 java.base at 24-internal j java.lang.invoke.LambdaFormEditor.filterReturnForm(Ljava/lang/invoke/LambdaForm$BasicType;Z)Ljava/lang/invoke/LambdaForm;+150 java.base at 24-internal j java.lang.invoke.MethodHandleImpl.makePairwiseConvertByEditor(Ljava/lang/invoke/MethodHandle;Ljava/lang/invoke/MethodType;ZZ)Ljava/lang/invoke/MethodHandle;+521 java.base at 24-internal j java.lang.invoke.MethodHandleImpl.makePairwiseConvert(Ljava/lang/invoke/MethodHandle;Ljava/lang/invoke/MethodType;ZZ)Ljava/lang/invoke/MethodHandle;+18 java.base at 24-internal j java.lang.invoke.MethodHandleImpl.makePairwiseConvert(Ljava/lang/invoke/MethodHandle;Ljava/lang/invoke/MethodType;Z)Ljava/lang/invoke/MethodHandle;+4 java.base at 24-internal j java.lang.invoke.MethodHandle.asTypeUncached(Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/MethodHandle;+50 java.base at 24-internal j java.lang.invoke.MethodHandle.asType(Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/MethodHandle;+25 java.base at 24-internal j java.lang.invoke.MethodHandleImpl$BindCaller.restoreToType(Ljava/lang/invoke/MethodHandle;Ljava/lang/invoke/MethodHandle;Ljava/lang/Class;)Ljava/lang/invoke/MethodHandle;+26 java.base at 24-internal j java.lang.invoke.MethodHandleImpl$BindCaller.bindCallerWithInjectedInvoker(Ljava/lang/invoke/MethodHandle;Ljava/lang/Class;)Ljava/lang/invoke/MethodHandle;+26 java.base at 24-internal j java.lang.invoke.MethodHandleImpl$BindCaller.bindCaller(Ljava/lang/invoke/MethodHandle;Ljava/lang/Class;)Ljava/lang/invoke/MethodHandle;+175 java.base at 24-internal j java.lang.invoke.MethodHandleImpl.bindCaller(Ljava/lang/invoke/MethodHandle;Ljava/lang/Class;)Ljava/lang/invoke/MethodHandle;+2 java.base at 24-internal j java.lang.invoke.MethodHandles$Lookup.maybeBindCaller(Ljava/lang/invoke/MemberName;Ljava/lang/invoke/MethodHandle;Ljava/lang/invoke/MethodHandles$Lookup;)Ljava/lang/invoke/MethodHandle;+64 java.base at 24-internal j java.lang.invoke.MethodHandles$Lookup.getDirectMethodCommon(BLjava/lang/Class;Ljava/lang/invoke/MemberName;ZZLjava/lang/invoke/MethodHandles$Lookup;)Ljava/lang/invoke/MethodHandle;+284 java.base at 24-internal j java.lang.invoke.MethodHandles$Lookup.getDirectMethodNoSecurityManager(BLjava/lang/Class;Ljava/lang/invoke/MemberName;Ljava/lang/invoke/MethodHandles$Lookup;)Ljava/lang/invoke/MethodHandle;+14 java.base at 24-internal j java.lang.invoke.MethodHandles$Lookup.getDirectMethodForConstant(BLjava/lang/Class;Ljava/lang/invoke/MemberName;)Ljava/lang/invoke/MethodHandle;+31 java.base at 24-internal j java.lang.invoke.MethodHandles$Lookup.linkMethodHandleConstant(BLjava/lang/Class;Ljava/lang/String;Ljava/lang/Object;)Ljava/lang/invoke/MethodHandle;+153 java.base at 24-internal j java.lang.invoke.MethodHandleNatives.linkMethodHandleConstant(Ljava/lang/Class;ILjava/lang/Class;Ljava/lang/String;Ljava/lang/Object;)Ljava/lang/invoke/MethodHandle;+38 java.base at 24-internal v ~StubRoutines::call_stub 0x00007f5f6bbbfd01 V [libjvm.so+0xdf7fc4] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x62a (javaCalls.cpp:421) V [libjvm.so+0x13225b6] os::os_exception_wrapper(void (*)(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*), JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x36 (os_linux.cpp:4975) V [libjvm.so+0xdf7996] JavaCalls::call(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x3a (javaCalls.cpp:329) V [libjvm.so+0xdf72ac] JavaCalls::call_static(JavaValue*, Klass*, Symbol*, Symbol*, JavaCallArguments*, JavaThread*)+0x154 (javaCalls.cpp:256) V [libjvm.so+0x154c706] SystemDictionary::link_method_handle_constant(Klass*, int, Klass*, Symbol*, Symbol*, JavaThread*)+0x40a (systemDictionary.cpp:2325) V [libjvm.so+0x9fa820] ConstantPool::resolve_constant_at_impl(constantPoolHandle const&, int, int, bool*, JavaThread*)+0xd88 (constantPool.cpp:1192) V [libjvm.so+0x735099] ConstantPool::resolve_possibly_cached_constant_at(int, JavaThread*)+0x4b (constantPool.hpp:711) V [libjvm.so+0x9fb225] ConstantPool::copy_bootstrap_arguments_at_impl(constantPoolHandle const&, int, int, int, objArrayHandle, int, bool, Handle, JavaThread*)+0x1c7 (constantPool.cpp:1312) V [libjvm.so+0x735182] ConstantPool::copy_bootstrap_arguments_at(int, int, int, objArrayHandle, int, bool, Handle, JavaThread*)+0x66 (constantPool.hpp:724) V [libjvm.so+0x733a29] BootstrapInfo::resolve_args(JavaThread*)+0x3a5 (bootstrapInfo.cpp:193) V [libjvm.so+0x7334e4] BootstrapInfo::resolve_bsm(JavaThread*)+0x23e (bootstrapInfo.cpp:110) V [libjvm.so+0x949b93] ClassListParser::resolve_indy_impl(Symbol*, JavaThread*)+0x1db (classListParser.cpp:613) V [libjvm.so+0x9498ce] ClassListParser::resolve_indy(JavaThread*, Symbol*)+0x50 (classListParser.cpp:578) V [libjvm.so+0x9485d0] ClassListParser::parse_at_tags(JavaThread*)+0x22e (classListParser.cpp:302) V [libjvm.so+0x947af9] ClassListParser::parse(JavaThread*)+0x8d (classListParser.cpp:120) V [libjvm.so+0x125b889] ClassListParser::parse_classlist(char const*, ClassListParser::ParseMode, JavaThread*)+0x66 (classListParser.hpp:144) V [libjvm.so+0x1258192] MetaspaceShared::preload_classes(JavaThread*)+0xa2 (metaspaceShared.cpp:737) V [libjvm.so+0x1258307] MetaspaceShared::preload_and_dump_impl(StaticArchiveBuilder&, JavaThread*)+0x29 (metaspaceShared.cpp:761) V [libjvm.so+0x1257d6e] MetaspaceShared::preload_and_dump(JavaThread*)+0x54 (metaspaceShared.cpp:660) V [libjvm.so+0x159e76f] Threads::create_vm(JavaVMInitArgs*, bool*)+0xafb (threads.cpp:821) V [libjvm.so+0xef6538] JNI_CreateJavaVM_inner(JavaVM_**, void**, void*)+0x93 (jni.cpp:3594) V [libjvm.so+0xef696e] JNI_CreateJavaVM+0x32 (jni.cpp:3685) C [libjli.so+0x45ff] JavaMain+0x8f C [libjli.so+0x7f49] ThreadJavaMain+0x9 >From looking at `*ClassListParser::_instance` inside gdb, the line in the classlist being processed is this one: @lambda-proxy java/util/logging/Level$KnownLevel apply ()Ljava/util/function/Function; (Ljava/lang/Object;)Ljava/lang/Object; REF_invokeStatic java/util/logging/Level$KnownLevel lambda$add$3 (Ljava/lang/String;)Ljava/util/List; (Ljava/lang/String;)Ljava/util/List; This is a lambda proxy class that was generated when executing `HelloClasslist` . Perhaps it took a different path and didn't end up needing the `Species_LLLL`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20799#issuecomment-2329504434 From iklam at openjdk.org Wed Sep 4 16:42:19 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 4 Sep 2024 16:42:19 GMT Subject: RFR: 8338530: CDS warning Skipping java/lang/invoke/BoundMethodHandle$Species_LLLL In-Reply-To: References: <7gA4o-mcaRw6Qz_MYfmnV5kyvyaJlsQNXP1HiscQJx4=.7bd68e07-5e9b-4ccd-a6b8-e7c6dc67e855@github.com> Message-ID: On Wed, 4 Sep 2024 16:25:26 GMT, Ioi Lam wrote: > 1. It would probably work to actually make these dumpable into the CDS archive - though we might need to also dump the full bytecode and patch that up when we load them. With [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737), we can archive these classes when `-XX:+AOTClassLinking` is enabled: if (k->name()->starts_with("java/lang/invoke/BoundMethodHandle$Species_")) { // This class is dynamically generated by the JDK + if (CDSConfig::is_dumping_aot_linked_classes() { + // This class is archived without recording the original bytecodes. That's OK because the original + // bytecodes are needed only for JVMTI ClassFileLoadHook. A CDS archived generated + // with -XX:+AOTClassLinking will not be loadable if ClassFileLoadHooks are enabled. + k->set_shared_classpath_index(0); + } else { ResourceMark rm; log_info(cds)("Skipping %s because it is dynamically generated", k->name()->as_C_string()); return true; // exclude without warning + } } else { // These are classes loaded from unsupported locations (such as those loaded by JVMTI native // agent during dump time). return warn_excluded(k, "Unsupported location"); } We can't do this when `AOTClassLinking` is disabled, or else when CFLH asks for the original bytecodes, we will get an assert in [`FileMapInfo::open_stream_for_jvmti()`](https://github.com/openjdk/jdk/blob/12d060a255b9b783488714c6c2cb73a899d3f708/src/hotspot/share/cds/filemap.cpp#L2547): ------------- PR Comment: https://git.openjdk.org/jdk/pull/20799#issuecomment-2329533301 From sgehwolf at openjdk.org Wed Sep 4 17:46:00 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Wed, 4 Sep 2024 17:46:00 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v9] In-Reply-To: References: Message-ID: > Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. > > I'm adding those tests in order to not regress another time. > > Testing: > - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. > - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) > - [x] GHA Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: - Adapt JDK-8339148 - Merge branch 'master' into jdk-8333446-systemd-slice-tests - Merge branch 'master' into jdk-8333446-systemd-slice-tests - Fix comment of WB::host_cpus() - Handle non-root + CGv2 - Add nested hierarchy to test framework - Revert "Add root check for SystemdMemoryAwarenessTest.java" This reverts commit 7e8d9ed46815096ae8c4502f3320ebf5208438d5. - Add root check for SystemdMemoryAwarenessTest.java - Merge branch 'master' into jdk-8333446-systemd-slice-tests - Merge branch 'master' into jdk-8333446-systemd-slice-tests - ... and 7 more: https://git.openjdk.org/jdk/compare/29f3dd39...30f32d22 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19530/files - new: https://git.openjdk.org/jdk/pull/19530/files/cf49a96e..30f32d22 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19530&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19530&range=07-08 Stats: 2593 lines in 119 files changed: 1994 ins; 169 del; 430 mod Patch: https://git.openjdk.org/jdk/pull/19530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19530/head:pull/19530 PR: https://git.openjdk.org/jdk/pull/19530 From iklam at openjdk.org Wed Sep 4 19:31:19 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 4 Sep 2024 19:31:19 GMT Subject: RFR: 8338530: CDS warning Skipping java/lang/invoke/BoundMethodHandle$Species_LLLL [v2] In-Reply-To: <6oZ1_6CnRsDu73RQapRjaIRf_6111l8of3dXgFI-quw=.a0865dc8-144b-48b5-a6d4-bba2cc1ce7c7@github.com> References: <7gA4o-mcaRw6Qz_MYfmnV5kyvyaJlsQNXP1HiscQJx4=.7bd68e07-5e9b-4ccd-a6b8-e7c6dc67e855@github.com> <6oZ1_6CnRsDu73RQapRjaIRf_6111l8of3dXgFI-quw=.a0865dc8-144b-48b5-a6d4-bba2cc1ce7c7@github.com> Message-ID: On Wed, 4 Sep 2024 16:29:51 GMT, Matias Saavedra Silva wrote: >> Since [JDK-8336856](https://bugs.openjdk.org/browse/JDK-8336856), `java -Xshare:dump` reports a warning where a dynamically generated class, java/lang/invoke/BoundMethodHandle$Species_LLLL, is excluded. This patch silently excludes the class as it cannot be archived. Verified with tier x-y tests > > Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: > > - Updated test copyright > - Added to test LGTM ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20799#pullrequestreview-2281070412 From stooke at openjdk.org Wed Sep 4 20:10:54 2024 From: stooke at openjdk.org (Simon Tooke) Date: Wed, 4 Sep 2024 20:10:54 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v2] In-Reply-To: References: Message-ID: > This PR changes the status of realpath() from a Posix-specific API to a globally available API, i.e. adding it to the "Hotspot Porting API". Code would refer to os::realpath() instead of os::Posix::realpath(). > > This requires the addition of a stub routine in os_posix.cpp and a Windows implementation of realpath(), using Windows _fullpath(). > > This PR depends on #20597 in that it removes the need for one #ifdef in that PR. Because of that, this PR will be modified when and if #20597 is integrated. > > Please note that guidelines for doing this appear in src/hotspot/share/runtime/os.hpp Simon Tooke has updated the pull request incrementally with two additional commits since the last revision: - simplify windwos realpath() implementation - get rid of os::posix::realpath() and os::win32::realpath() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20683/files - new: https://git.openjdk.org/jdk/pull/20683/files/e59cad60..b7f495b2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20683&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20683&range=00-01 Stats: 108 lines in 4 files changed: 36 ins; 68 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20683.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20683/head:pull/20683 PR: https://git.openjdk.org/jdk/pull/20683 From stooke at openjdk.org Wed Sep 4 20:10:54 2024 From: stooke at openjdk.org (Simon Tooke) Date: Wed, 4 Sep 2024 20:10:54 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v2] In-Reply-To: References: Message-ID: <94HlSEbw0xhT9I7utkByuFl3tyQ-budbDywKy7uv_rY=.71183944-310d-47a6-bec1-05300fc50650@github.com> On Thu, 29 Aug 2024 05:45:35 GMT, David Holmes wrote: >> Simon Tooke has updated the pull request incrementally with two additional commits since the last revision: >> >> - simplify windwos realpath() implementation >> - get rid of os::posix::realpath() and os::win32::realpath() > > This is okay in principle but a few changes can be made. > > Also it seems that none of the callers of `realpath` ever check `errno` so I think that can be removed. > > Thanks Hello, @dholmes-ora , and thank you for your review! I am in the process of attempting to address your concerns. Your review has resulted in simpler code for my implementation - so thanks for that too. Please let me know if you have more concerns or questions. > src/hotspot/os/windows/os_windows.cpp line 5344: > >> 5342: // In this case, use the user provided buffer but at least check whether _fullpath caused >> 5343: // a memory overwrite. >> 5344: if (errno == EINVAL) { > > There is nothing to indicate that `_fullpath` can ever set `EINVAL` it is only specified to return null on error. This code should not check errno but can just re-try with the user-supplied buffer. According to [the documentation](https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/fullpath-wfullpath?view=msvc-170), EINVAL can be set in some circumstances. Those circumstances are checked earlier in the function, so I have simplified this code. Setting EINVAL (or ENAMETOOLONG) is part of the existing documentation for os::realpath(); I am loathe to change it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20683#issuecomment-2329878249 PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1744350217 From ccheung at openjdk.org Wed Sep 4 20:17:20 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Wed, 4 Sep 2024 20:17:20 GMT Subject: RFR: 8338530: CDS warning Skipping java/lang/invoke/BoundMethodHandle$Species_LLLL [v2] In-Reply-To: <6oZ1_6CnRsDu73RQapRjaIRf_6111l8of3dXgFI-quw=.a0865dc8-144b-48b5-a6d4-bba2cc1ce7c7@github.com> References: <7gA4o-mcaRw6Qz_MYfmnV5kyvyaJlsQNXP1HiscQJx4=.7bd68e07-5e9b-4ccd-a6b8-e7c6dc67e855@github.com> <6oZ1_6CnRsDu73RQapRjaIRf_6111l8of3dXgFI-quw=.a0865dc8-144b-48b5-a6d4-bba2cc1ce7c7@github.com> Message-ID: On Wed, 4 Sep 2024 16:29:51 GMT, Matias Saavedra Silva wrote: >> Since [JDK-8336856](https://bugs.openjdk.org/browse/JDK-8336856), `java -Xshare:dump` reports a warning where a dynamically generated class, java/lang/invoke/BoundMethodHandle$Species_LLLL, is excluded. This patch silently excludes the class as it cannot be archived. Verified with tier x-y tests > > Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: > > - Updated test copyright > - Added to test Looks good. ------------- Marked as reviewed by ccheung (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20799#pullrequestreview-2281153917 From matsaave at openjdk.org Wed Sep 4 20:52:24 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 4 Sep 2024 20:52:24 GMT Subject: RFR: 8338530: CDS warning Skipping java/lang/invoke/BoundMethodHandle$Species_LLLL In-Reply-To: References: <7gA4o-mcaRw6Qz_MYfmnV5kyvyaJlsQNXP1HiscQJx4=.7bd68e07-5e9b-4ccd-a6b8-e7c6dc67e855@github.com> Message-ID: On Wed, 4 Sep 2024 16:39:43 GMT, Ioi Lam wrote: >>> Yes, `Species_*` classes can be statically generated by jlink - using a build-time "training run" of the `HelloClasslist` program @ExE-Boss refers to to generate a list of classes to generate statically into the JDK image. >>> >>> What happened in [JDK-8336856](https://bugs.openjdk.org/browse/JDK-8336856) is likely that the `String` concat expressions touched by `HelloClasslist` no longer needs `Species_LLLL`, but something touched when running `java -Xshare:dump` does, so we get a warning. A dynamically generated `Species_LLLL` wouldn't exist in the JDK image, so it makes sense to exclude it from a CDS dump. >>> >>> Random thoughts: >>> >>> 1. It would probably work to actually make these dumpable into the CDS archive - though we might need to also dump the full bytecode and patch that up when we load them. >>> 2. Looking at what `-Xshare:dump` does there's some `linkMethodHandleConstant` upcall which ends up creating the `Species_LLLL` class seen here - via `MethodHandleImpl.bindCaller`. Anyone who can point me to what's making such upcalls during dump? The provided `Lookup` might need to have other privileges to avoid generating forms with caller sensitive bindings. @iklam ? >> >> I added this inside `KlassFactory::create_from_stream()` >> >> >> if (name->ends_with("BoundMethodHandle$Species_LLLL")) { >> fatal("here"); >> } >> >> >> The hs_err file shows >> >> >> V [libjvm.so+0x10e084b] KlassFactory::create_from_stream(ClassFileStream*, Symbol*, ClassLoaderData*, ClassLoadInfo const&, JavaThread*)+0x15b (klassFactory.cpp:185) >> V [libjvm.so+0x1547651] SystemDictionary::resolve_class_from_stream(ClassFileStream*, Symbol*, Handle, ClassLoadInfo const&, JavaThread*)+0x13f (systemDictionary.cpp:908) >> V [libjvm.so+0x15478f0] SystemDictionary::resolve_from_stream(ClassFileStream*, Symbol*, Handle, ClassLoadInfo const&, JavaThread*)+0x6c (systemDictionary.cpp:946) >> V [libjvm.so+0xf2510d] jvm_lookup_define_class(_jclass*, char const*, signed char const*, int, _jobject*, unsigned char, int, _jobject*, JavaThread*)+0x5b0 (jvm.cpp:1011) >> V [libjvm.so+0xf256ab] JVM_LookupDefineClass+0x133 (jvm.cpp:1086) >> C [libjava.so+0xec1b] Java_java_lang_ClassLoader_defineClass0+0x14b >> j java.lang.ClassLoader.defineClass0(Ljava/lang/ClassLoader;Ljava/lang/Class;Ljava/lang/String;[BIILjava/security/ProtectionDomain;ZILjava/lang/Object;)Ljava/lang/Class;+0 java.base at 24-internal >> j java.lang.System$2.defineClass(Ljava/lang/ClassLoader;Ljava/lang/Class;Lja... > >> 1. It would probably work to actually make these dumpable into the CDS archive - though we might need to also dump the full bytecode and patch that up when we load them. > > With [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737), we can archive these classes when `-XX:+AOTClassLinking` is enabled: > > > if (k->name()->starts_with("java/lang/invoke/BoundMethodHandle$Species_")) { > // This class is dynamically generated by the JDK > + if (CDSConfig::is_dumping_aot_linked_classes() { > + // This class is archived without recording the original bytecodes. That's OK because the original > + // bytecodes are needed only for JVMTI ClassFileLoadHook. A CDS archived generated > + // with -XX:+AOTClassLinking will not be loadable if ClassFileLoadHooks are enabled. > + k->set_shared_classpath_index(0); > + } else { > ResourceMark rm; > log_info(cds)("Skipping %s because it is dynamically generated", k->name()->as_C_string()); > return true; // exclude without warning > + } > } else { > // These are classes loaded from unsupported locations (such as those loaded by JVMTI native > // agent during dump time). > return warn_excluded(k, "Unsupported location"); > } > > > We can't do this when `AOTClassLinking` is disabled, or else when CFLH asks for the original bytecodes, we will get an assert in [`FileMapInfo::open_stream_for_jvmti()`](https://github.com/openjdk/jdk/blob/12d060a255b9b783488714c6c2cb73a899d3f708/src/hotspot/share/cds/filemap.cpp#L2547): Thanks for the reviews and discussion @iklam and @calvinccheung! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20799#issuecomment-2329988639 From matsaave at openjdk.org Wed Sep 4 20:52:25 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 4 Sep 2024 20:52:25 GMT Subject: Integrated: 8338530: CDS warning Skipping java/lang/invoke/BoundMethodHandle$Species_LLLL In-Reply-To: <7gA4o-mcaRw6Qz_MYfmnV5kyvyaJlsQNXP1HiscQJx4=.7bd68e07-5e9b-4ccd-a6b8-e7c6dc67e855@github.com> References: <7gA4o-mcaRw6Qz_MYfmnV5kyvyaJlsQNXP1HiscQJx4=.7bd68e07-5e9b-4ccd-a6b8-e7c6dc67e855@github.com> Message-ID: On Fri, 30 Aug 2024 18:05:24 GMT, Matias Saavedra Silva wrote: > Since [JDK-8336856](https://bugs.openjdk.org/browse/JDK-8336856), `java -Xshare:dump` reports a warning where a dynamically generated class, java/lang/invoke/BoundMethodHandle$Species_LLLL, is excluded. This patch silently excludes the class as it cannot be archived. Verified with tier 1-5 tests This pull request has now been integrated. Changeset: d4dfa012 Author: Matias Saavedra Silva URL: https://git.openjdk.org/jdk/commit/d4dfa0127f4d51c8127c5d4dfe3b58c09500e80f Stats: 12 lines in 2 files changed: 8 ins; 0 del; 4 mod 8338530: CDS warning Skipping java/lang/invoke/BoundMethodHandle$Species_LLLL Reviewed-by: iklam, ccheung ------------- PR: https://git.openjdk.org/jdk/pull/20799 From gziemski at openjdk.org Wed Sep 4 21:17:28 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Wed, 4 Sep 2024 21:17:28 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag [v2] In-Reply-To: References: Message-ID: > Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. > > There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) Gerard Ziemski has updated the pull request incrementally with 308 additional commits since the last revision: - undo MEMFLAGS to MemType - 8339233: Test javax/swing/JButton/SwingButtonResizeTestWithOpenGL.java#id failed: Button renderings are different after window resize Reviewed-by: honkar - 8338924: C1: assert(0 <= i && i < _len) failed: illegal index 5 for length 5 Co-authored-by: Dean Long Reviewed-by: kvn, thartmann - 8339492: StackMapDecoder::writeFrames makes lots of allocations Reviewed-by: liach, redestad, jwaters, asotona - 8332901: Select{Current,New}ItemTest.java for Choice don't open popup on macOS Move SelectCurrentItemTest.java to java/awt/Choice/SelectItem/. Move SelectNewItemTest.java to java/awt/Choice/SelectItem/. Use latches to control test flow instead of delays. Encapsulate the common logic in SelectCurrentItemTest. Provide overridable checkXXX() methods to modify conditions. Provide an overridable method which defines where to click in the choice popup to select an item. Reviewed-by: honkar, prr, dnguyen - 8339148: Make os::Linux::active_processor_count() public Reviewed-by: dholmes, jwaters - 8339112: Move JVM Klass flags out of AccessFlags Reviewed-by: matsaave, cjplummer, dlong, thartmann, yzheng - 8336860: x86: Change integer src operand for CMoveL of 0 and 1 to long Reviewed-by: epeter, chagedorn, shade, qamai, jbhateja - 8325679: Optimize ArrayList subList sort Reviewed-by: liach - 8339131: Remove rarely-used accessor methods from Opcode Reviewed-by: asotona - ... and 298 more: https://git.openjdk.org/jdk/compare/9665d7f7...6d6d70e9 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20497/files - new: https://git.openjdk.org/jdk/pull/20497/files/9665d7f7..6d6d70e9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20497&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20497&range=00-01 Stats: 40412 lines in 1372 files changed: 23818 ins; 9766 del; 6828 mod Patch: https://git.openjdk.org/jdk/pull/20497.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20497/head:pull/20497 PR: https://git.openjdk.org/jdk/pull/20497 From gziemski at openjdk.org Wed Sep 4 21:17:38 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Wed, 4 Sep 2024 21:17:38 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: On Wed, 7 Aug 2024 17:13:06 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. > > There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) I tried undoing my MEMFLAGS to MemType change in preparations for MEMFLAGS to MemTag rename (which I have working), but that caused massive rebase, so I will close this PR and move it to a clean branch if no-one objects? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2330132485 From pchilanomate at openjdk.org Wed Sep 4 21:53:25 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 4 Sep 2024 21:53:25 GMT Subject: RFR: 8335362: [Windows] Stack pointer increment in _cont_thaw stub can cause program to terminate with exit code 0xc0000005 Message-ID: Please review the following fix. In stub routine cont_thaw() we bump the stack pointer by the maximum size required to copy the frames currently stored in the top stackChunk. On Windows this increment of the stack pointer doesn't play nice with the way Windows sets up and manages stack pages. When a thread is created the stack is divided in 3 memory regions: regular committed pages, guard pages, reserved pages. The first pages are committed and the thread can read/write to them with no issues. The next pages(~2/3) are guard pages, which are committed but have the PAGE_GUARD attribute. When the thread tries to access a guard page the first time, the PAGE_GUARD attribute is removed and a new guard page from the reserved region is added. The rest of the stack are reserved pages and if we try to access it directly we get an EXCEPTION_ACCESS_VIOLATION (see bug for more details). So the problem is that we can bump the stack pointer too much and set it to point somewhere in the reserved reg ion. When we then execute the call instruction for method thaw(), we get an EXCEPTION_ACCESS_VIOLATION exception, but because we cannot access the memory at the current stack pointer, we cannot call any method anymore, including the exception handler and the program terminates abruptly with exit code 0xc0000005. The fix implemented is to bang the stack pages one by one to let the Windows page protection take over. This is what we already do in os::map_stack_shadow_pages() in JavaCalls::call_helper(), and also in interpreter (bang_stack_shadow_pages()) and compiler (generate_stack_overflow_check()) code. It's actually also the same mechanism that Windows routine _chkstk used by the compiler employs (see bug comments with assembly code). I added new test BigStackChunk.java that reproduces the issue. The test fails without this fix and passes with it. I also tested the patch by running in mach5 tiers1-7. Thanks, Patricio ------------- Commit messages: - v1 Changes: https://git.openjdk.org/jdk/pull/20862/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20862&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335362 Stats: 125 lines in 3 files changed: 125 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20862.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20862/head:pull/20862 PR: https://git.openjdk.org/jdk/pull/20862 From coleenp at openjdk.org Wed Sep 4 22:06:56 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 4 Sep 2024 22:06:56 GMT Subject: RFR: 8339316: Test runtime/exceptionMsgs/NoClassDefFoundError/NoClassDefFoundErrorTest.java fails after JDK-8338257 [v2] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 04:26:58 GMT, David Holmes wrote: >> In JDK-8338257 I overlooked updating the callers of `UTF8::is_legal_utf8` to pass a `size_t` length parameter. In some cases the length was explicitly cast to `int` and in the test case in question (with `-Xcheck:jni`) this caused integer overflow to a negative value which then became an exceedingly large `size_t` value and we then tried to do utf8 validation on random bytes. >> >> Testing: >> - failing test >> - tiers 1-4 >> >> Thanks > > David Holmes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into 8339316-verify-utf8 > - 8339316: Test runtime/exceptionMsgs/NoClassDefFoundError/NoClassDefFoundErrorTest.java fails after JDK-8338257 Looks good. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20804#pullrequestreview-2281494703 From iklam at openjdk.org Wed Sep 4 22:07:30 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 4 Sep 2024 22:07:30 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking Message-ID: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). **Overview** - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. - The boot classes are loaded as part of `vmClasses::resolve_all()` - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. **All-or-nothing Loading** - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exact same set of modules are visible when the AOT cache was created, and when the AOT cache is used. **Testing** - New test cases are added to test specific behaviors of the `-XX:+AOTClassLinking` flag - A new test group `hotspot_aot_classlinking` is created for running existing CDS tests with the `-XX:+AOTClassLinking` flag. Note that this group filters out test cases that are incompatible with `-XX:+AOTClassLinking`. E.g., classes that use JVMTI ClassFileLoadHook or class redefinition. - We will also modify some of our internal test cases (e.g., with large applications) to also run with the `-XX:+AOTClassLinking` flag. --- See [here](https://bugs.openjdk.org/browse/JDK-8315737) for the sequence of dependent RFEs for implementing JEP 483. ------------- Depends on: https://git.openjdk.org/jdk/pull/20517 Commit messages: - More clean up - Merge branch 'jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver' into jep-483-step-03-8329706-implement-xx-aot-class-linking - Merge branch 'jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver' into jep-483-step-03-8329706-implement-xx-aot-class-linking - 8329706: Implement -XX:+AOTClassLinking Changes: https://git.openjdk.org/jdk/pull/20843/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20843&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8329706 Stats: 1734 lines in 46 files changed: 1577 ins; 52 del; 105 mod Patch: https://git.openjdk.org/jdk/pull/20843.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20843/head:pull/20843 PR: https://git.openjdk.org/jdk/pull/20843 From redestad at openjdk.org Wed Sep 4 22:19:59 2024 From: redestad at openjdk.org (Claes Redestad) Date: Wed, 4 Sep 2024 22:19:59 GMT Subject: RFR: 8338530: CDS warning Skipping java/lang/invoke/BoundMethodHandle$Species_LLLL [v2] In-Reply-To: <6oZ1_6CnRsDu73RQapRjaIRf_6111l8of3dXgFI-quw=.a0865dc8-144b-48b5-a6d4-bba2cc1ce7c7@github.com> References: <7gA4o-mcaRw6Qz_MYfmnV5kyvyaJlsQNXP1HiscQJx4=.7bd68e07-5e9b-4ccd-a6b8-e7c6dc67e855@github.com> <6oZ1_6CnRsDu73RQapRjaIRf_6111l8of3dXgFI-quw=.a0865dc8-144b-48b5-a6d4-bba2cc1ce7c7@github.com> Message-ID: On Wed, 4 Sep 2024 16:29:51 GMT, Matias Saavedra Silva wrote: >> Since [JDK-8336856](https://bugs.openjdk.org/browse/JDK-8336856), `java -Xshare:dump` reports a warning where a dynamically generated class, java/lang/invoke/BoundMethodHandle$Species_LLLL, is excluded. This patch silently excludes the class as it cannot be archived. Verified with tier 1-5 tests > > Matias Saavedra Silva has updated the pull request incrementally with two additional commits since the last revision: > > - Updated test copyright > - Added to test So there is a method reference to `Class::getClassLoader()` (which is `@CallerSensitive`) in `KnownLevel`, and the bootstrapping and linking for that leads to spinning up of a set of MHs and the `Species_LLLL` class. Desugaring that lambda gets rid of the warning. It seems we take slightly different paths when running `-Xshare:dump` than when we run the `HelloClasslist` since it appears this is a lambda the build time thing captures. Different privileges of the `Lookup` object could explain this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20799#issuecomment-2330248731 From coleenp at openjdk.org Wed Sep 4 22:22:56 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 4 Sep 2024 22:22:56 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag [v2] In-Reply-To: References: Message-ID: <2cvG8JIT4qtVeu75lpMeQDhGOZCPieLvO8T1bu4Cs8c=.56bf10ee-4699-49cc-8349-060a3d40a159@github.com> On Wed, 4 Sep 2024 21:17:28 GMT, Gerard Ziemski wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. >> >> There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) > > Gerard Ziemski has updated the pull request incrementally with 308 additional commits since the last revision: > > - undo MEMFLAGS to MemType > - 8339233: Test javax/swing/JButton/SwingButtonResizeTestWithOpenGL.java#id failed: Button renderings are different after window resize > > Reviewed-by: honkar > - 8338924: C1: assert(0 <= i && i < _len) failed: illegal index 5 for length 5 > > Co-authored-by: Dean Long > Reviewed-by: kvn, thartmann > - 8339492: StackMapDecoder::writeFrames makes lots of allocations > > Reviewed-by: liach, redestad, jwaters, asotona > - 8332901: Select{Current,New}ItemTest.java for Choice don't open popup on macOS > > Move SelectCurrentItemTest.java to java/awt/Choice/SelectItem/. > Move SelectNewItemTest.java to java/awt/Choice/SelectItem/. > Use latches to control test flow instead of delays. > Encapsulate the common logic in SelectCurrentItemTest. > Provide overridable checkXXX() methods to modify conditions. > Provide an overridable method which defines where to click > in the choice popup to select an item. > > Reviewed-by: honkar, prr, dnguyen > - 8339148: Make os::Linux::active_processor_count() public > > Reviewed-by: dholmes, jwaters > - 8339112: Move JVM Klass flags out of AccessFlags > > Reviewed-by: matsaave, cjplummer, dlong, thartmann, yzheng > - 8336860: x86: Change integer src operand for CMoveL of 0 and 1 to long > > Reviewed-by: epeter, chagedorn, shade, qamai, jbhateja > - 8325679: Optimize ArrayList subList sort > > Reviewed-by: liach > - 8339131: Remove rarely-used accessor methods from Opcode > > Reviewed-by: asotona > - ... and 298 more: https://git.openjdk.org/jdk/compare/9665d7f7...6d6d70e9 Yes, I think this PR is not right. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2330253062 From kvn at openjdk.org Wed Sep 4 22:26:54 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 4 Sep 2024 22:26:54 GMT Subject: RFR: 8339466: Enumerate shared stubs and define static fields and names via declarations [v4] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 11:10:36 GMT, Andrew Dinn wrote: >> Systematize handling of SharedRuntime stubs. Generate enum ids, static fields and names from declarations using template macros. Systematically reference stubs and stub names using ids. > > Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: > > clean up asserts Few comments. src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 2184: > 2182: } > 2183: #endif > 2184: const char *name = SharedRuntime::stub_name(SharedStubId::deopt_id); Code style: `const char* name` src/hotspot/share/runtime/sharedRuntime.hpp line 66: > 64: #undef SHARED_STUB_FIELD_DECLARE > 65: > 66: #ifndef PRODUCT Use `#ifdef ASSERT` here. ------------- PR Review: https://git.openjdk.org/jdk/pull/20832#pullrequestreview-2281512904 PR Review Comment: https://git.openjdk.org/jdk/pull/20832#discussion_r1744577365 PR Review Comment: https://git.openjdk.org/jdk/pull/20832#discussion_r1744579368 From jiangli at openjdk.org Wed Sep 4 23:39:54 2024 From: jiangli at openjdk.org (Jiangli Zhou) Date: Wed, 4 Sep 2024 23:39:54 GMT Subject: RFR: 8339480: Build static-jdk image with a statically linked launcher In-Reply-To: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> References: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> Message-ID: <7MvsbWwg0NapAkQ45NF2u-KUtT7JaeyDjjPJa3bgK70=.9e181a2f-5d7d-43de-b943-cbd76de06e2f@github.com> On Tue, 3 Sep 2024 12:50:01 GMT, Magnus Ihse Bursie wrote: > As a prerequisite for Hermetic Java, we need a statically linked `java` launcher. It should behave like the normal, dynamically linked `java` launcher, except that all JDK native libraries should be statically, not dynamically, linked. > > This patch is the first step towards this goal. It will generate a `static-jdk` image with a statically linked launcher. This launcher is missing several native libs, however, and does therefore not behave like a proper dynamic java. One of the reasons for this is that local symbol hiding in static libraries are not implemented yet, which causes symbol clashes when linking all static libraries together. This will be addressed in an upcoming patch. > > All changes in the `src` directory are copied from, or inspired by, changes made in [the hermetic-java-runtime branch in Project Leyden](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). make/StaticLibs.gmk line 1: > 1: # Perhaps also consider adopting StaticLink.gmk file name from the https://github.com/openjdk/leyden/tree/hermetic-java-runtime/ branch, as we are mostly doing the static linking here. Creating the static libs is handled elsewhere. make/StaticLibs.gmk line 71: > 69: # libsspi_bridge has name conflicts with sunmscapi > 70: BROKEN_STATIC_LIBS += sspi_bridge > 71: # These libs define DllMain which conflict with Hotspot I'm not aware of the DllMain issue with static linking these libs. Could you please explain? The libawt.a and libdt_socket.a are statically linked with `javastatic` in https://github.com/openjdk/leyden/tree/hermetic-java-runtime/ branch. make/StaticLibs.gmk line 74: > 72: BROKEN_STATIC_LIBS += awt dt_shmem dt_socket javaaccessbridge > 73: # These libs are dependent on any of the above disabled libs > 74: BROKEN_STATIC_LIBS += fontmanager jawt lcms net nio Which specific dependent cause these libs being excluded? In https://github.com/openjdk/leyden/tree/hermetic-java-runtime/ branch, these JDK libs (except `libjawt.a`) are statically linked into `javastatic`. make/StaticLibs.gmk line 118: > 116: OPTIMIZATION := HIGH, \ > 117: STATIC_LAUNCHER := true, \ > 118: LDFLAGS := $(JAVASTATIC_LINK_LDFLAGS), \ I could be missing something, but I don't see where is $JAVASTATIC_LINK_LDFLAGS defined. On a related notes, I think we need to include $JVM_LDFLAGS when linking the static "java". See https://bugs.openjdk.org/browse/JDK-8339522?focusedId=14702923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14702923. make/modules/java.desktop/lib/AwtLibraries.gmk line 176: > 174: > 175: ifneq ($(ENABLE_HEADLESS_ONLY), true) > 176: # We cannot link with both awt_headless and awt_xawt at the same time Just a note on that. It's doable to link with both awt_headless and awt_xawt with some work. I did some quick experiments on that during the initial investigation for hermetic/static Java. src/java.base/unix/native/libjli/java_md.c line 300: > 298: char jvmcfg[], jint so_jvmcfg) { > 299: /* Compute/set the name of the executable. This is needed for macOS. */ > 300: SetExecname(*pargv); Why is `SetExecname()` needed for the static case? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20837#discussion_r1744614583 PR Review Comment: https://git.openjdk.org/jdk/pull/20837#discussion_r1744604685 PR Review Comment: https://git.openjdk.org/jdk/pull/20837#discussion_r1744603414 PR Review Comment: https://git.openjdk.org/jdk/pull/20837#discussion_r1744611776 PR Review Comment: https://git.openjdk.org/jdk/pull/20837#discussion_r1744616878 PR Review Comment: https://git.openjdk.org/jdk/pull/20837#discussion_r1744620611 From dholmes at openjdk.org Thu Sep 5 00:00:59 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 5 Sep 2024 00:00:59 GMT Subject: RFR: 8339316: Test runtime/exceptionMsgs/NoClassDefFoundError/NoClassDefFoundErrorTest.java fails after JDK-8338257 [v2] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 22:04:11 GMT, Coleen Phillimore wrote: >> David Holmes has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'master' into 8339316-verify-utf8 >> - 8339316: Test runtime/exceptionMsgs/NoClassDefFoundError/NoClassDefFoundErrorTest.java fails after JDK-8338257 > > Looks good. Thanks @coleenp ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20804#issuecomment-2330342554 From dholmes at openjdk.org Thu Sep 5 00:00:59 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 5 Sep 2024 00:00:59 GMT Subject: Integrated: 8339316: Test runtime/exceptionMsgs/NoClassDefFoundError/NoClassDefFoundErrorTest.java fails after JDK-8338257 In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 00:40:54 GMT, David Holmes wrote: > In JDK-8338257 I overlooked updating the callers of `UTF8::is_legal_utf8` to pass a `size_t` length parameter. In some cases the length was explicitly cast to `int` and in the test case in question (with `-Xcheck:jni`) this caused integer overflow to a negative value which then became an exceedingly large `size_t` value and we then tried to do utf8 validation on random bytes. > > Testing: > - failing test > - tiers 1-4 > > Thanks This pull request has now been integrated. Changeset: 96df5a6d Author: David Holmes URL: https://git.openjdk.org/jdk/commit/96df5a6d8f90c988b354dbe6bdc510aa4b8ee98b Stats: 11 lines in 7 files changed: 1 ins; 4 del; 6 mod 8339316: Test runtime/exceptionMsgs/NoClassDefFoundError/NoClassDefFoundErrorTest.java fails after JDK-8338257 Reviewed-by: jsjolen, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/20804 From jiangli at openjdk.org Thu Sep 5 00:05:50 2024 From: jiangli at openjdk.org (Jiangli Zhou) Date: Thu, 5 Sep 2024 00:05:50 GMT Subject: RFR: 8339480: Build static-jdk image with a statically linked launcher In-Reply-To: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> References: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> Message-ID: On Tue, 3 Sep 2024 12:50:01 GMT, Magnus Ihse Bursie wrote: > As a prerequisite for Hermetic Java, we need a statically linked `java` launcher. It should behave like the normal, dynamically linked `java` launcher, except that all JDK native libraries should be statically, not dynamically, linked. > > This patch is the first step towards this goal. It will generate a `static-jdk` image with a statically linked launcher. This launcher is missing several native libs, however, and does therefore not behave like a proper dynamic java. One of the reasons for this is that local symbol hiding in static libraries are not implemented yet, which causes symbol clashes when linking all static libraries together. This will be addressed in an upcoming patch. > > All changes in the `src` directory are copied from, or inspired by, changes made in [the hermetic-java-runtime branch in Project Leyden](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). src/hotspot/share/classfile/classLoader.cpp line 953: > 951: assert(CanonicalizeEntry == nullptr, "should not load java library twice"); > 952: if (is_vm_statically_linked()) { > 953: CanonicalizeEntry = CAST_TO_FN_PTR(canonicalize_fn_t, os::lookup_function("JDK_Canonicalize")); Can you add an assert to make sure `CanonicalizeEntry` is not NULL? src/hotspot/share/classfile/classLoader.cpp line 969: > 967: > 968: if (is_vm_statically_linked()) { > 969: JImageOpen = CAST_TO_FN_PTR(JImageOpen_t, os::lookup_function("JIMAGE_Open")); It might be good to assert these are not NULL as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20837#discussion_r1744635408 PR Review Comment: https://git.openjdk.org/jdk/pull/20837#discussion_r1744635916 From jiangli at openjdk.org Thu Sep 5 00:17:51 2024 From: jiangli at openjdk.org (Jiangli Zhou) Date: Thu, 5 Sep 2024 00:17:51 GMT Subject: RFR: 8339480: Build static-jdk image with a statically linked launcher In-Reply-To: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> References: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> Message-ID: On Tue, 3 Sep 2024 12:50:01 GMT, Magnus Ihse Bursie wrote: > As a prerequisite for Hermetic Java, we need a statically linked `java` launcher. It should behave like the normal, dynamically linked `java` launcher, except that all JDK native libraries should be statically, not dynamically, linked. > > This patch is the first step towards this goal. It will generate a `static-jdk` image with a statically linked launcher. This launcher is missing several native libs, however, and does therefore not behave like a proper dynamic java. One of the reasons for this is that local symbol hiding in static libraries are not implemented yet, which causes symbol clashes when linking all static libraries together. This will be addressed in an upcoming patch. > > All changes in the `src` directory are copied from, or inspired by, changes made in [the hermetic-java-runtime branch in Project Leyden](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). src/java.base/unix/native/libjli/java_md.c line 509: > 507: > 508: if (GetApplicationHome(path, pathsize)) { > 509: if (JLI_IsStaticallyLinked()) { `GetJREPath()` does not need to be called for the static case. Any reason why this path is executed for static mode? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20837#discussion_r1744642315 From jiangli at openjdk.org Thu Sep 5 00:27:50 2024 From: jiangli at openjdk.org (Jiangli Zhou) Date: Thu, 5 Sep 2024 00:27:50 GMT Subject: RFR: 8339480: Build static-jdk image with a statically linked launcher In-Reply-To: References: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> Message-ID: <4ZyGLLJL7fVUDI0QXFBOZ1dK97dOjWKDuDCBhsPqEHs=.ac87ad4f-0839-4d5d-a59e-b273f37a6711@github.com> On Tue, 3 Sep 2024 12:51:13 GMT, Magnus Ihse Bursie wrote: > @jianglizhou Can you please check if there are any other contributors that should be acknowledged? Thanks for asking! I checked all the related changes in https://github.com/openjdk/leyden/tree/hermetic-java-runtime branch. Following are the related commits from the branch corresponding to the extracted changes included in this PR. There are no other author for those changes. However, all changes have go though prior reviews by my teammates. I didn't ask it for the earlier integration PR, is there way to mention the reviewer contributions? - https://github.com/openjdk/leyden/commit/bceb753f79b4bc767fdfb71d5f68a84430644df6 - https://github.com/openjdk/leyden/commit/a998a9d6ca44a93d4e9859a17de2dca60963de76 - https://github.com/openjdk/leyden/commit/53aa8f0cf418ab5f435a4b9996c7754fb8505d4b - https://github.com/openjdk/leyden/commit/63f84f5c0a98077c8f49a19f026f103b9e29d6e0 - https://github.com/openjdk/leyden/commit/afe9ca06dd86e8983768de80ba1a08f3c68589b4 - https://github.com/openjdk/leyden/commit/7d75a7f4d6aa020b7580fbbf660b2b3e3a41b27 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20837#issuecomment-2330368791 From jwaters at openjdk.org Thu Sep 5 05:11:55 2024 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 5 Sep 2024 05:11:55 GMT Subject: RFR: 8339480: Build static-jdk image with a statically linked launcher In-Reply-To: <7MvsbWwg0NapAkQ45NF2u-KUtT7JaeyDjjPJa3bgK70=.9e181a2f-5d7d-43de-b943-cbd76de06e2f@github.com> References: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> <7MvsbWwg0NapAkQ45NF2u-KUtT7JaeyDjjPJa3bgK70=.9e181a2f-5d7d-43de-b943-cbd76de06e2f@github.com> Message-ID: On Wed, 4 Sep 2024 23:06:00 GMT, Jiangli Zhou wrote: >> As a prerequisite for Hermetic Java, we need a statically linked `java` launcher. It should behave like the normal, dynamically linked `java` launcher, except that all JDK native libraries should be statically, not dynamically, linked. >> >> This patch is the first step towards this goal. It will generate a `static-jdk` image with a statically linked launcher. This launcher is missing several native libs, however, and does therefore not behave like a proper dynamic java. One of the reasons for this is that local symbol hiding in static libraries are not implemented yet, which causes symbol clashes when linking all static libraries together. This will be addressed in an upcoming patch. >> >> All changes in the `src` directory are copied from, or inspired by, changes made in [the hermetic-java-runtime branch in Project Leyden](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). > > make/StaticLibs.gmk line 71: > >> 69: # libsspi_bridge has name conflicts with sunmscapi >> 70: BROKEN_STATIC_LIBS += sspi_bridge >> 71: # These libs define DllMain which conflict with Hotspot > > I'm not aware of the DllMain issue with static linking these libs. Could you please explain? The libawt.a and libdt_socket.a are statically linked with `javastatic` in https://github.com/openjdk/leyden/tree/hermetic-java-runtime/ branch. DllMain is a Windows specific initialization method that is called when a Windows dll (Dynamic library) is loaded, among other things. Since DllMain is extern "C", it is not mangled and hence likely that having multiple static libraries that each define it will cause multiple symbol definition errors during linking. It might be that the reason hermetic Java hasn't encountered this problem yet is because it mainly tests its code on Linux, while this is a Windows specific issue, since the names you mention (libawt.a and libdt_socket.a) are the names of those libraries on Linux, not Windows. However, the issue likely deeper than that. DllMain is completely wrong to define when inside a static library, and should not be compiled at all when making the static versions of these libraries. Simply localizing the DllMain symbol when creating a static library would be wrong. We'll have to find out how to run the initialization code for each of these currently dynamic libraries without DllMain when compiling them as static libraries > src/hotspot/share/classfile/classLoader.cpp line 953: > >> 951: assert(CanonicalizeEntry == nullptr, "should not load java library twice"); >> 952: if (is_vm_statically_linked()) { >> 953: CanonicalizeEntry = CAST_TO_FN_PTR(canonicalize_fn_t, os::lookup_function("JDK_Canonicalize")); > > Can you add an assert to make sure `CanonicalizeEntry` is not NULL? Also please remember to use nullptr and not NULL! @kimbarrett would appreciate it :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20837#discussion_r1744808169 PR Review Comment: https://git.openjdk.org/jdk/pull/20837#discussion_r1744810286 From amitkumar at openjdk.org Thu Sep 5 07:10:55 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 5 Sep 2024 07:10:55 GMT Subject: RFR: 8338658: New Object to ObjectMonitor mapping: s390x implementation [v2] In-Reply-To: References: <8mpU_pIgnJ_eNCSDLSMRR8zvDErPhSV_G8XePpmUl8U=.026964ac-f75e-472d-9187-f0c65548fa0c@github.com> <2yeb4_jKO9U7D1zHyLgi0GTKUym2iesw8lSMSl9tvIo=.c59ab670-31d2-48f3-aa2c-032ca2890c66@github.com> Message-ID: On Wed, 4 Sep 2024 13:08:04 GMT, Lutz Schmidt wrote: >> src/hotspot/cpu/s390/macroAssembler_s390.cpp line 6030: >> >>> 6028: z_lghi(top, 0); // tmp1 is free at this point >>> 6029: z_stg(top, om_cache_addr); >>> 6030: } >> >> @RealLucy should I remove the else Part ? I tested on tier1 and it is not being executed, So maybe safe to remove ? > > I am often undecided myself. The code as it is now is correct for all displacements. If you omit the else part, you introduce a hidden dependency on the layout of BasicObjectLock and BasicLock. On the other hand, how likely is it that anybody will fundamentally change the layout and thus break the disp12 requirement? Compact or generally valid? **Your choice.** I guess let's keep it. I mean even if there is need to change the layout, then we have to remove `z_mvghi` and switch back to this implementation again. So maybe better we keep it here and hope for the best ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20740#discussion_r1744920346 From stuefe at openjdk.org Thu Sep 5 07:15:51 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 5 Sep 2024 07:15:51 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v2] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 20:10:54 GMT, Simon Tooke wrote: >> This PR changes the status of realpath() from a Posix-specific API to a globally available API, i.e. adding it to the "Hotspot Porting API". Code would refer to os::realpath() instead of os::Posix::realpath(). >> >> This requires the addition of a stub routine in os_posix.cpp and a Windows implementation of realpath(), using Windows _fullpath(). >> >> This PR depends on #20597 in that it removes the need for one #ifdef in that PR. Because of that, this PR will be modified when and if #20597 is integrated. >> >> Please note that guidelines for doing this appear in src/hotspot/share/runtime/os.hpp > > Simon Tooke has updated the pull request incrementally with two additional commits since the last revision: > > - simplify windwos realpath() implementation > - get rid of os::posix::realpath() and os::win32::realpath() Looks good, thanks for the work. One question inline. src/hotspot/os/windows/os_windows.cpp line 5327: > 5325: > 5326: char* result = nullptr; > 5327: ALLOW_C_FUNCTION(::_fullpath, result = ::_fullpath(outbuf, filename, outbuflen);) Would this work for non-Latin-1 utf-8? ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20683#pullrequestreview-2282041961 PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1744925864 From jbhateja at openjdk.org Thu Sep 5 07:45:17 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 5 Sep 2024 07:45:17 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v6] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review suggestions incorportated ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/767aeef3..bec0f449 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=04-05 Stats: 1979 lines in 59 files changed: 670 ins; 809 del; 500 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From jbhateja at openjdk.org Thu Sep 5 07:45:17 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 5 Sep 2024 07:45:17 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v5] In-Reply-To: References: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> Message-ID: <7huBzF7ygKcr1ADKYTizGsyEBNb6dWYaU3g9_StUGB4=.89495de4-e6b5-47d6-9756-41471d366211@github.com> On Tue, 3 Sep 2024 13:09:13 GMT, Emanuel Peter wrote: > You did in fact add `java/lang` methods. I think you need to add tests for all of those. As well. That's going to be even more code to review. Hi @eme64 , As Paul suggested in offline mail chain, lets restrict the changes with this patch to only VectorAPI. Going forward we may need to add special Unsigned value classes wrapping around equivalent sized integers. For the time being moving all the helper APIs int VectorMathUtils.java, these automatically gets exercised by the fallback implementation and we already have tests for next APIs. > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 914: > >> 912: case T_SHORT: vpminuw(dst, src1, src2, vlen_enc); break; >> 913: case T_INT: vpminud(dst, src1, src2, vlen_enc); break; >> 914: case T_LONG: evpminuq(dst, k0, src1, src2, false, vlen_enc); break; > > Can you explain to me what the `k0` is and where it comes from? k0 is an implicit mask register which signifies all true mask. Its not allocatable. Long min / max instructions are only available on AVX512 targets. > src/hotspot/share/opto/addnode.hpp line 194: > >> 192: class SaturatingAddINode : public Node { >> 193: public: >> 194: SaturatingAddINode(Node* in1, Node* in2) : Node(in1,in2) {} > > Suggestion: > > SaturatingAddINode(Node* in1, Node* in2) : Node(in1, in2) {} > > In other places below as well. Not applicable now. > src/hotspot/share/opto/addnode.hpp line 198: > >> 196: virtual const Type* bottom_type() const { return TypeInt::INT; } >> 197: virtual uint ideal_reg() const { return Op_RegI; } >> 198: }; > > Are these not supposed to inherit from the `AddNode`, and then override the corresponding methods? Or are you making them separate for a good reason? As per offline discussion with Paul, we are planning to restrict this patch to only Vector API, please refer to my earlier comments, https://github.com/openjdk/jdk/pull/20507#discussion_r1718044262 To reduce the noise I am keeping only required Vector IR nodes and planning to support scalar saturated operations in subsequent patch. > src/hotspot/share/opto/addnode.hpp line 462: > >> 460: //------------------------------UMaxINode--------------------------------------- >> 461: // Maximum of 2 unsigned integers. >> 462: class UMaxLNode : public Node { > > Here you comment it with `UMaxINode`, but below it is the `UMaxLNode`. The `-------xyz------` comments are really useless. But the semantics description is useful (though you again say integer instead of long here...). Not applicable now. > src/hotspot/share/opto/vectornode.hpp line 634: > >> 632: virtual int Opcode() const; >> 633: }; >> 634: > > This could also be a separate PR. Or are they somehow inseparable from the "saturation" changes? Not applicable now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2330830123 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1744971176 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1744970961 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1744971087 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1744971023 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1744970833 From jbhateja at openjdk.org Thu Sep 5 07:45:18 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 5 Sep 2024 07:45:18 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v5] In-Reply-To: References: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> Message-ID: <73QhgX2mQ9TBRQSq57MyimsFExG0tOKv6_id6EuCV_c=.03442b40-a24d-4623-8f1e-6050087c0e0d@github.com> On Tue, 3 Sep 2024 22:18:20 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolved > > src/hotspot/cpu/x86/x86.ad line 10684: > >> 10682: match(Set dst (SaturatingSubVI src1 src2)); >> 10683: match(Set dst (SaturatingSubVL src1 src2)); >> 10684: effect(TEMP xtmp1, TEMP xtmp2); > > Here we need TEMP dst in effect, the saturating_unsigned_sub_dq_avx defines and uses dst across xtmp1. Thanks, yes live range of MachNode corresponding to TEMP ends at its consumer instruction, they never make their way into liveout set of its block or survive beyond consumer, but back to back updates to DST and TMP may corrupt DST if both are assigned same registers by allocator. > src/java.base/share/classes/java/lang/Long.java line 1987: > >> 1985: public static long addSaturating(long a, long b) { >> 1986: long res = a + b; >> 1987: // HD 2-12 Overflow iff both arguments have the opposite sign of the result > > HD -> Hacker's Delight Thanks for elaborating, I replicated this logic from https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Math.java#L930 Wanted to comply with rest of the codes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1744970392 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1744970574 From jbhateja at openjdk.org Thu Sep 5 08:34:36 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 5 Sep 2024 08:34:36 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v7] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Some cleanups. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/bec0f449..7164783e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=05-06 Stats: 17 lines in 7 files changed: 2 ins; 10 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From ihse at openjdk.org Thu Sep 5 09:54:52 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 5 Sep 2024 09:54:52 GMT Subject: RFR: 8339480: Build static-jdk image with a statically linked launcher In-Reply-To: References: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> <7MvsbWwg0NapAkQ45NF2u-KUtT7JaeyDjjPJa3bgK70=.9e181a2f-5d7d-43de-b943-cbd76de06e2f@github.com> Message-ID: On Thu, 5 Sep 2024 05:06:55 GMT, Julian Waters wrote: >> make/StaticLibs.gmk line 71: >> >>> 69: # libsspi_bridge has name conflicts with sunmscapi >>> 70: BROKEN_STATIC_LIBS += sspi_bridge >>> 71: # These libs define DllMain which conflict with Hotspot >> >> I'm not aware of the DllMain issue with static linking these libs. Could you please explain? The libawt.a and libdt_socket.a are statically linked with `javastatic` in https://github.com/openjdk/leyden/tree/hermetic-java-runtime/ branch. > > DllMain is a Windows specific initialization method that is called when a Windows dll (Dynamic library) is loaded, among other things. Since DllMain is extern "C", it is not mangled and hence likely that having multiple static libraries that each define it will cause multiple symbol definition errors during linking. It might be that the reason hermetic Java hasn't encountered this problem yet is because it mainly tests its code on Linux, while this is a Windows specific issue, since the names you mention (libawt.a and libdt_socket.a) are the names of those libraries on Linux, not Windows. However, the issue likely deeper than that. DllMain is completely wrong to define when inside a static library, and should not be compiled at all when making the static versions of these libraries. Simply localizing the DllMain symbol when creating a static library would be wrong. We'll have to find out how to run the initialization code for each of these currently dynamic libraries without DllMai n when compiling them as static libraries As Julian says, this is for Windows, and you have not even tried to compile that in your prototype. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20837#discussion_r1745162496 From ihse at openjdk.org Thu Sep 5 09:54:53 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 5 Sep 2024 09:54:53 GMT Subject: RFR: 8339480: Build static-jdk image with a statically linked launcher In-Reply-To: <7MvsbWwg0NapAkQ45NF2u-KUtT7JaeyDjjPJa3bgK70=.9e181a2f-5d7d-43de-b943-cbd76de06e2f@github.com> References: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> <7MvsbWwg0NapAkQ45NF2u-KUtT7JaeyDjjPJa3bgK70=.9e181a2f-5d7d-43de-b943-cbd76de06e2f@github.com> Message-ID: <7tmo9e9RcUi06DYLjvQEaEu_XCY4bUa4OcWByw7vCdc=.11672bb7-71ca-46f4-8ed1-48512ab59e15@github.com> On Wed, 4 Sep 2024 23:03:23 GMT, Jiangli Zhou wrote: >> As a prerequisite for Hermetic Java, we need a statically linked `java` launcher. It should behave like the normal, dynamically linked `java` launcher, except that all JDK native libraries should be statically, not dynamically, linked. >> >> This patch is the first step towards this goal. It will generate a `static-jdk` image with a statically linked launcher. This launcher is missing several native libs, however, and does therefore not behave like a proper dynamic java. One of the reasons for this is that local symbol hiding in static libraries are not implemented yet, which causes symbol clashes when linking all static libraries together. This will be addressed in an upcoming patch. >> >> All changes in the `src` directory are copied from, or inspired by, changes made in [the hermetic-java-runtime branch in Project Leyden](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). > > make/StaticLibs.gmk line 74: > >> 72: BROKEN_STATIC_LIBS += awt dt_shmem dt_socket javaaccessbridge >> 73: # These libs are dependent on any of the above disabled libs >> 74: BROKEN_STATIC_LIBS += fontmanager jawt lcms net nio > > Which specific dependent cause these libs being excluded? In https://github.com/openjdk/leyden/tree/hermetic-java-runtime/ branch, these JDK libs (except `libjawt.a`) are statically linked into `javastatic`. Well, but your proof-of-concept only supports clang on linux, where you have enabled symbol hiding. Our conclusion in the zoom talks was that we should strive for getting a static launcher build pushed into mainline before we have full and proper support for symbol hiding on all platforms. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20837#discussion_r1745161079 From ihse at openjdk.org Thu Sep 5 10:02:58 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 5 Sep 2024 10:02:58 GMT Subject: RFR: 8339480: Build static-jdk image with a statically linked launcher In-Reply-To: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> References: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> Message-ID: On Tue, 3 Sep 2024 12:50:01 GMT, Magnus Ihse Bursie wrote: > As a prerequisite for Hermetic Java, we need a statically linked `java` launcher. It should behave like the normal, dynamically linked `java` launcher, except that all JDK native libraries should be statically, not dynamically, linked. > > This patch is the first step towards this goal. It will generate a `static-jdk` image with a statically linked launcher. This launcher is missing several native libs, however, and does therefore not behave like a proper dynamic java. One of the reasons for this is that local symbol hiding in static libraries are not implemented yet, which causes symbol clashes when linking all static libraries together. This will be addressed in an upcoming patch. > > All changes in the `src` directory are copied from, or inspired by, changes made in [the hermetic-java-runtime branch in Project Leyden](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). The `/contributor` is supposed to add attribution to whomever wrote the code. There is no way to document any prior reviewing for code; but they are of course welcome to review this PR, and then it will be documented. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20837#issuecomment-2331113024 From ihse at openjdk.org Thu Sep 5 10:02:59 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 5 Sep 2024 10:02:59 GMT Subject: RFR: 8339480: Build static-jdk image with a statically linked launcher In-Reply-To: <7MvsbWwg0NapAkQ45NF2u-KUtT7JaeyDjjPJa3bgK70=.9e181a2f-5d7d-43de-b943-cbd76de06e2f@github.com> References: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> <7MvsbWwg0NapAkQ45NF2u-KUtT7JaeyDjjPJa3bgK70=.9e181a2f-5d7d-43de-b943-cbd76de06e2f@github.com> Message-ID: On Wed, 4 Sep 2024 23:28:10 GMT, Jiangli Zhou wrote: >> As a prerequisite for Hermetic Java, we need a statically linked `java` launcher. It should behave like the normal, dynamically linked `java` launcher, except that all JDK native libraries should be statically, not dynamically, linked. >> >> This patch is the first step towards this goal. It will generate a `static-jdk` image with a statically linked launcher. This launcher is missing several native libs, however, and does therefore not behave like a proper dynamic java. One of the reasons for this is that local symbol hiding in static libraries are not implemented yet, which causes symbol clashes when linking all static libraries together. This will be addressed in an upcoming patch. >> >> All changes in the `src` directory are copied from, or inspired by, changes made in [the hermetic-java-runtime branch in Project Leyden](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). > > make/modules/java.desktop/lib/AwtLibraries.gmk line 176: > >> 174: >> 175: ifneq ($(ENABLE_HEADLESS_ONLY), true) >> 176: # We cannot link with both awt_headless and awt_xawt at the same time > > Just a note on that. It's doable to link with both awt_headless and awt_xawt with some work. I did some quick experiments on that during the initial investigation for hermetic/static Java. That would require quite some work then..! The two libraries are meant as exclusive complements to each other -- they both implement the same "entry points", but in different ways -- one with X11 support, and one without. For other reasons (outside of static launcher reasons) I'd like to see some refactoring in how this is implemented, but that is completely outside this discussion. For the static launcher scenario, I can't even see the point of trying to include both? What would you accomplish by that? The entire point of having two libraries is that you want to be able to have full workstation capabilities, but then be dependent on the X11 libraries, or have limited capabilities, but skip the X11 dependency. > src/java.base/unix/native/libjli/java_md.c line 300: > >> 298: char jvmcfg[], jint so_jvmcfg) { >> 299: /* Compute/set the name of the executable. This is needed for macOS. */ >> 300: SetExecname(*pargv); > > Why is `SetExecname()` needed for the static case? Because of how macOS re-exec the launcher to reserve the main thread for GUI operations. Please check the rather extensive documentation elsewhere in this file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20837#discussion_r1745171016 PR Review Comment: https://git.openjdk.org/jdk/pull/20837#discussion_r1745172749 From rcastanedalo at openjdk.org Thu Sep 5 10:05:12 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 5 Sep 2024 10:05:12 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v15] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Remove unnecessary g1LoadXVolatile instructions in aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/ed9c0232..9821e795 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=13-14 Stats: 71 lines in 2 files changed: 4 ins; 51 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From ihse at openjdk.org Thu Sep 5 10:05:56 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 5 Sep 2024 10:05:56 GMT Subject: RFR: 8339480: Build static-jdk image with a statically linked launcher In-Reply-To: <7MvsbWwg0NapAkQ45NF2u-KUtT7JaeyDjjPJa3bgK70=.9e181a2f-5d7d-43de-b943-cbd76de06e2f@github.com> References: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> <7MvsbWwg0NapAkQ45NF2u-KUtT7JaeyDjjPJa3bgK70=.9e181a2f-5d7d-43de-b943-cbd76de06e2f@github.com> Message-ID: On Wed, 4 Sep 2024 23:24:13 GMT, Jiangli Zhou wrote: >> As a prerequisite for Hermetic Java, we need a statically linked `java` launcher. It should behave like the normal, dynamically linked `java` launcher, except that all JDK native libraries should be statically, not dynamically, linked. >> >> This patch is the first step towards this goal. It will generate a `static-jdk` image with a statically linked launcher. This launcher is missing several native libs, however, and does therefore not behave like a proper dynamic java. One of the reasons for this is that local symbol hiding in static libraries are not implemented yet, which causes symbol clashes when linking all static libraries together. This will be addressed in an upcoming patch. >> >> All changes in the `src` directory are copied from, or inspired by, changes made in [the hermetic-java-runtime branch in Project Leyden](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). > > make/StaticLibs.gmk line 1: > >> 1: # > > Perhaps also consider adopting StaticLink.gmk file name from the https://github.com/openjdk/leyden/tree/hermetic-java-runtime/ branch, as we are mostly doing the static linking here. Creating the static libs is handled elsewhere. My intention is to move all relevant handling of static linking into this file. > make/StaticLibs.gmk line 118: > >> 116: OPTIMIZATION := HIGH, \ >> 117: STATIC_LAUNCHER := true, \ >> 118: LDFLAGS := $(JAVASTATIC_LINK_LDFLAGS), \ > > I could be missing something, but I don't see where is $JAVASTATIC_LINK_LDFLAGS defined. > > On a related notes, I think we need to include $JVM_LDFLAGS when linking the static "java". See https://bugs.openjdk.org/browse/JDK-8339522?focusedId=14702923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14702923. You are right, this is dead code. Thanks for spotting this. During my experimentation, I tried passing along LDFLAGS from the individual libraries as well, but it turned out not to be a good idea -- the way we have used them were to modify some special properties on a single dynamic library, which did not apply to the static library as a whole. However, there is a risk that we in the future need to add LDFLAGS to a library that also needs to be carried over to the static launcher. If this happens, I guess we need to separate between LDFLAGS_ONLY_FOR_THIS_DLL and LDFLAGS_ALSO_FOR_STATIC_LINKING. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20837#discussion_r1745180739 PR Review Comment: https://git.openjdk.org/jdk/pull/20837#discussion_r1745180044 From jwaters at openjdk.org Thu Sep 5 10:06:55 2024 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 5 Sep 2024 10:06:55 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v2] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 20:10:54 GMT, Simon Tooke wrote: >> This PR changes the status of realpath() from a Posix-specific API to a globally available API, i.e. adding it to the "Hotspot Porting API". Code would refer to os::realpath() instead of os::Posix::realpath(). >> >> This requires the addition of a stub routine in os_posix.cpp and a Windows implementation of realpath(), using Windows _fullpath(). >> >> This PR depends on #20597 in that it removes the need for one #ifdef in that PR. Because of that, this PR will be modified when and if #20597 is integrated. >> >> Please note that guidelines for doing this appear in src/hotspot/share/runtime/os.hpp > > Simon Tooke has updated the pull request incrementally with two additional commits since the last revision: > > - simplify windwos realpath() implementation > - get rid of os::posix::realpath() and os::win32::realpath() Looks good, just 1 comment src/hotspot/os/posix/os_posix.cpp line 1027: > 1025: } > 1026: > 1027: char* os::Posix::realpath(const char* filename, char* outbuf, size_t outbuflen) { I'm looking at this from the GitHub UI so I might be missing something, but why was this moved up? ------------- Marked as reviewed by jwaters (Committer). PR Review: https://git.openjdk.org/jdk/pull/20683#pullrequestreview-2282462140 PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1745181260 From rcastanedalo at openjdk.org Thu Sep 5 10:09:55 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 5 Sep 2024 10:09:55 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v13] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 12:04:09 GMT, Martin Doerr wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with four additional commits since the last revision: >> >> - Increase test coverage of new-object stores with different type information >> - Refactor the two post-barrier removal cases into a single expression >> - Remove unnecessary early null-based post-barrier elision >> - Make store capturability test G1-specific and more precise > > src/hotspot/cpu/aarch64/gc/g1/g1_aarch64.ad line 646: > >> 644: instruct g1LoadPVolatile(iRegPNoSp dst, indirect mem, iRegPNoSp tmp1, iRegPNoSp tmp2, rFlagsReg cr) >> 645: %{ >> 646: predicate(UseG1GC && needs_acquiring_load(n) && n->as_Load()->barrier_data() != 0); > > Remark: This node should never match because the `referent` is never volatile (same for `g1LoadNVolatile`): https://github.com/openjdk/jdk/blob/7a418fc07464fe359a0b45b6d797c65c573770cb/src/java.base/share/classes/java/lang/ref/Reference.java#L157 > Hence, `needs_acquiring_load(n)` and `n->as_Load()->barrier_data() != 0` are never true at the same time. > Not sure if this should somehow be reflected in the code. I've inserted `ShouldNotReachHere` on PPC64. Good catch, thanks! I simply removed the `g1LoadXVolatile` patterns and added a comment explaining why they are not needed (commit 9821e795). The matcher should already fail if we ever end up with an erroneous `LoadX` node `n` for which `UseG1GC && needs_acquiring_load(n) && n->as_Load()->barrier_data() != 0` holds. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1745185394 From alanb at openjdk.org Thu Sep 5 10:19:51 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 5 Sep 2024 10:19:51 GMT Subject: RFR: 8339480: Build static-jdk image with a statically linked launcher In-Reply-To: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> References: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> Message-ID: <1XTLQONTqkE6n6BTX2KzIngNeewtx5F-cHqvHw7Bk4U=.ded8a4e8-e3e4-414a-afc1-9c694bcb9182@github.com> On Tue, 3 Sep 2024 12:50:01 GMT, Magnus Ihse Bursie wrote: > As a prerequisite for Hermetic Java, we need a statically linked `java` launcher. It should behave like the normal, dynamically linked `java` launcher, except that all JDK native libraries should be statically, not dynamically, linked. > > This patch is the first step towards this goal. It will generate a `static-jdk` image with a statically linked launcher. This launcher is missing several native libs, however, and does therefore not behave like a proper dynamic java. One of the reasons for this is that local symbol hiding in static libraries are not implemented yet, which causes symbol clashes when linking all static libraries together. This will be addressed in an upcoming patch. > > All changes in the `src` directory are copied from, or inspired by, changes made in [the hermetic-java-runtime branch in Project Leyden](https://github.com/openjdk/leyden/tree/hermetic-java-runtime). src/java.base/unix/native/libjli/java_md.c line 509: > 507: > 508: if (GetApplicationHome(path, pathsize)) { > 509: if (JLI_IsStaticallyLinked()) { In passing, GetJREPath's function description includes "or registry settings" which is confusing to see now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20837#discussion_r1745198354 From adinn at openjdk.org Thu Sep 5 10:28:41 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 5 Sep 2024 10:28:41 GMT Subject: RFR: 8339466: Enumerate shared stubs and define static fields and names via declarations [v4] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 22:20:31 GMT, Vladimir Kozlov wrote: >> Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: >> >> clean up asserts > > src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 2184: > >> 2182: } >> 2183: #endif >> 2184: const char *name = SharedRuntime::stub_name(SharedStubId::deopt_id); > > Code style: `const char* name` Done here and at all other occurrences. > src/hotspot/share/runtime/sharedRuntime.hpp line 66: > >> 64: #undef SHARED_STUB_FIELD_DECLARE >> 65: >> 66: #ifndef PRODUCT > > Use `#ifdef ASSERT` here. Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20832#discussion_r1745204621 PR Review Comment: https://git.openjdk.org/jdk/pull/20832#discussion_r1745205133 From adinn at openjdk.org Thu Sep 5 10:28:41 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 5 Sep 2024 10:28:41 GMT Subject: RFR: 8339466: Enumerate shared stubs and define static fields and names via declarations [v2] In-Reply-To: References: <58VOUJfcrvWbeA2QQWGJuXIoSMHJ-L_DNF-HpEM2clg=.e7743483-3026-4fd6-a1d7-c06b6cd6a299@github.com> Message-ID: On Tue, 3 Sep 2024 13:50:01 GMT, Andrew Dinn wrote: >> Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: >> >> fix errors in ppc generator > > src/hotspot/share/runtime/stubDeclarations.hpp line 28: > >> 26: #ifndef SHARE_RUNTIME_SHAREDRUNTIME_ID_HPP >> 27: #define SHARE_RUNTIME_SHAREDRUNTIME_ID_HPP >> 28: > > This should be SHARE_RUNTIME_STUBDECLARATIONS_HPP Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20832#discussion_r1745205440 From adinn at openjdk.org Thu Sep 5 10:28:41 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 5 Sep 2024 10:28:41 GMT Subject: RFR: 8339466: Enumerate shared stubs and define static fields and names via declarations [v5] In-Reply-To: References: Message-ID: > Systematize handling of SharedRuntime stubs. Generate enum ids, static fields and names from declarations using template macros. Systematically reference stubs and stub names using ids. Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: further review feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20832/files - new: https://git.openjdk.org/jdk/pull/20832/files/8d477b10..d6eb4b81 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20832&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20832&range=03-04 Stats: 42 lines in 8 files changed: 0 ins; 0 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/20832.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20832/head:pull/20832 PR: https://git.openjdk.org/jdk/pull/20832 From mdoerr at openjdk.org Thu Sep 5 10:45:55 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 5 Sep 2024 10:45:55 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v13] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 10:07:14 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/cpu/aarch64/gc/g1/g1_aarch64.ad line 646: >> >>> 644: instruct g1LoadPVolatile(iRegPNoSp dst, indirect mem, iRegPNoSp tmp1, iRegPNoSp tmp2, rFlagsReg cr) >>> 645: %{ >>> 646: predicate(UseG1GC && needs_acquiring_load(n) && n->as_Load()->barrier_data() != 0); >> >> Remark: This node should never match because the `referent` is never volatile (same for `g1LoadNVolatile`): https://github.com/openjdk/jdk/blob/7a418fc07464fe359a0b45b6d797c65c573770cb/src/java.base/share/classes/java/lang/ref/Reference.java#L157 >> Hence, `needs_acquiring_load(n)` and `n->as_Load()->barrier_data() != 0` are never true at the same time. >> Not sure if this should somehow be reflected in the code. I've inserted `ShouldNotReachHere` on PPC64. > > Good catch, thanks! I simply removed the `g1LoadXVolatile` patterns and added a comment explaining why they are not needed (commit 9821e795). The matcher should already fail if we ever end up with an erroneous `LoadX` node `n` for which `UseG1GC && needs_acquiring_load(n) && n->as_Load()->barrier_data() != 0` holds. Correct. Only the error message may be not so nice ("bad AD file"). PPC64 still has `g1LoadP_acq` and `g1LoadN_acq` which could also be replaced by a comment. But it's not important. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1745230285 From dholmes at openjdk.org Thu Sep 5 12:32:51 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 5 Sep 2024 12:32:51 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v2] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 07:12:21 GMT, Thomas Stuefe wrote: >> Simon Tooke has updated the pull request incrementally with two additional commits since the last revision: >> >> - simplify windwos realpath() implementation >> - get rid of os::posix::realpath() and os::win32::realpath() > > src/hotspot/os/windows/os_windows.cpp line 5327: > >> 5325: >> 5326: char* result = nullptr; >> 5327: ALLOW_C_FUNCTION(::_fullpath, result = ::_fullpath(outbuf, filename, outbuflen);) > > Would this work for non-Latin-1 utf-8? According to the docs: > _fullpath automatically handles multibyte-character string arguments as appropriate, recognizing multibyte-character sequences according to the multibyte code page currently in use. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1745389965 From dholmes at openjdk.org Thu Sep 5 12:32:52 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 5 Sep 2024 12:32:52 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v2] In-Reply-To: <94HlSEbw0xhT9I7utkByuFl3tyQ-budbDywKy7uv_rY=.71183944-310d-47a6-bec1-05300fc50650@github.com> References: <94HlSEbw0xhT9I7utkByuFl3tyQ-budbDywKy7uv_rY=.71183944-310d-47a6-bec1-05300fc50650@github.com> Message-ID: <_FPRq6ozd9ZsvNyw8h-HbMQeZpk2XovORCNfv_9nd7Q=.93c2bae1-479f-4a3c-ab6d-07f079f932a2@github.com> On Wed, 4 Sep 2024 20:05:30 GMT, Simon Tooke wrote: >> src/hotspot/os/windows/os_windows.cpp line 5344: >> >>> 5342: // In this case, use the user provided buffer but at least check whether _fullpath caused >>> 5343: // a memory overwrite. >>> 5344: if (errno == EINVAL) { >> >> There is nothing to indicate that `_fullpath` can ever set `EINVAL` it is only specified to return null on error. This code should not check errno but can just re-try with the user-supplied buffer. > > According to [the documentation](https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/fullpath-wfullpath?view=msvc-170), EINVAL can be set in some circumstances. Those circumstances are checked earlier in the function, so I have simplified this code. > > Setting EINVAL (or ENAMETOOLONG) is part of the existing documentation for os::realpath(); I am loathe to change it. I missed the one case where it can return EINVAL (strange that is buried in the remarks section). Given that none of the callers of realpath ever even look at errno it seems rather pointless to set it, but that cleanup should be a seperate RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1745394594 From adinn at openjdk.org Thu Sep 5 13:19:29 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 5 Sep 2024 13:19:29 GMT Subject: RFR: 8339466: Enumerate shared stubs and define static fields and names via declarations [v6] In-Reply-To: References: Message-ID: <7p8bt2Mw2kEH0miBuXAblTeO2P2QviPIstyz49kN1JM=.93408338-1e0d-4fe1-88c3-492402a10967@github.com> > Systematize handling of SharedRuntime stubs. Generate enum ids, static fields and names from declarations using template macros. Systematically reference stubs and stub names using ids. Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: fix accidental typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20832/files - new: https://git.openjdk.org/jdk/pull/20832/files/d6eb4b81..9d67efe2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20832&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20832&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20832.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20832/head:pull/20832 PR: https://git.openjdk.org/jdk/pull/20832 From epeter at openjdk.org Thu Sep 5 14:28:58 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 5 Sep 2024 14:28:58 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v7] In-Reply-To: References: Message-ID: <_aU7H3hhe2lA5cwndBuEFJ8U2rahyKwm60xqXaIANTQ=.1f027324-95d1-4539-b094-7ac04608fe59@github.com> On Thu, 5 Sep 2024 08:34:36 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Some cleanups. No time to review now. But the title only talks about saturating vector operations. UMin/ UMax is not really a saturating operation, right? Preferably, move it to a separate PR, or at least change the title, please :) Just note on the length of this PR: people are not really excited to review 9k lines at once. I personally spend quite a bit of effort splitting things into smaller units, so that I get things reviewed quicker, and so that I make the life of the reviewer easier. It would be nice if you could split things into smaller units, I think in the end you would get more reviews quicker, and the result would be of higher quality. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2331828999 From epeter at openjdk.org Thu Sep 5 14:47:55 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 5 Sep 2024 14:47:55 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v7] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 08:34:36 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Some cleanups. Just a few quick remarks. src/hotspot/share/opto/vectornode.hpp line 188: > 186: }; > 187: > 188: Suprious newline, please remove Suggestion: src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathUtils.java line 78: > 76: * @since 24 > 77: */ > 78: public static long addSaturating(long a, long b) { Are these public methods any Java dev could use? If so: do we have tests for them? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20507#pullrequestreview-2283248252 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1745655940 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1745665600 From epeter at openjdk.org Thu Sep 5 14:47:56 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 5 Sep 2024 14:47:56 GMT Subject: RFR: 8338021: Support saturating vector operators in VectorAPI [v5] In-Reply-To: <7huBzF7ygKcr1ADKYTizGsyEBNb6dWYaU3g9_StUGB4=.89495de4-e6b5-47d6-9756-41471d366211@github.com> References: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> <7huBzF7ygKcr1ADKYTizGsyEBNb6dWYaU3g9_StUGB4=.89495de4-e6b5-47d6-9756-41471d366211@github.com> Message-ID: On Thu, 5 Sep 2024 07:42:26 GMT, Jatin Bhateja wrote: >> src/hotspot/share/opto/vectornode.hpp line 634: >> >>> 632: virtual int Opcode() const; >>> 633: }; >>> 634: >> >> This could also be a separate PR. Or are they somehow inseparable from the "saturation" changes? > > Not applicable now. What is not applicable? Do you actually need this node for the saturating operations? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1745661230 From fjiang at openjdk.org Thu Sep 5 14:56:02 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 5 Sep 2024 14:56:02 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> <3ME602kl5NBEdiWT53M-B-YHtyU4yYgwc-lVFPxjiKg=.7298f764-788a-4c79-a2aa-324407c63813@github.com> Message-ID: On Wed, 4 Sep 2024 09:07:23 GMT, Roberto Casta?eda Lozano wrote: >> @robcasloz: Thanks for the updates! I have an implementation for PPC64: https://github.com/TheRealMDoerr/jdk/commit/ed9c0232f53a15d768804348e1d8a111fed9a19e >> Do you prefer integrating it soon? Otherwise, I could keep it separate and do more rebasing and testing while this PR evolves further. > >> I have an implementation for PPC64: https://github.com/TheRealMDoerr/jdk/commit/ed9c0232f53a15d768804348e1d8a111fed9a19e > Do you prefer integrating it soon? > > That's great, thank you for your work and your comments and suggestions, Martin! I just merged your implementation. Hi @robcasloz, here is the implementation for RISC-V: https://github.com/feilongjiang/jdk/commit/1c012cfd4a1f4d6e39d6f4798281423f884e97a6 We are still testing the latest changes, results will be updated later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2331932063 From gziemski at openjdk.org Thu Sep 5 15:26:12 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 5 Sep 2024 15:26:12 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag [v2] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 21:17:28 GMT, Gerard Ziemski wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. >> >> There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) > > Gerard Ziemski has updated the pull request incrementally with 308 additional commits since the last revision: > > - undo MEMFLAGS to MemType > - 8339233: Test javax/swing/JButton/SwingButtonResizeTestWithOpenGL.java#id failed: Button renderings are different after window resize > > Reviewed-by: honkar > - 8338924: C1: assert(0 <= i && i < _len) failed: illegal index 5 for length 5 > > Co-authored-by: Dean Long > Reviewed-by: kvn, thartmann > - 8339492: StackMapDecoder::writeFrames makes lots of allocations > > Reviewed-by: liach, redestad, jwaters, asotona > - 8332901: Select{Current,New}ItemTest.java for Choice don't open popup on macOS > > Move SelectCurrentItemTest.java to java/awt/Choice/SelectItem/. > Move SelectNewItemTest.java to java/awt/Choice/SelectItem/. > Use latches to control test flow instead of delays. > Encapsulate the common logic in SelectCurrentItemTest. > Provide overridable checkXXX() methods to modify conditions. > Provide an overridable method which defines where to click > in the choice popup to select an item. > > Reviewed-by: honkar, prr, dnguyen > - 8339148: Make os::Linux::active_processor_count() public > > Reviewed-by: dholmes, jwaters > - 8339112: Move JVM Klass flags out of AccessFlags > > Reviewed-by: matsaave, cjplummer, dlong, thartmann, yzheng > - 8336860: x86: Change integer src operand for CMoveL of 0 and 1 to long > > Reviewed-by: epeter, chagedorn, shade, qamai, jbhateja > - 8325679: Optimize ArrayList subList sort > > Reviewed-by: liach > - 8339131: Remove rarely-used accessor methods from Opcode > > Reviewed-by: asotona > - ... and 298 more: https://git.openjdk.org/jdk/compare/9665d7f7...6d6d70e9 Closing this PR and will move to a new one shortly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2332003251 From gziemski at openjdk.org Thu Sep 5 15:26:13 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 5 Sep 2024 15:26:13 GMT Subject: Withdrawn: 8337563: NMT: rename MEMFLAGS to MemFlag In-Reply-To: References: Message-ID: <09pdopRRxilGVUe0ELoTG8gMe2rOac9igIl3k_eM0BM=.7fc0ff0c-f7f1-4278-851c-183aa4048dd6@github.com> On Wed, 7 Aug 2024 17:13:06 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. > > There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/20497 From kvn at openjdk.org Thu Sep 5 15:38:54 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 5 Sep 2024 15:38:54 GMT Subject: RFR: 8339466: Enumerate shared stubs and define static fields and names via declarations [v6] In-Reply-To: <7p8bt2Mw2kEH0miBuXAblTeO2P2QviPIstyz49kN1JM=.93408338-1e0d-4fe1-88c3-492402a10967@github.com> References: <7p8bt2Mw2kEH0miBuXAblTeO2P2QviPIstyz49kN1JM=.93408338-1e0d-4fe1-88c3-492402a10967@github.com> Message-ID: On Thu, 5 Sep 2024 13:19:29 GMT, Andrew Dinn wrote: >> Systematize handling of SharedRuntime stubs. Generate enum ids, static fields and names from declarations using template macros. Systematically reference stubs and stub names using ids. > > Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: > > fix accidental typo Good. Let me test it before approval. ------------- PR Review: https://git.openjdk.org/jdk/pull/20832#pullrequestreview-2283441087 From rcastanedalo at openjdk.org Thu Sep 5 16:06:59 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 5 Sep 2024 16:06:59 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v10] In-Reply-To: References: <5aSkkYnXN8xLzsZy4OSEVNrIG1rv6dOPESBb4I-nfYE=.7027cc32-4dcf-4674-a9af-c960d8a2d95e@github.com> <3ME602kl5NBEdiWT53M-B-YHtyU4yYgwc-lVFPxjiKg=.7298f764-788a-4c79-a2aa-324407c63813@github.com> Message-ID: On Wed, 4 Sep 2024 09:07:23 GMT, Roberto Casta?eda Lozano wrote: >> @robcasloz: Thanks for the updates! I have an implementation for PPC64: https://github.com/TheRealMDoerr/jdk/commit/ed9c0232f53a15d768804348e1d8a111fed9a19e >> Do you prefer integrating it soon? Otherwise, I could keep it separate and do more rebasing and testing while this PR evolves further. > >> I have an implementation for PPC64: https://github.com/TheRealMDoerr/jdk/commit/ed9c0232f53a15d768804348e1d8a111fed9a19e > Do you prefer integrating it soon? > > That's great, thank you for your work and your comments and suggestions, Martin! I just merged your implementation. > Hi @robcasloz, here is the implementation for RISC-V: [feilongjiang at 1c012cf](https://github.com/feilongjiang/jdk/commit/1c012cfd4a1f4d6e39d6f4798281423f884e97a6) We are still testing the latest changes, results will be updated later. Great, thanks @feilongjiang! Just let me know when you are done with testing and I will merge it into this changeset. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2332119624 From gziemski at openjdk.org Thu Sep 5 16:14:09 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 5 Sep 2024 16:14:09 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag [v2] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 21:17:28 GMT, Gerard Ziemski wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. >> >> There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) > > Gerard Ziemski has updated the pull request incrementally with 308 additional commits since the last revision: > > - undo MEMFLAGS to MemType > - 8339233: Test javax/swing/JButton/SwingButtonResizeTestWithOpenGL.java#id failed: Button renderings are different after window resize > > Reviewed-by: honkar > - 8338924: C1: assert(0 <= i && i < _len) failed: illegal index 5 for length 5 > > Co-authored-by: Dean Long > Reviewed-by: kvn, thartmann > - 8339492: StackMapDecoder::writeFrames makes lots of allocations > > Reviewed-by: liach, redestad, jwaters, asotona > - 8332901: Select{Current,New}ItemTest.java for Choice don't open popup on macOS > > Move SelectCurrentItemTest.java to java/awt/Choice/SelectItem/. > Move SelectNewItemTest.java to java/awt/Choice/SelectItem/. > Use latches to control test flow instead of delays. > Encapsulate the common logic in SelectCurrentItemTest. > Provide overridable checkXXX() methods to modify conditions. > Provide an overridable method which defines where to click > in the choice popup to select an item. > > Reviewed-by: honkar, prr, dnguyen > - 8339148: Make os::Linux::active_processor_count() public > > Reviewed-by: dholmes, jwaters > - 8339112: Move JVM Klass flags out of AccessFlags > > Reviewed-by: matsaave, cjplummer, dlong, thartmann, yzheng > - 8336860: x86: Change integer src operand for CMoveL of 0 and 1 to long > > Reviewed-by: epeter, chagedorn, shade, qamai, jbhateja > - 8325679: Optimize ArrayList subList sort > > Reviewed-by: liach > - 8339131: Remove rarely-used accessor methods from Opcode > > Reviewed-by: asotona > - ... and 298 more: https://git.openjdk.org/jdk/compare/9665d7f7...6d6d70e9 The PR has been moved to https://github.com/openjdk/jdk/pull/20872 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2332134475 From jsjolen at openjdk.org Thu Sep 5 17:52:15 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 5 Sep 2024 17:52:15 GMT Subject: RFR: 8339627: Cleanup Unsafe.setMemory intrinsic code Message-ID: Hi, The code for the `Unsafe.setMemory` intrinsic has a few issues that this PR cleans up. 1. The labels are unused in x86-64 intrinsic 2. The function stub has an incorrect function prototype as it clearly manipulates the array so the array is not const, and we don't read the array so it probably shouldn't be called `src`. That's probably just an issue of `UnsafeArrayCopyStub` being copied and altered insufficiently. Thanks. ------------- Commit messages: - Call dest dst since ArrayCopy above does so - Dead labels - src is not const void* and should be called dest Changes: https://git.openjdk.org/jdk/pull/20873/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20873&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339627 Stats: 4 lines in 2 files changed: 0 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20873.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20873/head:pull/20873 PR: https://git.openjdk.org/jdk/pull/20873 From mdoerr at openjdk.org Thu Sep 5 18:18:56 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 5 Sep 2024 18:18:56 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v15] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 10:05:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Remove unnecessary g1LoadXVolatile instructions in aarch64 I've implemented the same cleanup as on aarch64: https://github.com/TheRealMDoerr/jdk/commit/ad662a256034a09156b1b43673d2640a119740b2 Would be nice if you could apply it. Thanks! In case you want to merge further updates from head, I have no objections. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2332365001 From fparain at openjdk.org Thu Sep 5 19:06:50 2024 From: fparain at openjdk.org (Frederic Parain) Date: Thu, 5 Sep 2024 19:06:50 GMT Subject: RFR: 8335362: [Windows] Stack pointer increment in _cont_thaw stub can cause program to terminate with exit code 0xc0000005 In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 21:05:10 GMT, Patricio Chilano Mateo wrote: > Please review the following fix. In stub routine cont_thaw() we bump the stack pointer by the maximum size required to copy the frames currently stored in the top stackChunk. On Windows this increment of the stack pointer doesn't play nice with the way Windows sets up and manages stack pages. When a thread is created the stack is divided in 3 memory regions: regular committed pages, guard pages, reserved pages. The first pages are committed and the thread can read/write to them with no issues. The next pages(~2/3) are guard pages, which are committed but have the PAGE_GUARD attribute. When the thread tries to access a guard page the first time, the PAGE_GUARD attribute is removed and a new guard page from the reserved region is added. The rest of the stack are reserved pages and if we try to access it directly we get an EXCEPTION_ACCESS_VIOLATION (see bug for more details). So the problem is that we can bump the stack pointer too much and set it to point somewhere in the reserved r egion. When we then execute the call instruction for method thaw(), we get an EXCEPTION_ACCESS_VIOLATION exception, but because we cannot access the memory at the current stack pointer, we cannot call any method anymore, including the exception handler and the program terminates abruptly with exit code 0xc0000005. > > The fix implemented is to bang the stack pages one by one to let the Windows page protection take over. This is what we already do in os::map_stack_shadow_pages() in JavaCalls::call_helper(), and also in interpreter (bang_stack_shadow_pages()) and compiler (generate_stack_overflow_check()) code. It's actually also the same mechanism that Windows routine _chkstk used by the compiler employs (see bug comments with assembly code). > > I added new test BigStackChunk.java that reproduces the issue. The test fails without this fix and passes with it. I also tested the patch by running in mach5 tiers1-7. > > Thanks, > Patricio LGTM. ------------- Marked as reviewed by fparain (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20862#pullrequestreview-2283869209 From duke at openjdk.org Thu Sep 5 19:10:34 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 5 Sep 2024 19:10:34 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v3] In-Reply-To: References: Message-ID: > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: update libm tanh reference test with code review suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20657/files - new: https://git.openjdk.org/jdk/pull/20657/files/4739ad45..39350a37 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=01-02 Stats: 7 lines in 1 file changed: 4 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20657/head:pull/20657 PR: https://git.openjdk.org/jdk/pull/20657 From duke at openjdk.org Thu Sep 5 19:10:34 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 5 Sep 2024 19:10:34 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v2] In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 22:55:18 GMT, Joe Darcy wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Add stub initialization and extra tanh tests > > test/jdk/java/lang/Math/HyperbolicTests.java line 984: > >> 982: double b1 = 0.02; >> 983: double b2 = 5.1; >> 984: double b3 = 55 * Math.log(2)/2; // ~19.062 > > Probably better to use StrictMath.log here or, better use, precompute the value as a constant and document its conceptual origin. Please see the updated code which uses the precomputed value of `b3`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1746031432 From duke at openjdk.org Thu Sep 5 19:12:55 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 5 Sep 2024 19:12:55 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v2] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 00:01:09 GMT, Joe Darcy wrote: > If the test is going to use randomness, then its jtreg tags should include > > `@key randomness` > > and it is preferable to use jdk.test.lib.RandomFactory to get and Random object since that handles printing out a key so the random sequence can be replicated if the test fails. Please see the test updated to use `@key randomness` and` jdk.test.lib.RandomFactory` to get and Random object. > The allowable worst-case error is 2.5 ulp, although at many arguments FDLIBM has a smaller error. > For a general Math vs StrictMath test with an allowable 2.5 ulp error, without knowing how accurate FDLIBM is for that function and argument, a large error of approx. 2X the nominal error should be allowed (in case FDLIBM errors in one direction and the Math method errors in the other direction). > So far the tests haven't failed with error of 2.5ulp. Would it be better to make it 5ulp? Please let me know. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1746034895 From jiangli at openjdk.org Thu Sep 5 19:21:52 2024 From: jiangli at openjdk.org (Jiangli Zhou) Date: Thu, 5 Sep 2024 19:21:52 GMT Subject: RFR: 8339480: Build static-jdk image with a statically linked launcher In-Reply-To: <7tmo9e9RcUi06DYLjvQEaEu_XCY4bUa4OcWByw7vCdc=.11672bb7-71ca-46f4-8ed1-48512ab59e15@github.com> References: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> <7MvsbWwg0NapAkQ45NF2u-KUtT7JaeyDjjPJa3bgK70=.9e181a2f-5d7d-43de-b943-cbd76de06e2f@github.com> <7tmo9e9RcUi06DYLjvQEaEu_XCY4bUa4OcWByw7vCdc=.11672bb7-71ca-46f4-8ed1-48512ab59e15@github.com> Message-ID: On Thu, 5 Sep 2024 09:50:49 GMT, Magnus Ihse Bursie wrote: > Well, but your proof-of-concept only supports clang on linux, where you have enabled symbol hiding. The hermetic-java-runtime branch doesn't have general symbol hiding enabled. That's why I'm wondering what the issues are with these libs except for `libjawt` with the current PR. (A side-note on `libjawt.a`: For static linking, we don't need `libjawt.a` and the headless or headful libs can be directly statically linked with.) > > Our conclusion in the zoom talks was that we should strive for getting a static launcher build pushed into mainline before we have full and proper support for symbol hiding on all platforms. Right, that would be a good & practical way to add the support in the mainline. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20837#discussion_r1746051842 From matsaave at openjdk.org Thu Sep 5 20:15:17 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 5 Sep 2024 20:15:17 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling Message-ID: This patch cleans up the use of `get_new_method()` so callers don't have to worry about throwing `NoSuchMethodError`. The method is refactored to throw the error and avoid ever returning nullptr. Verified with tier1-5 tests. ------------- Commit messages: - 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling Changes: https://git.openjdk.org/jdk/pull/20874/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20874&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338471 Stats: 11 lines in 3 files changed: 1 ins; 6 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20874.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20874/head:pull/20874 PR: https://git.openjdk.org/jdk/pull/20874 From dlong at openjdk.org Thu Sep 5 20:34:49 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 5 Sep 2024 20:34:49 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 18:56:19 GMT, Matias Saavedra Silva wrote: > This patch cleans up the use of `get_new_method()` so callers don't have to worry about throwing `NoSuchMethodError`. The method is refactored to throw the error and avoid ever returning nullptr. Verified with tier1-5 tests. I recently added ciMethod::equals() in JDK-8335120, but I didn't take into account deleted methods. Could you please fix ciMethod::equals() in this PR? With your changes to get_new_method(), it looks like ciMethod::equals() will incorrectly think that two methods are the same if both are deleted. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20874#issuecomment-2332581305 From dlong at openjdk.org Thu Sep 5 20:39:49 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 5 Sep 2024 20:39:49 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 18:56:19 GMT, Matias Saavedra Silva wrote: > This patch cleans up the use of `get_new_method()` so callers don't have to worry about throwing `NoSuchMethodError`. The method is refactored to throw the error and avoid ever returning nullptr. Verified with tier1-5 tests. Also, doesn't this change mean that we can now return Unsafe.throwNoSuchMethodError() instead of the target method? This probably works fine in the interpreter, but I'm worried this could break the compilers in subtle ways. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20874#issuecomment-2332588828 From jiangli at openjdk.org Thu Sep 5 20:39:51 2024 From: jiangli at openjdk.org (Jiangli Zhou) Date: Thu, 5 Sep 2024 20:39:51 GMT Subject: RFR: 8339480: Build static-jdk image with a statically linked launcher In-Reply-To: References: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> <7MvsbWwg0NapAkQ45NF2u-KUtT7JaeyDjjPJa3bgK70=.9e181a2f-5d7d-43de-b943-cbd76de06e2f@github.com> Message-ID: On Thu, 5 Sep 2024 05:06:55 GMT, Julian Waters wrote: >> make/StaticLibs.gmk line 71: >> >>> 69: # libsspi_bridge has name conflicts with sunmscapi >>> 70: BROKEN_STATIC_LIBS += sspi_bridge >>> 71: # These libs define DllMain which conflict with Hotspot >> >> I'm not aware of the DllMain issue with static linking these libs. Could you please explain? The libawt.a and libdt_socket.a are statically linked with `javastatic` in https://github.com/openjdk/leyden/tree/hermetic-java-runtime/ branch. > > DllMain is a Windows specific initialization method that is called when a Windows dll (Dynamic library) is loaded, among other things. Since DllMain is extern "C", it is not mangled and hence likely that having multiple static libraries that each define it will cause multiple symbol definition errors during linking. It might be that the reason hermetic Java hasn't encountered this problem yet is because it mainly tests its code on Linux, while this is a Windows specific issue, since the names you mention (libawt.a and libdt_socket.a) are the names of those libraries on Linux, not Windows. However, the issue likely deeper than that. DllMain is completely wrong to define when inside a static library, and should not be compiled at all when making the static versions of these libraries. Simply localizing the DllMain symbol when creating a static library would be wrong. We'll have to find out how to run the initialization code for each of these currently dynamic libraries without DllMai n when compiling them as static libraries @TheShermanTanker thanks for the details on DllMain issue. Right, we have only tested hermetic/static Java on Linux. I agree that hiding DllMain is not the right approach. One possible solution is to put the DllMain in separate .c/.cpp files and only link with those when building the .dll. With the current PR, we mainly focuses on the Linux (or unix-like) port, e.g. `os::lookup_function()` is not supported in the Windows port yet. Any thoughts on if we only limit static linking these affected JDK libraries on Windows initially, and allow statically linking more libs on Linux? We can add those libs on Windows when we resolve the DllMain issue. For running some minimum jtreg testing initially, I think we would want to include `libnet` (and other libs) in the statically linked `java` binary. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20837#discussion_r1746129101 From jiangli at openjdk.org Thu Sep 5 20:43:52 2024 From: jiangli at openjdk.org (Jiangli Zhou) Date: Thu, 5 Sep 2024 20:43:52 GMT Subject: RFR: 8339480: Build static-jdk image with a statically linked launcher In-Reply-To: References: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> <7MvsbWwg0NapAkQ45NF2u-KUtT7JaeyDjjPJa3bgK70=.9e181a2f-5d7d-43de-b943-cbd76de06e2f@github.com> Message-ID: On Thu, 5 Sep 2024 10:03:19 GMT, Magnus Ihse Bursie wrote: >> make/StaticLibs.gmk line 118: >> >>> 116: OPTIMIZATION := HIGH, \ >>> 117: STATIC_LAUNCHER := true, \ >>> 118: LDFLAGS := $(JAVASTATIC_LINK_LDFLAGS), \ >> >> I could be missing something, but I don't see where is $JAVASTATIC_LINK_LDFLAGS defined. >> >> On a related notes, I think we need to include $JVM_LDFLAGS when linking the static "java". See https://bugs.openjdk.org/browse/JDK-8339522?focusedId=14702923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14702923. > > You are right, this is dead code. Thanks for spotting this. > > During my experimentation, I tried passing along LDFLAGS from the individual libraries as well, but it turned out not to be a good idea -- the way we have used them were to modify some special properties on a single dynamic library, which did not apply to the static library as a whole. > > However, there is a risk that we in the future need to add LDFLAGS to a library that also needs to be carried over to the static launcher. If this happens, I guess we need to separate between LDFLAGS_ONLY_FOR_THIS_DLL and LDFLAGS_ALSO_FOR_STATIC_LINKING. +1 on "separate between LDFLAGS_ONLY_FOR_THIS_DLL and LDFLAGS_ALSO_FOR_STATIC_LINKING" I think we need to get the linker flags sorted out correctly in this initial PR and make sure the needed flags (most importantly the ones used in $JVM_LDFLAGS). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20837#discussion_r1746133505 From duke at openjdk.org Thu Sep 5 20:45:57 2024 From: duke at openjdk.org (halkosajtarevic) Date: Thu, 5 Sep 2024 20:45:57 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v15] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 10:05:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Remove unnecessary g1LoadXVolatile instructions in aarch64 Sorry, one maybe dumb question, hopefully matching the context here: Is this whole handling also required if those references point to instances of enums? Or could we do some further optimizations in such cases? Or is that exactly what C2 is doing afterwards? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2332586175 From jiangli at openjdk.org Thu Sep 5 20:47:50 2024 From: jiangli at openjdk.org (Jiangli Zhou) Date: Thu, 5 Sep 2024 20:47:50 GMT Subject: RFR: 8339480: Build static-jdk image with a statically linked launcher In-Reply-To: References: <5r5p2HyEXsEIr7wnq_5RSMfcbw-gsP4fBvTgr9P2lvY=.d3a51eae-661a-45d2-80e1-723e05e5eb32@github.com> <7MvsbWwg0NapAkQ45NF2u-KUtT7JaeyDjjPJa3bgK70=.9e181a2f-5d7d-43de-b943-cbd76de06e2f@github.com> Message-ID: On Thu, 5 Sep 2024 09:57:15 GMT, Magnus Ihse Bursie wrote: >> make/modules/java.desktop/lib/AwtLibraries.gmk line 176: >> >>> 174: >>> 175: ifneq ($(ENABLE_HEADLESS_ONLY), true) >>> 176: # We cannot link with both awt_headless and awt_xawt at the same time >> >> Just a note on that. It's doable to link with both awt_headless and awt_xawt with some work. I did some quick experiments on that during the initial investigation for hermetic/static Java. > > That would require quite some work then..! The two libraries are meant as exclusive complements to each other -- they both implement the same "entry points", but in different ways -- one with X11 support, and one without. For other reasons (outside of static launcher reasons) I'd like to see some refactoring in how this is implemented, but that is completely outside this discussion. > > For the static launcher scenario, I can't even see the point of trying to include both? What would you accomplish by that? > > The entire point of having two libraries is that you want to be able to have full workstation capabilities, but then be dependent on the X11 libraries, or have limited capabilities, but skip the X11 dependency. My initial understanding was that the libawt_headless was mostly as subset of libawt_xawt, which made it possible to statically link both the headless and headful natives. Completely agree that it's outside of the current scope. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20837#discussion_r1746136942 From dholmes at openjdk.org Thu Sep 5 21:09:51 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 5 Sep 2024 21:09:51 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v2] In-Reply-To: References: Message-ID: <0m0zdvRVNY3ZjLycIST_UNQjTFChOPKKS1KvV1m1stc=.f7ae20ae-4143-4d0b-ba77-dc330d859de6@github.com> On Wed, 4 Sep 2024 20:10:54 GMT, Simon Tooke wrote: >> This PR changes the status of realpath() from a Posix-specific API to a globally available API, i.e. adding it to the "Hotspot Porting API". Code would refer to os::realpath() instead of os::Posix::realpath(). >> >> This requires the addition of a stub routine in os_posix.cpp and a Windows implementation of realpath(), using Windows _fullpath(). >> >> This PR depends on #20597 in that it removes the need for one #ifdef in that PR. Because of that, this PR will be modified when and if #20597 is integrated. >> >> Please note that guidelines for doing this appear in src/hotspot/share/runtime/os.hpp > > Simon Tooke has updated the pull request incrementally with two additional commits since the last revision: > > - simplify windwos realpath() implementation > - get rid of os::posix::realpath() and os::win32::realpath() Windows version looks better now, though still one issue that isn't really solvable - so I'd let it slide (especially as I think the errno settings should be removed anyway). Thanks src/hotspot/os/windows/os_windows.cpp line 5330: > 5328: if (result == nullptr) { > 5329: errno = ENAMETOOLONG; > 5330: } This is a bit of an assumption. What if the name "includes a drive letter that isn't valid or can't be found"? Unfortunately Windows doesn't specify any further details beyond returning null. src/hotspot/share/runtime/os.hpp line 672: > 670: > 671: // A safe implementation of realpath which will not cause a buffer overflow if the resolved path > 672: // is longer than PATH_MAX. Nit: remove leading space to align with text on previous line. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20683#pullrequestreview-2284073591 PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1746155484 PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1746156066 From dholmes at openjdk.org Thu Sep 5 21:09:53 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 5 Sep 2024 21:09:53 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v2] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 10:04:14 GMT, Julian Waters wrote: >> Simon Tooke has updated the pull request incrementally with two additional commits since the last revision: >> >> - simplify windwos realpath() implementation >> - get rid of os::posix::realpath() and os::win32::realpath() > > src/hotspot/os/posix/os_posix.cpp line 1027: > >> 1025: } >> 1026: >> 1027: char* os::Posix::realpath(const char* filename, char* outbuf, size_t outbuflen) { > > I'm looking at this from the GitHub UI so I might be missing something, but why was this moved up? I'd assume because it is now os::foo rather than os::Posix::foo - though such a move is not necessary. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1746156926 From dholmes at openjdk.org Thu Sep 5 21:27:49 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 5 Sep 2024 21:27:49 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling In-Reply-To: References: Message-ID: <_FFKsynzGENvY7Qw96twFTBPhBUPggtYKRnpueZd5Bc=.40870dbb-bc10-4fb4-a107-ee17ee02dcfa@github.com> On Thu, 5 Sep 2024 18:56:19 GMT, Matias Saavedra Silva wrote: > This patch cleans up the use of `get_new_method()` so callers don't have to worry about throwing `NoSuchMethodError`. The method is refactored to throw the error and avoid ever returning nullptr. Verified with tier1-5 tests. Seems fine. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20874#pullrequestreview-2284103581 From dlong at openjdk.org Thu Sep 5 22:24:48 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 5 Sep 2024 22:24:48 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 18:56:19 GMT, Matias Saavedra Silva wrote: > This patch cleans up the use of `get_new_method()` so callers don't have to worry about throwing `NoSuchMethodError`. The method is refactored to throw the error and avoid ever returning nullptr. Verified with tier1-5 tests. In particular, compilers could get confused because the method signature is different from what the caller expects. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20874#issuecomment-2332730075 From kvn at openjdk.org Thu Sep 5 23:29:50 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 5 Sep 2024 23:29:50 GMT Subject: RFR: 8339466: Enumerate shared stubs and define static fields and names via declarations [v6] In-Reply-To: <7p8bt2Mw2kEH0miBuXAblTeO2P2QviPIstyz49kN1JM=.93408338-1e0d-4fe1-88c3-492402a10967@github.com> References: <7p8bt2Mw2kEH0miBuXAblTeO2P2QviPIstyz49kN1JM=.93408338-1e0d-4fe1-88c3-492402a10967@github.com> Message-ID: On Thu, 5 Sep 2024 13:19:29 GMT, Andrew Dinn wrote: >> Systematize handling of SharedRuntime stubs. Generate enum ids, static fields and names from declarations using template macros. Systematically reference stubs and stub names using ids. > > Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: > > fix accidental typo My tier1-4 testing passed clean. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20832#pullrequestreview-2284297306 From fyang at openjdk.org Fri Sep 6 02:14:54 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 6 Sep 2024 02:14:54 GMT Subject: RFR: 8339466: Enumerate shared stubs and define static fields and names via declarations [v6] In-Reply-To: <7p8bt2Mw2kEH0miBuXAblTeO2P2QviPIstyz49kN1JM=.93408338-1e0d-4fe1-88c3-492402a10967@github.com> References: <7p8bt2Mw2kEH0miBuXAblTeO2P2QviPIstyz49kN1JM=.93408338-1e0d-4fe1-88c3-492402a10967@github.com> Message-ID: On Thu, 5 Sep 2024 13:19:29 GMT, Andrew Dinn wrote: >> Systematize handling of SharedRuntime stubs. Generate enum ids, static fields and names from declarations using template macros. Systematically reference stubs and stub names using ids. > > Andrew Dinn has updated the pull request incrementally with one additional commit since the last revision: > > fix accidental typo Updated change LGTM. Thanks! BTW: I have just integrated a GHA fix for riscv which would hopefully catch build issues in time. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20832#pullrequestreview-2284437737 From dholmes at openjdk.org Fri Sep 6 04:58:49 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 6 Sep 2024 04:58:49 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking In-Reply-To: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Tue, 3 Sep 2024 20:57:48 GMT, Ioi Lam wrote: > This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > **Overview** > > - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. > - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. > - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. > - The boot classes are loaded as part of `vmClasses::resolve_all()` > - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). > - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. > > **All-or-nothing Loading** > > - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. > - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: > - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. > - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. > - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. > - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exact same set of modules are visible whe... src/hotspot/share/cds/aotClassLinker.hpp line 101: > 99: // When CDS is enabled, is ik guatanteed to be linked at deployment time (and > 100: // cannot be replaced by JVMTI, etc)? > 101: // This is a necessary (not but sufficient) condition for keeping a direct pointer Suggestion: // This is a necessary (but not sufficient) condition for keeping a direct pointer src/hotspot/share/cds/aotClassLinker.hpp line 106: > 104: static bool is_candidate(InstanceKlass* ik); > 105: > 106: // Request that ik to be added to the candidates table. This will return succeed only if Suggestion: // Request that ik be added to the candidates table. This will return true only if ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1746508873 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1746510269 From dholmes at openjdk.org Fri Sep 6 05:06:50 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 6 Sep 2024 05:06:50 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking In-Reply-To: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Tue, 3 Sep 2024 20:57:48 GMT, Ioi Lam wrote: > This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > **Overview** > > - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. > - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. > - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. > - The boot classes are loaded as part of `vmClasses::resolve_all()` > - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). > - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. > > **All-or-nothing Loading** > > - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. > - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: > - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. > - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. > - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. > - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exact same set of modules are visible whe... src/hotspot/share/cds/aotLinkedClassBulkLoader.cpp line 86: > 84: > 85: load_table(AOTLinkedClassTable::for_static_archive(), loader_kind, h_loader, current); > 86: assert(!current->has_pending_exception(), "VM should have exited due to ExceptionMark"); An `ExceptionMark` only triggers a VM exit on construction and destruction, if an exception is pending. src/hotspot/share/cds/aotLinkedClassBulkLoader.cpp line 143: > 141: InstanceKlass* ik = classes->at(i); > 142: if (log_is_enabled(Info, cds, aot, load)) { > 143: ResourceMark rm; Suggestion: ResourceMark rm(THREAD); src/hotspot/share/cds/aotLinkedClassBulkLoader.cpp line 162: > 160: > 161: if (actual != ik) { > 162: ResourceMark rm; Suggestion: ResourceMark rm(THREAD); src/hotspot/share/cds/aotLinkedClassBulkLoader.cpp line 196: > 194: if (ik->is_public() && !ik->is_hidden()) { > 195: if (log_is_enabled(Info, cds, aot, load)) { > 196: ResourceMark rm; Suggestion: ResourceMark rm(current); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1746512974 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1746513849 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1746514062 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1746514572 From dholmes at openjdk.org Fri Sep 6 05:12:54 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 6 Sep 2024 05:12:54 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking In-Reply-To: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: <4-FpKqNuti9sYRjbrdRV17Ao2f21Cn8EhjF3f8npt0M=.dde056e6-8e6f-4a02-8204-e6f6cedae337@github.com> On Tue, 3 Sep 2024 20:57:48 GMT, Ioi Lam wrote: > This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > **Overview** > > - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. > - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. > - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. > - The boot classes are loaded as part of `vmClasses::resolve_all()` > - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). > - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. > > **All-or-nothing Loading** > > - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. > - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: > - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. > - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. > - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. > - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exact same set of modules are visible whe... src/hotspot/share/cds/archiveBuilder.cpp line 902: > 900: > 901: log_info(cds)("Number of classes %d", num_instance_klasses + num_obj_array_klasses + num_type_array_klasses); > 902: log_info(cds)(" instance classes " STATS_FORMAT, STATS_PARAMS(instance_klasses)); Suggestion: log_info(cds)(" instance classes " STATS_FORMAT, STATS_PARAMS(instance_klasses)); src/hotspot/share/cds/archiveBuilder.cpp line 904: > 902: log_info(cds)(" instance classes " STATS_FORMAT, STATS_PARAMS(instance_klasses)); > 903: log_info(cds)(" boot " STATS_FORMAT, STATS_PARAMS(boot_klasses)); > 904: log_info(cds)(" vm " STATS_FORMAT, STATS_PARAMS(vm_klasses)); Suggestion: log_info(cds)(" vm " STATS_FORMAT, STATS_PARAMS(vm_klasses)); src/hotspot/share/cds/archiveBuilder.cpp line 912: > 910: STATS_PARAMS(unlinked_klasses), > 911: boot_unlinked, platform_unlinked, > 912: app_unlinked, unreg_unlinked); Suggestion: log_info(cds)(" (unlinked) " STATS_FORMAT ", boot = %d, plat = %d, app = %d, unreg = %d", STATS_PARAMS(unlinked_klasses), boot_unlinked, platform_unlinked, app_unlinked, unreg_unlinked); src/hotspot/share/cds/archiveUtils.cpp line 390: > 388: return "boot"; // boot classes in java.base > 389: } else { > 390: return "boot2"; // boot classes outside of java.base Suggestion: boot -> boot-base, boot2 -> boot-nonbase ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1746516082 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1746516226 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1746517363 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1746518309 From dholmes at openjdk.org Fri Sep 6 05:21:49 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 6 Sep 2024 05:21:49 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking In-Reply-To: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Tue, 3 Sep 2024 20:57:48 GMT, Ioi Lam wrote: > This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > **Overview** > > - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. > - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. > - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. > - The boot classes are loaded as part of `vmClasses::resolve_all()` > - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). > - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. > > **All-or-nothing Loading** > > - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. > - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: > - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. > - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. > - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. > - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exact same set of modules are visible whe... src/hotspot/share/cds/filemap.cpp line 2047: > 2045: if (!success) { > 2046: if (CDSConfig::is_using_aot_linked_classes()) { > 2047: // It's too later to recover -- we have already committed to use the archived metaspace objects, but Suggestion: // It's too late to recover -- we have already committed to use the archived metaspace objects, but src/hotspot/share/cds/lambdaFormInvokers.cpp line 104: > 102: // classes in the base archive. If we generate new versions of these classes, those CP entries > 103: // will be pointing to invalid classes. > 104: log_info(cds)("Base archive already have aot-linked lambda form holder classes. Cannot regenerate."); Suggestion: log_info(cds)("Base archive already has aot-linked lambda form holder classes. Cannot regenerate."); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1746522647 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1746523653 From dholmes at openjdk.org Fri Sep 6 05:26:50 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 6 Sep 2024 05:26:50 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking In-Reply-To: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Tue, 3 Sep 2024 20:57:48 GMT, Ioi Lam wrote: > This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > **Overview** > > - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. > - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. > - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. > - The boot classes are loaded as part of `vmClasses::resolve_all()` > - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). > - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. > > **All-or-nothing Loading** > > - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. > - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: > - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. > - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. > - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. > - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exact same set of modules are visible whe... I've taken an initial look through but there is an awful lot to try and digest here. I've flagged numerous typos and minor nits. One general query: does this stuff work if the user defines their own initial application classloader? test/hotspot/jtreg/runtime/cds/appcds/aotClassLinking/AOTClassLinkingVMOptions.java line 2: > 1: /* > 2: * Copyright (c) 2023, 2024, Oracle and/or its affiliates. All rights reserved. Suggestion: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. ------------- PR Review: https://git.openjdk.org/jdk/pull/20843#pullrequestreview-2284828158 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1746525404 From dholmes at openjdk.org Fri Sep 6 06:24:52 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 6 Sep 2024 06:24:52 GMT Subject: RFR: 8335362: [Windows] Stack pointer increment in _cont_thaw stub can cause program to terminate with exit code 0xc0000005 In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 21:05:10 GMT, Patricio Chilano Mateo wrote: > Please review the following fix. In stub routine cont_thaw() we bump the stack pointer by the maximum size required to copy the frames currently stored in the top stackChunk. On Windows this increment of the stack pointer doesn't play nice with the way Windows sets up and manages stack pages. When a thread is created the stack is divided in 3 memory regions: regular committed pages, guard pages, reserved pages. The first pages are committed and the thread can read/write to them with no issues. The next pages(~2/3) are guard pages, which are committed but have the PAGE_GUARD attribute. When the thread tries to access a guard page the first time, the PAGE_GUARD attribute is removed and a new guard page from the reserved region is added. The rest of the stack are reserved pages and if we try to access it directly we get an EXCEPTION_ACCESS_VIOLATION (see bug for more details). So the problem is that we can bump the stack pointer too much and set it to point somewhere in the reserved r egion. When we then execute the call instruction for method thaw(), we get an EXCEPTION_ACCESS_VIOLATION exception, but because we cannot access the memory at the current stack pointer, we cannot call any method anymore, including the exception handler and the program terminates abruptly with exit code 0xc0000005. > > The fix implemented is to bang the stack pages one by one to let the Windows page protection take over. This is what we already do in os::map_stack_shadow_pages() in JavaCalls::call_helper(), and also in interpreter (bang_stack_shadow_pages()) and compiler (generate_stack_overflow_check()) code. It's actually also the same mechanism that Windows routine _chkstk used by the compiler employs (see bug comments with assembly code). > > I added new test BigStackChunk.java that reproduces the issue. The test fails without this fix and passes with it. I also tested the patch by running in mach5 tiers1-7. > > Thanks, > Patricio src/hotspot/share/runtime/continuationFreezeThaw.cpp line 289: > 287: address last_touched_page = watermark - StackOverflow::stack_shadow_zone_size(); > 288: size_t pages_to_touch = align_up(watermark - new_sp, page_size) / page_size; > 289: while (pages_to_touch--) { Suggestion: while (pages_to_touch-- > 0) { src/hotspot/share/runtime/continuationFreezeThaw.cpp line 293: > 291: *last_touched_page = 0; > 292: } > 293: thread->stack_overflow_state()->set_shadow_zone_growth_watermark(new_sp); I'm not familiar with the details of this stack management code and am unclear about the role of the `shadow_zone_growth_watermark` here. The banging code in `os::map_stack_shadow_pages` doesn't access it. test/jdk/java/lang/Thread/virtual/BigStackChunk.java line 47: > 45: int i6 = i5 + 1; > 46: int i7 = i6 + 1; > 47: long ll = 2*(long)i1; Suggestion: long ll = 2 * (long)i1; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20862#discussion_r1746543746 PR Review Comment: https://git.openjdk.org/jdk/pull/20862#discussion_r1746568373 PR Review Comment: https://git.openjdk.org/jdk/pull/20862#discussion_r1746554210 From alanb at openjdk.org Fri Sep 6 06:25:49 2024 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 6 Sep 2024 06:25:49 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Fri, 6 Sep 2024 05:24:18 GMT, David Holmes wrote: > One general query: does this stuff work if the user defines their own initial application classloader? Just to say that there isn't any support in the JDK for replacing any of the 3 built-in class loaders. In normal setup, the system class loader == application class loader. It is possible to run `-Djava.system.class.loader=..` to set your own system class loader, it gets created with the built-in application class loader as its parent. In that setup you start out with 4 class loaders. It's not a widely used feature and I assume goes into the "user-defined class loaders" non-goal bucket. In any case, loading from the class path or module system continues to use the built-in application class loader in that configuration so it might not be totally hostile to the AOT cache. I suspect there are tests already for this setup. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20843#issuecomment-2333330520 From jbhateja at openjdk.org Fri Sep 6 06:30:55 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 6 Sep 2024 06:30:55 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v5] In-Reply-To: References: <4kI0NYrxxgGisMvfwUz0tjHy9RoNGA99qpHgS_wtrAc=.36012d46-f899-4021-aef5-8be2322e29c9@github.com> <7huBzF7ygKcr1ADKYTizGsyEBNb6dWYaU3g9_StUGB4=.89495de4-e6b5-47d6-9756-41471d366211@github.com> Message-ID: On Thu, 5 Sep 2024 14:31:39 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectornode.hpp line 634: >> >>> 632: virtual int Opcode() const; >>> 633: }; >>> 634: >> >> This could also be a separate PR. Or are they somehow inseparable from the "saturation" changes? > > What is not applicable? Do you actually need this node for the saturating operations? It was in context of scalar IRs, as mentioned we plan to support unsigned scalar operation and its idealizations in follow up patch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1746573566 From dholmes at openjdk.org Fri Sep 6 06:39:50 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 6 Sep 2024 06:39:50 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 18:56:19 GMT, Matias Saavedra Silva wrote: > This patch cleans up the use of `get_new_method()` so callers don't have to worry about throwing `NoSuchMethodError`. The method is refactored to throw the error and avoid ever returning nullptr. Verified with tier1-5 tests. Sorry I have to retract my review. It is not at all obvious that all the other uses of `get_new_method()` remain correct after this change. ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20874#pullrequestreview-2285027355 From jbhateja at openjdk.org Fri Sep 6 06:43:31 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 6 Sep 2024 06:43:31 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v8] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/7164783e..195390fe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=06-07 Stats: 24 lines in 2 files changed: 0 ins; 1 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From jbhateja at openjdk.org Fri Sep 6 06:43:32 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 6 Sep 2024 06:43:32 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v7] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 14:33:56 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Some cleanups. > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathUtils.java line 78: > >> 76: * @since 24 >> 77: */ >> 78: public static long addSaturating(long a, long b) { > > Are these public methods any Java dev could use? If so: do we have tests for them? Made them package private. These routines are exercised by newly added jtreg tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1746584232 From dholmes at openjdk.org Fri Sep 6 06:49:50 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 6 Sep 2024 06:49:50 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: <9iFlhZLWpgbA0Ha7_PQAkCTWiGxg7RLBxVMATGXFrAc=.b16dd263-c65d-426f-9e6d-2a71731fc08e@github.com> On Fri, 6 Sep 2024 06:22:16 GMT, Alan Bateman wrote: > It is possible to run -Djava.system.class.loader=.. to set your own system class loader, it gets created with the built-in application class loader as its parent. In that setup you start out with 4 class loaders. Yes that is what I was referring to. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20843#issuecomment-2333362465 From rcastanedalo at openjdk.org Fri Sep 6 08:49:35 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 6 Sep 2024 08:49:35 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v16] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'TheRealMDoerr/8334111_PPC64_G1_Barriers_V2' into JDK-8334060-g1-late-barrier-expansion - Cleanup g1_ppc.ad ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/9821e795..22e07ef0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=14-15 Stats: 40 lines in 1 file changed: 4 ins; 30 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Fri Sep 6 08:49:35 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 6 Sep 2024 08:49:35 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v15] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 10:05:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Remove unnecessary g1LoadXVolatile instructions in aarch64 > I've implemented the same cleanup as on aarch64: [TheRealMDoerr at ad662a2](https://github.com/TheRealMDoerr/jdk/commit/ad662a256034a09156b1b43673d2640a119740b2) Would be nice if you could apply it. Thanks! Sure, merged now (commit 22e07ef03a). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2333553391 From rcastanedalo at openjdk.org Fri Sep 6 09:43:57 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 6 Sep 2024 09:43:57 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v15] In-Reply-To: References: Message-ID: <01AjvN-E7dfMD3IbJHbjLpHwe3VbMuPPloi3vt7Bxxk=.bc1bb5e5-6cd9-414d-8c76-a224fe856dcd@github.com> On Thu, 5 Sep 2024 20:36:01 GMT, halkosajtarevic wrote: > Sorry, one maybe dumb question, hopefully matching the context here: Is this whole handling also required if those references point to instances of enums? Or could we do some further optimizations in such cases? Or is that exactly what C2 is doing afterwards? Hi, do you mean whether G1 requires barriers when writing enum instances into object fields, as in `storeEnum` in this example? (...) public enum Day {MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY}; static class MyObject { Day day; } public static void storeEnum(MyObject o, Day d) { o.day = d; } (...) MyObject o = new MyObject(); Day d = Day.TUESDAY; storeEnum(o, d); (...) If so, the answer is yes: C2 treats this case as any other object write and generates GC barriers accordingly. Do you have any specific optimization in mind? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2333674779 From duke at openjdk.org Fri Sep 6 10:14:59 2024 From: duke at openjdk.org (halkosajtarevic) Date: Fri, 6 Sep 2024 10:14:59 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v16] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 08:49:35 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'TheRealMDoerr/8334111_PPC64_G1_Barriers_V2' into JDK-8334060-g1-late-barrier-expansion > - Cleanup g1_ppc.ad Yes exactly, that was what I meant. I was just thinking whether it is necessary to do these barriers for enums at all. They will most likely always reside in a different region/generation anyways, and will (but that may be wrong) never be collected. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2333731725 From stuefe at openjdk.org Fri Sep 6 10:41:53 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 6 Sep 2024 10:41:53 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v2] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 12:28:53 GMT, David Holmes wrote: >> src/hotspot/os/windows/os_windows.cpp line 5327: >> >>> 5325: >>> 5326: char* result = nullptr; >>> 5327: ALLOW_C_FUNCTION(::_fullpath, result = ::_fullpath(outbuf, filename, outbuflen);) >> >> Would this work for non-Latin-1 utf-8? > > According to the docs: >> _fullpath automatically handles multibyte-character string arguments as appropriate, recognizing multibyte-character sequences according to the multibyte code page currently in use. Thanks,, David :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1746914038 From amitkumar at openjdk.org Fri Sep 6 10:43:54 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 6 Sep 2024 10:43:54 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v15] In-Reply-To: <01AjvN-E7dfMD3IbJHbjLpHwe3VbMuPPloi3vt7Bxxk=.bc1bb5e5-6cd9-414d-8c76-a224fe856dcd@github.com> References: <01AjvN-E7dfMD3IbJHbjLpHwe3VbMuPPloi3vt7Bxxk=.bc1bb5e5-6cd9-414d-8c76-a224fe856dcd@github.com> Message-ID: <5SBKgUwrPmIXH0hA64aKRsYZiHMg0M0uh_IjFq_xdAo=.f323ec69-adf3-4722-a5cb-0c49cfb8c5b1@github.com> On Fri, 6 Sep 2024 09:40:56 GMT, Roberto Casta?eda Lozano wrote: >> Sorry, one maybe dumb question, hopefully matching the context here: >> Is this whole handling also required if those references point to instances of enums? Or could we do some further optimizations in such cases? Or is that exactly what C2 is doing afterwards? > >> Sorry, one maybe dumb question, hopefully matching the context here: Is this whole handling also required if those references point to instances of enums? Or could we do some further optimizations in such cases? Or is that exactly what C2 is doing afterwards? > > Hi, do you mean whether G1 requires barriers when writing enum instances into object fields, as in `storeEnum` in this example? > > > (...) > > public enum Day {MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY}; > > static class MyObject { > Day day; > } > > public static void storeEnum(MyObject o, Day d) { > o.day = d; > } > > (...) > > MyObject o = new MyObject(); > Day d = Day.TUESDAY; > storeEnum(o, d); > > (...) > > > If so, the answer is yes: C2 treats this case as any other object write and generates GC barriers accordingly. Do you have any specific optimization in mind? Hi @robcasloz, you can pick up s390x patch from here: https://github.com/offamitkumar/jdk/commit/6663433c4aa17925f699eaa8995cdc0cd78c0034 ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2333779374 From sgehwolf at openjdk.org Fri Sep 6 10:51:54 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Fri, 6 Sep 2024 10:51:54 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v6] In-Reply-To: References: <-Ff0X6wkJWy78vOGT8F1m939z9Aoq8VjbUi_OTNoxko=.9447519f-8e98-4d9d-9c94-86cdbbbe3ae1@github.com> Message-ID: On Fri, 30 Aug 2024 11:05:24 GMT, Matthias Baesken wrote: >> Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - Add root check for SystemdMemoryAwarenessTest.java >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Add Whitebox check for host cpu >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Fix comments >> - 8333446: Add tests for hierarchical container support > > Looking through the coding it looks more or less okay to me; but if you really need to run it under user 'root' I think we will not have so much use for this in our test environments because we use other test users. > Not saying that this is a very bad thing, maybe it is just the way it is, that 'root' is needed ? @MBaesken Any more thoughts on this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2333790472 From stuefe at openjdk.org Fri Sep 6 11:15:00 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 6 Sep 2024 11:15:00 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v5] In-Reply-To: <0VQJugX9IulwqoN4WWxCixyhPRhfGs-48Vm5DB0s-VU=.334232df-93b0-453a-aba7-0cf26cecf8d1@github.com> References: <0VQJugX9IulwqoN4WWxCixyhPRhfGs-48Vm5DB0s-VU=.334232df-93b0-453a-aba7-0cf26cecf8d1@github.com> Message-ID: On Thu, 29 Aug 2024 12:08:37 GMT, Coleen Phillimore wrote: >> This change stores InstanceKlass for interface and abstract classes in the non-class metaspace, since class metaspace will have limits on number of classes that can be represented when Lilliput changes go in. Classes that have no instances created for them don't require compressed class pointers. The generated LambdaForm classes are also AllStatic, and changing them to abstract moves them to non-class metaspace too. It's not technically great to make them abstract and not final but you can't have both. Java classfile access flags have no way of specifying something like AllStatic. >> >> Tested with tier1-8. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Add function in Metaspace to tell you if Klass pointer is in compressible space. I still worry about unforeseen implications of not every Klass having an nKlass. It will have some implications on some of my Lilliput work. I wish we could have done this earlier. But in any case, this seems to be a reasonable way forward. One remark inline, otherwise this looks mostly good to me. src/hotspot/share/memory/metaspace.hpp line 165: > 163: return using_class_space() && (is_in_class_space(k) || is_in_shared_metaspace(k)); > 164: } > 165: I propose to drop this, and instead add a utility function to `CompressedKlassPointers` like this: // Returns true if p falls into the narrow Klass encoding range inline bool CompressedKlassPointers::is_in_encoding_range(const void* p) { return _base != nullptr && p >= _base && p < (_base + _range); } (Probably the `_base != nullptr` could even be left out, since `_range==0` and `_base==nullptr` for -UseCompressedClassPointers) And then use that function in `jfrTraceIdKlass.cpp`. That file needs to use `CompressedKlassPointers` anyway because it needs to encode the Klass*. This avoids having to rely on the exact composition of the memory regions inside the encoding range. What counts is whether the Klass pointer points into the narrow Klass encoding range. Essentially, with CDS, memory looks like this: encoding encoding base end |-----------------------------------------------------------------------| |----CDS---| |--------------------class space---------------------------| ------------- PR Review: https://git.openjdk.org/jdk/pull/19157#pullrequestreview-2285987065 PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1746943501 From mbaesken at openjdk.org Fri Sep 6 11:31:52 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 6 Sep 2024 11:31:52 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v9] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 17:46:00 GMT, Severin Gehwolf wrote: >> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. >> >> I'm adding those tests in order to not regress another time. >> >> Testing: >> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. >> - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) >> - [x] GHA > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: > > - Adapt JDK-8339148 > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Fix comment of WB::host_cpus() > - Handle non-root + CGv2 > - Add nested hierarchy to test framework > - Revert "Add root check for SystemdMemoryAwarenessTest.java" > > This reverts commit 7e8d9ed46815096ae8c4502f3320ebf5208438d5. > - Add root check for SystemdMemoryAwarenessTest.java > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - ... and 7 more: https://git.openjdk.org/jdk/compare/b773bfc6...30f32d22 Marked as reviewed by mbaesken (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19530#pullrequestreview-2286048546 From stuefe at openjdk.org Fri Sep 6 11:59:06 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 6 Sep 2024 11:59:06 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag [v2] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 21:17:28 GMT, Gerard Ziemski wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. >> >> There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) > > Gerard Ziemski has updated the pull request incrementally with 308 additional commits since the last revision: > > - undo MEMFLAGS to MemType > - 8339233: Test javax/swing/JButton/SwingButtonResizeTestWithOpenGL.java#id failed: Button renderings are different after window resize > > Reviewed-by: honkar > - 8338924: C1: assert(0 <= i && i < _len) failed: illegal index 5 for length 5 > > Co-authored-by: Dean Long > Reviewed-by: kvn, thartmann > - 8339492: StackMapDecoder::writeFrames makes lots of allocations > > Reviewed-by: liach, redestad, jwaters, asotona > - 8332901: Select{Current,New}ItemTest.java for Choice don't open popup on macOS > > Move SelectCurrentItemTest.java to java/awt/Choice/SelectItem/. > Move SelectNewItemTest.java to java/awt/Choice/SelectItem/. > Use latches to control test flow instead of delays. > Encapsulate the common logic in SelectCurrentItemTest. > Provide overridable checkXXX() methods to modify conditions. > Provide an overridable method which defines where to click > in the choice popup to select an item. > > Reviewed-by: honkar, prr, dnguyen > - 8339148: Make os::Linux::active_processor_count() public > > Reviewed-by: dholmes, jwaters > - 8339112: Move JVM Klass flags out of AccessFlags > > Reviewed-by: matsaave, cjplummer, dlong, thartmann, yzheng > - 8336860: x86: Change integer src operand for CMoveL of 0 and 1 to long > > Reviewed-by: epeter, chagedorn, shade, qamai, jbhateja > - 8325679: Optimize ArrayList subList sort > > Reviewed-by: liach > - 8339131: Remove rarely-used accessor methods from Opcode > > Reviewed-by: asotona > - ... and 298 more: https://git.openjdk.org/jdk/compare/9665d7f7...6d6d70e9 I am not excited about the final result, and I pity the poor backport maintainers, but it is also not a hill I'd die on. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2333894316 From rcastanedalo at openjdk.org Fri Sep 6 12:07:57 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 6 Sep 2024 12:07:57 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v16] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 10:12:19 GMT, halkosajtarevic wrote: > I was just thinking whether it is necessary to do these barriers for enums at all. They will most likely always reside in a different region/generation anyways, and will (but that may be wrong) never be collected. As far as I understand, from a GC perspective enum instances are handled just like any other class instance, so I do not think there is any special GC barrier optimization opportunity for them. Maybe some GC expert can comment further on this or correct me if I am wrong. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2333907222 From coleenp at openjdk.org Fri Sep 6 12:55:53 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 6 Sep 2024 12:55:53 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 18:56:19 GMT, Matias Saavedra Silva wrote: > This patch cleans up the use of `get_new_method()` so callers don't have to worry about throwing `NoSuchMethodError`. The method is refactored to throw the error and avoid ever returning nullptr. Verified with tier1-5 tests. We do the same thing with illegal_access_error() where the arguments may not match and there's a special case for this and no_such_method_error() in dependencies. Are the compilers confused by this too? if (target == nullptr || !target->is_public() || target->is_abstract() || target->is_overpass()) { assert(target == nullptr || !target->is_overpass() || target->is_public(), "Non-public overpass method!"); // Entry does not resolve. Leave it empty for AbstractMethodError or other error. if (!(target == nullptr) && !target->is_public()) { // Stuff an IllegalAccessError throwing method in there instead. itableOffsetEntry::method_entry(_klass, method_table_offset)[m->itable_index()]. initialize(_klass, Universe::throw_illegal_access_error()); } ------------- PR Comment: https://git.openjdk.org/jdk/pull/20874#issuecomment-2333984362 From coleenp at openjdk.org Fri Sep 6 12:55:55 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 6 Sep 2024 12:55:55 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 20:37:43 GMT, Dean Long wrote: > Also, doesn't this change mean that we can now return Unsafe.throwNoSuchMethodError() instead of the target method? > This probably works fine in the interpreter, but I'm worried this could break the compilers in subtle ways. So we should return NoSuchMethodError because there no is no longer a target method. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20874#issuecomment-2333986803 From sgehwolf at openjdk.org Fri Sep 6 12:56:53 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Fri, 6 Sep 2024 12:56:53 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v6] In-Reply-To: References: <-Ff0X6wkJWy78vOGT8F1m939z9Aoq8VjbUi_OTNoxko=.9447519f-8e98-4d9d-9c94-86cdbbbe3ae1@github.com> Message-ID: On Fri, 30 Aug 2024 11:05:24 GMT, Matthias Baesken wrote: >> Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - Add root check for SystemdMemoryAwarenessTest.java >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Add Whitebox check for host cpu >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Fix comments >> - 8333446: Add tests for hierarchical container support > > Looking through the coding it looks more or less okay to me; but if you really need to run it under user 'root' I think we will not have so much use for this in our test environments because we use other test users. > Not saying that this is a very bad thing, maybe it is just the way it is, that 'root' is needed ? Thank you for the review, @MBaesken! @zzambers OK for you as well? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2333989946 From coleenp at openjdk.org Fri Sep 6 13:14:49 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 6 Sep 2024 13:14:49 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling In-Reply-To: References: Message-ID: <6KiIhfuMzw4--X2kuJqjQs6s8OhA-dGtMIvDDULrOkw=.71492623-4bd2-44b2-8c4e-21b35980ef81@github.com> On Thu, 5 Sep 2024 18:56:19 GMT, Matias Saavedra Silva wrote: > This patch cleans up the use of `get_new_method()` so callers don't have to worry about throwing `NoSuchMethodError`. The method is refactored to throw the error and avoid ever returning nullptr. Verified with tier1-5 tests. What should this return for a deleted method? // ------------------------------------------------------------------ // ciMethod::equals // // Returns true if the methods are the same, taking redefined methods // into account. bool ciMethod::equals(const ciMethod* m) const { if (this == m) return true; VM_ENTRY_MARK; Method* m1 = this->get_Method(); Method* m2 = m->get_Method(); if (m1->is_old()) m1 = m1->get_new_method(); if (m2->is_old()) m2 = m2->get_new_method(); return m1 != Universe::no_such_method_error() && m1 == m2; // ??? } ------------- PR Comment: https://git.openjdk.org/jdk/pull/20874#issuecomment-2334024871 From ysuenaga at openjdk.org Fri Sep 6 13:40:12 2024 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Fri, 6 Sep 2024 13:40:12 GMT Subject: RFR: 8339307: jhsdb jstack could not trace FFM upcall frame Message-ID: This PR is successor of #20789 . I got some comments in there, then I needed to fix in many pooints. And also I have to fix branch name to kick GHA automatically (#20789 is named with `pr/`, it meets the condition to skip GHA). Hence I've opened another PR for this JBS issue. This PR has been updated with about topics since #20789: * Use `JavaFrameAnchor` instead of raw frame pointer to unwind frame of `UpcallStub`. * The change happens x86 (includes AMD64), aarch64, PPC64, RISC-V 64 only - s390 is out of scope because SA does not have s390 implementation. * Only both AMD64 and aarch64 have tested on GHA. * Refactor testcase to meet expected condition certainly. ------------- Commit messages: - 8339307: jhsdb jstack could not trace FFM upcall frame Changes: https://git.openjdk.org/jdk/pull/20885/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20885&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339307 Stats: 449 lines in 12 files changed: 439 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/20885.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20885/head:pull/20885 PR: https://git.openjdk.org/jdk/pull/20885 From ysuenaga at openjdk.org Fri Sep 6 13:41:58 2024 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Fri, 6 Sep 2024 13:41:58 GMT Subject: RFR: 8339307: jhsdb jstack could not trace FFM upcall frame [v4] In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 10:29:40 GMT, Yasumasa Suenaga wrote: >> I attempted to check stack trace in the core generated by [SEGV example in upcall](https://github.com/YaSuenag/garakuta/blob/841452d9176dab1ddbb552009c180530eb81190b/NativeSEGV/ffm/upcall/src/main/java/com/yasuenag/garakuta/nativesegv/upcall/Main.java) with `jhsdb jstack`, however it failed with following exception. >> >> >> Error occurred during stack walking: >> java.lang.RuntimeException: Couldn't deduce type of CodeBlob @0x00007fa04c265990 for PC=0x00007fa04c265aa6 >> at jdk.hotspot.agent/sun.jvm.hotspot.code.CodeCache.findBlobUnsafe(CodeCache.java:124) >> at jdk.hotspot.agent/sun.jvm.hotspot.code.CodeCache.findBlob(CodeCache.java:83) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.cb(Frame.java:119) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.adjustUnextendedSP(X86Frame.java:334) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.initFrame(X86Frame.java:137) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.(X86Frame.java:163) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.senderForInterpreterFrame(X86Frame.java:361) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.sender(X86Frame.java:281) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.sender(Frame.java:207) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.realSender(Frame.java:212) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.sender(VFrame.java:120) >> at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.javaSender(VFrame.java:144) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:81) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.run(JStack.java:67) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:278) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:241) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:134) >> at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.runWithArgs(JStack.java:90) >> at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runJSTACK(SALauncher.java:302) >> at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:500) >> Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for type of address 0x00007fa04c265990 (nearest symbol is _ZTV10Upcall... > > Yasumasa Suenaga has updated the pull request incrementally with three additional commits since the last revision: > > - Update test/hotspot/jtreg/serviceability/sa/LingeredAppWithFFMUpcall.java > > Co-authored-by: Andrey Turbanov > - Update test/hotspot/jtreg/serviceability/sa/LingeredAppWithFFMUpcall.java > > Co-authored-by: Andrey Turbanov > - Update test/hotspot/jtreg/serviceability/sa/LingeredAppWithFFMUpcall.java > > Co-authored-by: Andrey Turbanov I pushed new commit and sent new PR as #20885 because this branch cannot start GHA automatically. @JornVernee @plummercj I fixed your comment in new PR. Can you continue to review in #20885 ? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20789#issuecomment-2334081875 From ysuenaga at openjdk.org Fri Sep 6 13:41:58 2024 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Fri, 6 Sep 2024 13:41:58 GMT Subject: Withdrawn: 8339307: jhsdb jstack could not trace FFM upcall frame In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 09:14:11 GMT, Yasumasa Suenaga wrote: > I attempted to check stack trace in the core generated by [SEGV example in upcall](https://github.com/YaSuenag/garakuta/blob/841452d9176dab1ddbb552009c180530eb81190b/NativeSEGV/ffm/upcall/src/main/java/com/yasuenag/garakuta/nativesegv/upcall/Main.java) with `jhsdb jstack`, however it failed with following exception. > > > Error occurred during stack walking: > java.lang.RuntimeException: Couldn't deduce type of CodeBlob @0x00007fa04c265990 for PC=0x00007fa04c265aa6 > at jdk.hotspot.agent/sun.jvm.hotspot.code.CodeCache.findBlobUnsafe(CodeCache.java:124) > at jdk.hotspot.agent/sun.jvm.hotspot.code.CodeCache.findBlob(CodeCache.java:83) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.cb(Frame.java:119) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.adjustUnextendedSP(X86Frame.java:334) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.initFrame(X86Frame.java:137) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.(X86Frame.java:163) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.senderForInterpreterFrame(X86Frame.java:361) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.x86.X86Frame.sender(X86Frame.java:281) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.sender(Frame.java:207) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.Frame.realSender(Frame.java:212) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.sender(VFrame.java:120) > at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VFrame.javaSender(VFrame.java:144) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:81) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.run(JStack.java:67) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:278) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.start(Tool.java:241) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.Tool.execute(Tool.java:134) > at jdk.hotspot.agent/sun.jvm.hotspot.tools.JStack.runWithArgs(JStack.java:90) > at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.runJSTACK(SALauncher.java:302) > at jdk.hotspot.agent/sun.jvm.hotspot.SALauncher.main(SALauncher.java:500) > Caused by: sun.jvm.hotspot.types.WrongTypeException: No suitable match for type of address 0x00007fa04c265990 (nearest symbol is _ZTV10UpcallStub) > at jdk.hotspot.agent/sun.jvm.hotspot.run... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/20789 From adinn at openjdk.org Fri Sep 6 14:10:50 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Fri, 6 Sep 2024 14:10:50 GMT Subject: Integrated: 8339466: Enumerate shared stubs and define static fields and names via declarations In-Reply-To: References: Message-ID: On Tue, 3 Sep 2024 09:43:26 GMT, Andrew Dinn wrote: > Systematize handling of SharedRuntime stubs. Generate enum ids, static fields and names from declarations using template macros. Systematically reference stubs and stub names using ids. This pull request has now been integrated. Changeset: 0df10bbd Author: Andrew Dinn URL: https://git.openjdk.org/jdk/commit/0df10bbd96df46f23a7f57e5b9455fea41b2b15b Stats: 357 lines in 11 files changed: 218 ins; 28 del; 111 mod 8339466: Enumerate shared stubs and define static fields and names via declarations Reviewed-by: kvn, fyang ------------- PR: https://git.openjdk.org/jdk/pull/20832 From rcastanedalo at openjdk.org Fri Sep 6 14:15:41 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 6 Sep 2024 14:15:41 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v17] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: s390 port : late barrier expansion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/22e07ef0..6663433c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=15-16 Stats: 896 lines in 8 files changed: 837 ins; 32 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Fri Sep 6 14:15:42 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 6 Sep 2024 14:15:42 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v16] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 12:04:52 GMT, Roberto Casta?eda Lozano wrote: >> Yes exactly, that was what I meant. >> I was just thinking whether it is necessary to do these barriers for enums at all. They will most likely always reside in a different region/generation anyways, and will (but that may be wrong) never be collected. > >> I was just thinking whether it is necessary to do these barriers for enums at all. They will most likely always reside in a different region/generation anyways, and will (but that may be wrong) never be collected. > > As far as I understand, from a GC perspective enum instances are handled just like any other class instance, so I do not think there is any special GC barrier optimization opportunity for them. Maybe some GC expert can comment further on this or correct me if I am wrong. > Hi @robcasloz, you can pick up s390x patch from here: [offamitkumar at 6663433](https://github.com/offamitkumar/jdk/commit/6663433c4aa17925f699eaa8995cdc0cd78c0034) Done, thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2334130205 From coleenp at openjdk.org Fri Sep 6 14:48:14 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 6 Sep 2024 14:48:14 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v5] In-Reply-To: References: <0VQJugX9IulwqoN4WWxCixyhPRhfGs-48Vm5DB0s-VU=.334232df-93b0-453a-aba7-0cf26cecf8d1@github.com> Message-ID: On Fri, 6 Sep 2024 11:08:02 GMT, Thomas Stuefe wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Add function in Metaspace to tell you if Klass pointer is in compressible space. > > src/hotspot/share/memory/metaspace.hpp line 165: > >> 163: return using_class_space() && (is_in_class_space(k) || is_in_shared_metaspace(k)); >> 164: } >> 165: > > I propose to drop this, and instead add a utility function to `CompressedKlassPointers` like this: > > > // Returns true if p falls into the narrow Klass encoding range > inline bool CompressedKlassPointers::is_in_encoding_range(const void* p) { > return _base != nullptr && p >= _base && p < (_base + _range); > } > > (Probably the `_base != nullptr` could even be left out, since `_range==0` and `_base==nullptr` for -UseCompressedClassPointers) > > And then use that function in `jfrTraceIdKlass.cpp`. That file needs to use `CompressedKlassPointers` anyway because it needs to encode the Klass*. > > This avoids having to rely on the exact composition of the memory regions inside the encoding range. What counts is whether the Klass pointer points into the narrow Klass encoding range. > > Essentially, with CDS, memory looks like this: > > > encoding encoding > base end > |-----------------------------------------------------------------------| > |----CDS---| |--------------------class space---------------------------| oh yes that's much better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19157#discussion_r1747241623 From pchilanomate at openjdk.org Fri Sep 6 15:23:40 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 6 Sep 2024 15:23:40 GMT Subject: RFR: 8335362: [Windows] Stack pointer increment in _cont_thaw stub can cause program to terminate with exit code 0xc0000005 [v2] In-Reply-To: References: Message-ID: > Please review the following fix. In stub routine cont_thaw() we bump the stack pointer by the maximum size required to copy the frames currently stored in the top stackChunk. On Windows this increment of the stack pointer doesn't play nice with the way Windows sets up and manages stack pages. When a thread is created the stack is divided in 3 memory regions: regular committed pages, guard pages, reserved pages. The first pages are committed and the thread can read/write to them with no issues. The next pages(~2/3) are guard pages, which are committed but have the PAGE_GUARD attribute. When the thread tries to access a guard page the first time, the PAGE_GUARD attribute is removed and a new guard page from the reserved region is added. The rest of the stack are reserved pages and if we try to access it directly we get an EXCEPTION_ACCESS_VIOLATION (see bug for more details). So the problem is that we can bump the stack pointer too much and set it to point somewhere in the reserved r egion. When we then execute the call instruction for method thaw(), we get an EXCEPTION_ACCESS_VIOLATION exception, but because we cannot access the memory at the current stack pointer, we cannot call any method anymore, including the exception handler and the program terminates abruptly with exit code 0xc0000005. > > The fix implemented is to bang the stack pages one by one to let the Windows page protection take over. This is what we already do in os::map_stack_shadow_pages() in JavaCalls::call_helper(), and also in interpreter (bang_stack_shadow_pages()) and compiler (generate_stack_overflow_check()) code. It's actually also the same mechanism that Windows routine _chkstk used by the compiler employs (see bug comments with assembly code). > > I added new test BigStackChunk.java that reproduces the issue. The test fails without this fix and passes with it. I also tested the patch by running in mach5 tiers1-7. > > Thanks, > Patricio Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: David's comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20862/files - new: https://git.openjdk.org/jdk/pull/20862/files/1eda89a6..a5326f1f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20862&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20862&range=00-01 Stats: 5 lines in 3 files changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20862.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20862/head:pull/20862 PR: https://git.openjdk.org/jdk/pull/20862 From pchilanomate at openjdk.org Fri Sep 6 15:27:41 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 6 Sep 2024 15:27:41 GMT Subject: RFR: 8335362: [Windows] Stack pointer increment in _cont_thaw stub can cause program to terminate with exit code 0xc0000005 [v3] In-Reply-To: References: Message-ID: > Please review the following fix. In stub routine cont_thaw() we bump the stack pointer by the maximum size required to copy the frames currently stored in the top stackChunk. On Windows this increment of the stack pointer doesn't play nice with the way Windows sets up and manages stack pages. When a thread is created the stack is divided in 3 memory regions: regular committed pages, guard pages, reserved pages. The first pages are committed and the thread can read/write to them with no issues. The next pages(~2/3) are guard pages, which are committed but have the PAGE_GUARD attribute. When the thread tries to access a guard page the first time, the PAGE_GUARD attribute is removed and a new guard page from the reserved region is added. The rest of the stack are reserved pages and if we try to access it directly we get an EXCEPTION_ACCESS_VIOLATION (see bug for more details). So the problem is that we can bump the stack pointer too much and set it to point somewhere in the reserved r egion. When we then execute the call instruction for method thaw(), we get an EXCEPTION_ACCESS_VIOLATION exception, but because we cannot access the memory at the current stack pointer, we cannot call any method anymore, including the exception handler and the program terminates abruptly with exit code 0xc0000005. > > The fix implemented is to bang the stack pages one by one to let the Windows page protection take over. This is what we already do in os::map_stack_shadow_pages() in JavaCalls::call_helper(), and also in interpreter (bang_stack_shadow_pages()) and compiler (generate_stack_overflow_check()) code. It's actually also the same mechanism that Windows routine _chkstk used by the compiler employs (see bug comments with assembly code). > > I added new test BigStackChunk.java that reproduces the issue. The test fails without this fix and passes with it. I also tested the patch by running in mach5 tiers1-7. > > Thanks, > Patricio Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: fix update in map_stack_shadow_pages ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20862/files - new: https://git.openjdk.org/jdk/pull/20862/files/a5326f1f..00d5e9c5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20862&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20862&range=01-02 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20862.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20862/head:pull/20862 PR: https://git.openjdk.org/jdk/pull/20862 From jvernee at openjdk.org Fri Sep 6 15:32:05 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Fri, 6 Sep 2024 15:32:05 GMT Subject: RFR: 8339307: jhsdb jstack could not trace FFM upcall frame In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 09:31:45 GMT, Yasumasa Suenaga wrote: > This PR is successor of #20789 . I got some comments in there, then I needed to fix in many pooints. And also I have to fix branch name to kick GHA automatically (#20789 is named with `pr/`, it meets the condition to skip GHA). Hence I've opened another PR for this JBS issue. > > This PR has been updated with about topics since #20789: > * Use `JavaFrameAnchor` instead of raw frame pointer to unwind frame of `UpcallStub`. > * The change happens x86 (includes AMD64), aarch64, PPC64, RISC-V 64 only - s390 is out of scope because SA does not have s390 implementation. > * Only both AMD64 and aarch64 have tested on GHA. > * Refactor testcase to meet expected condition certainly. Looks great! Thanks for implementing the JFA-based stack walking. Please wait for another review from someone more familiar with SA as well, before integrating. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/ppc64/PPC64Frame.java line 326: > 324: > 325: var lastJavaFP = stub.getLastJavaFP(this); // This will be null > 326: var lastJavaSP = stub.getLastJavaSP(this); var lastJavaPC = stub.getLastJavaPC(this); Suggestion: var lastJavaSP = stub.getLastJavaSP(this); var lastJavaPC = stub.getLastJavaPC(this); test/hotspot/jtreg/serviceability/sa/libupcall.c line 2: > 1: /* > 2: * Copyright (c) 2014, Oracle and/or its affiliates. All rights reserved. Suggestion: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. ------------- Marked as reviewed by jvernee (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20885#pullrequestreview-2286531642 PR Review Comment: https://git.openjdk.org/jdk/pull/20885#discussion_r1747290780 PR Review Comment: https://git.openjdk.org/jdk/pull/20885#discussion_r1747295233 From liach at openjdk.org Fri Sep 6 16:11:15 2024 From: liach at openjdk.org (Chen Liang) Date: Fri, 6 Sep 2024 16:11:15 GMT Subject: RFR: 8336275: Move common Method and Constructor fields to Executable [v3] In-Reply-To: References: Message-ID: <9MFjLN9RDtYLkL4F-2JLeFKPGwJsjk0LFNYk_c9HMk4=.ec284df0-706e-4f8e-9432-b640cf9f771a@github.com> On Wed, 21 Aug 2024 15:42:18 GMT, Chen Liang wrote: >> Move fields common to Method and Field to executable, which simplifies implementation. Removed useless transient modifiers as Method and Field were never serializable. >> >> Note to core-libs reviewers: Please review the associated CSR on trivial removal of `abstract` modifier as well. > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Fix after merge > - Merge branch 'master' of https://github.com/openjdk/jdk into feature/executable-inline > - Merge branch 'master' of https://github.com/openjdk/jdk into feature/executable-inline > - Redundant transient; Update the comments to be more accurate > - Inline some common ctor + method fields to executable The new model here may no longer be valid now that deconstructors and patterns are being added. May revisit only if we are sure of the new model of deconstructors. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20188#issuecomment-2334385336 From liach at openjdk.org Fri Sep 6 16:11:16 2024 From: liach at openjdk.org (Chen Liang) Date: Fri, 6 Sep 2024 16:11:16 GMT Subject: Withdrawn: 8336275: Move common Method and Constructor fields to Executable In-Reply-To: References: Message-ID: On Tue, 16 Jul 2024 03:45:36 GMT, Chen Liang wrote: > Move fields common to Method and Field to executable, which simplifies implementation. Removed useless transient modifiers as Method and Field were never serializable. > > Note to core-libs reviewers: Please review the associated CSR on trivial removal of `abstract` modifier as well. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/20188 From coleenp at openjdk.org Fri Sep 6 16:20:52 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 6 Sep 2024 16:20:52 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v6] In-Reply-To: References: Message-ID: <6SbHbHK4n6vHaDLeC-X1oFBcoGE1osgeSXV7gq36xP8=.6f7e9fc4-ff7d-412f-9e14-5650dfa6f5d9@github.com> > This change stores InstanceKlass for interface and abstract classes in the non-class metaspace, since class metaspace will have limits on number of classes that can be represented when Lilliput changes go in. Classes that have no instances created for them don't require compressed class pointers. The generated LambdaForm classes are also AllStatic, and changing them to abstract moves them to non-class metaspace too. It's not technically great to make them abstract and not final but you can't have both. Java classfile access flags have no way of specifying something like AllStatic. > > Tested with tier1-8. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Replace Metaspace::is_compressed_klass_ptr with CompressedKlassPointers::is_in_encoding_range. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19157/files - new: https://git.openjdk.org/jdk/pull/19157/files/ce96165e..efa14e0c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19157&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19157&range=04-05 Stats: 19 lines in 3 files changed: 12 ins; 5 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/19157.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19157/head:pull/19157 PR: https://git.openjdk.org/jdk/pull/19157 From pchilanomate at openjdk.org Fri Sep 6 16:29:06 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 6 Sep 2024 16:29:06 GMT Subject: RFR: 8335362: [Windows] Stack pointer increment in _cont_thaw stub can cause program to terminate with exit code 0xc0000005 [v3] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 05:51:03 GMT, David Holmes wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> fix update in map_stack_shadow_pages > > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 289: > >> 287: address last_touched_page = watermark - StackOverflow::stack_shadow_zone_size(); >> 288: size_t pages_to_touch = align_up(watermark - new_sp, page_size) / page_size; >> 289: while (pages_to_touch--) { > > Suggestion: > > while (pages_to_touch-- > 0) { Fixed. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 293: > >> 291: *last_touched_page = 0; >> 292: } >> 293: thread->stack_overflow_state()->set_shadow_zone_growth_watermark(new_sp); > > I'm not familiar with the details of this stack management code and am unclear about the role of the `shadow_zone_growth_watermark` here. The banging code in `os::map_stack_shadow_pages` doesn't access it. The shadow zone growth watermark is just an optimization to avoid banging pages that were already touched. It is set to the highest sp (stack growing up) where we banged already (there is a diagram and more explanations in stackOverflow.hpp). So we don't strictly need it but we would incur in unnecessary overhead without it when the size of the frames freezed in the top stackChunk are a couple of pages in size. By checking this reference first we guarantee that almost all the time we won't have to do anything. I added the update of the watermark in os::map_stack_shadow_pages(). > test/jdk/java/lang/Thread/virtual/BigStackChunk.java line 47: > >> 45: int i6 = i5 + 1; >> 46: int i7 = i6 + 1; >> 47: long ll = 2*(long)i1; > > Suggestion: > > long ll = 2 * (long)i1; Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20862#discussion_r1747401722 PR Review Comment: https://git.openjdk.org/jdk/pull/20862#discussion_r1747401934 PR Review Comment: https://git.openjdk.org/jdk/pull/20862#discussion_r1747401862 From psandoz at openjdk.org Fri Sep 6 18:02:12 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 6 Sep 2024 18:02:12 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v7] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 06:40:18 GMT, Jatin Bhateja wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathUtils.java line 78: >> >>> 76: * @since 24 >>> 77: */ >>> 78: public static long addSaturating(long a, long b) { >> >> Are these public methods any Java dev could use? If so: do we have tests for them? > > Made them package private. These routines are exercised by newly added jtreg tests. These methods need to be public, as the need to be used in any tail computation. Recommend naming as `VectorMath` aligning with the naming of `Math` and `StrictMath`. * The class {@code VectorMath} contains methods for performing * scalar numeric operations in support of vector numeric operations. For each method we can reference the associated vector operator. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1747520954 From vlivanov at openjdk.org Fri Sep 6 18:10:18 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 6 Sep 2024 18:10:18 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v22] In-Reply-To: References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> Message-ID: On Fri, 30 Aug 2024 16:37:05 GMT, Andrew Haley wrote: >> This patch expands the use of a hash table for secondary superclasses >> to the interpreter, C1, and runtime. It also adds a C2 implementation >> of hashed lookup in cases where the superclass isn't known at compile >> time. >> >> HotSpot shared runtime >> ---------------------- >> >> Building hashed secondary tables is now unconditional. It takes very >> little time, and now that the shared runtime always has the tables, it >> might as well take advantage of them. The shared code is easier to >> follow now, I think. >> >> There might be a performance issue with x86-64 in that we build >> HotSpot for a default x86-64 target that does not support popcount. >> This means that HotSpot C++ runtime on x86 always uses a software >> emulation for popcount, even though the vast majority of machines made >> for the past 20 years can do popcount in a single instruction. It >> wouldn't be terribly hard to do something about that. >> >> Having said that, the software popcount is really not bad. >> >> x86 >> --- >> >> x86 is rather tricky, because we still support >> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as >> well as 32- and 64-bit ports. There's some further complication in >> that only `RCX` can be used as a shift count, so there's some register >> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp >> rather gnarly, with multiple levels of conditionals at compile time >> and runtime. >> >> AArch64 >> ------- >> >> AArch64 is considerably more straightforward. We always have a >> popcount instruction and (thankfully) no 32-bit code to worry about. >> >> Generally >> --------- >> >> I would dearly love simply to rip out the "old" secondary supers cache >> support, but I've left it in just in case someone has a performance >> regression. >> >> The versions of `MacroAssembler::lookup_secondary_supers_table` that >> work with variable superclasses don't take a fixed set of temp >> registers, and neither do they call out to to a slow path subroutine. >> Instead, the slow patch is expanded inline. >> >> I don't think this is necessarily bad. Apart from the very rare cases >> where C2 can't determine the superclass to search for at compile time, >> this code is only used for generating stubs, and it seemed to me >> ridiculous to have stubs calling other stubs. >> >> I've followed the guidance from @iwanowww not to obsess too much about >> the performance of C1-compiled secondary supers lookups, and to prefer >> simplicity over absolute performance. Nonetheless, this i... > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Fix s390 Looks good. Testing results (hs-tier1 - hs-tier6) are clean. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19989#pullrequestreview-2286894886 From gziemski at openjdk.org Fri Sep 6 18:12:47 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 6 Sep 2024 18:12:47 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemFlag [v2] In-Reply-To: References: Message-ID: <4ANlXGlTJl_PFzySR2lf0kQdvoawEpFJEEp7T5B5ZY8=.6b69de16-6403-474b-9376-1bd4a8a50d14@github.com> On Wed, 4 Sep 2024 21:17:28 GMT, Gerard Ziemski wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemType`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemType` is much more suitable name. >> >> There is a bunch of other related cleanup that we can do, but I will leave for follow up issues such as [NMT: rename NMTUtil::flag to NMTUtil::type](https://bugs.openjdk.org/browse/JDK-8337836) > > Gerard Ziemski has updated the pull request incrementally with 308 additional commits since the last revision: > > - undo MEMFLAGS to MemType > - 8339233: Test javax/swing/JButton/SwingButtonResizeTestWithOpenGL.java#id failed: Button renderings are different after window resize > > Reviewed-by: honkar > - 8338924: C1: assert(0 <= i && i < _len) failed: illegal index 5 for length 5 > > Co-authored-by: Dean Long > Reviewed-by: kvn, thartmann > - 8339492: StackMapDecoder::writeFrames makes lots of allocations > > Reviewed-by: liach, redestad, jwaters, asotona > - 8332901: Select{Current,New}ItemTest.java for Choice don't open popup on macOS > > Move SelectCurrentItemTest.java to java/awt/Choice/SelectItem/. > Move SelectNewItemTest.java to java/awt/Choice/SelectItem/. > Use latches to control test flow instead of delays. > Encapsulate the common logic in SelectCurrentItemTest. > Provide overridable checkXXX() methods to modify conditions. > Provide an overridable method which defines where to click > in the choice popup to select an item. > > Reviewed-by: honkar, prr, dnguyen > - 8339148: Make os::Linux::active_processor_count() public > > Reviewed-by: dholmes, jwaters > - 8339112: Move JVM Klass flags out of AccessFlags > > Reviewed-by: matsaave, cjplummer, dlong, thartmann, yzheng > - 8336860: x86: Change integer src operand for CMoveL of 0 and 1 to long > > Reviewed-by: epeter, chagedorn, shade, qamai, jbhateja > - 8325679: Optimize ArrayList subList sort > > Reviewed-by: liach > - 8339131: Remove rarely-used accessor methods from Opcode > > Reviewed-by: asotona > - ... and 298 more: https://git.openjdk.org/jdk/compare/9665d7f7...6d6d70e9 I'm not as excited about it as I was when started working on it. Will get https://github.com/openjdk/jdk/pull/20872 ready, then will give it final thought. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20497#issuecomment-2334578610 From jbhateja at openjdk.org Fri Sep 6 18:13:34 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 6 Sep 2024 18:13:34 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v8] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review resolutions. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/8d71f175..d3ee3104 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=06-07 Stats: 115 lines in 18 files changed: 12 ins; 15 del; 88 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From jbhateja at openjdk.org Fri Sep 6 18:13:35 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 6 Sep 2024 18:13:35 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 30 Aug 2024 14:40:35 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding descriptive comments > > src/hotspot/share/opto/vectornode.cpp line 2159: > >> 2157: >> 2158: vmask_type = TypeVect::makemask(elem_bt, num_elem); >> 2159: mask = phase->transform(new VectorMaskCastNode(mask, vmask_type)); > > I would just have two variables, and not overwrite it: `integral_vmask_type` and `vmask_type`. Maybe also `mask` could be split into two variables? I think the variable names are appropriate and in accordance with convention. > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Vector.java line 2770: > >> 2768: >> 2769: /** >> 2770: * Rearranges the lane elements of two vectors, selecting lanes > > I have a bit of a name concern here. Why are we calling it "select" and not "rearrange"? Because for a single "from" vector we also call it "rearrange", right? Is "select" not often synonymous to "blend", which works also with two "from" vectors, but with a mask and not indexing for "selection/rearranging"? We already have another flavor of [selectFrom](https://docs.oracle.com/en/java/javase/22/docs/api/jdk.incubator.vector/jdk/incubator/vector/Vector.html#selectFrom(jdk.incubator.vector.Vector)) which permutes single vector, new API extents its semantics to two vector selection, so we kept the nomenclature consistent. > test/jdk/jdk/incubator/vector/Byte128VectorTests.java line 324: > >> 322: boolean is_exceptional_idx = (int)order[idx] >= vector_len; >> 323: int oidx = is_exceptional_idx ? ((int)order[idx] - vector_len) : (int)order[idx]; >> 324: Assert.assertEquals(r[idx], (is_exceptional_idx ? b[i + oidx] : a[i + oidx])); > > I thought general Java style is camelCase? Is that not followed in the VectorAPI code? I agree, but somehow we are using non camelCase conventions in this file, look for uses of 'vector_len'. just preserving file level convention. > test/jdk/jdk/incubator/vector/ShortMaxVectorTests.java line 1048: > >> 1046: return SHORT_GENERATOR_SELECT_FROM_TRIPLES.stream().map(List::toArray). >> 1047: toArray(Object[][]::new); >> 1048: } > > Just a control question: does this also occasionally generate examples with out-of-bounds indices? Negative out of bounds and positive out of bounds? Original API did throw IndexOutOfBoundsException, but later on we have moved away from exception throwing semantics to wrapping semantics. Please find details at following comment https://github.com/openjdk/jdk/pull/20508#issuecomment-2306344606 > test/jdk/jdk/incubator/vector/ShortMaxVectorTests.java line 5812: > >> 5810: ShortVector bv = ShortVector.fromArray(SPECIES, b, i); >> 5811: ShortVector idxv = ShortVector.fromArray(SPECIES, idx, i); >> 5812: idxv.selectFrom(av, bv).intoArray(r, i); > > Would this test catch a bug where the backend would generate vectors that are too long or too short? Existing vectorAPI inline expansion entry points explicitly pass lane type and count as intrinsic arguments, this is used to create concrete ideal vector types. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1747532692 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1747532456 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1747532419 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1747532340 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1747532307 From jbhateja at openjdk.org Fri Sep 6 18:13:35 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 6 Sep 2024 18:13:35 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 30 Aug 2024 14:57:31 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectornode.cpp line 2183: >> >>> 2181: }; >>> 2182: // Targets emulating unsupported permutation for certain vector types >>> 2183: // may need to message the indexes to match the users intent. >> >> Suggestion: >> >> // may need to massage the indexes to match the users intent. > > This optimization for now seems quite specific to your `SelectFromTwoVectorNode::Ideal` lowering code. Can this conversion not be done there already? > > What is the semantics of `VectorRearrangeNode`? Should its shuffle vector always be bytes, and we now violated that "for a quick second"? Or is it going to be generally the idea to create all sorts of shuffle types and then fix that up? But then why do we need the `vector_indexes_needs_massaging`? > > Can you help me understand the concept/strategy behind this? Ok, IIRC variable index permutation instruction on every target expects shape conformance b/w data vector and permute index vector. Rearrange expects indices to be passed throug shuffle, idealization routines automatically injects a VectorLoadShuffle after loading indexes held in shuffle's backing storage i.e. a byte array. In all the cases apart from byte vector permute , VectorLoadShuffle expands the index byte lanes to match the data vector lane. So we always end up emitting a lane expansion instruction before permute instruction (scenario 1). Apart from usual expansions VectorLoadShuffle may also do additional magic for some targets where it may need to prune / massage the index vector if target does not support destination vector type (scenario 2). For our case, new selectFrom accepts the indices though vectors which save redundant expansions, but to leverage existing backend support for scenario 2 we do target specific pruning ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1747532612 From sviswanathan at openjdk.org Fri Sep 6 18:43:09 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 6 Sep 2024 18:43:09 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v8] In-Reply-To: References: Message-ID: <_DbK4ZSVvMwabc8jXhGrqJD-ox6o9Bvo9or64AKUQ4E=.8bd87542-9eed-456d-8d87-a065da637918@github.com> On Fri, 6 Sep 2024 06:43:31 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review suggestions I have only one comment, rest of the changes look good to me. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathUtils.java line 33: > 31: * > 32: */ > 33: public class VectorMathUtils { Could the class also be not public as it has only package private methods now? ------------- PR Review: https://git.openjdk.org/jdk/pull/20507#pullrequestreview-2286943136 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1747565468 From sviswanathan at openjdk.org Fri Sep 6 18:47:12 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 6 Sep 2024 18:47:12 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v8] In-Reply-To: <_DbK4ZSVvMwabc8jXhGrqJD-ox6o9Bvo9or64AKUQ4E=.8bd87542-9eed-456d-8d87-a065da637918@github.com> References: <_DbK4ZSVvMwabc8jXhGrqJD-ox6o9Bvo9or64AKUQ4E=.8bd87542-9eed-456d-8d87-a065da637918@github.com> Message-ID: On Fri, 6 Sep 2024 18:39:08 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review suggestions > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMathUtils.java line 33: > >> 31: * >> 32: */ >> 33: public class VectorMathUtils { > > Could the class also be not public as it has only package private methods now? Please ignore this comment as Paul suggests that the methods in this file should to be public. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1747571110 From matsaave at openjdk.org Fri Sep 6 19:51:18 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 6 Sep 2024 19:51:18 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v2] In-Reply-To: References: Message-ID: > This patch cleans up the use of `get_new_method()` so callers don't have to worry about throwing `NoSuchMethodError`. The method is refactored to throw the error and avoid ever returning nullptr. Verified with tier1-5 tests. Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Fixed conditional to check for is_deleted() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20874/files - new: https://git.openjdk.org/jdk/pull/20874/files/f3d4f23a..ba1cb1b8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20874&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20874&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20874.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20874/head:pull/20874 PR: https://git.openjdk.org/jdk/pull/20874 From kbarrett at openjdk.org Fri Sep 6 20:26:09 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 6 Sep 2024 20:26:09 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v16] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 12:04:52 GMT, Roberto Casta?eda Lozano wrote: > > I was just thinking whether it is necessary to do these barriers for enums at all. They will most likely always reside in a different region/generation anyways, and will (but that may be wrong) never be collected. > > As far as I understand, from a GC perspective enum instances are handled just like any other class instance, so I do not think there is any special GC barrier optimization opportunity for them. Maybe some GC expert can comment further on this or correct me if I am wrong. @robcasloz is correct, the GCs don't have any special knowledge about enum instances. They are ordinary objects, though probably long-lived so will eventually migrate to the old generation. Trying to do anything special with them seems very unlikely to provide a benefit worth the costs involved. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2334754544 From duke at openjdk.org Fri Sep 6 20:26:10 2024 From: duke at openjdk.org (halkosajtarevic) Date: Fri, 6 Sep 2024 20:26:10 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v16] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 20:21:11 GMT, Kim Barrett wrote: > > > I was just thinking whether it is necessary to do these barriers for enums at all. They will most likely always reside in a different region/generation anyways, and will (but that may be wrong) never be collected. > > > > > > As far as I understand, from a GC perspective enum instances are handled just like any other class instance, so I do not think there is any special GC barrier optimization opportunity for them. Maybe some GC expert can comment further on this or correct me if I am wrong. > > @robcasloz is correct, the GCs don't have any special knowledge about enum instances. They are ordinary objects, though probably long-lived so will eventually migrate to the old generation. Trying to do anything special with them seems very unlikely to provide a benefit worth the costs involved. Thank you very much for the insights! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2334756865 From gziemski at openjdk.org Fri Sep 6 20:35:29 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 6 Sep 2024 20:35:29 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag Message-ID: Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. This fix also includes a cleanup of all the related parameter names and local variable names. Testing is pending... Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) ------------- Commit messages: - missed renames on Windows - fix test failure - fix incorrect rename - mssing renames - missed _mem_type --> _mem_tags - missed MemFlagBitmap --> MemTagBitmap - missed tag --> mem_tag - missed type --> tag - missed type --> tag - missed flag_is_valid --> tag_is_valid - ... and 1 more: https://git.openjdk.org/jdk/compare/4ffcf894...983bb6e2 Changes: https://git.openjdk.org/jdk/pull/20872/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20872&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337563 Stats: 1259 lines in 108 files changed: 142 ins; 138 del; 979 mod Patch: https://git.openjdk.org/jdk/pull/20872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20872/head:pull/20872 PR: https://git.openjdk.org/jdk/pull/20872 From gziemski at openjdk.org Fri Sep 6 21:01:04 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Fri, 6 Sep 2024 21:01:04 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 16:10:05 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. > > This fix also includes a cleanup of all the related parameter names and local variable names. > > Testing is pending... > > Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) @dholmes-ora @tstuefe @coleenp @stefank @kimbarrett @jdksjolen @afshin-zafari MEMFLAGS rename task has been moved here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20872#issuecomment-2334800974 From coleenp at openjdk.org Fri Sep 6 21:04:06 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 6 Sep 2024 21:04:06 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v2] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 19:51:18 GMT, Matias Saavedra Silva wrote: >> This patch cleans up the use of `get_new_method()` so callers don't have to worry about throwing `NoSuchMethodError`. The method is refactored to throw the error and avoid ever returning nullptr. Verified with tier1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Fixed conditional to check for is_deleted() This looks good. It's good to isolate the deleted method code. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20874#pullrequestreview-2287153463 From sviswanathan at openjdk.org Fri Sep 6 21:45:15 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 6 Sep 2024 21:45:15 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v3] In-Reply-To: <2nRoXBr_v8DjlG4wJlWF9OhYMmgpTUDX6VAQnvO3DCY=.596e5e39-c5ba-4d20-b5e0-aa301f7c9d76@github.com> References: <8CAXws7Rp6HKERu5hSTOrXi8GRFRdV4I670Nf8NSZlI=.ba6acccb-77e5-46a6-bec2-e0ea97dfe85d@github.com> <2nRoXBr_v8DjlG4wJlWF9OhYMmgpTUDX6VAQnvO3DCY=.596e5e39-c5ba-4d20-b5e0-aa301f7c9d76@github.com> Message-ID: On Wed, 4 Sep 2024 01:57:42 GMT, Jatin Bhateja wrote: >> @theRealAph, this implementation is based on Intel libm math library and meets the accuracy requirements. The algorithm is provided in the comments. > > @vamsi-parasa don't hesitate in adding as much and explicit information about the original source from where the algorithm has been picked up, even though the PR explicitly mentions libm. Adding the link to source references is a good practice. @jatin-bhateja This is based on Intel internal LIBM sources and so there is no public link available. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1747726562 From sviswanathan at openjdk.org Fri Sep 6 21:45:16 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 6 Sep 2024 21:45:16 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v3] In-Reply-To: References: <8CAXws7Rp6HKERu5hSTOrXi8GRFRdV4I670Nf8NSZlI=.ba6acccb-77e5-46a6-bec2-e0ea97dfe85d@github.com> <2nRoXBr_v8DjlG4wJlWF9OhYMmgpTUDX6VAQnvO3DCY=.596e5e39-c5ba-4d20-b5e0-aa301f7c9d76@github.com> Message-ID: On Fri, 6 Sep 2024 21:15:07 GMT, Sandhya Viswanathan wrote: >> @vamsi-parasa don't hesitate in adding as much and explicit information about the original source from where the algorithm has been picked up, even though the PR explicitly mentions libm. Adding the link to source references is a good practice. > > @jatin-bhateja This is based on Intel internal LIBM sources and so there is no public link available. > Do you have a copy of this information? Should it be in the commit? @theRealAph The accuracy of standard (non fast mode) LIBM functions ensures errors of < 1 ulp. LIBM is part of Intel C++ compiler. The documentation can be found here: https://www.intel.com/content/www/us/en/docs/cpp-compiler/developer-guide-reference/2021-8/programming-tradeoffs-floating-point-applications.html. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1747742612 From sviswanathan at openjdk.org Fri Sep 6 22:08:14 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 6 Sep 2024 22:08:14 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v8] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 06:43:31 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review suggestions Thank you for taking care of all my comments. /Reviewers 2 ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20507#pullrequestreview-2287215687 PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2334870617 From iklam at openjdk.org Fri Sep 6 23:57:04 2024 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 6 Sep 2024 23:57:04 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking In-Reply-To: <4-FpKqNuti9sYRjbrdRV17Ao2f21Cn8EhjF3f8npt0M=.dde056e6-8e6f-4a02-8204-e6f6cedae337@github.com> References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> <4-FpKqNuti9sYRjbrdRV17Ao2f21Cn8EhjF3f8npt0M=.dde056e6-8e6f-4a02-8204-e6f6cedae337@github.com> Message-ID: On Fri, 6 Sep 2024 05:07:09 GMT, David Holmes wrote: >> This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> **Overview** >> >> - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. >> - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. >> - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. >> - The boot classes are loaded as part of `vmClasses::resolve_all()` >> - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). >> - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. >> >> **All-or-nothing Loading** >> >> - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. >> - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: >> - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. >> - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. >> - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. >> - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exa... > > src/hotspot/share/cds/archiveBuilder.cpp line 904: > >> 902: log_info(cds)(" instance classes " STATS_FORMAT, STATS_PARAMS(instance_klasses)); >> 903: log_info(cds)(" boot " STATS_FORMAT, STATS_PARAMS(boot_klasses)); >> 904: log_info(cds)(" vm " STATS_FORMAT, STATS_PARAMS(vm_klasses)); > > Suggestion: > > log_info(cds)(" vm " STATS_FORMAT, STATS_PARAMS(vm_klasses)); The indentation is intentional: vm is a subset of boot classes, which is a subset of instance classes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1747823070 From iklam at openjdk.org Sat Sep 7 00:01:05 2024 From: iklam at openjdk.org (Ioi Lam) Date: Sat, 7 Sep 2024 00:01:05 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking In-Reply-To: <4-FpKqNuti9sYRjbrdRV17Ao2f21Cn8EhjF3f8npt0M=.dde056e6-8e6f-4a02-8204-e6f6cedae337@github.com> References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> <4-FpKqNuti9sYRjbrdRV17Ao2f21Cn8EhjF3f8npt0M=.dde056e6-8e6f-4a02-8204-e6f6cedae337@github.com> Message-ID: On Fri, 6 Sep 2024 05:10:42 GMT, David Holmes wrote: >> This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> **Overview** >> >> - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. >> - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. >> - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. >> - The boot classes are loaded as part of `vmClasses::resolve_all()` >> - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). >> - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. >> >> **All-or-nothing Loading** >> >> - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. >> - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: >> - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. >> - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. >> - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. >> - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exa... > > src/hotspot/share/cds/archiveUtils.cpp line 390: > >> 388: return "boot"; // boot classes in java.base >> 389: } else { >> 390: return "boot2"; // boot classes outside of java.base > > Suggestion: boot -> boot-base, boot2 -> boot-nonbase ? I prefer boot/boot2 to make the output easier to read. Anyone debugging this output will need to read the code to understand what "boot2" or "boot-nonbase" is. A few extra characters here will not help. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1747824950 From ysuenaga at openjdk.org Sat Sep 7 00:07:27 2024 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Sat, 7 Sep 2024 00:07:27 GMT Subject: RFR: 8339307: jhsdb jstack could not trace FFM upcall frame [v2] In-Reply-To: References: Message-ID: <8R0jfoVw4GGw1LLahZWR_VCKAnxkmvQBvcQ9jjg_6t4=.4adfb16d-3958-4043-845e-7ca467d980c5@github.com> > This PR is successor of #20789 . I got some comments in there, then I needed to fix in many pooints. And also I have to fix branch name to kick GHA automatically (#20789 is named with `pr/`, it meets the condition to skip GHA). Hence I've opened another PR for this JBS issue. > > This PR has been updated with about topics since #20789: > * Use `JavaFrameAnchor` instead of raw frame pointer to unwind frame of `UpcallStub`. > * The change happens x86 (includes AMD64), aarch64, PPC64, RISC-V 64 only - s390 is out of scope because SA does not have s390 implementation. > * Only both AMD64 and aarch64 have tested on GHA. > * Refactor testcase to meet expected condition certainly. Yasumasa Suenaga has updated the pull request incrementally with two additional commits since the last revision: - Update test/hotspot/jtreg/serviceability/sa/libupcall.c Co-authored-by: Jorn Vernee - Update src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/ppc64/PPC64Frame.java Co-authored-by: Jorn Vernee ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20885/files - new: https://git.openjdk.org/jdk/pull/20885/files/f22c6236..6cd05f8f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20885&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20885&range=00-01 Stats: 3 lines in 2 files changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20885.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20885/head:pull/20885 PR: https://git.openjdk.org/jdk/pull/20885 From iklam at openjdk.org Sat Sep 7 00:30:24 2024 From: iklam at openjdk.org (Ioi Lam) Date: Sat, 7 Sep 2024 00:30:24 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v2] In-Reply-To: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: > This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > **Overview** > > - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. > - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. > - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. > - The boot classes are loaded as part of `vmClasses::resolve_all()` > - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). > - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. > > **All-or-nothing Loading** > > - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. > - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: > - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. > - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. > - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. > - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exact same set of modules are visible whe... Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: @dholmes-ora comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20843/files - new: https://git.openjdk.org/jdk/pull/20843/files/a5e7eb51..ac1ed798 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20843&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20843&range=00-01 Stats: 61 lines in 7 files changed: 29 ins; 15 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/20843.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20843/head:pull/20843 PR: https://git.openjdk.org/jdk/pull/20843 From iklam at openjdk.org Sat Sep 7 00:34:05 2024 From: iklam at openjdk.org (Ioi Lam) Date: Sat, 7 Sep 2024 00:34:05 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v2] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: <6rIWXK2IjOxEPUvdbFchC8d191QHOC2RhrRdl3K7wxo=.8a5f1a8b-0fb6-4d3a-8a5e-c224f17408fc@github.com> On Fri, 6 Sep 2024 05:24:18 GMT, David Holmes wrote: > I've taken an initial look through but there is an awful lot to try and digest here. I've flagged numerous typos and minor nits. > > One general query: does this stuff work if the user defines their own initial application classloader? Hi David thanks for the review. I've pushed a new version that has most of your suggestions. I also added code to avoid loading the CDS archive if it has aot-linked classes, and the user has specified `-Djava.system.class.loader` > src/hotspot/share/cds/aotLinkedClassBulkLoader.cpp line 86: > >> 84: >> 85: load_table(AOTLinkedClassTable::for_static_archive(), loader_kind, h_loader, current); >> 86: assert(!current->has_pending_exception(), "VM should have exited due to ExceptionMark"); > > An `ExceptionMark` only triggers a VM exit on construction and destruction, if an exception is pending. I refactored the code a bit to handle any exception that occurs during bulk loading, and print our more useful error messages. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20843#issuecomment-2334966941 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1747837592 From cjplummer at openjdk.org Sat Sep 7 01:13:13 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Sat, 7 Sep 2024 01:13:13 GMT Subject: RFR: 8339307: jhsdb jstack could not trace FFM upcall frame [v2] In-Reply-To: <8R0jfoVw4GGw1LLahZWR_VCKAnxkmvQBvcQ9jjg_6t4=.4adfb16d-3958-4043-845e-7ca467d980c5@github.com> References: <8R0jfoVw4GGw1LLahZWR_VCKAnxkmvQBvcQ9jjg_6t4=.4adfb16d-3958-4043-845e-7ca467d980c5@github.com> Message-ID: <-BKb29h-VOB1Wb3knXT9adPqwYLaVW6zcUGTK7hdMvo=.3885eaaf-11d9-4f77-89c2-3289d114bac8@github.com> On Sat, 7 Sep 2024 00:07:27 GMT, Yasumasa Suenaga wrote: >> This PR is successor of #20789 . I got some comments in there, then I needed to fix in many pooints. And also I have to fix branch name to kick GHA automatically (#20789 is named with `pr/`, it meets the condition to skip GHA). Hence I've opened another PR for this JBS issue. >> >> This PR has been updated with about topics since #20789: >> * Use `JavaFrameAnchor` instead of raw frame pointer to unwind frame of `UpcallStub`. >> * The change happens x86 (includes AMD64), aarch64, PPC64, RISC-V 64 only - s390 is out of scope because SA does not have s390 implementation. >> * Only both AMD64 and aarch64 have tested on GHA. >> * Refactor testcase to meet expected condition certainly. > > Yasumasa Suenaga has updated the pull request incrementally with two additional commits since the last revision: > > - Update test/hotspot/jtreg/serviceability/sa/libupcall.c > > Co-authored-by: Jorn Vernee > - Update src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/ppc64/PPC64Frame.java > > Co-authored-by: Jorn Vernee Overall the SA code looks good, but since I don't understand how UpcallStub linkage works, it's hard for me to say for sure if it is correct for what you are trying to accomplish. However, the test coverage seems to sufficiently ensure correctness of the implementation. Thanks for adding the code to have the LingeredApp readiness wait until the upcall thread is ready. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/code/UpcallStub.java line 41: > 39: > 40: private static AddressField lastJavaSPField; > 41: I think it's better without the blank lines. ------------- Marked as reviewed by cjplummer (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20885#pullrequestreview-2287301997 PR Review Comment: https://git.openjdk.org/jdk/pull/20885#discussion_r1747848359 From ysuenaga at openjdk.org Sat Sep 7 01:48:39 2024 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Sat, 7 Sep 2024 01:48:39 GMT Subject: RFR: 8339307: jhsdb jstack could not trace FFM upcall frame [v3] In-Reply-To: References: Message-ID: > This PR is successor of #20789 . I got some comments in there, then I needed to fix in many pooints. And also I have to fix branch name to kick GHA automatically (#20789 is named with `pr/`, it meets the condition to skip GHA). Hence I've opened another PR for this JBS issue. > > This PR has been updated with about topics since #20789: > * Use `JavaFrameAnchor` instead of raw frame pointer to unwind frame of `UpcallStub`. > * The change happens x86 (includes AMD64), aarch64, PPC64, RISC-V 64 only - s390 is out of scope because SA does not have s390 implementation. > * Only both AMD64 and aarch64 have tested on GHA. > * Refactor testcase to meet expected condition certainly. Yasumasa Suenaga has updated the pull request incrementally with one additional commit since the last revision: Remove blank lines ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20885/files - new: https://git.openjdk.org/jdk/pull/20885/files/6cd05f8f..5ff32a44 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20885&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20885&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20885.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20885/head:pull/20885 PR: https://git.openjdk.org/jdk/pull/20885 From ysuenaga at openjdk.org Sat Sep 7 01:48:39 2024 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Sat, 7 Sep 2024 01:48:39 GMT Subject: RFR: 8339307: jhsdb jstack could not trace FFM upcall frame [v3] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 15:29:18 GMT, Jorn Vernee wrote: >> Yasumasa Suenaga has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove blank lines > > Looks great! Thanks for implementing the JFA-based stack walking. > > Please wait for another review from someone more familiar with SA as well, before integrating. @JornVernee @plummercj Thanks a lot for your review! I think the PR is ready to merge now. However `ready` label has been removed due to new commit (removing blank lines commented by Chris). So can you approve this change again? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20885#issuecomment-2334993740 From ysuenaga at openjdk.org Sat Sep 7 01:48:39 2024 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Sat, 7 Sep 2024 01:48:39 GMT Subject: RFR: 8339307: jhsdb jstack could not trace FFM upcall frame [v2] In-Reply-To: <-BKb29h-VOB1Wb3knXT9adPqwYLaVW6zcUGTK7hdMvo=.3885eaaf-11d9-4f77-89c2-3289d114bac8@github.com> References: <8R0jfoVw4GGw1LLahZWR_VCKAnxkmvQBvcQ9jjg_6t4=.4adfb16d-3958-4043-845e-7ca467d980c5@github.com> <-BKb29h-VOB1Wb3knXT9adPqwYLaVW6zcUGTK7hdMvo=.3885eaaf-11d9-4f77-89c2-3289d114bac8@github.com> Message-ID: On Sat, 7 Sep 2024 01:04:14 GMT, Chris Plummer wrote: >> Yasumasa Suenaga has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update test/hotspot/jtreg/serviceability/sa/libupcall.c >> >> Co-authored-by: Jorn Vernee >> - Update src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/ppc64/PPC64Frame.java >> >> Co-authored-by: Jorn Vernee > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/code/UpcallStub.java line 41: > >> 39: >> 40: private static AddressField lastJavaSPField; >> 41: > > I think it's better without the blank lines. Removed them in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20885#discussion_r1747859951 From cjplummer at openjdk.org Sat Sep 7 03:03:06 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Sat, 7 Sep 2024 03:03:06 GMT Subject: RFR: 8339307: jhsdb jstack could not trace FFM upcall frame [v3] In-Reply-To: References: Message-ID: On Sat, 7 Sep 2024 01:48:39 GMT, Yasumasa Suenaga wrote: >> This PR is successor of #20789 . I got some comments in there, then I needed to fix in many pooints. And also I have to fix branch name to kick GHA automatically (#20789 is named with `pr/`, it meets the condition to skip GHA). Hence I've opened another PR for this JBS issue. >> >> This PR has been updated with about topics since #20789: >> * Use `JavaFrameAnchor` instead of raw frame pointer to unwind frame of `UpcallStub`. >> * The change happens x86 (includes AMD64), aarch64, PPC64, RISC-V 64 only - s390 is out of scope because SA does not have s390 implementation. >> * Only both AMD64 and aarch64 have tested on GHA. >> * Refactor testcase to meet expected condition certainly. > > Yasumasa Suenaga has updated the pull request incrementally with one additional commit since the last revision: > > Remove blank lines Marked as reviewed by cjplummer (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20885#pullrequestreview-2287341952 From kbarrett at openjdk.org Sat Sep 7 04:15:14 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 7 Sep 2024 04:15:14 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v17] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 14:15:41 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > s390 port : late barrier expansion I've reviewed the non-compiler GC changes. I've looked over the compiler changes, but can't claim to have reviewed them. I've also reviewed the x64 changes, and looked over the aarch64 changes. src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 176: > 174: __ jcc(Assembler::zero, runtime); // jump to runtime if index == 0 (full buffer) > 175: // The buffer is not full, store value into it. > 176: __ subptr(temp, wordSize); // temp := next index Instead of __ testptr(temp, temp); __ jcc(Assembler::zero, runtime); __ subptr(temp, wordSize); it seems like this might be better __ subptr(temp, wordSize); __ jcc(Assembler::below, runtime); I think the code in the PR matches what the early expansion generates, so I think a change here can be deferred to a followup. src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 354: > 352: __ bind(runtime); > 353: // save the live input values > 354: RegSet saved = RegSet::of(store_addr NOT_LP64(COMMA thread)); I was looking at this a while ago, and haven't figured out why we're saving `store_addr` here. Also not sure why we're saving `thread` here for 32bit platforms. Something to think about for the future. Though maybe the 32bit case will be gone by then :) src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 112: > 110: // The answer is that stores of different sizes can co-exist > 111: // in the same sequence of RawMem effects. We sometimes initialize > 112: // a whole 'tile' of array elements with a single jint or jlong.) I'm having trouble making sense of this comment. I guess a jlong could be used to null-initialize two 32bit oops/narrowOops? But that doesn't have anything to do with jints. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2287188386 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1747741376 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1747824868 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1747898995 From kbarrett at openjdk.org Sat Sep 7 05:45:07 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 7 Sep 2024 05:45:07 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag In-Reply-To: References: Message-ID: <1OaddXS4qsgvGkMJXBt8kZRa7J6vQVZH1tdQ22hpGeo=.06168001-0cde-423b-bebb-24bb45d1c155@github.com> On Thu, 5 Sep 2024 16:10:05 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. > > This fix also includes a cleanup of all the related parameter names and local variable names. > > Testing is pending... > > Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) I started commenting on `MEMFLAGS F` => `MemType F` template parameters that needed the parameter name updated, but stopped after a while. There's a bunch more, that should be easy enough to find. I found this very hard to review, esp. after spotting some outright mistakes that forced me to look much more carefully at all the changes. I'd have really preferred seeing this broken up into smaller chunks that weren't quite so soporific. src/hotspot/share/gc/shared/taskqueue.hpp line 119: > 117: // TaskQueueSuper collects functionality common to all GenericTaskQueue instances. > 118: > 119: template MemTag parameter name should probably be changed here and elsewhere in taskqueue code. Suggest `mem_tag`. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 63: > 61: static void* allocate_node(void* context, size_t size, Value const& value) { > 62: ObjectMonitorTable::inc_items_count(); > 63: return AllocateHeap(size, MemTag::mtObjectMonitor); pre-existing: Why the scope here and below? src/hotspot/share/runtime/os.hpp line 918: > 916: static ssize_t recv(int fd, char* buf, size_t nBytes, uint type); > 917: static ssize_t send(int fd, char* buf, size_t nBytes, uint type); > 918: static ssize_t raw_send(int fd, char* buf, size_t nBytes, uint type); This set of changes is wrong. These aren't MEMFLAGS flags. (I hope there aren't any more like this. This sort of thing would be easy to miss in a change this large. If I were making this change I'd have broken it up into several smaller pieces.) src/hotspot/share/utilities/chunkedList.hpp line 31: > 29: #include "utilities/debug.hpp" > 30: > 31: template class ChunkedList : public CHeapObj { Parameter name should be updated. Suggest `mem_tag`. src/hotspot/share/utilities/concurrentHashTable.hpp line 43: > 41: class Mutex; > 42: > 43: template Parameter name should be updated throughout ConcurrentHashTable. Suggest mem_tag. src/hotspot/share/utilities/growableArray.hpp line 803: > 801: > 802: // Leaner GrowableArray for CHeap backed data arrays, with compile-time decided MemTag. > 803: template Another parameter needing update, but shouldn't (can't?) be called `mem_tag` because of the function parameter name for allocate(). src/hotspot/share/utilities/linkedlist.hpp line 368: > 366: template 367: AnyObj::allocation_type T = AnyObj::C_HEAP, > 368: MemTag F = mtNMT, AllocFailType alloc_failmode = AllocFailStrategy::RETURN_NULL> Another parameter name needing update. src/hotspot/share/utilities/objectBitSet.hpp line 42: > 40: * during the lifetime of the ObjectBitSet. The underlying memory is allocated from C-Heap. > 41: */ > 42: template More parameter names needing update. src/hotspot/share/utilities/resizeableResourceHash.hpp line 33: > 31: typename K, typename V, > 32: AnyObj::allocation_type ALLOC_TYPE, > 33: MemTag MEM_TYPE> I think s/MEM_TYPE/mem_type/, but other non-type template parameters here are also all-uppercase, so I guess better to leave it for now and look at it later. ------------- Changes requested by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20872#pullrequestreview-2287424615 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1747939080 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1747945078 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1747946449 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1747947967 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1747948639 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1747950081 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1747950278 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1747950612 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1747951327 From ysuenaga at openjdk.org Sat Sep 7 05:49:11 2024 From: ysuenaga at openjdk.org (Yasumasa Suenaga) Date: Sat, 7 Sep 2024 05:49:11 GMT Subject: Integrated: 8339307: jhsdb jstack could not trace FFM upcall frame In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 09:31:45 GMT, Yasumasa Suenaga wrote: > This PR is successor of #20789 . I got some comments in there, then I needed to fix in many pooints. And also I have to fix branch name to kick GHA automatically (#20789 is named with `pr/`, it meets the condition to skip GHA). Hence I've opened another PR for this JBS issue. > > This PR has been updated with about topics since #20789: > * Use `JavaFrameAnchor` instead of raw frame pointer to unwind frame of `UpcallStub`. > * The change happens x86 (includes AMD64), aarch64, PPC64, RISC-V 64 only - s390 is out of scope because SA does not have s390 implementation. > * Only both AMD64 and aarch64 have tested on GHA. > * Refactor testcase to meet expected condition certainly. This pull request has now been integrated. Changeset: deeb09a6 Author: Yasumasa Suenaga URL: https://git.openjdk.org/jdk/commit/deeb09a640bf693ea130d1283fc010c22f0cf9db Stats: 447 lines in 12 files changed: 437 ins; 0 del; 10 mod 8339307: jhsdb jstack could not trace FFM upcall frame Reviewed-by: cjplummer, jvernee ------------- PR: https://git.openjdk.org/jdk/pull/20885 From mdoerr at openjdk.org Sat Sep 7 12:40:10 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Sat, 7 Sep 2024 12:40:10 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v17] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 14:15:41 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > s390 port : late barrier expansion I've run the current version through our nightly tests and there were no issues. (Applied on JDK head and tested on x86_64, aarch64 and PPC64 with several OSes.) Performance also looks good on PPC64. No regression observed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2335174688 From mdoerr at openjdk.org Sat Sep 7 14:16:06 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Sat, 7 Sep 2024 14:16:06 GMT Subject: RFR: 8338658: New Object to ObjectMonitor mapping: s390x implementation [v2] In-Reply-To: <_DK9kodmp_ATB5lajLKG2sbkTelH1vAQ9d26WYJES_g=.f892b962-abbb-41d2-8156-3cad77a59c21@github.com> References: <7ODOU2xJpTiLcvTCwz113KzHAPbLUiIaRoDf1TC_zhU=.b64ff099-8682-4b08-bd62-563917837f89@github.com> <_DK9kodmp_ATB5lajLKG2sbkTelH1vAQ9d26WYJES_g=.f892b962-abbb-41d2-8156-3cad77a59c21@github.com> Message-ID: On Wed, 4 Sep 2024 09:06:19 GMT, Axel Boldt-Christmas wrote: >> I couldn't find answer for that. Maybe @xmas92 can tell us about that. > > There are 8 cache entries, and a null sentinel at the end. All entries can be null. > > So the answer is no we can not be sure about that as one, two or three of the first three entries may be null. But I am not sure what the reason is for this question. > > The non-empty/non-null entries always comes first, followed by the null entries if any, followed by a null sentinel. The unrolled entries do not check for null, only for a match. The loop will check the rest of the entries and go to a slow path when a null entry (or the null sentinel) is encountered. @xmas92: Can the 1st entry be 0 and the 2nd one contain garbage which matches the object by chance? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20740#discussion_r1748100949 From mdoerr at openjdk.org Sun Sep 8 11:15:04 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Sun, 8 Sep 2024 11:15:04 GMT Subject: RFR: 8338658: New Object to ObjectMonitor mapping: s390x implementation [v2] In-Reply-To: <8mpU_pIgnJ_eNCSDLSMRR8zvDErPhSV_G8XePpmUl8U=.026964ac-f75e-472d-9187-f0c65548fa0c@github.com> References: <8mpU_pIgnJ_eNCSDLSMRR8zvDErPhSV_G8XePpmUl8U=.026964ac-f75e-472d-9187-f0c65548fa0c@github.com> Message-ID: On Wed, 4 Sep 2024 07:20:35 GMT, Amit Kumar wrote: >> s390x implementation of [JDK-8315884](https://bugs.openjdk.org/browse/JDK-8315884) New Object to ObjectMonitor mapping; >> >> Testing: >> - tier1-test (fastdebug) >> - tier1-test with UseObjectMonitorTable (fastdebug) >> - tier1-test with UseObjectMonitorTable (release) > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > review comments src/hotspot/cpu/s390/macroAssembler_s390.cpp line 6023: > 6021: > 6022: if (UseObjectMonitorTable) { > 6023: // Clear cache in case fast locking succeeds. @xmas92: This comment sounds like it should only be done if fast locking succeeds. Why are we doing it regardless of that? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20740#discussion_r1749190601 From duke at openjdk.org Sun Sep 8 13:24:49 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Sun, 8 Sep 2024 13:24:49 GMT Subject: RFR: 8334999: RISC-V: implement AES single block encryption/decryption intrinsics [v7] In-Reply-To: References: Message-ID: <9rglid_tIn1JA4zqOFygiz1hWYZGOPa8Ci1AI1qRHDA=.c70dd0ec-46b4-43c5-841e-7d91edf65eb4@github.com> > Hello everyone! Please review this port of vector AES single block encryption/decryption intrinsics. On my QEMU with `Zvkned` extension enabled the `test/hotspot/jtreg/compiler/codegen/aes/TestAESMain.java` test is OK. I know that currently hardware implementing this extension is not available on the market but I suppose this PR can be a good starting point on supporting AES intrinsics for RISC-V in OpenJDK. ArsenyBochkarev has updated the pull request incrementally with one additional commit since the last revision: Multiversion decrypt intrinsic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19960/files - new: https://git.openjdk.org/jdk/pull/19960/files/407b9af0..f7bef0e3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19960&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19960&range=05-06 Stats: 140 lines in 1 file changed: 17 ins; 65 del; 58 mod Patch: https://git.openjdk.org/jdk/pull/19960.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19960/head:pull/19960 PR: https://git.openjdk.org/jdk/pull/19960 From duke at openjdk.org Sun Sep 8 13:24:49 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Sun, 8 Sep 2024 13:24:49 GMT Subject: RFR: 8334999: RISC-V: implement AES single block encryption/decryption intrinsics [v3] In-Reply-To: References: Message-ID: On Thu, 18 Jul 2024 08:26:02 GMT, Fei Yang wrote: >> Changes requested by fyang (Reviewer). > >> As for comparison with the openssl version: first of all, thanks for the sources, @RealFYang! The main difference that I see is that they introduced three different different versions of encryption depending on the key sizes, which allows them to skip a couple of instructions, like when I did `vaesem_vv(res, vzero)` followed by `vxor_vv(res, res, vtemp1)`. So I thought it'll be more efficient to replace the current version by something openssl-lookalike. The only problem I see is increasing code size a bit. Please let me know if we are not interested in this change for some reason > > Does `vaesz_vs` help in anyway? And what about the `generate_aescrypt_decryptBlock`? [1] > > [1] https://github.com/openssl/openssl/blob/master/crypto/aes/asm/aes-riscv64-zvkned.pl#L451 Hello @RealFYang! Sorry for such a late reply. > Does `vaesz_vs` help in anyway? As far as I know, the `vaesz_vs` instruction is just an alias for `vxor`, so it was already utilized in this patch. > `generate_aescrypt_decryptBlock` I missed this case in initial multiversioning commit, so I multiversioned the decrypt intrisic also, thanks for pointing it out! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19960#issuecomment-2336684717 From dholmes at openjdk.org Sun Sep 8 23:49:10 2024 From: dholmes at openjdk.org (David Holmes) Date: Sun, 8 Sep 2024 23:49:10 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v2] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> <4-FpKqNuti9sYRjbrdRV17Ao2f21Cn8EhjF3f8npt0M=.dde056e6-8e6f-4a02-8204-e6f6cedae337@github.com> Message-ID: On Fri, 6 Sep 2024 23:53:58 GMT, Ioi Lam wrote: >> src/hotspot/share/cds/archiveBuilder.cpp line 904: >> >>> 902: log_info(cds)(" instance classes " STATS_FORMAT, STATS_PARAMS(instance_klasses)); >>> 903: log_info(cds)(" boot " STATS_FORMAT, STATS_PARAMS(boot_klasses)); >>> 904: log_info(cds)(" vm " STATS_FORMAT, STATS_PARAMS(vm_klasses)); >> >> Suggestion: >> >> log_info(cds)(" vm " STATS_FORMAT, STATS_PARAMS(vm_klasses)); > > The indentation is intentional: vm is a subset of boot classes, which is a subset of instance classes. Okay but I presume the indent level should be the same: boot was indented by 2, then vm only be 1. >> src/hotspot/share/cds/archiveUtils.cpp line 390: >> >>> 388: return "boot"; // boot classes in java.base >>> 389: } else { >>> 390: return "boot2"; // boot classes outside of java.base >> >> Suggestion: boot -> boot-base, boot2 -> boot-nonbase ? > > I prefer boot/boot2 to make the output easier to read. Anyone debugging this output will need to read the code to understand what "boot2" or "boot-nonbase" is. A few extra characters here will not help. Sorry but '2' conveys zero information whereas 'nonbase' tells you they are not in the base module. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1749399548 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1749399859 From dholmes at openjdk.org Sun Sep 8 23:52:09 2024 From: dholmes at openjdk.org (David Holmes) Date: Sun, 8 Sep 2024 23:52:09 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v2] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Sat, 7 Sep 2024 00:30:24 GMT, Ioi Lam wrote: >> This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> **Overview** >> >> - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. >> - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. >> - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. >> - The boot classes are loaded as part of `vmClasses::resolve_all()` >> - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). >> - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. >> >> **All-or-nothing Loading** >> >> - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. >> - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: >> - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. >> - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. >> - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. >> - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exa... > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @dholmes-ora comments src/hotspot/share/cds/archiveBuilder.cpp line 912: > 910: STATS_PARAMS(unlinked_klasses), > 911: boot_unlinked, platform_unlinked, > 912: app_unlinked, unreg_unlinked); This indentation still looks weird. Arguments on newlines should be aligned. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1749401000 From dholmes at openjdk.org Mon Sep 9 00:00:06 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 9 Sep 2024 00:00:06 GMT Subject: RFR: 8335362: [Windows] Stack pointer increment in _cont_thaw stub can cause program to terminate with exit code 0xc0000005 [v3] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 15:27:41 GMT, Patricio Chilano Mateo wrote: >> Please review the following fix. In stub routine cont_thaw() we bump the stack pointer by the maximum size required to copy the frames currently stored in the top stackChunk. On Windows this increment of the stack pointer doesn't play nice with the way Windows sets up and manages stack pages. When a thread is created the stack is divided in 3 memory regions: regular committed pages, guard pages, reserved pages. The first pages are committed and the thread can read/write to them with no issues. The next pages(~2/3) are guard pages, which are committed but have the PAGE_GUARD attribute. When the thread tries to access a guard page the first time, the PAGE_GUARD attribute is removed and a new guard page from the reserved region is added. The rest of the stack are reserved pages and if we try to access it directly we get an EXCEPTION_ACCESS_VIOLATION (see bug for more details). So the problem is that we can bump the stack pointer too much and set it to point somewhere in the reserved region. When we then execute the call instruction for method thaw(), we get an EXCEPTION_ACCESS_VIOLATION exception, but because we cannot access the memory at the current stack pointer, we cannot call any method anymore, including the exception handler and the program terminates abruptly with exit code 0xc0000005. >> >> The fix implemented is to bang the stack pages one by one to let the Windows page protection take over. This is what we already do in os::map_stack_shadow_pages() in JavaCalls::call_helper(), and also in interpreter (bang_stack_shadow_pages()) and compiler (generate_stack_overflow_check()) code. It's actually also the same mechanism that Windows routine _chkstk used by the compiler employs (see bug comments with assembly code). >> >> I added new test BigStackChunk.java that reproduces the issue. The test fails without this fix and passes with it. I also tested the patch by running in mach5 tiers1-7. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > fix update in map_stack_shadow_pages Thanks for the update. The role of `_shadow_zone_growth_watermark` is clearer now. src/hotspot/os/windows/os_windows.inline.hpp line 60: > 58: } > 59: StackOverflow* state = JavaThread::current()->stack_overflow_state(); > 60: assert(original_sp > state->shadow_zone_safe_limit(), ""); Can you print the values if the assert fails please. ------------- PR Review: https://git.openjdk.org/jdk/pull/20862#pullrequestreview-2288632318 PR Review Comment: https://git.openjdk.org/jdk/pull/20862#discussion_r1749403210 From dholmes at openjdk.org Mon Sep 9 00:08:05 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 9 Sep 2024 00:08:05 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag In-Reply-To: <1OaddXS4qsgvGkMJXBt8kZRa7J6vQVZH1tdQ22hpGeo=.06168001-0cde-423b-bebb-24bb45d1c155@github.com> References: <1OaddXS4qsgvGkMJXBt8kZRa7J6vQVZH1tdQ22hpGeo=.06168001-0cde-423b-bebb-24bb45d1c155@github.com> Message-ID: On Sat, 7 Sep 2024 05:11:25 GMT, Kim Barrett wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. >> >> This fix also includes a cleanup of all the related parameter names and local variable names. >> >> Testing is pending... >> >> Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) > > src/hotspot/share/gc/shared/taskqueue.hpp line 119: > >> 117: // TaskQueueSuper collects functionality common to all GenericTaskQueue instances. >> 118: >> 119: template > > MemTag parameter name should probably be changed here and elsewhere in taskqueue code. > Suggest `mem_tag`. I was going to suggest just MT which is more in keeping with the short/terse names given to type parameters. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1749405959 From iklam at openjdk.org Mon Sep 9 01:04:17 2024 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 9 Sep 2024 01:04:17 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v2] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Sun, 8 Sep 2024 23:49:45 GMT, David Holmes wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> @dholmes-ora comments > > src/hotspot/share/cds/archiveBuilder.cpp line 912: > >> 910: STATS_PARAMS(unlinked_klasses), >> 911: boot_unlinked, platform_unlinked, >> 912: app_unlinked, unreg_unlinked); > > This indentation still looks weird. Arguments on newlines should be aligned. I tried to separate the "types" from the "values". I think this makes it easy to see how many types there are. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1749426533 From iklam at openjdk.org Mon Sep 9 01:04:17 2024 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 9 Sep 2024 01:04:17 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v2] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> <4-FpKqNuti9sYRjbrdRV17Ao2f21Cn8EhjF3f8npt0M=.dde056e6-8e6f-4a02-8204-e6f6cedae337@github.com> Message-ID: On Sun, 8 Sep 2024 23:46:03 GMT, David Holmes wrote: >> I prefer boot/boot2 to make the output easier to read. Anyone debugging this output will need to read the code to understand what "boot2" or "boot-nonbase" is. A few extra characters here will not help. > > Sorry but '2' conveys zero information whereas 'nonbase' tells you they are not in the base module. It's the second group of boot loaders classes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1749425914 From dholmes at openjdk.org Mon Sep 9 03:14:07 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 9 Sep 2024 03:14:07 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 16:10:05 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. > > This fix also includes a cleanup of all the related parameter names and local variable names. > > Testing is pending... > > Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) The bulk of this seems fine but as Kim points out the `F` template parameter really needs changing too. There are also some other questionable changes that I've flagged below. Thanks src/hotspot/share/nmt/memMapPrinter.cpp line 70: > 68: //end > 69: > 70: static const char* get_shortname_for_nmt_flag(MemTag mem_tag) { Shouldn't this be renamed to `get_shortname_for_nmt_tag`? src/hotspot/share/nmt/memReporter.cpp line 852: > 850: } else if (early_site->mem_tag() != current_site->mem_tag()) { > 851: // This site was originally allocated with one type, then released, > 852: // then re-allocated at the same site (as far as we can tell) with a different type. s/type/tag/ src/hotspot/share/nmt/memTracker.hpp line 83: > 81: if (enabled()) { > 82: return MallocTracker::record_malloc(mem_base, size, mem_tag, stack); > 83: return MallocTracker::record_malloc(mem_base, size, mem_tag, stack); Did this even compile? ! Suggestion: return MallocTracker::record_malloc(mem_base, size, mem_tag, stack); src/hotspot/share/nmt/memoryFileTracker.cpp line 51: > 49: for (int i = 0; i < mt_number_of_tags; i++) { > 50: VirtualMemory* summary = file->_summary.by_type(NMTUtil::index_to_tag(i)); > 51: summary->reserve_memory(diff.type[i].commit); Why is this `type` not `tag`? src/hotspot/share/nmt/memoryFileTracker.cpp line 109: > 107: tty->print_cr("Expected start out to have same type as end in, but was: %s, %s", > 108: VMATree::statetype_to_string(broken_start->val().out.state()), > 109: VMATree::statetype_to_string(broken_end->val().in.state())); Not seeing what this rename has to do with current changes. ??? src/hotspot/share/nmt/virtualMemoryTracker.cpp line 400: > 398: > 399: // Print some more details. Don't use UL here to avoid circularities. > 400: tty->print_cr("Error: existing region: [" INTPTR_FORMAT "-" INTPTR_FORMAT "), type %u.\n" Again why `type` instead of `tag`? src/hotspot/share/nmt/virtualMemoryTracker.cpp line 560: > 558: // Given an existing memory mapping registered with NMT, split the mapping in > 559: // two. The newly created two mappings will be registered under the call > 560: // stack and the memory types of the original section. types -> tags src/hotspot/share/nmt/vmatree.cpp line 86: > 84: // If the state is not matching then we have different operations, such as: > 85: // reserve [x1, A); ... commit [A, x2); or > 86: // reserve [x1, A), type1; ... reserve [A, x2), type2; or Why type not tag? src/hotspot/share/nmt/vmatree.hpp line 91: > 89: private: > 90: // Store the state and mem_tag as two bytes > 91: uint8_t info[2]; I'm unclear about terminology here: type -> state ? ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20872#pullrequestreview-2288637165 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1749408498 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1749409032 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1749409609 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1749410180 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1749410493 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1749411164 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1749411446 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1749411720 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1749412298 From dholmes at openjdk.org Mon Sep 9 03:27:05 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 9 Sep 2024 03:27:05 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v2] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 19:51:18 GMT, Matias Saavedra Silva wrote: >> This patch cleans up the use of `get_new_method()` so callers don't have to worry about throwing `NoSuchMethodError`. The method is refactored to throw the error and avoid ever returning nullptr. Verified with tier1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Fixed conditional to check for is_deleted() src/hotspot/share/oops/method.hpp line 853: > 851: Method* new_method = method_holder()->method_with_idnum(orig_method_idnum()); > 852: assert(this != new_method, "sanity check"); > 853: return (new_method == nullptr || is_deleted()) ? Universe::throw_no_such_method_error() : new_method; I am still confused by the different possibilities here. Under what conditions will we get nullptr? Is it the case that `get_new_method` should only be called when `is_old()` is true? Can `is_old` and `is_deleted` be true at the same time? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20874#discussion_r1749507207 From dholmes at openjdk.org Mon Sep 9 03:32:05 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 9 Sep 2024 03:32:05 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v2] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 03:24:05 GMT, David Holmes wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed conditional to check for is_deleted() > > src/hotspot/share/oops/method.hpp line 853: > >> 851: Method* new_method = method_holder()->method_with_idnum(orig_method_idnum()); >> 852: assert(this != new_method, "sanity check"); >> 853: return (new_method == nullptr || is_deleted()) ? Universe::throw_no_such_method_error() : new_method; > > I am still confused by the different possibilities here. Under what conditions will we get nullptr? Is it the case that `get_new_method` should only be called when `is_old()` is true? Can `is_old` and `is_deleted` be true at the same time? To answer some of my own questions: - yes `get_new_method` should only be called if `is_old` is true. (Should we assert that?) - yes a method can be old and deleted at the same time. I remain unclear how nullptr can appear here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20874#discussion_r1749510581 From galder at openjdk.org Mon Sep 9 05:10:07 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 9 Sep 2024 05:10:07 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Tue, 3 Sep 2024 07:37:33 GMT, Francesco Nigro wrote: >> Working on it > > @galderz in the benchmark did you collected the mispredicts/branches? @franz1981 No I hadn't done so until now, but I will be tracking those more closely. Context: I have been running some reduction JMH benchmarks and I could see a big drop in non AVX-512 performance compared to the unpatched code. E.g. @Benchmark public long reductionSingleLongMax() { long result = 0; for (int i = 0; i < size; i++) { final long v = 11 * aLong[i]; result = Math.max(result, v); } return result; } This is caused by keeping the Max/Min nodes in the IR, which get translated into `cmpq+cmovlq` instructions (via the macro expansion). The code gets unrolled but a dependency chain on the current max value. In the unpatched code the intrinsic does not kick in and uses a standard ternary operation, which gets translated into a normal control flow. The system is able to handle this better due to branch prediction. @franz1981's comment is precisely about this. I need to enhance the benchmark to control the branchiness of the test (e.g. how often it goes one side or the other of a max/min call) and measure the mispredictions and branches...etc. FYI: A similar situation can be replicated with reduction benchmarks that use max/min integer, but for the code to fallback into `cmov`, both AVX and SSE have be turned off. I also need to see what the performance looks on like on a system with AVX-512, and also look at how non-reduction JMH benchmarks behave on systems with/without AVX-512. Finally, I'm also looking at an experiment to see what would happen in cmovl was implemented with branch+mov instead. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2337131179 From dholmes at openjdk.org Mon Sep 9 05:18:23 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 9 Sep 2024 05:18:23 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v2] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Mon, 9 Sep 2024 01:01:41 GMT, Ioi Lam wrote: >> src/hotspot/share/cds/archiveBuilder.cpp line 912: >> >>> 910: STATS_PARAMS(unlinked_klasses), >>> 911: boot_unlinked, platform_unlinked, >>> 912: app_unlinked, unreg_unlinked); >> >> This indentation still looks weird. Arguments on newlines should be aligned. > > I tried to separate the "types" from the "values". I think this makes it easy to see how many types there are. Sorry I don't follow. This is just like a printf call ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1749573172 From lmao at openjdk.org Mon Sep 9 05:40:18 2024 From: lmao at openjdk.org (Liang Mao) Date: Mon, 9 Sep 2024 05:40:18 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass Message-ID: Hi, It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. test/hotspot/jtreg/runtime and gc are clean. Thanks, Liang ------------- Commit messages: - 8339725: Concurrent GC crashed due to GetMethodDeclaringClass Changes: https://git.openjdk.org/jdk/pull/20907/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20907&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339725 Stats: 5 lines in 3 files changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20907/head:pull/20907 PR: https://git.openjdk.org/jdk/pull/20907 From lmao at openjdk.org Mon Sep 9 06:04:36 2024 From: lmao at openjdk.org (Liang Mao) Date: Mon, 9 Sep 2024 06:04:36 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v2] In-Reply-To: References: Message-ID: > Hi, > > It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. > > test/hotspot/jtreg/runtime and gc are clean. > > Thanks, > Liang Liang Mao has updated the pull request incrementally with one additional commit since the last revision: Fix build error in windows/mac ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20907/files - new: https://git.openjdk.org/jdk/pull/20907/files/276bec66..da942579 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20907&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20907&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20907/head:pull/20907 PR: https://git.openjdk.org/jdk/pull/20907 From fjiang at openjdk.org Mon Sep 9 06:09:12 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Mon, 9 Sep 2024 06:09:12 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v16] In-Reply-To: References: Message-ID: <2Iqb8t5nI61Zq22PafvY9QUUw_9OZ7oHygSdOY6QCX8=.f1338ef5-d646-45aa-bcb6-54f0dd13bc87@github.com> On Fri, 6 Sep 2024 14:02:58 GMT, Roberto Casta?eda Lozano wrote: >>> I was just thinking whether it is necessary to do these barriers for enums at all. They will most likely always reside in a different region/generation anyways, and will (but that may be wrong) never be collected. >> >> As far as I understand, from a GC perspective enum instances are handled just like any other class instance, so I do not think there is any special GC barrier optimization opportunity for them. Maybe some GC expert can comment further on this or correct me if I am wrong. > >> Hi @robcasloz, you can pick up s390x patch from here: [offamitkumar at 6663433](https://github.com/offamitkumar/jdk/commit/6663433c4aa17925f699eaa8995cdc0cd78c0034) > > Done, thanks! > > Hi @robcasloz, here is the implementation for RISC-V: [feilongjiang at 1c012cf](https://github.com/feilongjiang/jdk/commit/1c012cfd4a1f4d6e39d6f4798281423f884e97a6) We are still testing the latest changes, results will be updated later. > > Great, thanks @feilongjiang! Just let me know when you are done with testing and I will merge it into this changeset. Tier1-3 & hotspot:tier4 test result is clean on linux-riscv64 platform. No regression observed for performance. (Applied on JDK head). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2337203016 From aboldtch at openjdk.org Mon Sep 9 07:08:05 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 9 Sep 2024 07:08:05 GMT Subject: RFR: 8338658: New Object to ObjectMonitor mapping: s390x implementation [v2] In-Reply-To: References: <8mpU_pIgnJ_eNCSDLSMRR8zvDErPhSV_G8XePpmUl8U=.026964ac-f75e-472d-9187-f0c65548fa0c@github.com> Message-ID: On Sun, 8 Sep 2024 11:12:24 GMT, Martin Doerr wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> review comments > > src/hotspot/cpu/s390/macroAssembler_s390.cpp line 6023: > >> 6021: >> 6022: if (UseObjectMonitorTable) { >> 6023: // Clear cache in case fast locking succeeds. > > @xmas92: This comment sounds like it should only be done if fast locking succeeds. Why are we doing it regardless of that? The invariants surrounding the cache went back and forth a bit. The important part is that the cache slot in the `BasicLock` is valid after the enter is complete. There is now a RAII object in the runtime code which makes sure this is the case for all paths. So it should be the case that the C2 code only needs to handle the case where it is successful. (Clearing when fast lock succeeds and storing the monitor when inflated locking succeeds.) I think it went in in this state as a combination of it being something that once was required (because the runtime was less precise with how it handles the `BasicLock` cache) and that I wanted to take the C2 changes in as they were because most of the performance testing had been performed in that state. Maybe there was some technical issue somewhere with regards to register availability, cannot recall. Regardless if it can be only done in the none slow path case and it is more performant it should be fine to do so. I do not think I saw a measurable difference on x86_64 from having the two stores in the successful inflated case. And in all other cases the only time you can elide the store is when the slow path is taken, so it probably does not save much if anything. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20740#discussion_r1749678191 From aboldtch at openjdk.org Mon Sep 9 07:08:06 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 9 Sep 2024 07:08:06 GMT Subject: RFR: 8338658: New Object to ObjectMonitor mapping: s390x implementation [v2] In-Reply-To: References: <7ODOU2xJpTiLcvTCwz113KzHAPbLUiIaRoDf1TC_zhU=.b64ff099-8682-4b08-bd62-563917837f89@github.com> <_DK9kodmp_ATB5lajLKG2sbkTelH1vAQ9d26WYJES_g=.f892b962-abbb-41d2-8156-3cad77a59c21@github.com> Message-ID: On Sat, 7 Sep 2024 14:13:00 GMT, Martin Doerr wrote: >> There are 8 cache entries, and a null sentinel at the end. All entries can be null. >> >> So the answer is no we can not be sure about that as one, two or three of the first three entries may be null. But I am not sure what the reason is for this question. >> >> The non-empty/non-null entries always comes first, followed by the null entries if any, followed by a null sentinel. The unrolled entries do not check for null, only for a match. The loop will check the rest of the entries and go to a slow path when a null entry (or the null sentinel) is encountered. > > @xmas92: Can the 1st entry be 0 and the 2nd one contain garbage which matches the object by chance? No, the oops in the cache are either valid entries or null. They may be stale, as in the monitor has been deflated, but that will cause the slow path to be taken and the cache entry is replaced inside the runtime call. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20740#discussion_r1749678277 From rcastanedalo at openjdk.org Mon Sep 9 07:44:13 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 9 Sep 2024 07:44:13 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v17] In-Reply-To: References: Message-ID: On Sat, 7 Sep 2024 12:37:54 GMT, Martin Doerr wrote: > I've run the current version through our nightly tests and there were no issues. (Applied on JDK head and tested on x86_64, aarch64 and PPC64 with several OSes.) Performance also looks good on PPC64. No regression observed. Great, thanks for testing Martin! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2337362381 From jbhateja at openjdk.org Mon Sep 9 08:18:54 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 9 Sep 2024 08:18:54 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v9] In-Reply-To: References: Message-ID: <9NBW5WfztEJCcHs68o3b8O1IhgmdS3LX7UQmZXxbZ8M=.0271ce97-7dbe-4f84-965d-d511b0392c5b@github.com> > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with two additional commits since the last revision: - Fix jtreg regression. - Addressing Paul's comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/195390fe..4a93042b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=07-08 Stats: 215 lines in 39 files changed: 0 ins; 1 del; 214 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From fbredberg at openjdk.org Mon Sep 9 08:44:03 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Mon, 9 Sep 2024 08:44:03 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread Message-ID: Removed the concept of an ObjectMonitor Responsible thread. The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. Passes tier1-tier7 on supported platforms. x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. Arm32 and Zero doesn't need any changes as far as I can tell. ------------- Commit messages: - Small fixes before the review - Merge branch 'master' into 8320318_objectmon_responsible_thread - Merge branch 'master' into 8320318_objectmon_responsible_thread - Removed _Responsible - Fixed s390 - Fixed legacy locking - Merge branch 'master' into 8320318_objectmon_responsible_thread - Moved complexity from assembler to c++ - Small fixes for x86 and PowerPC. - 8320318: ObjectMonitor Responsible thread Changes: https://git.openjdk.org/jdk/pull/19454/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19454&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320318 Stats: 708 lines in 15 files changed: 294 ins; 284 del; 130 mod Patch: https://git.openjdk.org/jdk/pull/19454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19454/head:pull/19454 PR: https://git.openjdk.org/jdk/pull/19454 From mdoerr at openjdk.org Mon Sep 9 09:42:04 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 9 Sep 2024 09:42:04 GMT Subject: RFR: 8338658: New Object to ObjectMonitor mapping: s390x implementation [v2] In-Reply-To: References: <8mpU_pIgnJ_eNCSDLSMRR8zvDErPhSV_G8XePpmUl8U=.026964ac-f75e-472d-9187-f0c65548fa0c@github.com> <2yeb4_jKO9U7D1zHyLgi0GTKUym2iesw8lSMSl9tvIo=.c59ab670-31d2-48f3-aa2c-032ca2890c66@github.com> Message-ID: On Thu, 5 Sep 2024 07:07:55 GMT, Amit Kumar wrote: >> I am often undecided myself. The code as it is now is correct for all displacements. If you omit the else part, you introduce a hidden dependency on the layout of BasicObjectLock and BasicLock. On the other hand, how likely is it that anybody will fundamentally change the layout and thus break the disp12 requirement? Compact or generally valid? **Your choice.** > > I guess let's keep it. I mean even if there is need to change the layout, then we have to remove `z_mvghi` and switch back to this implementation again. So maybe better we keep it here and hope for the best ? `BasicObjectLock` is a very small data structure. 12 bit offsets are more than enough. The actual offsets are 0 :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20740#discussion_r1749933492 From mdoerr at openjdk.org Mon Sep 9 09:42:06 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 9 Sep 2024 09:42:06 GMT Subject: RFR: 8338658: New Object to ObjectMonitor mapping: s390x implementation [v2] In-Reply-To: References: Message-ID: <3sDJd7qbND_8TDkMZNrnIUNUDTU-dDFsxnWCHMY7WyA=.ea6f3736-565c-4e44-b718-e660328fae5f@github.com> On Mon, 2 Sep 2024 18:15:19 GMT, Lutz Schmidt wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> review comments > > src/hotspot/cpu/s390/macroAssembler_s390.cpp line 6165: > >> 6163: z_stg(tmp1, Address(box, BasicLock::object_monitor_cache_offset_in_bytes())); >> 6164: } >> 6165: > > Why not use mvghi here to directly write zero to memory? Prerequisites: displacement must be uimm12. +1 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20740#discussion_r1749935121 From aph at openjdk.org Mon Sep 9 10:11:07 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 9 Sep 2024 10:11:07 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v22] In-Reply-To: References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> Message-ID: On Fri, 6 Sep 2024 18:07:04 GMT, Vladimir Ivanov wrote: > Looks good. > > Testing results (hs-tier1 - hs-tier6) are clean. Great! Is there anyone you'd suggest for additional review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2337693887 From rkennke at openjdk.org Mon Sep 9 10:29:55 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 10:29:55 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: - Fix compiler/c2/irTests/TestPadding.java for +COH - Simplify arrayOopDesc::length_offset_in_bytes and oopDesc::base_offset_in_bytes - Nit in header_size - GC code tweaks - Fix runtime/cds/appcds/loaderConstraints/DynamicLoaderConstraintsTest.java - Fix jdk/tools/jlink/plugins/CDSPluginTest.java - Cleanup markWord bits and comments - x86_64: Fix loadNKlassCompactHeaders - aarch64: Fix loadNKlassCompactHeaders - Use FLAG_SET_ERGO when turning off UseCompactObjectHeaders - ... and 16 more: https://git.openjdk.org/jdk/compare/b45fe174...49126383 ------------- Changes: https://git.openjdk.org/jdk/pull/20677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=06 Stats: 4465 lines in 189 files changed: 3175 ins; 678 del; 612 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From mli at openjdk.org Mon Sep 9 10:33:15 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 9 Sep 2024 10:33:15 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic Message-ID: Hi, Can you help to review this patch? Thanks. This improvement is based on java.base/share/native/libzip/zlib/zcrc32.c, I made some modification to N (to 16) related code, then re-generate the tables needed, finally vectorize the code (original implementation in zcrc32.c is just scalar code). ## Test test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java, test/jdk/java/util/zip/TestCRC32.java ## Performance tested on bananapi ### with patch data Benchmark -with patch | (count) | Mode | Cnt | Score | Error | Units -- | -- | -- | -- | -- | -- | -- TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.884 | 0.03 | ns/op TestCRC32.testCRC32Update | 128 | avgt | 10 | 401.122 | 0.309 | ns/op TestCRC32.testCRC32Update | 256 | avgt | 10 | 680.168 | 0.032 | ns/op TestCRC32.testCRC32Update | 512 | avgt | 10 | 1062.426 | 0.401 | ns/op TestCRC32.testCRC32Update | 2048 | avgt | 10 | 3308.361 | 0.176 | ns/op TestCRC32.testCRC32Update | 16384 | avgt | 10 | 24403.231 | 20.248 | ns/op TestCRC32.testCRC32Update | 65536 | avgt | 10 | 103463.735 | 4.245 | ns/op ### without patch data Benchmark -without patch | (count) | Mode | Cnt | Score | Error | Units -- | -- | -- | -- | -- | -- | -- TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.942 | 0.224 | ns/op TestCRC32.testCRC32Update | 128 | avgt | 10 | 364.159 | 0.019 | ns/op TestCRC32.testCRC32Update | 256 | avgt | 10 | 686.106 | 0.1 | ns/op TestCRC32.testCRC32Update | 512 | avgt | 10 | 1328.962 | 0.073 | ns/op TestCRC32.testCRC32Update | 2048 | avgt | 10 | 5191.116 | 0.189 | ns/op TestCRC32.testCRC32Update | 16384 | avgt | 10 | 41286.858 | 4.53 | ns/op TestCRC32.testCRC32Update | 65536 | avgt | 10 | 172340.099 | 11.004 | ns/op ------------- Commit messages: - fix space - Initial commit Changes: https://git.openjdk.org/jdk/pull/20910/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20910&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339738 Stats: 348 lines in 3 files changed: 332 ins; 10 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/20910.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20910/head:pull/20910 PR: https://git.openjdk.org/jdk/pull/20910 From mdoerr at openjdk.org Mon Sep 9 10:45:03 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 9 Sep 2024 10:45:03 GMT Subject: RFR: 8338658: New Object to ObjectMonitor mapping: s390x implementation [v2] In-Reply-To: References: <7ODOU2xJpTiLcvTCwz113KzHAPbLUiIaRoDf1TC_zhU=.b64ff099-8682-4b08-bd62-563917837f89@github.com> <_DK9kodmp_ATB5lajLKG2sbkTelH1vAQ9d26WYJES_g=.f892b962-abbb-41d2-8156-3cad77a59c21@github.com> Message-ID: On Mon, 9 Sep 2024 07:05:31 GMT, Axel Boldt-Christmas wrote: >> @xmas92: Can the 1st entry be 0 and the 2nd one contain garbage which matches the object by chance? > > No, the oops in the cache are either valid entries or null. They may be stale, as in the monitor has been deflated, but that will cause the slow path to be taken and the cache entry is replaced inside the runtime call. Ok, the owner won't be 0 in this case, the cmpxchg attempt will fail and the slow path will handle it correctly. Thanks for the explanation! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20740#discussion_r1750022201 From mli at openjdk.org Mon Sep 9 11:13:47 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 9 Sep 2024 11:13:47 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v2] In-Reply-To: References: Message-ID: <361CNPQYcSo_A4BmHd1RrYEVheSFNfc-WB3wXEEPUL4=.9fc78191-1052-4b26-852a-147e7f1dddd4@github.com> > Hi, > Can you help to review this patch? > Thanks. > > This improvement is based on java.base/share/native/libzip/zlib/zcrc32.c, I made some modification to N (to 16) related code, then re-generate the tables needed, finally vectorize the code (original implementation in zcrc32.c is just scalar code). > > ## Test > test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java, > test/jdk/java/util/zip/TestCRC32.java > > ## Performance > > tested on bananapi > > ### with patch > data > > Benchmark -with patch | (count) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- > TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.884 | 0.03 | ns/op > TestCRC32.testCRC32Update | 128 | avgt | 10 | 401.122 | 0.309 | ns/op > TestCRC32.testCRC32Update | 256 | avgt | 10 | 680.168 | 0.032 | ns/op > TestCRC32.testCRC32Update | 512 | avgt | 10 | 1062.426 | 0.401 | ns/op > TestCRC32.testCRC32Update | 2048 | avgt | 10 | 3308.361 | 0.176 | ns/op > TestCRC32.testCRC32Update | 16384 | avgt | 10 | 24403.231 | 20.248 | ns/op > TestCRC32.testCRC32Update | 65536 | avgt | 10 | 103463.735 | 4.245 | ns/op > > > > ### without patch > data > > Benchmark -without patch | (count) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- > TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.942 | 0.224 | ns/op > TestCRC32.testCRC32Update | 128 | avgt | 10 | 364.159 | 0.019 | ns/op > TestCRC32.testCRC32Update | 256 | avgt | 10 | 686.106 | 0.1 | ns/op > TestCRC32.testCRC32Update | 512 | avgt | 10 | 1328.962 | 0.073 | ns/op > TestCRC32.testCRC32Update | 2048 | avgt | 10 | 5191.116 | 0.189 | ns/op > TestCRC32.testCRC32Update | 16384 | avgt | 10 | 41286.858 | 4.53 | ns/op > TestCRC32.testCRC32Update | 65536 | avgt | 10 | 172340.099 | 11.004 | ns/op > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: zext_w ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20910/files - new: https://git.openjdk.org/jdk/pull/20910/files/a0408543..2eb264bd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20910&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20910&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20910.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20910/head:pull/20910 PR: https://git.openjdk.org/jdk/pull/20910 From rcastanedalo at openjdk.org Mon Sep 9 11:15:47 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 9 Sep 2024 11:15:47 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v18] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion - riscv port for JEP 475 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/6663433c..94145917 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=16-17 Stats: 860 lines in 4 files changed: 771 ins; 49 del; 40 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Mon Sep 9 11:15:47 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 9 Sep 2024 11:15:47 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v17] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 07:41:06 GMT, Roberto Casta?eda Lozano wrote: >> I've run the current version through our nightly tests and there were no issues. (Applied on JDK head and tested on x86_64, aarch64 and PPC64 with several OSes.) Performance also looks good on PPC64. No regression observed. > >> I've run the current version through our nightly tests and there were no issues. (Applied on JDK head and tested on x86_64, aarch64 and PPC64 with several OSes.) Performance also looks good on PPC64. No regression observed. > > Great, thanks for testing Martin! > > > Hi @robcasloz, here is the implementation for RISC-V: [feilongjiang at 1c012cf](https://github.com/feilongjiang/jdk/commit/1c012cfd4a1f4d6e39d6f4798281423f884e97a6) We are still testing the latest changes, results will be updated later. > > > > > > Great, thanks @feilongjiang! Just let me know when you are done with testing and I will merge it into this changeset. > > Tier1-3 & hotspot:tier4 test result is clean on linux-riscv64 platform. No regression observed for performance. (Applied on JDK head). Thanks @feilongjiang, merged now (commit 94145917). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2337824882 From eosterlund at openjdk.org Mon Sep 9 11:17:12 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 9 Sep 2024 11:17:12 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v2] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 06:04:36 GMT, Liang Mao wrote: >> Hi, >> >> It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. >> >> test/hotspot/jtreg/runtime and gc are clean. >> >> Thanks, >> Liang > > Liang Mao has updated the pull request incrementally with one additional commit since the last revision: > > Fix build error in windows/mac Reading a strong reference such as OopHandle with ON_PHANTOM_OOP_REF is dangerous. The implication of using ON_PHANTOM_OOP_REF is that there are certain interactions with a reference processor. But it will not process OopHandle because they are strong references. So I really don't think we should be doing this. I think a better solution is to do what existing code does today, which is to read klass->klass_holder() (which keeps the class alive) in the GetMethodDeclaringClass function. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2337829404 From rehn at openjdk.org Mon Sep 9 11:23:27 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 9 Sep 2024 11:23:27 GMT Subject: RFR: 8339741: RISC-V: C ABI breakage for integer on stack Message-ID: Hi please review, When calling a native function using integers smaller than 64, they must be loaded from a Java stack slot and widen to 64-bit, sign-extended. In the interpreter case we only store 32-bit, which means the top 32-bit are 'random'. In the compiler case we do an ld and grab random top 32-bit. These should be loaded with a lw from Java stack, thus proper sign extended and then stored with sd into the native stack. I found the intrepter bug first, wrote a test case for it, which found the compiler bug. Here you can see the difference, both are legal todo from a compiler: https://godbolt.org/z/85aMhja5f Relevant specs: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc > integer scalars narrower than XLEN bits are widened according to the sign of their type up to 32 bits, then sign-extended to XLEN bits. I checked floats also, they seems fine, but please go ahead and do a check regarding floats. Passes ./test/hotspot/jtreg/compiler/calls/, runnnig t1. ------------- Commit messages: - Fixed Changes: https://git.openjdk.org/jdk/pull/20912/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20912&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339741 Stats: 136 lines in 4 files changed: 134 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20912.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20912/head:pull/20912 PR: https://git.openjdk.org/jdk/pull/20912 From rcastanedalo at openjdk.org Mon Sep 9 11:35:13 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 9 Sep 2024 11:35:13 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v17] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 21:33:42 GMT, Kim Barrett wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> s390 port : late barrier expansion > > src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 176: > >> 174: __ jcc(Assembler::zero, runtime); // jump to runtime if index == 0 (full buffer) >> 175: // The buffer is not full, store value into it. >> 176: __ subptr(temp, wordSize); // temp := next index > > Instead of > > __ testptr(temp, temp); > __ jcc(Assembler::zero, runtime); > __ subptr(temp, wordSize); > > it seems like this might be better > > __ subptr(temp, wordSize); > __ jcc(Assembler::below, runtime); > > I think the code in the PR matches what the early expansion generates, so I think a change here > can be deferred to a followup. Good point, thanks! I made a note for follow-up work. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1750088920 From rcastanedalo at openjdk.org Mon Sep 9 11:48:11 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 9 Sep 2024 11:48:11 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v17] In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 23:57:59 GMT, Kim Barrett wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> s390 port : late barrier expansion > > src/hotspot/cpu/x86/gc/g1/g1BarrierSetAssembler_x86.cpp line 354: > >> 352: __ bind(runtime); >> 353: // save the live input values >> 354: RegSet saved = RegSet::of(store_addr NOT_LP64(COMMA thread)); > > I was looking at this a while ago, and haven't figured out why we're saving `store_addr` here. > Also not sure why we're saving `thread` here for 32bit platforms. > Something to think about for the future. Though maybe the 32bit case will be gone by then :) I'm not sure either, this is in any case pre-existing interpreter code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1750105760 From rkennke at openjdk.org Mon Sep 9 11:55:52 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 11:55:52 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Try to avoid lea in loadNklass (aarch64) - Fix release build error ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/49126383..70f492d3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=06-07 Stats: 24 lines in 5 files changed: 12 ins; 1 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From tschatzl at openjdk.org Mon Sep 9 12:40:13 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 9 Sep 2024 12:40:13 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 10:29:55 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: > > - Fix compiler/c2/irTests/TestPadding.java for +COH > - Simplify arrayOopDesc::length_offset_in_bytes and oopDesc::base_offset_in_bytes > - Nit in header_size > - GC code tweaks > - Fix runtime/cds/appcds/loaderConstraints/DynamicLoaderConstraintsTest.java > - Fix jdk/tools/jlink/plugins/CDSPluginTest.java > - Cleanup markWord bits and comments > - x86_64: Fix loadNKlassCompactHeaders > - aarch64: Fix loadNKlassCompactHeaders > - Use FLAG_SET_ERGO when turning off UseCompactObjectHeaders > - ... and 16 more: https://git.openjdk.org/jdk/compare/b45fe174...49126383 src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 481: > 479: Klass* klass = UseCompactObjectHeaders > 480: ? old_mark.klass() > 481: : old->klass(); To be exact "promotion" only refers to copying to an older generation, so this comment does not cover objects copied within the generation. Suggestion: // NOTE: With compact headers, it is not safe to load the Klass* from old, because // that would access the mark-word, that might change at any time by concurrent // workers. // This mark word would refer to a forwardee, which may not yet have completed // copying. Therefore we must load the Klass* from the mark-word that we already // loaded. This is safe, because we only enter here if not yet forwarded. src/hotspot/share/gc/parallel/mutableSpace.cpp line 225: > 223: // header-based forwarding during promotion. Full GC doesn't > 224: // use the object header for forwarding at all. > 225: p += obj->forwardee()->size(); Better use `!obj->is_self_forwarded()` here. src/hotspot/share/gc/parallel/psPromotionManager.inline.hpp line 174: > 172: // may not yet have completed copying. Therefore we must load the Klass* from > 173: // the mark-word that we have already loaded. This is safe, because we have checked > 174: // that this is not yet forwarded in the caller.) Same adjustment needed as for G1. src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 711: > 709: // 8 - 32-bit VM > 710: // 12 - 64-bit VM, compressed klass > 711: // 16 - 64-bit VM, normal klass The comment needs to be adapted to include the case for compact object headers. src/hotspot/share/oops/arrayOop.hpp line 83: > 81: // The _length field is not declared in C++. It is allocated after the > 82: // declared nonstatic fields in arrayOopDesc if not compressed, otherwise > 83: // it occupies the second half of the _klass field in oopDesc. Needs update. src/hotspot/share/oops/instanceOop.hpp line 36: > 34: class instanceOopDesc : public oopDesc { > 35: public: > 36: // If compressed, the offset of the fields of the instance may not be aligned. Needs fixing (or removal) wrt to compact object headers, or move to the particular case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750046114 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750056160 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750074607 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750080552 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750027009 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750116336 From tschatzl at openjdk.org Mon Sep 9 12:40:14 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 9 Sep 2024 12:40:14 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com> On Thu, 22 Aug 2024 20:08:43 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix bit counts in GCForwarding src/hotspot/share/gc/shared/collectedHeap.cpp line 232: > 230: } > 231: > 232: // With compact headers, we can't safely access the class, due Suggestion: // With compact headers, we can't safely access the klass, due This is the case why? Because we might not have copied the header yet? Is this method actually ever used while the forwarded object is unstable? Given this is used for verification only afaik, we should make an effort to provide that check. src/hotspot/share/gc/shared/gcForwarding.hpp line 34: > 32: > 33: /* > 34: * Implements forwarding for the full-GCs of Serial, Parallel, G1 and Shenandoah in Suggestion: * Implements forwarding for the Full GCs of Serial, Parallel, G1 and Shenandoah in src/hotspot/share/gc/shared/gcForwarding.hpp line 41: > 39: * bits (to indicate 'forwarded' state as usual). > 40: */ > 41: class GCForwarding : public AllStatic { Since this class is only used for Full GCs, it may be useful to include that information, i.e. something like `FullGCForwarding` to avoid confusion why it is not used for other GCs too. (Unless this has been discussed and even rejected by me before). src/hotspot/share/oops/compressedKlass.hpp line 43: > 41: > 42: // Tiny-class-pointer mode > 43: static int _tiny_cp; // -1, 0=true, 1=false Suggestion: static int _tiny_cp; // -1 = uninitialized, 0 = true, 1 = false In addition to that, I am not sure if introducing a new term ("tiny") for compact class header related changes (and just here) makes the code more clear; I would have expected a "_compact_" prefix. Also all other members use "k"-klass and spell out "klass pointer", so I would prefer to keep that style. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1749995275 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1749980748 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1749987945 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1749969456 From tschatzl at openjdk.org Mon Sep 9 12:40:18 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 9 Sep 2024 12:40:18 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> On Mon, 9 Sep 2024 11:55:52 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Try to avoid lea in loadNklass (aarch64) > - Fix release build error src/hotspot/share/oops/klass.hpp line 169: > 167: // contention that may happen when a nearby object is modified. > 168: AccessFlags _access_flags; // Access flags. The class/interface distinction is stored here. > 169: // Some flags created by the JVM, not in the class file itself, Suggestion: markWord _prototype_header; // Used to initialize objects' header with compact headers. Maybe some comment why this is an instance member. src/hotspot/share/oops/objArrayKlass.inline.hpp line 74: > 72: void ObjArrayKlass::oop_oop_iterate(oop obj, OopClosureType* closure) { > 73: // In this assert, we cannot safely access the Klass* with compact headers. > 74: assert (UseCompactObjectHeaders || obj->is_array(), "obj must be array"); If we can't safely access the `Klass*` here, why is the call to `obj->klass()` below safe? src/hotspot/share/oops/oop.cpp line 157: > 155: bool oopDesc::has_klass_gap() { > 156: // Only has a klass gap when compressed class pointers are used. > 157: // Except when using compact headers. Suggestion: // Only has a klass gap when compressed class pointers are used and not // using compact headers. (Not sure if repeating the fairly simple disjunction below makes sense, but there has been a comment before too) src/hotspot/share/oops/oop.cpp line 230: > 228: // disjunct below to fail if the two comparands are computed across such > 229: // a concurrent change. > 230: return Universe::heap()->is_stw_gc_active() && klass->is_objArray_klass() && is_forwarded() && (UseParallelGC || UseG1GC); Is this still true after the recent changes like JDK-8311163? It might be worth waiting for. src/hotspot/share/oops/oop.hpp line 103: > 101: static inline void set_klass_gap(HeapWord* mem, int z); > 102: > 103: // size of object header, aligned to platform wordSize Suggestion: // Size of object header, aligned to platform wordSize Pre-existing src/hotspot/share/oops/oop.hpp line 108: > 106: return sizeof(markWord) / HeapWordSize; > 107: } else { > 108: return sizeof(oopDesc) / HeapWordSize; Suggestion: return sizeof(oopDesc) / HeapWordSize; src/hotspot/share/oops/oop.hpp line 134: > 132: inline Klass* forward_safe_klass(markWord m) const; > 133: inline size_t forward_safe_size(); > 134: inline void forward_safe_init_mark(); Given the comment these methods do not seem "safe" to me. Maybe use "raw" or something to better indicate that care must be taken to use them. Maybe the "safe" refers to use them only in "safe" contexts, but in Hotspot code iirc we use something like "raw" or "unsafe". src/hotspot/share/oops/oop.hpp line 295: > 293: // this call returns null for that thread; any other thread has the > 294: // value of the forwarding pointer returned and does not modify "this". > 295: inline oop forward_to_atomic(oop p, markWord compare, atomic_memory_order order = memory_order_conservative); Maybe add an assert in the implementation so that it is not used for self-forwarding. Same for `forward_to`. src/hotspot/share/oops/oop.hpp line 356: > 354: return mark_offset_in_bytes() + sizeof(markWord) / 2; > 355: } else > 356: #endif Maybe instead of trying to calculate some random, meaningless value just use some "random" value directly? I am fine with the existing code, but first stating directly that "any value" works here, this additional code seems to confuse the message. (Fwiw, the method is also used during Universe initialization). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750118470 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750143956 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750145460 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750150640 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750154114 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750153663 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750157781 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750159516 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750163768 From rehn at openjdk.org Mon Sep 9 12:43:18 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 9 Sep 2024 12:43:18 GMT Subject: RFR: 8339771: RISC-V: Reduce icache flushes Message-ID: Hey, please consider, All code which is offline (behind a barrier) do not need global icache flushes. As we can instead in slow path locally (thread and hart) emit fence.i. But if we were to be context switch do a hart which have not had fence.i emitted we can still fetch stale instructions. To handle this case new now have kernel support: https://docs.kernel.org/arch/riscv/cmodx.html It's not perfect as we will be emitting fence.i on any context switch for any thread with this patch, even if that thread do not execute on code heap (non attached native thread), and even if there was no changes to code heap. But this is in many cases much faster as the icache flush global IPI is very intrusive. Particular cases are running a concurrent gc with small head room. In such scenario I measured 15% increased throughput on VF2. A large CPU or less head room (faster GC cycles) will yield even more performance boost. Note that this requires 6.10 kernel. I'm running VF2 with 6.11-rc3 kernel and this passed tier1-3. (With setting on) Later we probably want this default on, but as it's hard to test I'll leave default off. ------------- Commit messages: - Fixed ws - Draft Changes: https://git.openjdk.org/jdk/pull/20913/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20913&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339771 Stats: 98 lines in 10 files changed: 94 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20913.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20913/head:pull/20913 PR: https://git.openjdk.org/jdk/pull/20913 From tschatzl at openjdk.org Mon Sep 9 12:45:07 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 9 Sep 2024 12:45:07 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 11:55:52 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Try to avoid lea in loadNklass (aarch64) > - Fix release build error Only looked at GC and runtime changes, only very briefly at compiler stuff. Only looked at GC and runtime changes, only very briefly at compiler stuff. ------------- Changes requested by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2289786482 PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2289800458 From rkennke at openjdk.org Mon Sep 9 12:52:07 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 12:52:07 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 18:10:44 GMT, Albert Mingkun Yang wrote: >> FWIW, the ParallelGC does something very similar to what you propose, except that it walks bitmaps instead of paring the space to find the self-forwarded objects. It then has a check inside object_iterate to make sure that it doesn't expose the dead objects (in eden and the from space) to heap dumpers and histogram printers. >> >> Because of the the code above, the SerialGC clears away the information about what objects are dead in eden and the from space, so heap dumpers and histogram printers will include these dead objects. We might want to fix that as a future RFE. > >> If we get a promotion failure in the young gen, we are leaving the dead objects marked as forwarded. > > True; need to do sth like `obj->init_mark();` for the non-self-forwarded case. The postcondition is that no forwarded objs in eden/from. ParallelGC actually doesn't use bitmaps, it pushes all forwarded objs to preserved-marks-table, and uses that to find forwarded objects, which is why we can't remove the preserved-marks table in ParallelGC (IOW, after this patch, the preserved-marks-stuff in Parallel scavenger is *only* used to find forwarded objects. We might want to think about more efficient solutions for this). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750199051 From rkennke at openjdk.org Mon Sep 9 13:02:08 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 13:02:08 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v5] In-Reply-To: <3QGPH52NyrDPne5EgoGx2sx9OeGRu9K72onNNwzMr2M=.8a390b3d-2e8a-470e-8bb7-1ba975070c53@github.com> References: <-G_gdaZBT2xhZFsdyEwIqiOHpbLpiL79N6NDsW8X2BY=.bc52bd8a-21c5-40e7-a921-a5f68675200f@github.com> <3QGPH52NyrDPne5EgoGx2sx9OeGRu9K72onNNwzMr2M=.8a390b3d-2e8a-470e-8bb7-1ba975070c53@github.com> Message-ID: On Fri, 30 Aug 2024 07:42:39 GMT, Thomas Stuefe wrote: >> Yes. This silent setting of UseCompactObjectHeaders ended up hiding why we got CDS failures. I would also suggest that we change this to FLAG_SET_ERGO. > > Seems we run all into the same thoughts :) > > I added > > Suggestion: > > FLAG_SET_DEFAULT(UseCompactObjectHeaders, false); > warning("Compact object headers require a java heap size smaller than %zu (given: %zu). " > "Disabling compact object headers.", max_narrow_heap_size * HeapWordSize, max_heap_size); That %zu is SIZE_FORMAT, right? This should probably use proper_unit_for_byte_size()/byte_size_in_proper_unit(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750215510 From rkennke at openjdk.org Mon Sep 9 13:31:10 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 13:31:10 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v5] In-Reply-To: <-G_gdaZBT2xhZFsdyEwIqiOHpbLpiL79N6NDsW8X2BY=.bc52bd8a-21c5-40e7-a921-a5f68675200f@github.com> References: <-G_gdaZBT2xhZFsdyEwIqiOHpbLpiL79N6NDsW8X2BY=.bc52bd8a-21c5-40e7-a921-a5f68675200f@github.com> Message-ID: On Thu, 22 Aug 2024 19:50:21 GMT, Albert Mingkun Yang wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix hash shift for 32 bit builds > > src/hotspot/share/gc/shared/gcForwarding.hpp line 36: > >> 34: * Implements forwarding for the full-GCs of Serial, Parallel, G1 and Shenandoah in >> 35: * a way that preserves upper N bits of object mark-words, which contain crucial >> 36: * Klass* information when running with compact headers. The encoding is similar to > > This doc suggests this forwarding is only for compact-header so I wonder if we can check `UseCompactObjectHeaders` directly instead of heap-size in `GCForwarding::initialize`. Right. The original implementation was more complex and then the consensus was to not sprinkle UseCompactHeaders all over the place, but with that new/simpler implementation it makes sense to simply check the UCOH flag. > src/hotspot/share/gc/shared/gcForwarding.hpp line 40: > >> 38: * heap-base, shifts that difference into the right place, and sets the lowest two >> 39: * bits (to indicate 'forwarded' state as usual). >> 40: */ > >> "can use 40 bits for forwardee encoding. That's enough for 8TB of heap." > > I feel this 8T-constraint is significant and should be in the doc. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750264571 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750265026 From coleenp at openjdk.org Mon Sep 9 13:32:13 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 9 Sep 2024 13:32:13 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v2] In-Reply-To: References: Message-ID: <2Jl1gRQ49EyqEEGyhk020hzxfDiydWk1u4V5-mLttyA=.c245a5b4-30f6-4a38-8e41-5d6acca57ecd@github.com> On Mon, 9 Sep 2024 03:29:57 GMT, David Holmes wrote: >> src/hotspot/share/oops/method.hpp line 853: >> >>> 851: Method* new_method = method_holder()->method_with_idnum(orig_method_idnum()); >>> 852: assert(this != new_method, "sanity check"); >>> 853: return (new_method == nullptr || is_deleted()) ? Universe::throw_no_such_method_error() : new_method; >> >> I am still confused by the different possibilities here. Under what conditions will we get nullptr? Is it the case that `get_new_method` should only be called when `is_old()` is true? Can `is_old` and `is_deleted` be true at the same time? > > To answer some of my own questions: > - yes `get_new_method` should only be called if `is_old` is true. (Should we assert that?) > - yes a method can be old and deleted at the same time. > > I remain unclear how nullptr can appear here. This redefine code is really complicated. Obsolete methods, which includes deleted methods get an incremented idnum in check_methods_and_mark_as_obsolete. // obsolete methods need a unique idnum so they become new entries in // the jmethodID cache in InstanceKlass assert(old_method->method_idnum() == new_method->method_idnum(), "must match"); u2 num = InstanceKlass::cast(_the_class)->next_method_idnum(); if (num != ConstMethod::UNSET_IDNUM) { old_method->set_method_idnum(num); } When get_new_method() is called it compares InstanceKlass::_methods[idnum], then compares the Method at that index to the idnum that the method has stored. For deleted methods, this will return nullptr because the idnum comparison in method_with_idnum will not match, or idnum for the deleted method is greater than InstanceKlass::_methods.length(). I think it is sufficient to check for nullptr in get_new_method() for deleted methods but also the explicit is_deleted() comparison is much easier to understand that it's the right answer. Yes, there could be an assert in get_new_method(), the method passed in is_old(). All redefined methods are marked as is_old(). I believe all callers test this before making this call. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20874#discussion_r1750263229 From aph at openjdk.org Mon Sep 9 13:32:24 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 9 Sep 2024 13:32:24 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v23] In-Reply-To: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> Message-ID: > This patch expands the use of a hash table for secondary superclasses > to the interpreter, C1, and runtime. It also adds a C2 implementation > of hashed lookup in cases where the superclass isn't known at compile > time. > > HotSpot shared runtime > ---------------------- > > Building hashed secondary tables is now unconditional. It takes very > little time, and now that the shared runtime always has the tables, it > might as well take advantage of them. The shared code is easier to > follow now, I think. > > There might be a performance issue with x86-64 in that we build > HotSpot for a default x86-64 target that does not support popcount. > This means that HotSpot C++ runtime on x86 always uses a software > emulation for popcount, even though the vast majority of machines made > for the past 20 years can do popcount in a single instruction. It > wouldn't be terribly hard to do something about that. > > Having said that, the software popcount is really not bad. > > x86 > --- > > x86 is rather tricky, because we still support > `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as > well as 32- and 64-bit ports. There's some further complication in > that only `RCX` can be used as a shift count, so there's some register > shuffling to do. All of this makes the logic in macroAssembler_x86.cpp > rather gnarly, with multiple levels of conditionals at compile time > and runtime. > > AArch64 > ------- > > AArch64 is considerably more straightforward. We always have a > popcount instruction and (thankfully) no 32-bit code to worry about. > > Generally > --------- > > I would dearly love simply to rip out the "old" secondary supers cache > support, but I've left it in just in case someone has a performance > regression. > > The versions of `MacroAssembler::lookup_secondary_supers_table` that > work with variable superclasses don't take a fixed set of temp > registers, and neither do they call out to to a slow path subroutine. > Instead, the slow patch is expanded inline. > > I don't think this is necessarily bad. Apart from the very rare cases > where C2 can't determine the superclass to search for at compile time, > this code is only used for generating stubs, and it seemed to me > ridiculous to have stubs calling other stubs. > > I've followed the guidance from @iwanowww not to obsess too much about > the performance of C1-compiled secondary supers lookups, and to prefer > simplicity over absolute performance. Nonetheless, this is a > complicated patch that touches many areas. Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 61 commits: - Merge from 4ff72dc57e65e99b129f0ba28196994edf402018 - Fix s390 - Use post-incrememnt RegSet operator. - Merge branch 'clean' into JDK-8331658-work - Fix merge - Merge branch 'clean' into JDK-8331658-work - Merge from JDK head. - Cleanup - Fix shared code - Fix shared code - ... and 51 more: https://git.openjdk.org/jdk/compare/4ff72dc5...a7612674 ------------- Changes: https://git.openjdk.org/jdk/pull/19989/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19989&range=22 Stats: 1052 lines in 22 files changed: 778 ins; 140 del; 134 mod Patch: https://git.openjdk.org/jdk/pull/19989.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19989/head:pull/19989 PR: https://git.openjdk.org/jdk/pull/19989 From aph at openjdk.org Mon Sep 9 13:36:10 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 9 Sep 2024 13:36:10 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v23] In-Reply-To: References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> Message-ID: On Mon, 9 Sep 2024 13:32:24 GMT, Andrew Haley wrote: >> This patch expands the use of a hash table for secondary superclasses >> to the interpreter, C1, and runtime. It also adds a C2 implementation >> of hashed lookup in cases where the superclass isn't known at compile >> time. >> >> HotSpot shared runtime >> ---------------------- >> >> Building hashed secondary tables is now unconditional. It takes very >> little time, and now that the shared runtime always has the tables, it >> might as well take advantage of them. The shared code is easier to >> follow now, I think. >> >> There might be a performance issue with x86-64 in that we build >> HotSpot for a default x86-64 target that does not support popcount. >> This means that HotSpot C++ runtime on x86 always uses a software >> emulation for popcount, even though the vast majority of machines made >> for the past 20 years can do popcount in a single instruction. It >> wouldn't be terribly hard to do something about that. >> >> Having said that, the software popcount is really not bad. >> >> x86 >> --- >> >> x86 is rather tricky, because we still support >> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as >> well as 32- and 64-bit ports. There's some further complication in >> that only `RCX` can be used as a shift count, so there's some register >> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp >> rather gnarly, with multiple levels of conditionals at compile time >> and runtime. >> >> AArch64 >> ------- >> >> AArch64 is considerably more straightforward. We always have a >> popcount instruction and (thankfully) no 32-bit code to worry about. >> >> Generally >> --------- >> >> I would dearly love simply to rip out the "old" secondary supers cache >> support, but I've left it in just in case someone has a performance >> regression. >> >> The versions of `MacroAssembler::lookup_secondary_supers_table` that >> work with variable superclasses don't take a fixed set of temp >> registers, and neither do they call out to to a slow path subroutine. >> Instead, the slow patch is expanded inline. >> >> I don't think this is necessarily bad. Apart from the very rare cases >> where C2 can't determine the superclass to search for at compile time, >> this code is only used for generating stubs, and it seemed to me >> ridiculous to have stubs calling other stubs. >> >> I've followed the guidance from @iwanowww not to obsess too much about >> the performance of C1-compiled secondary supers lookups, and to prefer >> simplicity over absolute performance. Nonetheless, this i... > > Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 61 commits: > > - Merge from 4ff72dc57e65e99b129f0ba28196994edf402018 > - Fix s390 > - Use post-incrememnt RegSet operator. > - Merge branch 'clean' into JDK-8331658-work > - Fix merge > - Merge branch 'clean' into JDK-8331658-work > - Merge from JDK head. > - Cleanup > - Fix shared code > - Fix shared code > - ... and 51 more: https://git.openjdk.org/jdk/compare/4ff72dc5...a7612674 I had to merge from JDK head to fix a conflict. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19989#issuecomment-2338141849 From coleenp at openjdk.org Mon Sep 9 13:39:07 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 9 Sep 2024 13:39:07 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 16:10:05 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. > > This fix also includes a cleanup of all the related parameter names and local variable names. > > Testing is pending... > > Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) Some people wanted chunks and some people wanted it all at once. After fixing these instances that you've pointed out, Gerard can have another pass with things that might not be noticed on this pass. The MemTag F pattern could be changed to MemTag MT (or left for a further review, which is my preference). The capital letter T seems like a bad choice for this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20872#issuecomment-2338144129 From coleenp at openjdk.org Mon Sep 9 13:39:09 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 9 Sep 2024 13:39:09 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag In-Reply-To: <1OaddXS4qsgvGkMJXBt8kZRa7J6vQVZH1tdQ22hpGeo=.06168001-0cde-423b-bebb-24bb45d1c155@github.com> References: <1OaddXS4qsgvGkMJXBt8kZRa7J6vQVZH1tdQ22hpGeo=.06168001-0cde-423b-bebb-24bb45d1c155@github.com> Message-ID: On Sat, 7 Sep 2024 05:27:12 GMT, Kim Barrett wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. >> >> This fix also includes a cleanup of all the related parameter names and local variable names. >> >> Testing is pending... >> >> Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) > > src/hotspot/share/utilities/chunkedList.hpp line 31: > >> 29: #include "utilities/debug.hpp" >> 30: >> 31: template class ChunkedList : public CHeapObj { > > Parameter name should be updated. Suggest `mem_tag`. How about MT here or just M? I would make this a further change though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1750281203 From rkennke at openjdk.org Mon Sep 9 14:11:08 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 14:11:08 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: <6I0T4rOjOTj-FZxpspatEo6j1_Num75bCAOBNxsrHI8=.097f731e-7c92-4eac-a379-c2df336cd412@github.com> On Tue, 27 Aug 2024 07:43:07 GMT, Hamlin Li wrote: >> @Hamlin-Li : AFAIK, porting to linux-riscv platform has NOT been started yet. To avoid duplicate work, please let me know if anyone is interested or has been working on it :-) > > Yes, I'm interested in it. Thanks for raising the discussion. :) If anybody is doing it, please send me a patch, or we can do it as a follow-up PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750345203 From rkennke at openjdk.org Mon Sep 9 14:11:10 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 14:11:10 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Fri, 23 Aug 2024 11:38:39 GMT, Hamlin Li wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > src/hotspot/share/oops/oop.inline.hpp line 94: > >> 92: >> 93: void oopDesc::init_mark() { >> 94: if (UseCompactObjectHeaders) { > > Seems only `set_mark(prototype_mark());` is fine for both cases? Right. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750342555 From rkennke at openjdk.org Mon Sep 9 14:35:08 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 14:35:08 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Mon, 26 Aug 2024 21:52:58 GMT, Chris Plummer wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Oop.java line 169: > >> 167: } else { >> 168: visitor.doMetadata(klass, true); >> 169: } > > Why is there no `visitor.doMetadata()` call for the compact object header case? There is no dedicated klass field anymore, the Klass* is encoded in the mark, and we would need to extract it. What is the purpose of the visitors? Do they need to see the klass/compressedKlass, or is it sufficient to visit the mark-word (which we already do, but as CInt). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750386024 From rcastanedalo at openjdk.org Mon Sep 9 14:44:17 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 9 Sep 2024 14:44:17 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v17] In-Reply-To: References: Message-ID: <_7xTwicd2PDxJZOJ7xLdkHZc18UT-9sVZk-3YFgkMA0=.7db81e94-4f38-44a0-9983-6e391459aab2@github.com> On Sat, 7 Sep 2024 03:57:43 GMT, Kim Barrett wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> s390 port : late barrier expansion > > src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 112: > >> 110: // The answer is that stores of different sizes can co-exist >> 111: // in the same sequence of RawMem effects. We sometimes initialize >> 112: // a whole 'tile' of array elements with a single jint or jlong.) > > I'm having trouble making sense of this comment. I guess a jlong could be used to null-initialize two > 32bit oops/narrowOops? But that doesn't have anything to do with jints. I am not sure the complex overlap test is necessary here, this code was copy-pasted from [MemNode::find_previous_store()](https://github.com/openjdk/jdk/blob/c54fc08aa3c63e4b26dc5edb2436844dfd3bab7c/src/hotspot/share/opto/memnode.cpp#L678) by [JDK-8057737](https://bugs.openjdk.org/browse/JDK-8057737), and in this new context I do not see how we might find stores of different sizes as mentioned in the comment. jlongs could be used to null-initialize two 32-bit OOPs, but such initializing stores are not even visible in C2's intermediate representation at the time `G1BarrierSetC2::g1_can_remove_pre_barrier()` is called. The fact that the comment refers to initializing several array elements with a single jint suggests to me that this code has lost some of its original purpose after being copied into a narrower context (OOP stores after object allocations). But since this code is pre-existing and in the worst case it is just performing some unnecessary work, I suggest to leave it as-is a nd possibly investigate how to simplify it as a follow-up task. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1750400106 From coleenp at openjdk.org Mon Sep 9 14:47:10 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 9 Sep 2024 14:47:10 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v2] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 06:04:36 GMT, Liang Mao wrote: >> Hi, >> >> It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. >> >> test/hotspot/jtreg/runtime and gc are clean. >> >> Thanks, >> Liang > > Liang Mao has updated the pull request incrementally with one additional commit since the last revision: > > Fix build error in windows/mac I think this is wrong. We want 'resolve' to read the oop and tell the GC that this oop should be alive. If GetMethodDeclaringClass is crashing, make sure the holder is alive before the iteration could safepoint. This code does keep it in a Handle, so that should be sufficient. jclass JvmtiEnvBase::get_jni_class_non_null(Klass* k) { assert(k != nullptr, "k != null"); Thread *thread = Thread::current(); return (jclass)jni_reference(Handle(thread, k->java_mirror())); } ------------- Changes requested by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20907#pullrequestreview-2290148202 From stefank at openjdk.org Mon Sep 9 14:50:10 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 9 Sep 2024 14:50:10 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: <4yTWbD93OXGwYYxEQo56smKa5kl_WiPPcMsXSs0eUoQ=.893f54c8-ed2b-4a7f-bf0a-36553a951f47@github.com> On Fri, 30 Aug 2024 08:06:31 GMT, Stefan Karlsson wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > src/hotspot/share/cds/filemap.cpp line 2507: > >> 2505: } >> 2506: >> 2507: if (compact_headers() != UseCompactObjectHeaders) { > > (Commenting here, but the comment applies to code a bit above) While debugging CDS, it would have been useful to print the value of UseCompactObjectHeaders. > > Could we change the code to be: > > log_info(cds)("Archive was created with UseCompressedOops = %d, UseCompressedClassPointers = %d, UseCompactObjectHeaders = %d", > compressed_oops(), compressed_class_pointers(), compact_headers()); Resolved. > src/hotspot/share/cds/filemap.cpp line 2508: > >> 2506: >> 2507: if (compact_headers() != UseCompactObjectHeaders) { >> 2508: log_info(cds)("The shared archive file's UseCompactObjectHeaders setting (%s)" > > Printing on the `info` level mimics what we do when there's a mismatch for compressed classes (and oops), but I wonder if that one is intentional or if it is accidentally printing to 'info' instead of 'warning'. @iklam informed me that some of the info levels (including this line) should be converted to warning. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750408043 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750410679 From fyang at openjdk.org Mon Sep 9 14:57:05 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 9 Sep 2024 14:57:05 GMT Subject: RFR: 8339741: RISC-V: C ABI breakage for integer on stack In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 11:17:54 GMT, Robbin Ehn wrote: > Hi please review, > > When calling a native function using integers smaller than 64, > they must be loaded from a Java stack slot and widen to 64-bit, sign-extended. > In the interpreter case we only store 32-bit, which means the top 32-bit are 'random'. > In the compiler case we do an ld and grab random top 32-bit. > These should be loaded with a lw from Java stack, thus proper sign extended and then stored with sd into the native stack. > > I found the intrepter bug first, wrote a test case for it, which found the compiler bug. > > Here you can see the difference, both are legal todo from a compiler: > https://godbolt.org/z/85aMhja5f > Relevant specs: > https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc >> integer scalars narrower than XLEN bits are widened according to the sign of their type up to 32 bits, then sign-extended to XLEN bits. > > I checked floats also, they seems fine, but please go ahead and do a check regarding floats. > > Passes ./test/hotspot/jtreg/compiler/calls/, runnnig t1. Nice catch! Thanks Robbin. I am trying it on my machines. BTW: Seems this fix deserves a small code comment. ------------- PR Review: https://git.openjdk.org/jdk/pull/20912#pullrequestreview-2290174737 From rkennke at openjdk.org Mon Sep 9 15:04:09 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 15:04:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: <4yTWbD93OXGwYYxEQo56smKa5kl_WiPPcMsXSs0eUoQ=.893f54c8-ed2b-4a7f-bf0a-36553a951f47@github.com> References: <4yTWbD93OXGwYYxEQo56smKa5kl_WiPPcMsXSs0eUoQ=.893f54c8-ed2b-4a7f-bf0a-36553a951f47@github.com> Message-ID: <-2JWx3F8EdyQ0Uf-mI62ImLXgjgIy9PEydjtKHhx12Q=.4d944301-6f1c-4270-953c-ec6c86df946a@github.com> On Mon, 9 Sep 2024 14:47:28 GMT, Stefan Karlsson wrote: >> src/hotspot/share/cds/filemap.cpp line 2508: >> >>> 2506: >>> 2507: if (compact_headers() != UseCompactObjectHeaders) { >>> 2508: log_info(cds)("The shared archive file's UseCompactObjectHeaders setting (%s)" >> >> Printing on the `info` level mimics what we do when there's a mismatch for compressed classes (and oops), but I wonder if that one is intentional or if it is accidentally printing to 'info' instead of 'warning'. > > @iklam informed me that some of the info levels (including this line) should be converted to warning. Yeah that looks inconsistent with other places where we print a warning instead. I'll change it to warning for the UCOH check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750430001 From stefank at openjdk.org Mon Sep 9 15:04:12 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 9 Sep 2024 15:04:12 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Mon, 9 Sep 2024 12:21:19 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Try to avoid lea in loadNklass (aarch64) >> - Fix release build error > > src/hotspot/share/oops/oop.hpp line 134: > >> 132: inline Klass* forward_safe_klass(markWord m) const; >> 133: inline size_t forward_safe_size(); >> 134: inline void forward_safe_init_mark(); > > Given the comment these methods do not seem "safe" to me. Maybe use "raw" or something to better indicate that care must be taken to use them. > > Maybe the "safe" refers to use them only in "safe" contexts, but in Hotspot code iirc we use something like "raw" or "unsafe". Restating my earlier comment about this: These functions are mainly used by the GCs. In one of the patches I've cleaned away all usages except for those in Shenandoah. I would prefer to see these completely removed from the oops/ directory and let the GCs decide when and how to perform "safe" reads of these values. > src/hotspot/share/oops/oop.hpp line 356: > >> 354: return mark_offset_in_bytes() + sizeof(markWord) / 2; >> 355: } else >> 356: #endif > > Maybe instead of trying to calculate some random, meaningless value just use some "random" value directly? > I am fine with the existing code, but first stating directly that "any value" works here, this additional code seems to confuse the message. (Fwiw, the method is also used during Universe initialization). Just to be clear, the second part of the quoted sentence is important: > could be any value *that is not a valid field offset* ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750428581 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750432186 From tschatzl at openjdk.org Mon Sep 9 15:04:12 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 9 Sep 2024 15:04:12 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Mon, 9 Sep 2024 15:00:09 GMT, Stefan Karlsson wrote: > could be any value that is not a valid field offset I understand that that "random value" needs to satisfy this condition. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750433800 From eosterlund at openjdk.org Mon Sep 9 15:31:07 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 9 Sep 2024 15:31:07 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v2] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 14:44:38 GMT, Coleen Phillimore wrote: > I think this is wrong. We want 'resolve' to read the oop and tell the GC that this oop should be alive. If GetMethodDeclaringClass is crashing, make sure the holder is alive before the iteration could safepoint. > > > > This code does keep it in a Handle, so that should be sufficient. > > > > jclass > > JvmtiEnvBase::get_jni_class_non_null(Klass* k) { > > assert(k != nullptr, "k != null"); > > Thread *thread = Thread::current(); > > return (jclass)jni_reference(Handle(thread, k->java_mirror())); > > } > > > > The trouble is that java_mirror does not keep the holder alive. It's an OopHandle to an oop in the CLD. Using ths OopHandle at all is only allowed if the holder is in some other way guaranteed to be held strongly reachable. The naming of the java_mirror* functions are a bit unfortunate. It's tempting to believe java_mirror will keep the mirror alive when there is another java_mirror_no_keepalive next to it. But it does not, and that is intentional. A better name would perhaps be java_mirror_already_kept_alive or something like that. Neither one of them keeps the class alive. This can be made prettier once non-generational ZGC is removed. But until then I'm hoping we can do what the other code is doing which is to call klass_holder() to keep the class alive, and then fetch the mirror separately knowing it is safe. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2338431601 From stefank at openjdk.org Mon Sep 9 15:34:09 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 9 Sep 2024 15:34:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 12:49:05 GMT, Roman Kennke wrote: >>> If we get a promotion failure in the young gen, we are leaving the dead objects marked as forwarded. >> >> True; need to do sth like `obj->init_mark();` for the non-self-forwarded case. The postcondition is that no forwarded objs in eden/from. > > ParallelGC actually doesn't use bitmaps, it pushes all forwarded objs to preserved-marks-table, and uses that to find forwarded objects, which is why we can't remove the preserved-marks table in ParallelGC (IOW, after this patch, the preserved-marks-stuff in Parallel scavenger is *only* used to find forwarded objects. We might want to think about more efficient solutions for this). (Just to clarify if others are reading this) Right, what I referred to above was how we found the object to forward, which is done via the bitmaps: while (cur_addr < region_end) { cur_addr = mark_bitmap()->find_obj_beg(cur_addr, region_end); If the Parallel Old collector didn't do that, but instead parsed the heap like Serial does, then the Parallel Young collector would also have to fix the from space copies of moved objects when when it hits a promotion failure, just like Serial does. This was just meant to point out the differences between the two collectors and why the young GC code is different. I realize that in earlier comments I called the from-space copy of the objects "dead objects", but they are not dead they are just the stale objects that are discoverable because of promotion failure keeping the eden and from spaces. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750480983 From stefank at openjdk.org Mon Sep 9 15:34:10 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 9 Sep 2024 15:34:10 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v5] In-Reply-To: References: <-G_gdaZBT2xhZFsdyEwIqiOHpbLpiL79N6NDsW8X2BY=.bc52bd8a-21c5-40e7-a921-a5f68675200f@github.com> <3QGPH52NyrDPne5EgoGx2sx9OeGRu9K72onNNwzMr2M=.8a390b3d-2e8a-470e-8bb7-1ba975070c53@github.com> Message-ID: On Mon, 9 Sep 2024 12:59:36 GMT, Roman Kennke wrote: > That %zu is SIZE_FORMAT, right? Yes. Reviewers have lately encouraged people to use %zu instead of SIZE_FORMAT. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750482486 From coleenp at openjdk.org Mon Sep 9 15:49:03 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 9 Sep 2024 15:49:03 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v2] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 06:04:36 GMT, Liang Mao wrote: >> Hi, >> >> It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. >> >> test/hotspot/jtreg/runtime and gc are clean. >> >> Thanks, >> Liang > > Liang Mao has updated the pull request incrementally with one additional commit since the last revision: > > Fix build error in windows/mac Holding a mirror in a Handle must keep the CLD alive. If it has an oop class loader, it will keep a reference to that class loader alive through the vectors field. If it is a mirror holder CLD, the holder is the mirror itself. You can only unload the CLD if the CLD::holder is unreachable. This Handle makes the CLD holder reachable. Not having this property breaks everything. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2338470827 From pchilanomate at openjdk.org Mon Sep 9 15:53:48 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 9 Sep 2024 15:53:48 GMT Subject: RFR: 8335362: [Windows] Stack pointer increment in _cont_thaw stub can cause program to terminate with exit code 0xc0000005 [v3] In-Reply-To: References: Message-ID: On Sun, 8 Sep 2024 23:56:25 GMT, David Holmes wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> fix update in map_stack_shadow_pages > > src/hotspot/os/windows/os_windows.inline.hpp line 60: > >> 58: } >> 59: StackOverflow* state = JavaThread::current()->stack_overflow_state(); >> 60: assert(original_sp > state->shadow_zone_safe_limit(), ""); > > Can you print the values if the assert fails please. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20862#discussion_r1750513308 From pchilanomate at openjdk.org Mon Sep 9 15:53:47 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 9 Sep 2024 15:53:47 GMT Subject: RFR: 8335362: [Windows] Stack pointer increment in _cont_thaw stub can cause program to terminate with exit code 0xc0000005 [v4] In-Reply-To: References: Message-ID: > Please review the following fix. In stub routine cont_thaw() we bump the stack pointer by the maximum size required to copy the frames currently stored in the top stackChunk. On Windows this increment of the stack pointer doesn't play nice with the way Windows sets up and manages stack pages. When a thread is created the stack is divided in 3 memory regions: regular committed pages, guard pages, reserved pages. The first pages are committed and the thread can read/write to them with no issues. The next pages(~2/3) are guard pages, which are committed but have the PAGE_GUARD attribute. When the thread tries to access a guard page the first time, the PAGE_GUARD attribute is removed and a new guard page from the reserved region is added. The rest of the stack are reserved pages and if we try to access it directly we get an EXCEPTION_ACCESS_VIOLATION (see bug for more details). So the problem is that we can bump the stack pointer too much and set it to point somewhere in the reserved r egion. When we then execute the call instruction for method thaw(), we get an EXCEPTION_ACCESS_VIOLATION exception, but because we cannot access the memory at the current stack pointer, we cannot call any method anymore, including the exception handler and the program terminates abruptly with exit code 0xc0000005. > > The fix implemented is to bang the stack pages one by one to let the Windows page protection take over. This is what we already do in os::map_stack_shadow_pages() in JavaCalls::call_helper(), and also in interpreter (bang_stack_shadow_pages()) and compiler (generate_stack_overflow_check()) code. It's actually also the same mechanism that Windows routine _chkstk used by the compiler employs (see bug comments with assembly code). > > I added new test BigStackChunk.java that reproduces the issue. The test fails without this fix and passes with it. I also tested the patch by running in mach5 tiers1-7. > > Thanks, > Patricio Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: print values in assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20862/files - new: https://git.openjdk.org/jdk/pull/20862/files/00d5e9c5..13a4aa9f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20862&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20862&range=02-03 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20862.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20862/head:pull/20862 PR: https://git.openjdk.org/jdk/pull/20862 From matsaave at openjdk.org Mon Sep 9 16:06:06 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Mon, 9 Sep 2024 16:06:06 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v2] In-Reply-To: <2Jl1gRQ49EyqEEGyhk020hzxfDiydWk1u4V5-mLttyA=.c245a5b4-30f6-4a38-8e41-5d6acca57ecd@github.com> References: <2Jl1gRQ49EyqEEGyhk020hzxfDiydWk1u4V5-mLttyA=.c245a5b4-30f6-4a38-8e41-5d6acca57ecd@github.com> Message-ID: <_geeT1LEXJLEVX6kW4zv8z2YldHczqXWJRqrWtb8RzM=.41f5209e-39dc-49c8-aa44-01192c917578@github.com> On Mon, 9 Sep 2024 13:27:24 GMT, Coleen Phillimore wrote: >> To answer some of my own questions: >> - yes `get_new_method` should only be called if `is_old` is true. (Should we assert that?) >> - yes a method can be old and deleted at the same time. >> >> I remain unclear how nullptr can appear here. > > This redefine code is really complicated. Obsolete methods, which includes deleted methods get an incremented idnum in check_methods_and_mark_as_obsolete. > > // obsolete methods need a unique idnum so they become new entries in > // the jmethodID cache in InstanceKlass > assert(old_method->method_idnum() == new_method->method_idnum(), "must match"); > u2 num = InstanceKlass::cast(_the_class)->next_method_idnum(); > if (num != ConstMethod::UNSET_IDNUM) { > old_method->set_method_idnum(num); > } > > When get_new_method() is called it compares InstanceKlass::_methods[idnum], then compares the Method at that index to the idnum that the method has stored. For deleted methods, this will return nullptr because the idnum comparison in method_with_idnum will not match, or idnum for the deleted method is greater than InstanceKlass::_methods.length(). I think it is sufficient to check for nullptr in get_new_method() for deleted methods but also the explicit is_deleted() comparison is much easier to understand that it's the right answer. > > Yes, there could be an assert in get_new_method(), the method passed in is_old(). All redefined methods are marked as is_old(). I believe all callers test this before making this call. Coleen is correct, all the callers do indeed check that the method `is_old()` but I think it's fine if we assert that in `get_new_method()` to make it clear that it is a requirement. The method `method_with_idnum()` can return nullptr here: if (m == nullptr || m->method_idnum() != idnum) { for (int index = 0; index < methods()->length(); ++index) { m = methods()->at(index); if (m->method_idnum() == idnum) { return m; } } // None found, return null for the caller to handle. return nullptr; As the comment suggests, the caller should handle nullptr, which in this case is `get_new_method()`. The callers of get_new_method() try to handle this but I think it's cleaner to check inside the method. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20874#discussion_r1750534303 From eosterlund at openjdk.org Mon Sep 9 16:16:07 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 9 Sep 2024 16:16:07 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v2] In-Reply-To: References: Message-ID: <72HOB2zopDx2N-J1eQpSqh_kaBhqVle_AbwngrCmQLw=.e9fd5b9c-03b1-45f8-ac44-c0c5684fbf02@github.com> On Mon, 9 Sep 2024 06:04:36 GMT, Liang Mao wrote: >> Hi, >> >> It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. >> >> test/hotspot/jtreg/runtime and gc are clean. >> >> Thanks, >> Liang > > Liang Mao has updated the pull request incrementally with one additional commit since the last revision: > > Fix build error in windows/mac > Holding a mirror in a Handle must keep the CLD alive. If it has an oop class loader, it will keep a reference to that class loader alive through the vectors field. If it is a mirror holder CLD, the holder is the mirror itself. You can only unload the CLD if the CLD::holder is unreachable. This Handle makes the CLD holder reachable. > > > > Not having this property breaks everything. A handle ensures that an oop that was strongly reachable at the point when the handle was created, keeps the oop strongly reachable across the lifetime of the handle. But it cannot guarantee that an oop that was not strongly reachable at the point of creating the handle, will be kept alive by the handle. That's an illegal use of the handle. What is happening here is that we load one of the weird CLD oops that we are not allowed to load unless the holder is already kept strongly reachable. That means that we are loading an oop that is not strongly reachable at the point of loading it, and illegally putting it in a handle, which on its own would crash the JVM later if the GC tried to actually trace through the handle. What we are doing conceptually is the equivalent of peeking past a non-strong reference to an object and then loading one of its strong references and exposing it in the object graph through a handle, which is illegal. Just because we read from a strong edge, that doesn't mean the oop is strongly kept alive. Because we only got hold of the strong reference by peeking through a non-strong reference. In this case the non-strong reference is the klass_holder that we skipped loading, and the strong reference of the holder is the klass_mirror weird CLD oop from the OopHandle that logically belongs to the holder, which we didn't load in an appropriate manner. You gotta love the weird CLD oops. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2338532271 From coleenp at openjdk.org Mon Sep 9 16:32:05 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 9 Sep 2024 16:32:05 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v2] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 06:04:36 GMT, Liang Mao wrote: >> Hi, >> >> It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. >> >> test/hotspot/jtreg/runtime and gc are clean. >> >> Thanks, >> Liang > > Liang Mao has updated the pull request incrementally with one additional commit since the last revision: > > Fix build error in windows/mac Okay I agree that you can't use a Handle to reference this mirror if it's not already referenced by other code (already alive). Fetching out of the CLD::_handles doesn't keep it alive. // method - pre-checked for validity, but may be null meaning obsolete method // declaring_class_ptr - pre-checked for null jvmtiError JvmtiEnv::GetMethodDeclaringClass(Method* method, jclass* declaring_class_ptr) { NULL_CHECK(method, JVMTI_ERROR_INVALID_METHODID); (*declaring_class_ptr) = get_jni_class_non_null(method->method_holder()); return JVMTI_ERROR_NONE; } /* end GetMethodDeclaringClass */ So here, I don't see anything holding the method_holder() mirror through the Method, unless it's in the caller (a global jobject or something). Same with the GetFieldDeclaringClass function. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2338564976 From bulasevich at openjdk.org Mon Sep 9 16:47:37 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Mon, 9 Sep 2024 16:47:37 GMT Subject: RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM Message-ID: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> With this change, I have adjusted the default settings for CodeCacheSegmentSize and CodeEntryAlignment for AARCH and ARM32. The main goal is to improve code density by reducing the number of wasted bytes (approximately **4%** waste). Improving code density may also have the side effect of boosting performance in large applications Each nmethod occupies a number of code cache segments (minimum allocation blocks). Since the size of an nmethod is not aligned to 128 bytes, the last occupied segment is half empty. Reducing the size of the code cache segments correspondingly minimizes waste. However, we should be careful about reducing the CodeCacheSegmentSize too much, as smaller segment sizes will increase the overhead of the CodeHeap::_segmap bitmap. A CodeCacheSegmentSize of 64 seems to be an optimal balance. The current large default value for CodeCacheSegmentSize (64+64) was historically introduced with the comment "Tiered compilation has large code-entry alignment" which doesn't make much sense to me. The history of this comment and value is as follows: - The PPC port was introduced with CodeEntryAlignment=128 (recently reduced to 64: https://github.com/openjdk/jdk/commit/09a78b5d) and CodeCacheSegmentSize was adjusted accordingly for that platform. - Soon after, the 128-byte alignment was applied to all platforms to hide a debug mode warning (https://github.com/openjdk/jdk/commit/e8bc971d). Despite the change (and Segmented Code Cache introduced later), the warning can still be reproduced today using the -XX:+VerifyCodeCache fastdebug option in large applications (10K nmethods ~ 10K free blocks in between them). I believe it is time to remove the comment and update the default value. I also suggest updating the default CodeEntryAlignment value for AARCH. The current setting is much larger than for x86 and was likely based on the typical cache line size of 64 bytes. Cortex-A57, A72 architecture software optimisation guides recommend a 32-byte alignment for subroutine entry points. Neoverse architecture software optimisation guides do not mention recommended entry point alignment. For reference, the default [function_align setting in GCC](https://github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/tuning_models/cortexa72.h#L44) is typically 16 or 32 bytes, depending on the target architecture. Hotspot performance tests with -XX:CodeCacheSegmentSize=64 and -XX:CodeEntryAlignment=16 options showed the following results: - No performance impact on Hotspot microbenchmarks and other microbenchmarks. - On the Renaissance Dotty benchmark: - 0.2-0.4% performance improvement on Neoverse N1/V1/V2 architectures - 0.7% performance improvement on Raspberry Pi Model 3 (ARM32, ARM1176JZF-S) - slight performance degradation on Cortex-A72, reproducible only with the CodeEntryAlignment update. I suggest changing the CodeCacheSegmentSize for AARCH64 and ARM32 and updating the CodeEntryAlignment for AARCH64 Neoverse platforms. ------------- Commit messages: - 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM Changes: https://git.openjdk.org/jdk/pull/20864/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20864&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339573 Stats: 6 lines in 3 files changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20864.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20864/head:pull/20864 PR: https://git.openjdk.org/jdk/pull/20864 From stefank at openjdk.org Mon Sep 9 16:55:04 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 9 Sep 2024 16:55:04 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v2] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 06:04:36 GMT, Liang Mao wrote: >> Hi, >> >> It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. >> >> test/hotspot/jtreg/runtime and gc are clean. >> >> Thanks, >> Liang > > Liang Mao has updated the pull request incrementally with one additional commit since the last revision: > > Fix build error in windows/mac GetFieldDeclaringClass holds the sub-class: https://docs.oracle.com/en/java/javase/21/docs/specs/jvmti.html#GetFieldDeclaringClass so, it should be keeping the super-class that is the owner of the field alive. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2338609353 From cjplummer at openjdk.org Mon Sep 9 16:56:08 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 9 Sep 2024 16:56:08 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: <1VACYSoQRtP9m4BJkCVrdFxueC75Kg4Kp3wjGsAA2Dw=.53563f62-70cf-4d93-8d99-69b737812ba6@github.com> On Mon, 26 Aug 2024 21:30:51 GMT, Chris Plummer wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Oop.java line 85: > >> 83: >> 84: private static Klass getKlass(Mark mark) { >> 85: assert(VM.getVM().isCompactObjectHeadersEnabled()); > > `mark.getKlass()` already does this assert. I don't see any value in this `getKlass()` method. The caller should just call `getMark().getKlass()` rather than `getKlass(getMark())`. I'm not sure why this got marked as resolved. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750600652 From cjplummer at openjdk.org Mon Sep 9 16:56:08 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 9 Sep 2024 16:56:08 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 14:32:49 GMT, Roman Kennke wrote: >> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Oop.java line 169: >> >>> 167: } else { >>> 168: visitor.doMetadata(klass, true); >>> 169: } >> >> Why is there no `visitor.doMetadata()` call for the compact object header case? > > There is no dedicated klass field anymore, the Klass* is encoded in the mark, and we would need to extract it. What is the purpose of the visitors? Do they need to see the klass/compressedKlass, or is it sufficient to visit the mark-word (which we already do, but as CInt). I've been looking into this. It's a bit hard to follow but I think you do need to do something more here. Can you run ClhsdbInspect.java and send me the output. In specific I need to know if the log includes the following: hsdb> + inspect 0x00000007cff154b8 instance of Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject @ 0x00000007cff154b8 (size = 24) _mark: 1 _metadata._compressed_klass: InstanceKlass for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject firstWaiter: Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionNode @ 0x00000007cfff5f80 lastWaiter: Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionNode @ 0x00000007cfff5f80 this$0: Oop for java/util/concurrent/locks/ReentrantLock$NonfairSync @ 0x00000007cff15498 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750598648 From gziemski at openjdk.org Mon Sep 9 17:00:10 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 9 Sep 2024 17:00:10 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 00:13:25 GMT, David Holmes wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. >> >> This fix also includes a cleanup of all the related parameter names and local variable names. >> >> Testing is pending... >> >> Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) > > src/hotspot/share/nmt/memMapPrinter.cpp line 70: > >> 68: //end >> 69: >> 70: static const char* get_shortname_for_nmt_flag(MemTag mem_tag) { > > Shouldn't this be renamed to `get_shortname_for_nmt_tag`? I went with `get_shortname_for_mem_tag()` I think that is more consistent? Is that OK? > src/hotspot/share/nmt/memReporter.cpp line 852: > >> 850: } else if (early_site->mem_tag() != current_site->mem_tag()) { >> 851: // This site was originally allocated with one type, then released, >> 852: // then re-allocated at the same site (as far as we can tell) with a different type. > > s/type/tag/ Fixed. > src/hotspot/share/nmt/memTracker.hpp line 83: > >> 81: if (enabled()) { >> 82: return MallocTracker::record_malloc(mem_base, size, mem_tag, stack); >> 83: return MallocTracker::record_malloc(mem_base, size, mem_tag, stack); > > Did this even compile? ! > Suggestion: > > return MallocTracker::record_malloc(mem_base, size, mem_tag, stack); Fixed. None of the compilers had a problem with this weirdly enough. > src/hotspot/share/nmt/memoryFileTracker.cpp line 51: > >> 49: for (int i = 0; i < mt_number_of_tags; i++) { >> 50: VirtualMemory* summary = file->_summary.by_type(NMTUtil::index_to_tag(i)); >> 51: summary->reserve_memory(diff.type[i].commit); > > Why is this `type` not `tag`? Fixed. > src/hotspot/share/nmt/memoryFileTracker.cpp line 109: > >> 107: tty->print_cr("Expected start out to have same type as end in, but was: %s, %s", >> 108: VMATree::statetype_to_string(broken_start->val().out.state()), >> 109: VMATree::statetype_to_string(broken_end->val().in.state())); > > Not seeing what this rename has to do with current changes. ??? Fixed. > src/hotspot/share/nmt/virtualMemoryTracker.cpp line 400: > >> 398: >> 399: // Print some more details. Don't use UL here to avoid circularities. >> 400: tty->print_cr("Error: existing region: [" INTPTR_FORMAT "-" INTPTR_FORMAT "), type %u.\n" > > Again why `type` instead of `tag`? Fixed. > src/hotspot/share/nmt/virtualMemoryTracker.cpp line 560: > >> 558: // Given an existing memory mapping registered with NMT, split the mapping in >> 559: // two. The newly created two mappings will be registered under the call >> 560: // stack and the memory types of the original section. > > types -> tags Fixed. > src/hotspot/share/nmt/vmatree.cpp line 86: > >> 84: // If the state is not matching then we have different operations, such as: >> 85: // reserve [x1, A); ... commit [A, x2); or >> 86: // reserve [x1, A), type1; ... reserve [A, x2), type2; or > > Why type not tag? I will fix it. > src/hotspot/share/nmt/vmatree.hpp line 91: > >> 89: private: >> 90: // Store the state and mem_tag as two bytes >> 91: uint8_t info[2]; > > I'm unclear about terminology here: type -> state ? Those come from my initial fix, when I renamed `MEMFLAGS` -> `MemType`. In that work the type_flag was clashing with mem_type parameter, so I renamed it from `type` to `state` I tried to re-use that original patch, but as you just found, some of those changes do not apply. I will undo this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1750598453 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1750596573 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1750595789 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1750592818 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1750589625 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1750588916 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1750587807 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1750586650 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1750583730 From gziemski at openjdk.org Mon Sep 9 17:00:14 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 9 Sep 2024 17:00:14 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag In-Reply-To: <1OaddXS4qsgvGkMJXBt8kZRa7J6vQVZH1tdQ22hpGeo=.06168001-0cde-423b-bebb-24bb45d1c155@github.com> References: <1OaddXS4qsgvGkMJXBt8kZRa7J6vQVZH1tdQ22hpGeo=.06168001-0cde-423b-bebb-24bb45d1c155@github.com> Message-ID: On Sat, 7 Sep 2024 05:24:29 GMT, Kim Barrett wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. >> >> This fix also includes a cleanup of all the related parameter names and local variable names. >> >> Testing is pending... >> >> Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) > > src/hotspot/share/runtime/os.hpp line 918: > >> 916: static ssize_t recv(int fd, char* buf, size_t nBytes, uint type); >> 917: static ssize_t send(int fd, char* buf, size_t nBytes, uint type); >> 918: static ssize_t raw_send(int fd, char* buf, size_t nBytes, uint type); > > This set of changes is wrong. These aren't MEMFLAGS flags. > (I hope there aren't any more like this. This sort of thing would be easy to miss in a change this > large. If I were making this change I'd have broken it up into several smaller pieces.) Looks like a find/replace issue. Reverted the change. > src/hotspot/share/utilities/concurrentHashTable.hpp line 43: > >> 41: class Mutex; >> 42: >> 43: template > > Parameter name should be updated throughout ConcurrentHashTable. Suggest mem_tag. We can change this later in a followup. > src/hotspot/share/utilities/growableArray.hpp line 803: > >> 801: >> 802: // Leaner GrowableArray for CHeap backed data arrays, with compile-time decided MemTag. >> 803: template > > Another parameter needing update, but shouldn't (can't?) be called `mem_tag` because of the > function parameter name for allocate(). We can change this later in a followup. > src/hotspot/share/utilities/linkedlist.hpp line 368: > >> 366: template > 367: AnyObj::allocation_type T = AnyObj::C_HEAP, >> 368: MemTag F = mtNMT, AllocFailType alloc_failmode = AllocFailStrategy::RETURN_NULL> > > Another parameter name needing update. We can change this later in a followup. > src/hotspot/share/utilities/objectBitSet.hpp line 42: > >> 40: * during the lifetime of the ObjectBitSet. The underlying memory is allocated from C-Heap. >> 41: */ >> 42: template > > More parameter names needing update. I followed the local pattern, but we can change this later in a followup. > src/hotspot/share/utilities/resizeableResourceHash.hpp line 33: > >> 31: typename K, typename V, >> 32: AnyObj::allocation_type ALLOC_TYPE, >> 33: MemTag MEM_TYPE> > > I think s/MEM_TYPE/mem_type/, but other non-type template parameters here are also > all-uppercase, so I guess better to leave it for now and look at it later. Yes, I followed the local pattern. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1750606609 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1750603957 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1750603545 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1750603358 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1750602937 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1750601971 From gziemski at openjdk.org Mon Sep 9 17:00:17 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 9 Sep 2024 17:00:17 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag In-Reply-To: References: <1OaddXS4qsgvGkMJXBt8kZRa7J6vQVZH1tdQ22hpGeo=.06168001-0cde-423b-bebb-24bb45d1c155@github.com> Message-ID: On Mon, 9 Sep 2024 13:36:26 GMT, Coleen Phillimore wrote: >> src/hotspot/share/utilities/chunkedList.hpp line 31: >> >>> 29: #include "utilities/debug.hpp" >>> 30: >>> 31: template class ChunkedList : public CHeapObj { >> >> Parameter name should be updated. Suggest `mem_tag`. > > How about MT here or just M? I would make this a further change though. We can change this later in a followup. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1750604170 From eosterlund at openjdk.org Mon Sep 9 17:02:03 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 9 Sep 2024 17:02:03 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v2] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 16:29:44 GMT, Coleen Phillimore wrote: > Okay I agree that you can't use a Handle to reference this mirror if it's not already referenced by other code (already alive). Fetching out of the CLD::_handles doesn't keep it alive. > > > > // method - pre-checked for validity, but may be null meaning obsolete method > > // declaring_class_ptr - pre-checked for null > > jvmtiError > > JvmtiEnv::GetMethodDeclaringClass(Method* method, jclass* declaring_class_ptr) { > > NULL_CHECK(method, JVMTI_ERROR_INVALID_METHODID); > > (*declaring_class_ptr) = get_jni_class_non_null(method->method_holder()); > > return JVMTI_ERROR_NONE; > > } /* end GetMethodDeclaringClass */ > > > > So here, I don't see anything holding the method_holder() mirror through the Method, unless it's in the caller (a global jobject or something). Same with the GetFieldDeclaringClass function. Exactly. As for the GetFieldDeclaringClass method, the XSL generated C++ code that calls it has a jclass of the relevant class or a subclass of it, which is fine in terms of ensuring the holder is kept alive. So it's really GetMethodDeclaringClass that is missing something. Its caller (also XSL generated C++ code) checks that the Method* has not been cleared in the jmethodID handle, and bails if the CLD is not alive. But nowhere do we call klass_holder() which is what safely reads the holder and ensures it is made strongly reachable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2338621683 From gziemski at openjdk.org Mon Sep 9 17:07:05 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 9 Sep 2024 17:07:05 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag In-Reply-To: References: <1OaddXS4qsgvGkMJXBt8kZRa7J6vQVZH1tdQ22hpGeo=.06168001-0cde-423b-bebb-24bb45d1c155@github.com> Message-ID: On Mon, 9 Sep 2024 00:04:58 GMT, David Holmes wrote: >> src/hotspot/share/gc/shared/taskqueue.hpp line 119: >> >>> 117: // TaskQueueSuper collects functionality common to all GenericTaskQueue instances. >>> 118: >>> 119: template >> >> MemTag parameter name should probably be changed here and elsewhere in taskqueue code. >> Suggest `mem_tag`. > > I was going to suggest just MT which is more in keeping with the short/terse names given to type parameters. Will be address in a followup issue. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1750615114 From gziemski at openjdk.org Mon Sep 9 17:07:06 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 9 Sep 2024 17:07:06 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag In-Reply-To: <1OaddXS4qsgvGkMJXBt8kZRa7J6vQVZH1tdQ22hpGeo=.06168001-0cde-423b-bebb-24bb45d1c155@github.com> References: <1OaddXS4qsgvGkMJXBt8kZRa7J6vQVZH1tdQ22hpGeo=.06168001-0cde-423b-bebb-24bb45d1c155@github.com> Message-ID: On Sat, 7 Sep 2024 05:21:50 GMT, Kim Barrett wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. >> >> This fix also includes a cleanup of all the related parameter names and local variable names. >> >> Testing is pending... >> >> Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) > > src/hotspot/share/runtime/lightweightSynchronizer.cpp line 63: > >> 61: static void* allocate_node(void* context, size_t size, Value const& value) { >> 62: ObjectMonitorTable::inc_items_count(); >> 63: return AllocateHeap(size, MemTag::mtObjectMonitor); > > pre-existing: Why the scope here and below? Good question, I will clean this up. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1750614387 From gziemski at openjdk.org Mon Sep 9 17:16:06 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 9 Sep 2024 17:16:06 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 16:10:05 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. > > This fix also includes a cleanup of all the related parameter names and local variable names. > > Testing is pending... > > Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) Thank you David, Kim for your feedback, as the very first reviewers your job was the hardest. I implemented all your feedback. The template parameter rename I was planning on doing in a followup issue, however, if you really want, I can make the fix here too. It will increase the size of the changes, but I already accommodated Stefan request to include parameters and local variables, so we can go this one last step further if you like. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20872#issuecomment-2338647692 From gziemski at openjdk.org Mon Sep 9 17:19:50 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 9 Sep 2024 17:19:50 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v2] In-Reply-To: References: Message-ID: > Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. > > This fix also includes a cleanup of all the related parameter names and local variable names. > > Testing is pending... > > Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: fix incorrect renames, missing type -> mem_tag ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20872/files - new: https://git.openjdk.org/jdk/pull/20872/files/983bb6e2..9ae36d57 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20872&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20872&range=00-01 Stats: 28 lines in 9 files changed: 0 ins; 1 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/20872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20872/head:pull/20872 PR: https://git.openjdk.org/jdk/pull/20872 From lmesnik at openjdk.org Mon Sep 9 17:20:08 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 9 Sep 2024 17:20:08 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v2] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 06:04:36 GMT, Liang Mao wrote: >> Hi, >> >> It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. >> >> test/hotspot/jtreg/runtime and gc are clean. >> >> Thanks, >> Liang > > Liang Mao has updated the pull request incrementally with one additional commit since the last revision: > > Fix build error in windows/mac Can you please add regression test provided in the bug also. ------------- Changes requested by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20907#pullrequestreview-2290498068 From gziemski at openjdk.org Mon Sep 9 17:31:47 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 9 Sep 2024 17:31:47 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v3] In-Reply-To: References: Message-ID: > Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. > > This fix also includes a cleanup of all the related parameter names and local variable names. > > Testing is pending... > > Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: fix linux build issue ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20872/files - new: https://git.openjdk.org/jdk/pull/20872/files/9ae36d57..998537bd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20872&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20872&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20872/head:pull/20872 PR: https://git.openjdk.org/jdk/pull/20872 From zzambers at openjdk.org Mon Sep 9 17:33:10 2024 From: zzambers at openjdk.org (Zdenek Zambersky) Date: Mon, 9 Sep 2024 17:33:10 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v9] In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 17:46:00 GMT, Severin Gehwolf wrote: >> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. >> >> I'm adding those tests in order to not regress another time. >> >> Testing: >> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. >> - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) >> - [x] GHA > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: > > - Adapt JDK-8339148 > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Fix comment of WB::host_cpus() > - Handle non-root + CGv2 > - Add nested hierarchy to test framework > - Revert "Add root check for SystemdMemoryAwarenessTest.java" > > This reverts commit 7e8d9ed46815096ae8c4502f3320ebf5208438d5. > - Add root check for SystemdMemoryAwarenessTest.java > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - ... and 7 more: https://git.openjdk.org/jdk/compare/3d582133...30f32d22 I have done some testing on RHELs (build with changes from this PR + other 2 container PRs applied): **RHEL-8** (cgroup1/non-root) - test was skipped correctly **RHEL-9** (cgroup2/non-root) - I saw failure of `active_processor_count` check. - after investigation, I have found, that `cpu` cgroup controller is not delegated to `user at 1000.service` (and children) on rhel-9 (unlike in e.g. fedora) it only had `memory pids` (btw. available controllers at given "level" are listed in `cgroup.controllers` file in cgroups v2) - when I modified `user at .service` to also delegate cpu controller, test passed Apart from issue with check for `active_processor_count` on RHEL-9/non-root, it looks good. However I don't know how to easily fix issue with `active_processor_count` check. Maybe check could be skipped for non-root. (Work-around is to modify system configuration.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2338674577 From rkennke at openjdk.org Mon Sep 9 17:45:47 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 9 Sep 2024 17:45:47 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: Message-ID: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with six additional commits since the last revision: - Print as warning when UCOH doesn't match in CDS archive - Improve initialization of mark-word in CDS ArchiveHeapWriter - Simplify getKlass() in SA - Simplify oopDesc::init_mark() - Get rid of forward_safe_* methods - GCForwarding touch-ups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/70f492d3..2884499a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=07-08 Stats: 132 lines in 17 files changed: 26 ins; 73 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From aph at openjdk.org Mon Sep 9 18:04:05 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 9 Sep 2024 18:04:05 GMT Subject: RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM In-Reply-To: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> References: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> Message-ID: On Thu, 5 Sep 2024 00:58:10 GMT, Boris Ulasevich wrote: > With this change, I have adjusted the default settings for CodeCacheSegmentSize and CodeEntryAlignment for AARCH and ARM32. The main goal is to improve code density by reducing the number of wasted bytes (approximately **4%** waste). Improving code density may also have the side effect of boosting performance in large applications > > Each nmethod occupies a number of code cache segments (minimum allocation blocks). Since the size of an nmethod is not aligned to 128 bytes, the last occupied segment is half empty. Reducing the size of the code cache segments correspondingly minimizes waste. However, we should be careful about reducing the CodeCacheSegmentSize too much, as smaller segment sizes will increase the overhead of the CodeHeap::_segmap bitmap. A CodeCacheSegmentSize of 64 seems to be an optimal balance. > > The current large default value for CodeCacheSegmentSize (64+64) was historically introduced with the comment "Tiered compilation has large code-entry alignment" which doesn't make much sense to me. The history of this comment and value is as follows: > - The PPC port was introduced with CodeEntryAlignment=128 (recently reduced to 64: https://github.com/openjdk/jdk/commit/09a78b5d) and CodeCacheSegmentSize was adjusted accordingly for that platform. > - Soon after, the 128-byte alignment was applied to all platforms to hide a debug mode warning (https://github.com/openjdk/jdk/commit/e8bc971d). Despite the change (and Segmented Code Cache introduced later), the warning can still be reproduced today using the -XX:+VerifyCodeCache fastdebug option in large applications (10K nmethods ~ 10K free blocks in between them). > > I believe it is time to remove the comment and update the default value. > > I also suggest updating the default CodeEntryAlignment value for AARCH. The current setting is much larger than for x86 and was likely based on the typical cache line size of 64 bytes. Cortex-A57, A72 architecture software optimisation guides recommend a 32-byte alignment for subroutine entry points. Neoverse architecture software optimisation guides do not mention recommended entry point alignment. > > For reference, the default [function_align setting in GCC](https://github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/tuning_models/cortexa72.h#L44) is typically 16 or 32 bytes, depending on the target architecture. > > Hotspot performance tests with -XX:CodeCacheSegmentSize=64 and -XX:CodeEntryAlignment=16 options showed the following results: > - No performance impact on ... Looks like good low-hanging fruit to me. Could we ask @veresov why he made this change? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20864#issuecomment-2338755041 From iveresov at openjdk.org Mon Sep 9 18:08:05 2024 From: iveresov at openjdk.org (Igor Veresov) Date: Mon, 9 Sep 2024 18:08:05 GMT Subject: RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM In-Reply-To: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> References: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> Message-ID: On Thu, 5 Sep 2024 00:58:10 GMT, Boris Ulasevich wrote: > With this change, I have adjusted the default settings for CodeCacheSegmentSize and CodeEntryAlignment for AARCH and ARM32. The main goal is to improve code density by reducing the number of wasted bytes (approximately **4%** waste). Improving code density may also have the side effect of boosting performance in large applications > > Each nmethod occupies a number of code cache segments (minimum allocation blocks). Since the size of an nmethod is not aligned to 128 bytes, the last occupied segment is half empty. Reducing the size of the code cache segments correspondingly minimizes waste. However, we should be careful about reducing the CodeCacheSegmentSize too much, as smaller segment sizes will increase the overhead of the CodeHeap::_segmap bitmap. A CodeCacheSegmentSize of 64 seems to be an optimal balance. > > The current large default value for CodeCacheSegmentSize (64+64) was historically introduced with the comment "Tiered compilation has large code-entry alignment" which doesn't make much sense to me. The history of this comment and value is as follows: > - The PPC port was introduced with CodeEntryAlignment=128 (recently reduced to 64: https://github.com/openjdk/jdk/commit/09a78b5d) and CodeCacheSegmentSize was adjusted accordingly for that platform. > - Soon after, the 128-byte alignment was applied to all platforms to hide a debug mode warning (https://github.com/openjdk/jdk/commit/e8bc971d). Despite the change (and Segmented Code Cache introduced later), the warning can still be reproduced today using the -XX:+VerifyCodeCache fastdebug option in large applications (10K nmethods ~ 10K free blocks in between them). > > I believe it is time to remove the comment and update the default value. > > I also suggest updating the default CodeEntryAlignment value for AARCH. The current setting is much larger than for x86 and was likely based on the typical cache line size of 64 bytes. Cortex-A57, A72 architecture software optimisation guides recommend a 32-byte alignment for subroutine entry points. Neoverse architecture software optimisation guides do not mention recommended entry point alignment. > > For reference, the default [function_align setting in GCC](https://github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/tuning_models/cortexa72.h#L44) is typically 16 or 32 bytes, depending on the target architecture. > > Hotspot performance tests with -XX:CodeCacheSegmentSize=64 and -XX:CodeEntryAlignment=16 options showed the following results: > - No performance impact on ... I don't quite remember making this change... And I don't remember any reasons as to why it might have been needed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20864#issuecomment-2338766859 From cjplummer at openjdk.org Mon Sep 9 18:37:09 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 9 Sep 2024 18:37:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 16:51:35 GMT, Chris Plummer wrote: >> There is no dedicated klass field anymore, the Klass* is encoded in the mark, and we would need to extract it. What is the purpose of the visitors? Do they need to see the klass/compressedKlass, or is it sufficient to visit the mark-word (which we already do, but as CInt). > > I've been looking into this. It's a bit hard to follow but I think you do need to do something more here. Can you run ClhsdbInspect.java and send me the output. In specific I need to know if the log includes the following: > > > hsdb> + inspect 0x00000007cff154b8 > instance of Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject @ 0x00000007cff154b8 (size = 24) > _mark: 1 > _metadata._compressed_klass: InstanceKlass for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject > firstWaiter: Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionNode @ 0x00000007cfff5f80 > lastWaiter: Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionNode @ 0x00000007cfff5f80 > this$0: Oop for java/util/concurrent/locks/ReentrantLock$NonfairSync @ 0x00000007cff15498 I pulled your changes and I see one slight difference in the output. The following line is missing: `_metadata._compressed_klass: InstanceKlass for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject` I realize that there is no `_metadata._compressed_klass` when you have compact headers, and that the Klass* is encoded in the `_mark` word, which is now looks something like this in the output: _mark: 16294762323640321 So you can say that the Klass* is embedded in the _mark work, but this isn't of much help to SA users. I think what is expected is that the visitor is passed a MetadataField object that when getValue() is called on it, the Klass mirror is returned. Maybe we need a new CompactKlassField type like we current have a NarrowKlassField field type, and it will do the decoding of the _mark work into a Klass. The current getKlass() is related to this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750743693 From gziemski at openjdk.org Mon Sep 9 18:41:26 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 9 Sep 2024 18:41:26 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v4] In-Reply-To: References: Message-ID: > Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. > > This fix also includes a cleanup of all the related parameter names and local variable names. > > Testing is pending... > > Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: revert type->state change ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20872/files - new: https://git.openjdk.org/jdk/pull/20872/files/998537bd..bf062da9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20872&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20872&range=02-03 Stats: 21 lines in 3 files changed: 0 ins; 0 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/20872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20872/head:pull/20872 PR: https://git.openjdk.org/jdk/pull/20872 From gziemski at openjdk.org Mon Sep 9 19:02:25 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Mon, 9 Sep 2024 19:02:25 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v5] In-Reply-To: References: Message-ID: > Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. > > This fix also includes a cleanup of all the related parameter names and local variable names. > > Testing is pending... > > Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: fix test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20872/files - new: https://git.openjdk.org/jdk/pull/20872/files/bf062da9..7a4f6e01 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20872&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20872&range=03-04 Stats: 24 lines in 1 file changed: 0 ins; 0 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/20872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20872/head:pull/20872 PR: https://git.openjdk.org/jdk/pull/20872 From cjplummer at openjdk.org Mon Sep 9 19:07:10 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 9 Sep 2024 19:07:10 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 18:34:10 GMT, Chris Plummer wrote: >> I've been looking into this. It's a bit hard to follow but I think you do need to do something more here. Can you run ClhsdbInspect.java and send me the output. In specific I need to know if the log includes the following: >> >> >> hsdb> + inspect 0x00000007cff154b8 >> instance of Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject @ 0x00000007cff154b8 (size = 24) >> _mark: 1 >> _metadata._compressed_klass: InstanceKlass for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject >> firstWaiter: Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionNode @ 0x00000007cfff5f80 >> lastWaiter: Oop for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionNode @ 0x00000007cfff5f80 >> this$0: Oop for java/util/concurrent/locks/ReentrantLock$NonfairSync @ 0x00000007cff15498 > > I pulled your changes and I see one slight difference in the output. The following line is missing: > > `_metadata._compressed_klass: InstanceKlass for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject` > > I realize that there is no `_metadata._compressed_klass` when you have compact headers, and that the Klass* is encoded in the `_mark` word, which is now looks something like this in the output: > > _mark: 16294762323640321 > > So you can say that the Klass* is embedded in the _mark work, but this isn't of much help to SA users. I think what is expected is that the visitor is passed a MetadataField object that when getValue() is called on it, the Klass mirror is returned. Maybe we need a new CompactKlassField type like we current have a NarrowKlassField field type, and it will do the decoding of the _mark work into a Klass. The current getKlass() is related to this. Thinking about this a bit more, maybe _mark needs to be a MetadataFile rather than CInt. This is a kind of odd situation. Basically we have a CInt field that is more than just simple bits used as flags or small integers. It also gets you to the Klass*. Possibly SA should treat _mark is two seprate fields; one that remains a CInt as it currently is and another that treats it as an encoded Klass* like the NarrowKlassField case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750788243 From kvn at openjdk.org Mon Sep 9 19:15:03 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 9 Sep 2024 19:15:03 GMT Subject: RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM In-Reply-To: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> References: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> Message-ID: On Thu, 5 Sep 2024 00:58:10 GMT, Boris Ulasevich wrote: > With this change, I have adjusted the default settings for CodeCacheSegmentSize and CodeEntryAlignment for AARCH and ARM32. The main goal is to improve code density by reducing the number of wasted bytes (approximately **4%** waste). Improving code density may also have the side effect of boosting performance in large applications > > Each nmethod occupies a number of code cache segments (minimum allocation blocks). Since the size of an nmethod is not aligned to 128 bytes, the last occupied segment is half empty. Reducing the size of the code cache segments correspondingly minimizes waste. However, we should be careful about reducing the CodeCacheSegmentSize too much, as smaller segment sizes will increase the overhead of the CodeHeap::_segmap bitmap. A CodeCacheSegmentSize of 64 seems to be an optimal balance. > > The current large default value for CodeCacheSegmentSize (64+64) was historically introduced with the comment "Tiered compilation has large code-entry alignment" which doesn't make much sense to me. The history of this comment and value is as follows: > - The PPC port was introduced with CodeEntryAlignment=128 (recently reduced to 64: https://github.com/openjdk/jdk/commit/09a78b5d) and CodeCacheSegmentSize was adjusted accordingly for that platform. > - Soon after, the 128-byte alignment was applied to all platforms to hide a debug mode warning (https://github.com/openjdk/jdk/commit/e8bc971d). Despite the change (and Segmented Code Cache introduced later), the warning can still be reproduced today using the -XX:+VerifyCodeCache fastdebug option in large applications (10K nmethods ~ 10K free blocks in between them). > > I believe it is time to remove the comment and update the default value. > > I also suggest updating the default CodeEntryAlignment value for AARCH. The current setting is much larger than for x86 and was likely based on the typical cache line size of 64 bytes. Cortex-A57, A72 architecture software optimisation guides recommend a 32-byte alignment for subroutine entry points. Neoverse architecture software optimisation guides do not mention recommended entry point alignment. > > For reference, the default [function_align setting in GCC](https://github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/tuning_models/cortexa72.h#L44) is typically 16 or 32 bytes, depending on the target architecture. > > Hotspot performance tests with -XX:CodeCacheSegmentSize=64 and -XX:CodeEntryAlignment=16 options showed the following results: > - No performance impact on ... As @bulasevich pointed the change was done by Albert to fix VM warning: https://bugs.openjdk.org/browse/JDK-8029799 Quote: "The freelist of the code cache exceeds 10'000 items, which results in a VM warning. The problem behind the warning is that the freelist is populated by a large number of small free blocks. For example, in failing test case (see header), the freelist grows up to more than 3500 items where the largest item on the list is 9 segments (one segment is 64 bytes). That experiment was done on my laptop. Such a large freelist can indeed be a performance problem, since we use a linear search to traverse the freelist." The warning is about huge freelist which is scanned linearly to find corresponding free space in CodeCache for next allocation. It is become big with tiered compilation because we do a lot of C1 compiled code which is replaced with C2 compiled code. The fix for 8029799 did optimization for freelist search for allocation by selecting first which have enough space. This reduce time of search but on other hand may increase fragmentation of CodeCache space. There were several optimization done for this code by @RealLucy [JDK-8223444](https://bugs.openjdk.org/browse/JDK-8223444) and [JDK-8231460](https://bugs.openjdk.org/browse/JDK-8231460). But it is still using `linked list` for free segments. Should we consider something more complex? Or it is not an issue? > Hotspot performance tests with -XX:CodeCacheSegmentSize=64 and -XX:CodeEntryAlignment=16 options showed the following results: Which of these two flags setting improved performance most? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20864#issuecomment-2338885078 PR Comment: https://git.openjdk.org/jdk/pull/20864#issuecomment-2338886472 From coleenp at openjdk.org Mon Sep 9 19:40:06 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 9 Sep 2024 19:40:06 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v2] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 06:04:36 GMT, Liang Mao wrote: >> Hi, >> >> It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. >> >> test/hotspot/jtreg/runtime and gc are clean. >> >> Thanks, >> Liang > > Liang Mao has updated the pull request incrementally with one additional commit since the last revision: > > Fix build error in windows/mac Hm not sure how to fix GetMethodDeclaringKlass - is there a race if we call klass_holder() after that on the method->method_holder() ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2338929379 From coleenp at openjdk.org Mon Sep 9 19:55:16 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 9 Sep 2024 19:55:16 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: On Mon, 9 Sep 2024 17:45:47 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with six additional commits since the last revision: > > - Print as warning when UCOH doesn't match in CDS archive > - Improve initialization of mark-word in CDS ArchiveHeapWriter > - Simplify getKlass() in SA > - Simplify oopDesc::init_mark() > - Get rid of forward_safe_* methods > - GCForwarding touch-ups I reviewed the oops code so far. src/hotspot/share/oops/compressedKlass.cpp line 116: > 114: _range = end - _base; > 115: > 116: DEBUG_ONLY(assert_is_valid_encoding(addr, len, _base, _shift);) Can you refactor so the aarch64 path runs this same code without duplication? src/hotspot/share/oops/klass.hpp line 173: > 171: > 172: markWord _prototype_header; // Used to initialize objects' header > 173: I think you should move this up after ClassLoaderData, as there might be an alignment gap (you can run pahole to check). src/hotspot/share/oops/klass.hpp line 718: > 716: > 717: markWord prototype_header() const { > 718: assert(UseCompactObjectHeaders, "only use with compact object headers"); Should this unconditionally return _prototype_header since it's initialized to markWord::prototype_header(), or would that decrease performance for the non-compact headers case? src/hotspot/share/oops/klass.inline.hpp line 54: > 52: } > 53: > 54: inline void Klass::set_prototype_header(markWord header) { Can you put a comment that this is only used when dumping the archive? Because otherwise the Klass::_prototype_header field should always be initialized to the right thing (either with Klass encoded or as markWord::protoytpe_header()) and doesn't change. src/hotspot/share/oops/markWord.inline.hpp line 90: > 88: ShouldNotReachHere(); > 89: return markWord(); > 90: #endif Is the ifdef _LP64 necessary, since UseCompactObjectHeaders should always be false for 32 bits? src/hotspot/share/oops/oop.inline.hpp line 90: > 88: } else { > 89: return markWord::prototype(); > 90: } Could this be unconditional since prototoype_header is initialized for all Klasses? src/hotspot/share/oops/typeArrayKlass.cpp line 175: > 173: size_t TypeArrayKlass::oop_size(oop obj) const { > 174: // In this assert, we cannot safely access the Klass* with compact headers. > 175: assert(UseCompactObjectHeaders || obj->is_typeArray(),"must be a type array"); Why not? I think I'm missing something. Klass should be in the markWord and that should be ok (?) ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2290316150 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750529270 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750727211 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750730078 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750736547 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750739441 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750842383 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750721069 From coleenp at openjdk.org Mon Sep 9 19:55:19 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 9 Sep 2024 19:55:19 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 11:55:52 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Try to avoid lea in loadNklass (aarch64) > - Fix release build error src/hotspot/cpu/aarch64/compressedKlass_aarch64.cpp line 147: > 145: #endif > 146: > 147: return true; This should only be in the compressedKlass.cpp file. src/hotspot/share/oops/compressedKlass.cpp line 214: > 212: ss.print("Class space size (%zu) exceeds the maximum possible size (%zu)", > 213: len, max_encoding_range_size()); > 214: vm_exit_during_initialization(ss.base()); Why does this exit and not turn off compressed klass pointers and compact object headers? src/hotspot/share/oops/compressedKlass.cpp line 222: > 220: return; > 221: } > 222: #endif Why not add null pd_initialize to zero to remove this conditional code? src/hotspot/share/oops/compressedKlass.cpp line 224: > 222: #endif > 223: > 224: if (tiny_classpointer_mode()) { I kind of agree with Thomas Schatzl for this. Maybe it should be compact_classpointer_mode(). It's nice to have a new string for grep, but they're not really that tiny. src/hotspot/share/oops/compressedKlass.cpp line 234: > 232: _range = len; > 233: > 234: constexpr int log_cacheline = 6; Is 6 the log of DEFAULT_CACHE_LINE_SIZE? src/hotspot/share/oops/compressedKlass.cpp line 243: > 241: } else { > 242: > 243: // In legacy mode, we try, in order of preference: Can you not use the word 'legacy' here? Maybe in "non-compact object header mode"... src/hotspot/share/oops/compressedKlass.inline.hpp line 100: > 98: check_valid_klass(k, base(), shift()); > 99: // Also assert that k falls into what we know is the valid Klass range. This is usually smaller > 100: // than the encoding range (e.g. encoding range covers 4G, but we only have 1G class space and a 1G is the default CompressedClassSpaceSize but can be larger, right? So the comment isn't quite accurate. Or with tiny class pointers can it only be 1G? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750527537 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750511912 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750513660 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750515923 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750520712 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750524690 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750662637 From coleenp at openjdk.org Mon Sep 9 19:55:20 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 9 Sep 2024 19:55:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com> References: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com> Message-ID: <4cTKmlYUEtFpr2TURf25gd7-_eSb-uF0cC0BmLl6wd0=.b9f0482d-5439-421b-9a29-7a014fb72558@github.com> On Mon, 9 Sep 2024 10:02:53 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > src/hotspot/share/oops/compressedKlass.hpp line 43: > >> 41: >> 42: // Tiny-class-pointer mode >> 43: static int _tiny_cp; // -1, 0=true, 1=false > > Suggestion: > > static int _tiny_cp; // -1 = uninitialized, 0 = true, 1 = false > > In addition to that, I am not sure if introducing a new term ("tiny") for compact class header related changes (and just here) makes the code more clear; I would have expected a "_compact_" prefix. Also all other members use "k"-klass and spell out "klass pointer", so I would prefer to keep that style. I agree with this. 'cp' reads as ConstantPool for me even though this is a different context. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750531167 From jbhateja at openjdk.org Mon Sep 9 19:58:13 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 9 Sep 2024 19:58:13 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction Message-ID: - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. - This saves emitting an explicit MOVZX instruction after setCC. - These new instructions are encoded using 4 byte Extended EVEX encoding. Validation performed over stand alone test point using Intel SDE. Best Regards, Jatin ------------- Commit messages: - 8339790: Support Intel APX setzucc instruction. Changes: https://git.openjdk.org/jdk/pull/20920/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20920&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339790 Stats: 53 lines in 7 files changed: 26 ins; 13 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/20920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20920/head:pull/20920 PR: https://git.openjdk.org/jdk/pull/20920 From stefank at openjdk.org Mon Sep 9 20:02:04 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 9 Sep 2024 20:02:04 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v2] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 06:04:36 GMT, Liang Mao wrote: >> Hi, >> >> It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. >> >> test/hotspot/jtreg/runtime and gc are clean. >> >> Thanks, >> Liang > > Liang Mao has updated the pull request incrementally with one additional commit since the last revision: > > Fix build error in windows/mac When me, Erik, and Axel, discussed this earlier today, we thought that the correct fix would be to add a klass_holder() call to GetMethodDeclaringKlass(). That works as long as there's no safepoint between that and the creation of the JNI handle, which there isn't. I also think that we should add a comment to GetFieldDeclaringClass, explaining why the klass is held alive. I'd also prefer if we added an `assert(k->is_loader_alive(), "Must be alive")` to `get_jni_class_non_null`. That assert will trigger if the call gets called just after ZGC marking, but before the jmethodID cleaning. A short window, but maybe enough to catch these kind of errors earlier. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2338970316 From stefank at openjdk.org Mon Sep 9 20:07:13 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 9 Sep 2024 20:07:13 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: <1vlOROPKL6eDagcb0xz0MGe6vA6vYBVa4UzVsCZ5Q8I=.d5408660-ee94-4118-b809-d13a3dc500b4@github.com> On Mon, 9 Sep 2024 18:15:38 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with six additional commits since the last revision: >> >> - Print as warning when UCOH doesn't match in CDS archive >> - Improve initialization of mark-word in CDS ArchiveHeapWriter >> - Simplify getKlass() in SA >> - Simplify oopDesc::init_mark() >> - Get rid of forward_safe_* methods >> - GCForwarding touch-ups > > src/hotspot/share/oops/typeArrayKlass.cpp line 175: > >> 173: size_t TypeArrayKlass::oop_size(oop obj) const { >> 174: // In this assert, we cannot safely access the Klass* with compact headers. >> 175: assert(UseCompactObjectHeaders || obj->is_typeArray(),"must be a type array"); > > Why not? I think I'm missing something. Klass should be in the markWord and that should be ok (?) I tracked this down to only (at least in my testing) happen from `size_given_klass` when called from the GC when it is about to copy an object. While that happens another thread can racingly succeed to copy the object and install a forwarding pointer over the old copy. When that happens the klass pointer is broken and the call to oopDesc::is_typeArray() crashes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750862842 From coleenp at openjdk.org Mon Sep 9 20:23:11 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 9 Sep 2024 20:23:11 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: <1vlOROPKL6eDagcb0xz0MGe6vA6vYBVa4UzVsCZ5Q8I=.d5408660-ee94-4118-b809-d13a3dc500b4@github.com> References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> <1vlOROPKL6eDagcb0xz0MGe6vA6vYBVa4UzVsCZ5Q8I=.d5408660-ee94-4118-b809-d13a3dc500b4@github.com> Message-ID: On Mon, 9 Sep 2024 20:04:22 GMT, Stefan Karlsson wrote: >> src/hotspot/share/oops/typeArrayKlass.cpp line 175: >> >>> 173: size_t TypeArrayKlass::oop_size(oop obj) const { >>> 174: // In this assert, we cannot safely access the Klass* with compact headers. >>> 175: assert(UseCompactObjectHeaders || obj->is_typeArray(),"must be a type array"); >> >> Why not? I think I'm missing something. Klass should be in the markWord and that should be ok (?) > > I tracked this down to only (at least in my testing) happen from `size_given_klass` when called from the GC when it is about to copy an object. While that happens another thread can racingly succeed to copy the object and install a forwarding pointer over the old copy. When that happens the klass pointer is broken and the call to oopDesc::is_typeArray() crashes. I did miss something. I thought the markWord was never overwritten by the forwarding pointer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1750882259 From bulasevich at openjdk.org Mon Sep 9 20:24:04 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Mon, 9 Sep 2024 20:24:04 GMT Subject: RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM In-Reply-To: References: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> Message-ID: <8G3mAhuSvOX9Ibpye1FDIzfko55KhHSoCvDPTMFDlqk=.a9633c21-bb8f-4fde-be14-84d94e5dc1e6@github.com> On Mon, 9 Sep 2024 19:12:35 GMT, Vladimir Kozlov wrote: > > Hotspot performance tests with -XX:CodeCacheSegmentSize=64 and -XX:CodeEntryAlignment=16 options showed the following results: > > Which of these two flags setting improved performance most? There is a high noise in the benchmark. With 1K of iterations on Neoverse V2 machine the result is: vm options? | ms/op | % | cache-misses (M) | group0-code_sparsity (K) | L1-icache-load-misses:u (M) -- | -- | -- | -- | -- | -- default | 635.574 | ? | 7193 | 18152 | 10111 CodeEntryAlignment | 633.961 | -0.25 | 7177 | 18156 | 10334 CodeCacheSegmentSize | 633.041 | -0.15 | 7187 | 17919 | 10081 both | 631.687 | -0.21 | 7167 | 17796 | 10287 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20864#issuecomment-2339014367 From mdoerr at openjdk.org Mon Sep 9 20:32:07 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 9 Sep 2024 20:32:07 GMT Subject: RFR: 8338658: New Object to ObjectMonitor mapping: s390x implementation [v2] In-Reply-To: References: <8mpU_pIgnJ_eNCSDLSMRR8zvDErPhSV_G8XePpmUl8U=.026964ac-f75e-472d-9187-f0c65548fa0c@github.com> Message-ID: <_YuT_5mjd8j-ApNPJNrS-9nKsZtty_q3hCq-kFvg0Ds=.ac5fba7e-8d0d-46b9-b2b7-1aeddc96e98e@github.com> On Mon, 9 Sep 2024 07:05:26 GMT, Axel Boldt-Christmas wrote: >> src/hotspot/cpu/s390/macroAssembler_s390.cpp line 6023: >> >>> 6021: >>> 6022: if (UseObjectMonitorTable) { >>> 6023: // Clear cache in case fast locking succeeds. >> >> @xmas92: This comment sounds like it should only be done if fast locking succeeds. Why are we doing it regardless of that? > > The invariants surrounding the cache went back and forth a bit. > > The important part is that the cache slot in the `BasicLock` is valid after the enter is complete. There is now a RAII object in the runtime code which makes sure this is the case for all paths. So it should be the case that the C2 code only needs to handle the case where it is successful. (Clearing when fast lock succeeds and storing the monitor when inflated locking succeeds.) > > I think it went in in this state as a combination of it being something that once was required (because the runtime was less precise with how it handles the `BasicLock` cache) and that I wanted to take the C2 changes in as they were because most of the performance testing had been performed in that state. Maybe there was some technical issue somewhere with regards to register availability, cannot recall. Regardless if it can be only done in the none slow path case and it is more performant it should be fine to do so. > > I do not think I saw a measurable difference on x86_64 from having the two stores in the successful inflated case. And in all other cases the only time you can elide the store is when the slow path is taken, so it probably does not save much if anything. Thanks for explaining! I'm ok with it and I have implemented it the same way for PPC64: https://github.com/openjdk/jdk/pull/20922 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20740#discussion_r1750892559 From mdoerr at openjdk.org Mon Sep 9 20:33:14 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 9 Sep 2024 20:33:14 GMT Subject: RFR: 8338995: New Object to ObjectMonitor mapping: PPC64 implementation Message-ID: PPC64 implementation of [JDK-8315884](https://bugs.openjdk.org/browse/JDK-8315884). ------------- Commit messages: - 8338995: New Object to ObjectMonitor mapping: PPC64 implementation Changes: https://git.openjdk.org/jdk/pull/20922/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20922&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338995 Stats: 166 lines in 8 files changed: 76 ins; 27 del; 63 mod Patch: https://git.openjdk.org/jdk/pull/20922.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20922/head:pull/20922 PR: https://git.openjdk.org/jdk/pull/20922 From matsaave at openjdk.org Mon Sep 9 21:11:24 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Mon, 9 Sep 2024 21:11:24 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v3] In-Reply-To: References: Message-ID: > This patch cleans up the use of `get_new_method()` so callers don't have to worry about throwing `NoSuchMethodError`. The method is refactored to throw the error and avoid ever returning nullptr. Verified with tier1-5 tests. Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Added assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20874/files - new: https://git.openjdk.org/jdk/pull/20874/files/ba1cb1b8..bd1cc1e8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20874&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20874&range=01-02 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20874.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20874/head:pull/20874 PR: https://git.openjdk.org/jdk/pull/20874 From dholmes at openjdk.org Mon Sep 9 21:43:06 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 9 Sep 2024 21:43:06 GMT Subject: RFR: 8335362: [Windows] Stack pointer increment in _cont_thaw stub can cause program to terminate with exit code 0xc0000005 [v4] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 15:53:47 GMT, Patricio Chilano Mateo wrote: >> Please review the following fix. In stub routine cont_thaw() we bump the stack pointer by the maximum size required to copy the frames currently stored in the top stackChunk. On Windows this increment of the stack pointer doesn't play nice with the way Windows sets up and manages stack pages. When a thread is created the stack is divided in 3 memory regions: regular committed pages, guard pages, reserved pages. The first pages are committed and the thread can read/write to them with no issues. The next pages(~2/3) are guard pages, which are committed but have the PAGE_GUARD attribute. When the thread tries to access a guard page the first time, the PAGE_GUARD attribute is removed and a new guard page from the reserved region is added. The rest of the stack are reserved pages and if we try to access it directly we get an EXCEPTION_ACCESS_VIOLATION (see bug for more details). So the problem is that we can bump the stack pointer too much and set it to point somewhere in the reserved region. When we then execute the call instruction for method thaw(), we get an EXCEPTION_ACCESS_VIOLATION exception, but because we cannot access the memory at the current stack pointer, we cannot call any method anymore, including the exception handler and the program terminates abruptly with exit code 0xc0000005. >> >> The fix implemented is to bang the stack pages one by one to let the Windows page protection take over. This is what we already do in os::map_stack_shadow_pages() in JavaCalls::call_helper(), and also in interpreter (bang_stack_shadow_pages()) and compiler (generate_stack_overflow_check()) code. It's actually also the same mechanism that Windows routine _chkstk used by the compiler employs (see bug comments with assembly code). >> >> I added new test BigStackChunk.java that reproduces the issue. The test fails without this fix and passes with it. I also tested the patch by running in mach5 tiers1-7. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > print values in assert Thanks LGTM. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20862#pullrequestreview-2291009061 From dholmes at openjdk.org Mon Sep 9 21:48:05 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 9 Sep 2024 21:48:05 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v2] In-Reply-To: <_geeT1LEXJLEVX6kW4zv8z2YldHczqXWJRqrWtb8RzM=.41f5209e-39dc-49c8-aa44-01192c917578@github.com> References: <2Jl1gRQ49EyqEEGyhk020hzxfDiydWk1u4V5-mLttyA=.c245a5b4-30f6-4a38-8e41-5d6acca57ecd@github.com> <_geeT1LEXJLEVX6kW4zv8z2YldHczqXWJRqrWtb8RzM=.41f5209e-39dc-49c8-aa44-01192c917578@github.com> Message-ID: On Mon, 9 Sep 2024 16:03:26 GMT, Matias Saavedra Silva wrote: >> This redefine code is really complicated. Obsolete methods, which includes deleted methods get an incremented idnum in check_methods_and_mark_as_obsolete. >> >> // obsolete methods need a unique idnum so they become new entries in >> // the jmethodID cache in InstanceKlass >> assert(old_method->method_idnum() == new_method->method_idnum(), "must match"); >> u2 num = InstanceKlass::cast(_the_class)->next_method_idnum(); >> if (num != ConstMethod::UNSET_IDNUM) { >> old_method->set_method_idnum(num); >> } >> >> When get_new_method() is called it compares InstanceKlass::_methods[idnum], then compares the Method at that index to the idnum that the method has stored. For deleted methods, this will return nullptr because the idnum comparison in method_with_idnum will not match, or idnum for the deleted method is greater than InstanceKlass::_methods.length(). I think it is sufficient to check for nullptr in get_new_method() for deleted methods but also the explicit is_deleted() comparison is much easier to understand that it's the right answer. >> >> Yes, there could be an assert in get_new_method(), the method passed in is_old(). All redefined methods are marked as is_old(). I believe all callers test this before making this call. > > Coleen is correct, all the callers do indeed check that the method `is_old()` but I think it's fine if we assert that in `get_new_method()` to make it clear that it is a requirement. The method `method_with_idnum()` can return nullptr here: > > if (m == nullptr || m->method_idnum() != idnum) { > for (int index = 0; index < methods()->length(); ++index) { > m = methods()->at(index); > if (m->method_idnum() == idnum) { > return m; > } > } > // None found, return null for the caller to handle. > return nullptr; > > As the comment suggests, the caller should handle nullptr, which in this case is `get_new_method()`. The callers of get_new_method() try to handle this but I think it's cleaner to check inside the method. So is it the case that `nullptr` implies deleted, and deleted implies `nullptr`? If so checking for both is redundant and confusing because it makes it look like there are two distinct cases. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20874#discussion_r1750978607 From coleenp at openjdk.org Mon Sep 9 22:31:04 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 9 Sep 2024 22:31:04 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v2] In-Reply-To: References: <2Jl1gRQ49EyqEEGyhk020hzxfDiydWk1u4V5-mLttyA=.c245a5b4-30f6-4a38-8e41-5d6acca57ecd@github.com> <_geeT1LEXJLEVX6kW4zv8z2YldHczqXWJRqrWtb8RzM=.41f5209e-39dc-49c8-aa44-01192c917578@github.com> Message-ID: On Mon, 9 Sep 2024 21:45:00 GMT, David Holmes wrote: >> Coleen is correct, all the callers do indeed check that the method `is_old()` but I think it's fine if we assert that in `get_new_method()` to make it clear that it is a requirement. The method `method_with_idnum()` can return nullptr here: >> >> if (m == nullptr || m->method_idnum() != idnum) { >> for (int index = 0; index < methods()->length(); ++index) { >> m = methods()->at(index); >> if (m->method_idnum() == idnum) { >> return m; >> } >> } >> // None found, return null for the caller to handle. >> return nullptr; >> >> As the comment suggests, the caller should handle nullptr, which in this case is `get_new_method()`. The callers of get_new_method() try to handle this but I think it's cleaner to check inside the method. > > So is it the case that `nullptr` implies deleted, and deleted implies `nullptr`? If so checking for both is redundant and confusing because it makes it look like there are two distinct cases. So that implies that you trust my reading of this code. It is complicated enough that testing both seems like a safe thing to do and somewhat clarifying, or else adding an assert like: assert(new_method != nullptr || old_method->is_deleted(), "this is the only way this happens"); return new_method == nullptr ? nsme : new_method; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20874#discussion_r1751012295 From bulasevich at openjdk.org Mon Sep 9 22:40:04 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Mon, 9 Sep 2024 22:40:04 GMT Subject: RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM In-Reply-To: References: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> Message-ID: On Mon, 9 Sep 2024 19:11:44 GMT, Vladimir Kozlov wrote: > The warning is about huge freelist which is scanned linearly to find corresponding free space in CodeCache for next allocation. It is become big with tiered compilation because we do a lot of C1 compiled code which is replaced with C2 compiled code. With Segmented Code Cache we have separate freelists for profiled and non-profiled heap. So I think there is no need to correct segment size for tiered compilation. > But it is still using linked list for free segments. Should we consider something more complex? Or it is not an issue? This can indeed be a problem. CodeHeap allocation takes about 1% of the total compilation time. Here is my statistic for 50K compiled methods: before: total_compilation_time(s): 370, total_allocation_time(s): 1.6 after: total_compilation_time(s): 378, total_allocation_time(s): 3.4 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20864#issuecomment-2339273224 From dholmes at openjdk.org Mon Sep 9 23:27:05 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 9 Sep 2024 23:27:05 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v2] In-Reply-To: References: Message-ID: <2SvbUTqgsR8KlKSWLgNzh2RNo-uVtJm2xopck9yOmZs=.7cf5ed4f-dee6-434d-b33a-301c4bfc3fcc@github.com> On Mon, 9 Sep 2024 06:04:36 GMT, Liang Mao wrote: >> Hi, >> >> It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. >> >> test/hotspot/jtreg/runtime and gc are clean. >> >> Thanks, >> Liang > > Liang Mao has updated the pull request incrementally with one additional commit since the last revision: > > Fix build error in windows/mac Maybe I'm misunderstanding the test case but isn't it using a jMethodID for a class that may have been unloaded? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2339321221 From iklam at openjdk.org Tue Sep 10 00:53:32 2024 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 10 Sep 2024 00:53:32 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v2] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Mon, 9 Sep 2024 05:15:20 GMT, David Holmes wrote: >> I tried to separate the "types" from the "values". I think this makes it easy to see how many types there are. > > Sorry I don't follow. This is just like a printf call I changed it according to your suggestion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1751101001 From iklam at openjdk.org Tue Sep 10 00:53:32 2024 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 10 Sep 2024 00:53:32 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v3] In-Reply-To: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: > This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > **Overview** > > - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. > - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. > - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. > - The boot classes are loaded as part of `vmClasses::resolve_all()` > - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). > - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. > > **All-or-nothing Loading** > > - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. > - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: > - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. > - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. > - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. > - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exact same set of modules are visible whe... Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: @dholmes-ora comments: logging indents ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20843/files - new: https://git.openjdk.org/jdk/pull/20843/files/ac1ed798..0441aef0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20843&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20843&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20843.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20843/head:pull/20843 PR: https://git.openjdk.org/jdk/pull/20843 From kvn at openjdk.org Tue Sep 10 01:58:06 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 10 Sep 2024 01:58:06 GMT Subject: RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM In-Reply-To: References: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> Message-ID: <4_oh93TGk3nScisJdXZ0YqCcXvP0tbYjOS2c5BPira0=.668a8db0-7fbd-4297-a066-306bd583ebd9@github.com> On Mon, 9 Sep 2024 22:37:44 GMT, Boris Ulasevich wrote: > With Segmented Code Cache we have separate freelists for profiled and non-profiled heap. So I think there is no need to correct segment size for tiered compilation. It is indeed helped somewhat but as you said: > the warning can still be reproduced today using the -XX:+VerifyCodeCache fastdebug option in large applications (10K nmethods ~ 10K free blocks in between them). Anyway, I agree to trade off 1% in compilation speed for 0.2-0.7% application performance improvement. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20864#issuecomment-2339456528 From kvn at openjdk.org Tue Sep 10 02:01:03 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 10 Sep 2024 02:01:03 GMT Subject: RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM In-Reply-To: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> References: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> Message-ID: <-cCyBdiqtII0_9uS1wl8CEZ_7a-Q6Wf6tH2PwROp1y8=.b4822d51-a9f1-4d50-88f2-0ab8c94b37f5@github.com> On Thu, 5 Sep 2024 00:58:10 GMT, Boris Ulasevich wrote: > With this change, I have adjusted the default settings for CodeCacheSegmentSize and CodeEntryAlignment for AARCH and ARM32. The main goal is to improve code density by reducing the number of wasted bytes (approximately **4%** waste). Improving code density may also have the side effect of boosting performance in large applications > > Each nmethod occupies a number of code cache segments (minimum allocation blocks). Since the size of an nmethod is not aligned to 128 bytes, the last occupied segment is half empty. Reducing the size of the code cache segments correspondingly minimizes waste. However, we should be careful about reducing the CodeCacheSegmentSize too much, as smaller segment sizes will increase the overhead of the CodeHeap::_segmap bitmap. A CodeCacheSegmentSize of 64 seems to be an optimal balance. > > The current large default value for CodeCacheSegmentSize (64+64) was historically introduced with the comment "Tiered compilation has large code-entry alignment" which doesn't make much sense to me. The history of this comment and value is as follows: > - The PPC port was introduced with CodeEntryAlignment=128 (recently reduced to 64: https://github.com/openjdk/jdk/commit/09a78b5d) and CodeCacheSegmentSize was adjusted accordingly for that platform. > - Soon after, the 128-byte alignment was applied to all platforms to hide a debug mode warning (https://github.com/openjdk/jdk/commit/e8bc971d). Despite the change (and Segmented Code Cache introduced later), the warning can still be reproduced today using the -XX:+VerifyCodeCache fastdebug option in large applications (10K nmethods ~ 10K free blocks in between them). > > I believe it is time to remove the comment and update the default value. > > I also suggest updating the default CodeEntryAlignment value for AARCH. The current setting is much larger than for x86 and was likely based on the typical cache line size of 64 bytes. Cortex-A57, A72 architecture software optimisation guides recommend a 32-byte alignment for subroutine entry points. Neoverse architecture software optimisation guides do not mention recommended entry point alignment. > > For reference, the default [function_align setting in GCC](https://github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/tuning_models/cortexa72.h#L44) is typically 16 or 32 bytes, depending on the target architecture. > > Hotspot performance tests with -XX:CodeCacheSegmentSize=64 and -XX:CodeEntryAlignment=16 options showed the following results: > - No performance impact on ... I will ask someone to do our performance testing to confirm your results. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20864#issuecomment-2339459229 From lmao at openjdk.org Tue Sep 10 02:21:04 2024 From: lmao at openjdk.org (Liang Mao) Date: Tue, 10 Sep 2024 02:21:04 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v2] In-Reply-To: <2SvbUTqgsR8KlKSWLgNzh2RNo-uVtJm2xopck9yOmZs=.7cf5ed4f-dee6-434d-b33a-301c4bfc3fcc@github.com> References: <2SvbUTqgsR8KlKSWLgNzh2RNo-uVtJm2xopck9yOmZs=.7cf5ed4f-dee6-434d-b33a-301c4bfc3fcc@github.com> Message-ID: <2xh9F1kC4pZ4yHEwiPxJzaqO5REJ7vRXs6m1v0ADRss=.3f34dd37-8785-4f6e-857b-47d9e2e2f6bb@github.com> On Mon, 9 Sep 2024 23:24:05 GMT, David Holmes wrote: > Maybe I'm misunderstanding the test case but isn't it using a jMethodID for a class that may have been unloaded? Yes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2339477484 From jkarthikeyan at openjdk.org Tue Sep 10 03:29:05 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 10 Sep 2024 03:29:05 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 19:36:51 GMT, Jatin Bhateja wrote: > - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The > condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. > - This saves emitting an explicit MOVZX instruction after setCC. > - These new instructions are encoded using 4 byte Extended EVEX encoding. > > Validation performed over stand alone test point using Intel SDE. > > Best Regards, > Jatin src/hotspot/cpu/x86/macroAssembler_x86.cpp line 10425: > 10423: } > 10424: > 10425: void MacroAssembler::setCC(Assembler::Condition comparison, Register dst) { Generally I think we use all lowercase for assembler functions, such as `Assembler::jcc`. I think it would be easier to read if this were named `setcc` (and similar for `esetzucc`). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20920#discussion_r1751195879 From rehn at openjdk.org Tue Sep 10 05:14:47 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 10 Sep 2024 05:14:47 GMT Subject: RFR: 8339741: RISC-V: C ABI breakage for integer on stack [v2] In-Reply-To: References: Message-ID: > Hi please review, > > When calling a native function using integers smaller than 64, > they must be loaded from a Java stack slot and widen to 64-bit, sign-extended. > In the interpreter case we only store 32-bit, which means the top 32-bit are 'random'. > In the compiler case we do an ld and grab random top 32-bit. > These should be loaded with a lw from Java stack, thus proper sign extended and then stored with sd into the native stack. > > I found the intrepter bug first, wrote a test case for it, which found the compiler bug. > > Here you can see the difference, both are legal todo from a compiler: > https://godbolt.org/z/85aMhja5f > Relevant specs: > https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc >> integer scalars narrower than XLEN bits are widened according to the sign of their type up to 32 bits, then sign-extended to XLEN bits. > > I checked floats also, they seems fine, but please go ahead and do a check regarding floats. > > Passes ./test/hotspot/jtreg/compiler/calls/, runnnig t1. Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Added comment - Merge branch 'master' into c_abi_error - Fixed ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20912/files - new: https://git.openjdk.org/jdk/pull/20912/files/3afd24d9..6e738aa5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20912&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20912&range=00-01 Stats: 1281 lines in 67 files changed: 769 ins; 235 del; 277 mod Patch: https://git.openjdk.org/jdk/pull/20912.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20912/head:pull/20912 PR: https://git.openjdk.org/jdk/pull/20912 From rehn at openjdk.org Tue Sep 10 05:14:47 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 10 Sep 2024 05:14:47 GMT Subject: RFR: 8339741: RISC-V: C ABI breakage for integer on stack [v2] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 14:54:32 GMT, Fei Yang wrote: > Nice catch! Thanks Robbin. I am trying it on my machines. BTW: Seems this fix deserves a small code comment. Added comment, is this what you had in mind? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20912#issuecomment-2339633321 From rehn at openjdk.org Tue Sep 10 05:29:08 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 10 Sep 2024 05:29:08 GMT Subject: RFR: 8339741: RISC-V: C ABI breakage for integer on stack [v2] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 05:14:47 GMT, Robbin Ehn wrote: >> Hi please review, >> >> When calling a native function using integers smaller than 64, >> they must be loaded from a Java stack slot and widen to 64-bit, sign-extended. >> In the interpreter case we only store 32-bit, which means the top 32-bit are 'random'. >> In the compiler case we do an ld and grab random top 32-bit. >> These should be loaded with a lw from Java stack, thus proper sign extended and then stored with sd into the native stack. >> >> I found the intrepter bug first, wrote a test case for it, which found the compiler bug. >> >> Here you can see the difference, both are legal todo from a compiler: >> https://godbolt.org/z/85aMhja5f >> Relevant specs: >> https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc >>> integer scalars narrower than XLEN bits are widened according to the sign of their type up to 32 bits, then sign-extended to XLEN bits. >> >> I checked floats also, they seems fine, but please go ahead and do a check regarding floats. >> >> Passes ./test/hotspot/jtreg/compiler/calls/, runnnig t1. > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Added comment > - Merge branch 'master' into c_abi_error > - Fixed Oh, so I have question here: } else if (dst.first()->is_stack()) { // reg to stack sd(src.first()->as_Register(), Address(sp, reg2offset_out(dst.first()))); } else { if (dst.first() != src.first()) { sign_extend(dst.first()->as_Register(), src.first()->as_Register(), 32); } } The sd is done without any sign extension, hence we must have C ABI representation in the register. But that means the register to register case, where we have sign_extend(), is actually just a move. AFAIK this is the case, which I hope, otherwise we have third bug here. So I think we should use `mv` to be clear about what we are doing. ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20912#issuecomment-2339650697 From dholmes at openjdk.org Tue Sep 10 06:00:11 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 10 Sep 2024 06:00:11 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 17:13:11 GMT, Gerard Ziemski wrote: > The template parameter rename I was planning on doing in a followup issue, however, if you really want, I can make the fix here too. Personally I'd be okay with doing it here as one final commit that can be viewed in isolation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20872#issuecomment-2339696620 From dholmes at openjdk.org Tue Sep 10 06:07:10 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 10 Sep 2024 06:07:10 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v3] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Tue, 10 Sep 2024 00:53:32 GMT, Ioi Lam wrote: >> This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> **Overview** >> >> - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. >> - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. >> - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. >> - The boot classes are loaded as part of `vmClasses::resolve_all()` >> - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). >> - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. >> >> **All-or-nothing Loading** >> >> - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. >> - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: >> - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. >> - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. >> - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. >> - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exa... > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @dholmes-ora comments: logging indents Thanks for those adjustments. Nothing further from me, but I'm hardly an expert in this area. ------------- PR Review: https://git.openjdk.org/jdk/pull/20843#pullrequestreview-2291436503 From dholmes at openjdk.org Tue Sep 10 06:53:05 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 10 Sep 2024 06:53:05 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v2] In-Reply-To: References: <2Jl1gRQ49EyqEEGyhk020hzxfDiydWk1u4V5-mLttyA=.c245a5b4-30f6-4a38-8e41-5d6acca57ecd@github.com> <_geeT1LEXJLEVX6kW4zv8z2YldHczqXWJRqrWtb8RzM=.41f5209e-39dc-49c8-aa44-01192c917578@github.com> Message-ID: On Mon, 9 Sep 2024 22:28:20 GMT, Coleen Phillimore wrote: >> So is it the case that `nullptr` implies deleted, and deleted implies `nullptr`? If so checking for both is redundant and confusing because it makes it look like there are two distinct cases. > > So that implies that you trust my reading of this code. It is complicated enough that testing both seems like a safe thing to do and somewhat clarifying, or else adding an assert like: > > assert(new_method != nullptr || old_method->is_deleted(), "this is the only way this happens"); > return new_method == nullptr ? nsme : new_method; The assert works for me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20874#discussion_r1751358981 From rkennke at openjdk.org Tue Sep 10 07:23:13 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 10 Sep 2024 07:23:13 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com> References: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com> Message-ID: On Mon, 9 Sep 2024 10:16:24 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > src/hotspot/share/gc/shared/gcForwarding.hpp line 41: > >> 39: * bits (to indicate 'forwarded' state as usual). >> 40: */ >> 41: class GCForwarding : public AllStatic { > > Since this class is only used for Full GCs, it may be useful to include that information, i.e. something like `FullGCForwarding` to avoid confusion why it is not used for other GCs too. > (Unless this has been discussed and even rejected by me before). I agree. In-fact, that has been my original name. It has been suggested that I change it to SlidingForwarding when that was the approach that we were going to take, but with the new implementation, FullGCForwarding makes most sense. I'll change it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751400378 From luhenry at openjdk.org Tue Sep 10 07:26:10 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 10 Sep 2024 07:26:10 GMT Subject: RFR: 8339741: RISC-V: C ABI breakage for integer on stack [v2] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 05:14:47 GMT, Robbin Ehn wrote: >> Hi please review, >> >> When calling a native function using integers smaller than 64, >> they must be loaded from a Java stack slot and widen to 64-bit, sign-extended. >> In the interpreter case we only store 32-bit, which means the top 32-bit are 'random'. >> In the compiler case we do an ld and grab random top 32-bit. >> These should be loaded with a lw from Java stack, thus proper sign extended and then stored with sd into the native stack. >> >> I found the intrepter bug first, wrote a test case for it, which found the compiler bug. >> >> Here you can see the difference, both are legal todo from a compiler: >> https://godbolt.org/z/85aMhja5f >> Relevant specs: >> https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc >>> integer scalars narrower than XLEN bits are widened according to the sign of their type up to 32 bits, then sign-extended to XLEN bits. >> >> I checked floats also, they seems fine, but please go ahead and do a check regarding floats. >> >> Passes ./test/hotspot/jtreg/compiler/calls/, runnnig t1. > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Added comment > - Merge branch 'master' into c_abi_error > - Fixed Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20912#pullrequestreview-2291581821 From luhenry at openjdk.org Tue Sep 10 07:30:05 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 10 Sep 2024 07:30:05 GMT Subject: RFR: 8339771: RISC-V: Reduce icache flushes In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 12:33:01 GMT, Robbin Ehn wrote: > Hey, please consider, > > All code which is offline (behind a barrier) do not need global icache flushes. > As we can instead in slow path locally (thread and hart) emit fence.i. > But if we were to be context switch do a hart which have not had fence.i emitted we can still fetch stale instructions. > To handle this case new now have kernel support: > https://docs.kernel.org/arch/riscv/cmodx.html > > It's not perfect as we will be emitting fence.i on any context switch for any thread with this patch, even if that thread do not execute on code heap (non attached native thread), and even if there was no changes to code heap. > But this is in many cases much faster as the icache flush global IPI is very intrusive. > Particular cases are running a concurrent gc with small head room. > In such scenario I measured 15% increased throughput on VF2. > A large CPU or less head room (faster GC cycles) will yield even more performance boost. > > Note that this requires 6.10 kernel. > > I'm running VF2 with 6.11-rc3 kernel and this passed tier1-3. (With setting on) > > Later we probably want this default on, but as it's hard to test I'll leave default off. src/hotspot/os_cpu/linux_riscv/vm_version_linux_riscv.cpp line 126: > 124: // Linux kernel require Zifencei > 125: if (!ext_Zifencei.enabled()) { > 126: ext_Zifencei.enable_feature(); That would deserve a `log_info` of sort, to make sure we can debug such state change in the wild ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20913#discussion_r1751410062 From fyang at openjdk.org Tue Sep 10 07:49:04 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 10 Sep 2024 07:49:04 GMT Subject: RFR: 8339741: RISC-V: C ABI breakage for integer on stack [v2] In-Reply-To: <2veWPlTXhpRIEqfHjOxebZgsSPr-C4ZgP_wmOBfnFoU=.378999c3-aea9-4904-8620-5f87bc449544@github.com> References: <2veWPlTXhpRIEqfHjOxebZgsSPr-C4ZgP_wmOBfnFoU=.378999c3-aea9-4904-8620-5f87bc449544@github.com> Message-ID: <5sc9442PvXL6s4_TUrFQdt32D1Iq_wLJEvGuhnZLF4o=.24d0d5f6-73d1-471f-947b-cfe457cd5f28@github.com> On Tue, 10 Sep 2024 07:44:05 GMT, Fei Yang wrote: > Oh, so I have question here: > > ``` > } else if (dst.first()->is_stack()) { > // reg to stack > sd(src.first()->as_Register(), Address(sp, reg2offset_out(dst.first()))); > } else { > if (dst.first() != src.first()) { > sign_extend(dst.first()->as_Register(), src.first()->as_Register(), 32); > } > } > ``` > > The sd is done without any sign extension, hence we must have C ABI representation in the register. But that means the register to register case, where we have sign_extend(), is actually just a move. AFAIK this is the case, which I hope, otherwise we have third bug here. So I think we should use `mv` to be clear about what we are doing. > > ? Yes. Thanks for the update! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20912#issuecomment-2339913882 From rkennke at openjdk.org Tue Sep 10 07:56:09 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 10 Sep 2024 07:56:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com> References: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com> Message-ID: On Mon, 9 Sep 2024 10:21:54 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > src/hotspot/share/gc/shared/collectedHeap.cpp line 232: > >> 230: } >> 231: >> 232: // With compact headers, we can't safely access the class, due > > Suggestion: > > // With compact headers, we can't safely access the klass, due > > > This is the case why? Because we might not have copied the header yet? Is this method actually ever used while the forwarded object is unstable? > Given this is used for verification only afaik, we should make an effort to provide that check. With compact headers, we can't safely access the Klass* when the object has been forwarded, because non-full-GC-forwarding temporarily overwrites the mark-word, and thus the Klass*, with the forwarding pointer, and here we have no way to make a distinction between Full-GC and regular GC forwarding. I improved the code to make the check when the object is not forwarded. Not sure if we could/should do more (e.g. pass around is_full argument to make the distinction, or find the - possibly few - places where we might call is_oop() on from-space objects in regular GC and do the check in a forwardee-safe way?). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751448814 From fyang at openjdk.org Tue Sep 10 07:57:09 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 10 Sep 2024 07:57:09 GMT Subject: RFR: 8339741: RISC-V: C ABI breakage for integer on stack [v2] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 05:14:47 GMT, Robbin Ehn wrote: >> Hi please review, >> >> When calling a native function using integers smaller than 64, >> they must be loaded from a Java stack slot and widen to 64-bit, sign-extended. >> In the interpreter case we only store 32-bit, which means the top 32-bit are 'random'. >> In the compiler case we do an ld and grab random top 32-bit. >> These should be loaded with a lw from Java stack, thus proper sign extended and then stored with sd into the native stack. >> >> I found the intrepter bug first, wrote a test case for it, which found the compiler bug. >> >> Here you can see the difference, both are legal todo from a compiler: >> https://godbolt.org/z/85aMhja5f >> Relevant specs: >> https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc >>> integer scalars narrower than XLEN bits are widened according to the sign of their type up to 32 bits, then sign-extended to XLEN bits. >> >> I checked floats also, they seems fine, but please go ahead and do a check regarding floats. >> >> Passes ./test/hotspot/jtreg/compiler/calls/, runnnig t1. > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Added comment > - Merge branch 'master' into c_abi_error > - Fixed Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20912#pullrequestreview-2291650466 From fbredberg at openjdk.org Tue Sep 10 07:59:10 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Tue, 10 Sep 2024 07:59:10 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: Message-ID: On Wed, 29 May 2024 12:58:02 GMT, Fredrik Bredberg wrote: > Removed the concept of an ObjectMonitor Responsible thread. > > The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. > > The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). > > After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. > > Passes tier1-tier7 on supported platforms. > x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. > Arm32 and Zero doesn't need any changes as far as I can tell. I've done basic testing on ppc64le, riscv64 and s390x using QEMU, but would appreciate if @TheRealMDoerr, @RealFYang and @offamitkumar could take it for a real test drive. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19454#issuecomment-2339935480 From lmao at openjdk.org Tue Sep 10 08:18:55 2024 From: lmao at openjdk.org (Liang Mao) Date: Tue, 10 Sep 2024 08:18:55 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v3] In-Reply-To: References: Message-ID: > Hi, > > It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. > > test/hotspot/jtreg/runtime and gc are clean. > > Thanks, > Liang Liang Mao has updated the pull request incrementally with one additional commit since the last revision: Keep klass_holder alive and add the test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20907/files - new: https://git.openjdk.org/jdk/pull/20907/files/da942579..751c29bb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20907&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20907&range=01-02 Stats: 238 lines in 8 files changed: 231 ins; 3 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20907/head:pull/20907 PR: https://git.openjdk.org/jdk/pull/20907 From lmao at openjdk.org Tue Sep 10 08:27:06 2024 From: lmao at openjdk.org (Liang Mao) Date: Tue, 10 Sep 2024 08:27:06 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v2] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 17:17:12 GMT, Leonid Mesnik wrote: > Can you please add regression test provided in the bug also. @lmesnik , could you please help add @D-D-H and @krk as authors to the test case? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2339997447 From lmao at openjdk.org Tue Sep 10 08:30:11 2024 From: lmao at openjdk.org (Liang Mao) Date: Tue, 10 Sep 2024 08:30:11 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v2] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 16:59:18 GMT, Erik ?sterlund wrote: >> Okay I agree that you can't use a Handle to reference this mirror if it's not already referenced by other code (already alive). Fetching out of the CLD::_handles doesn't keep it alive. >> >> // method - pre-checked for validity, but may be null meaning obsolete method >> // declaring_class_ptr - pre-checked for null >> jvmtiError >> JvmtiEnv::GetMethodDeclaringClass(Method* method, jclass* declaring_class_ptr) { >> NULL_CHECK(method, JVMTI_ERROR_INVALID_METHODID); >> (*declaring_class_ptr) = get_jni_class_non_null(method->method_holder()); >> return JVMTI_ERROR_NONE; >> } /* end GetMethodDeclaringClass */ >> >> So here, I don't see anything holding the method_holder() mirror through the Method, unless it's in the caller (a global jobject or something). Same with the GetFieldDeclaringClass function. > >> Okay I agree that you can't use a Handle to reference this mirror if it's not already referenced by other code (already alive). Fetching out of the CLD::_handles doesn't keep it alive. >> >> >> >> // method - pre-checked for validity, but may be null meaning obsolete method >> >> // declaring_class_ptr - pre-checked for null >> >> jvmtiError >> >> JvmtiEnv::GetMethodDeclaringClass(Method* method, jclass* declaring_class_ptr) { >> >> NULL_CHECK(method, JVMTI_ERROR_INVALID_METHODID); >> >> (*declaring_class_ptr) = get_jni_class_non_null(method->method_holder()); >> >> return JVMTI_ERROR_NONE; >> >> } /* end GetMethodDeclaringClass */ >> >> >> >> So here, I don't see anything holding the method_holder() mirror through the Method, unless it's in the caller (a global jobject or something). Same with the GetFieldDeclaringClass function. > > Exactly. As for the GetFieldDeclaringClass method, the XSL generated C++ code that calls it has a jclass of the relevant class or a subclass of it, which is fine in terms of ensuring the holder is kept alive. So it's really GetMethodDeclaringClass that is missing something. Its caller (also XSL generated C++ code) checks that the Method* has not been cleared in the jmethodID handle, and bails if the CLD is not alive. But nowhere do we call klass_holder() which is what safely reads the holder and ensures it is made strongly reachable. @fisk @stefank , I used the klass_holder to keep it alive and check alive via is_loader_alive. Thanks for the suggestion! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2340003707 From rkennke at openjdk.org Tue Sep 10 08:36:13 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 10 Sep 2024 08:36:13 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Mon, 9 Sep 2024 14:58:07 GMT, Stefan Karlsson wrote: >> src/hotspot/share/oops/oop.hpp line 134: >> >>> 132: inline Klass* forward_safe_klass(markWord m) const; >>> 133: inline size_t forward_safe_size(); >>> 134: inline void forward_safe_init_mark(); >> >> Given the comment these methods do not seem "safe" to me. Maybe use "raw" or something to better indicate that care must be taken to use them. >> >> Maybe the "safe" refers to use them only in "safe" contexts, but in Hotspot code iirc we use something like "raw" or "unsafe". > > Restating my earlier comment about this: These functions are mainly used by the GCs. In one of the patches I've cleaned away all usages except for those in Shenandoah. I would prefer to see these completely removed from the oops/ directory and let the GCs decide when and how to perform "safe" reads of these values. I've removed those methods. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751514466 From rkennke at openjdk.org Tue Sep 10 08:40:12 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 10 Sep 2024 08:40:12 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Mon, 9 Sep 2024 15:01:10 GMT, Thomas Schatzl wrote: >> Just to be clear, the second part of the quoted sentence is important: >>> could be any value *that is not a valid field offset* > >> could be any value that is not a valid field offset > > I understand that that "random value" needs to satisfy this condition. With compact headers, this value should only be used in C2, and not really as an actual offset. An earlier version of the change had the value in src/hotspot/share/opto/type.hpp instead, and only an assert(!UCOH) in oopDesc::klass_offset_in_bytes(). I think this would be a better solution overall, because it prevents accidental (and wrong) usage of the klass_offset in the runtime. Back then it has been rejected by somebody (don't remember), because it made the C2 diff a little messier, so I kept it like it is now. I would prefer to reinstate it, though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751522091 From rkennke at openjdk.org Tue Sep 10 08:44:11 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 10 Sep 2024 08:44:11 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Mon, 9 Sep 2024 12:12:23 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Try to avoid lea in loadNklass (aarch64) >> - Fix release build error > > src/hotspot/share/oops/objArrayKlass.inline.hpp line 74: > >> 72: void ObjArrayKlass::oop_oop_iterate(oop obj, OopClosureType* closure) { >> 73: // In this assert, we cannot safely access the Klass* with compact headers. >> 74: assert (UseCompactObjectHeaders || obj->is_array(), "obj must be array"); > > If we can't safely access the `Klass*` here, why is the call to `obj->klass()` below safe? Good question. This comment and assert can probably be removed (same for the similar comment/assert in TypeArrayKlass::oop_oop_iterate_impl(). Could be a left-over from a time when we had to deal with OM and/or stack-locks in the header. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751527745 From mli at openjdk.org Tue Sep 10 08:54:13 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 10 Sep 2024 08:54:13 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: <6I0T4rOjOTj-FZxpspatEo6j1_Num75bCAOBNxsrHI8=.097f731e-7c92-4eac-a379-c2df336cd412@github.com> References: <6I0T4rOjOTj-FZxpspatEo6j1_Num75bCAOBNxsrHI8=.097f731e-7c92-4eac-a379-c2df336cd412@github.com> Message-ID: On Mon, 9 Sep 2024 14:08:53 GMT, Roman Kennke wrote: >> Yes, I'm interested in it. Thanks for raising the discussion. :) > > If anybody is doing it, please send me a patch, or we can do it as a follow-up PR. Thanks. I'll send it to you if I finish it in time, otherwise I will do it in a separate pr. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751544394 From rehn at openjdk.org Tue Sep 10 08:59:05 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 10 Sep 2024 08:59:05 GMT Subject: RFR: 8339741: RISC-V: C ABI breakage for integer on stack [v2] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 07:23:53 GMT, Ludovic Henry wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Added comment >> - Merge branch 'master' into c_abi_error >> - Fixed > > Marked as reviewed by luhenry (Committer). Thanks @luhenry and @RealFYang ! > (PS: Having said that, a `mv` does seem more obvious. Your call) Thinking about it, I see sign/zero extensions bugs still pop-up in major oss, kernel, gcc, etc.., expecting us to have fixed all is a bit naive, and they are pretty easy to introduce. So my suggestion would be to go further; adding MASM methods to check register or an address for proper representation otherwise 'asserting' (in debug builds). ~Pseudo code for this case: expect_state_s32(dst.first()->is_stack() ? dst.first()->as_Register() : Address(sp, reg2offset_out(dst.first()))) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20912#issuecomment-2340067378 From dholmes at openjdk.org Tue Sep 10 09:05:08 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 10 Sep 2024 09:05:08 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v2] In-Reply-To: <2xh9F1kC4pZ4yHEwiPxJzaqO5REJ7vRXs6m1v0ADRss=.3f34dd37-8785-4f6e-857b-47d9e2e2f6bb@github.com> References: <2SvbUTqgsR8KlKSWLgNzh2RNo-uVtJm2xopck9yOmZs=.7cf5ed4f-dee6-434d-b33a-301c4bfc3fcc@github.com> <2xh9F1kC4pZ4yHEwiPxJzaqO5REJ7vRXs6m1v0ADRss=.3f34dd37-8785-4f6e-857b-47d9e2e2f6bb@github.com> Message-ID: On Tue, 10 Sep 2024 02:18:35 GMT, Liang Mao wrote: > > Maybe I'm misunderstanding the test case but isn't it using a jMethodID for a class that may have been unloaded? > > Yes. Okay then that is a programming error not a VM error. It is up to the application to ensure that classes are kept alive if you have jMethodID's for them: https://docs.oracle.com/en/java/javase/22/docs/specs/jni/design.html#accessing-fields-and-methods any validation the VM attempts with jMethodId's (and field id's) is best-effort and not required. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2340086448 From lmao at openjdk.org Tue Sep 10 09:10:09 2024 From: lmao at openjdk.org (Liang Mao) Date: Tue, 10 Sep 2024 09:10:09 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v2] In-Reply-To: References: <2SvbUTqgsR8KlKSWLgNzh2RNo-uVtJm2xopck9yOmZs=.7cf5ed4f-dee6-434d-b33a-301c4bfc3fcc@github.com> <2xh9F1kC4pZ4yHEwiPxJzaqO5REJ7vRXs6m1v0ADRss=.3f34dd37-8785-4f6e-857b-47d9e2e2f6bb@github.com> Message-ID: On Tue, 10 Sep 2024 09:02:35 GMT, David Holmes wrote: > Okay then that is a programming error not a VM error. It is up to the application to ensure that classes are kept alive if you have jMethodID's for them: > > https://docs.oracle.com/en/java/javase/22/docs/specs/jni/design.html#accessing-fields-and-methods > > any validation the VM attempts with jMethodId's (and field id's) is best-effort and not required. Thanks very much for the knowledge. Actually I'm not the original test creator and just try to fix the crash from a VM perspective. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2340097296 From fyang at openjdk.org Tue Sep 10 09:16:07 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 10 Sep 2024 09:16:07 GMT Subject: RFR: 8339741: RISC-V: C ABI breakage for integer on stack [v2] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 07:23:53 GMT, Ludovic Henry wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Added comment >> - Merge branch 'master' into c_abi_error >> - Fixed > > Marked as reviewed by luhenry (Committer). > Thanks @luhenry and @RealFYang ! > > > (PS: Having said that, a `mv` does seem more obvious. Your call) > > Thinking about it, I see sign/zero extensions bugs still pop-up in major oss, kernel, gcc, etc.., expecting us to have fixed all is a bit naive, and they are pretty easy to introduce. So my suggestion would be to go further; adding MASM methods to check register or an address for proper representation otherwise 'asserting' (in debug builds). > > ~Pseudo code for this case: > > ``` > expect_state_s32(dst.first()->is_stack() ? dst.first()->as_Register() : Address(sp, reg2offset_out(dst.first()))) > ``` Yeah, that makes sense to me especially for sign extension cases. And we already have simple checks for address like in movptr: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp#L1883. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20912#issuecomment-2340109762 From sgehwolf at openjdk.org Tue Sep 10 09:18:11 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Tue, 10 Sep 2024 09:18:11 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v9] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 17:28:16 GMT, Zdenek Zambersky wrote: >> Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: >> >> - Adapt JDK-8339148 >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Fix comment of WB::host_cpus() >> - Handle non-root + CGv2 >> - Add nested hierarchy to test framework >> - Revert "Add root check for SystemdMemoryAwarenessTest.java" >> >> This reverts commit 7e8d9ed46815096ae8c4502f3320ebf5208438d5. >> - Add root check for SystemdMemoryAwarenessTest.java >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - ... and 7 more: https://git.openjdk.org/jdk/compare/79b21a7e...30f32d22 > > I have done some testing on RHELs (build with changes from this PR + other 2 container PRs applied): > **RHEL-8** (cgroup1/non-root) > - test was skipped correctly > > **RHEL-9** (cgroup2/non-root) > - I saw failure of `active_processor_count` check. > - after investigation, I have found, that `cpu` cgroup controller is not delegated to `user at 1000.service` (and children) on rhel-9 (unlike in e.g. fedora) it only had `memory pids` (btw. available controllers at given "level" are listed in `cgroup.controllers` file in cgroups v2) > - when I modified `user at .service` to also delegate cpu controller, test passed > > Apart from issue with check for `active_processor_count` on RHEL-9/non-root, it looks good. However I don't know how to easily fix issue with `active_processor_count` check. Maybe check could be skipped for non-root. (Work-around is to modify system configuration.) @zzambers Thanks for taking a look. > I have done some testing on RHELs (build with changes from this PR + other 2 container PRs applied): **RHEL-8** (cgroup1/non-root) > > * test was skipped correctly > > > **RHEL-9** (cgroup2/non-root) > > * I saw failure of `active_processor_count` check. > > * after investigation, I have found, that `cpu` cgroup controller is not delegated to `user at 1000.service` (and children) on rhel-9 (unlike in e.g. fedora) it only had `memory pids` (btw. available controllers at given "level" are listed in `cgroup.controllers` file in cgroups v2) > > * when I modified `user at .service` to also delegate cpu controller, test passed Could it be that the setup you've done to employ delegation is similar to this one? https://github.com/jerboaa/openjdk-cgroupv2-setup/blob/97690683af17b303276ea473fe44b3dde7ead327/config_cgroupv2.yml#L24-L32 > Apart from issue with check for `active_processor_count` on RHEL-9/non-root, it looks good. However I don't know how to easily fix issue with `active_processor_count` check. Maybe check could be skipped for non-root. (Work-around is to modify system configuration.) Do existing podman container tests pass on that system? It seems fair to assume that that's the baseline config for container tests in general: systemd ones or podman/docker. I know that on cg v2 not all container tests pass out-of-the-box. In particular certain CPU awareness tests. Keeping that basic idea in terms of required config for those tests consistent with other container tests seem adequate to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2340114316 From lmao at openjdk.org Tue Sep 10 09:20:47 2024 From: lmao at openjdk.org (Liang Mao) Date: Tue, 10 Sep 2024 09:20:47 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v4] In-Reply-To: References: Message-ID: <51CiZ13xi8PSarGITqJO8JuiQXrKzqgYsfd2lzA22Fk=.2b784383-7af5-4fed-b207-5092076ff12c@github.com> > Hi, > > It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. > > test/hotspot/jtreg/runtime and gc are clean. > > Thanks, > Liang Liang Mao has updated the pull request incrementally with one additional commit since the last revision: Fix windows build error of agent cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20907/files - new: https://git.openjdk.org/jdk/pull/20907/files/751c29bb..385768df Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20907&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20907&range=02-03 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20907/head:pull/20907 PR: https://git.openjdk.org/jdk/pull/20907 From rkennke at openjdk.org Tue Sep 10 09:31:09 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 10 Sep 2024 09:31:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Tue, 10 Sep 2024 08:37:43 GMT, Roman Kennke wrote: >>> could be any value that is not a valid field offset >> >> I understand that that "random value" needs to satisfy this condition. > > With compact headers, this value should only be used in C2, and not really as an actual offset. An earlier version of the change had the value in src/hotspot/share/opto/type.hpp instead, and only an assert(!UCOH) in oopDesc::klass_offset_in_bytes(). I think this would be a better solution overall, because it prevents accidental (and wrong) usage of the klass_offset in the runtime. Back then it has been rejected by somebody (don't remember), because it made the C2 diff a little messier, so I kept it like it is now. I would prefer to reinstate it, though. > (Fwiw, the method is also used during Universe initialization). Yes, but only in the -UCOH branch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751604467 From stefank at openjdk.org Tue Sep 10 10:05:10 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 10 Sep 2024 10:05:10 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Tue, 10 Sep 2024 08:41:16 GMT, Roman Kennke wrote: >> src/hotspot/share/oops/objArrayKlass.inline.hpp line 74: >> >>> 72: void ObjArrayKlass::oop_oop_iterate(oop obj, OopClosureType* closure) { >>> 73: // In this assert, we cannot safely access the Klass* with compact headers. >>> 74: assert (UseCompactObjectHeaders || obj->is_array(), "obj must be array"); >> >> If we can't safely access the `Klass*` here, why is the call to `obj->klass()` below safe? > > Good question. This comment and assert can probably be removed (same for the similar comment/assert in TypeArrayKlass::oop_oop_iterate_impl(). Could be a left-over from a time when we had to deal with OM and/or stack-locks in the header. FWIW, I've been running tests with this assert restored (and the one in TypeArrayKlass) without hitting any problems. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751656595 From lucy at openjdk.org Tue Sep 10 10:08:03 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 10 Sep 2024 10:08:03 GMT Subject: RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM In-Reply-To: References: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> Message-ID: On Mon, 9 Sep 2024 19:11:44 GMT, Vladimir Kozlov wrote: > There were several optimization done for this code by @RealLucy [JDK-8223444](https://bugs.openjdk.org/browse/JDK-8223444) and [JDK-8231460](https://bugs.openjdk.org/browse/JDK-8231460). But it is still using `linked list` for free segments. Should we consider something more complex? Or it is not an issue? During my testing of the mentioned fixes, I never saw such long freelists. They can only appear with really severe fragmentation in the code cache. Artificial creation would work like allocating many (small) code blobs and then freeing every other. Adjacent free slots are fused (combined together) during free processing. During code blob allocation, the free list is searched first for a match. That usually helps well against a growing free list. To gain more insight, you may want to try `-XX:+PrintCodeHeapAnalytics`. The same information (with more options) is also available via jcmd. CodeCacheSegmentSize should not be chosen too small to keep memory and processing overhead in check. 64 bytes appears to be a feasible choice. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20864#issuecomment-2340227771 From lmao at openjdk.org Tue Sep 10 10:15:43 2024 From: lmao at openjdk.org (Liang Mao) Date: Tue, 10 Sep 2024 10:15:43 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v5] In-Reply-To: References: Message-ID: > Hi, > > It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. > > test/hotspot/jtreg/runtime and gc are clean. > > Thanks, > Liang Liang Mao has updated the pull request incrementally with one additional commit since the last revision: Exclude libagent8339725.cpp compiling for windows ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20907/files - new: https://git.openjdk.org/jdk/pull/20907/files/385768df..d08bdeb0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20907&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20907&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20907/head:pull/20907 PR: https://git.openjdk.org/jdk/pull/20907 From stuefe at openjdk.org Tue Sep 10 10:22:10 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 10 Sep 2024 10:22:10 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v6] In-Reply-To: <6SbHbHK4n6vHaDLeC-X1oFBcoGE1osgeSXV7gq36xP8=.6f7e9fc4-ff7d-412f-9e14-5650dfa6f5d9@github.com> References: <6SbHbHK4n6vHaDLeC-X1oFBcoGE1osgeSXV7gq36xP8=.6f7e9fc4-ff7d-412f-9e14-5650dfa6f5d9@github.com> Message-ID: On Fri, 6 Sep 2024 16:20:52 GMT, Coleen Phillimore wrote: >> This change stores InstanceKlass for interface and abstract classes in the non-class metaspace, since class metaspace will have limits on number of classes that can be represented when Lilliput changes go in. Classes that have no instances created for them don't require compressed class pointers. The generated LambdaForm classes are also AllStatic, and changing them to abstract moves them to non-class metaspace too. It's not technically great to make them abstract and not final but you can't have both. Java classfile access flags have no way of specifying something like AllStatic. >> >> Tested with tier1-8. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Replace Metaspace::is_compressed_klass_ptr with CompressedKlassPointers::is_in_encoding_range. This looks good to me. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19157#pullrequestreview-2292015058 From bulasevich at openjdk.org Tue Sep 10 11:05:07 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 10 Sep 2024 11:05:07 GMT Subject: RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM In-Reply-To: <-cCyBdiqtII0_9uS1wl8CEZ_7a-Q6Wf6tH2PwROp1y8=.b4822d51-a9f1-4d50-88f2-0ab8c94b37f5@github.com> References: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> <-cCyBdiqtII0_9uS1wl8CEZ_7a-Q6Wf6tH2PwROp1y8=.b4822d51-a9f1-4d50-88f2-0ab8c94b37f5@github.com> Message-ID: On Tue, 10 Sep 2024 01:58:05 GMT, Vladimir Kozlov wrote: > I will ask someone to do our performance testing to confirm your results. Good. Thank you very much. I am foxusing mainly for memory optimization, the performance improvement is a side effect. Anyway, more testing would be good. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20864#issuecomment-2340346144 From bulasevich at openjdk.org Tue Sep 10 11:05:07 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 10 Sep 2024 11:05:07 GMT Subject: RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM In-Reply-To: References: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> Message-ID: <8YanHZxslhO-8HUEwO9VLlFETrk1W-rsy2EnLZYiLNE=.db2331a1-b433-45d5-9645-1620f5f4c76b@github.com> On Tue, 10 Sep 2024 10:05:09 GMT, Lutz Schmidt wrote: > During my testing of the mentioned fixes, I never saw such long freelists. I see the warning every time I run Renaissance benchmarks with the VerifyCodeCache option. Is this a problem and not just a reflection of the fact that the codeheap contains 20K of methods with some holes in between? $ ./build/linux-aarch64-server-fastdebug/jdk/bin/java -XX:+VerifyCodeCache -XX:+PrintCodeCache -jar ~/renaissance-jmh-0.14.2-95-gae3b5ce.jar Dotty # Run progress: 0.00% complete, ETA 00:00:00 # Fork: 1 of 5 # Warmup Iteration 1: 86079.453 ms/op # Warmup Iteration 2: 8212.947 ms/op # Warmup Iteration 3: 5416.496 ms/op # Warmup Iteration 4: 6841.073 ms/op # Warmup Iteration 5: 4248.337 ms/op # Warmup Iteration 6: OpenJDK 64-Bit Server VM warning: CodeHeap: # of free blocks > 10000 3530.697 ms/op # Warmup Iteration 7: 3186.153 ms/op > CodeCacheSegmentSize should not be chosen too small to keep memory and processing overhead in check. 64 bytes appears to be a feasible choice. OK. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20864#issuecomment-2340350586 From rkennke at openjdk.org Tue Sep 10 11:29:09 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 10 Sep 2024 11:29:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com> Message-ID: On Tue, 10 Sep 2024 07:53:23 GMT, Roman Kennke wrote: >> src/hotspot/share/gc/shared/collectedHeap.cpp line 232: >> >>> 230: } >>> 231: >>> 232: // With compact headers, we can't safely access the class, due >> >> Suggestion: >> >> // With compact headers, we can't safely access the klass, due >> >> >> This is the case why? Because we might not have copied the header yet? Is this method actually ever used while the forwarded object is unstable? >> Given this is used for verification only afaik, we should make an effort to provide that check. > > With compact headers, we can't safely access the Klass* when the object has been forwarded, because non-full-GC-forwarding temporarily overwrites the mark-word, and thus the Klass*, with the forwarding pointer, and here we have no way to make a distinction between Full-GC and regular GC forwarding. > > I improved the code to make the check when the object is not forwarded. Not sure if we could/should do more (e.g. pass around is_full argument to make the distinction, or find the - possibly few - places where we might call is_oop() on from-space objects in regular GC and do the check in a forwardee-safe way?). Ah, I found it! It seems only the ShenandoahVerifier calls oop_iterate() on from_space objects, which can have a forwarding, which would mess with the object's Klass*. We're lucky because that iterator doesn't visit the Klass*. I see the following ways out: - The caller must ensure that the oop is ok and Klass* is accessible. I could do that in the ShenandoahVerifier. It kinda defeats the point, though, we want the verifier operate on the 'raw' object, not necessarily the forwardee. - Next easy way out would be to use 'this' instead of obj->klass(). Should makes sense, because it should always be the same. Using 'this' in the assert (this->is_array_klass()) is kinda bogus, though. And asserting (this == obj->klass()) would be nice, but would have the same problem as before where we would need to exclude UCOH for the case where Shenandoah needs it. In-fact, this is done already in oopDesc::oop_iterate_backwards(), but also excluding UCOH. - We could add a hook in the iterator that gives the Klass* for a given oop, which can then be overridden by the actual iterator to do the right thing, e.g. load the Klass* from the forwardee. WDYT? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751770293 From jbhateja at openjdk.org Tue Sep 10 11:45:25 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 10 Sep 2024 11:45:25 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v2] In-Reply-To: References: Message-ID: <-gmI8CWVHCk3ebpH6M3IaB4avuiG7QpzykKxYzGso2o=.b8072269-55e9-4c57-85e9-9f05fba5d934@github.com> > - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The > condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. > - This saves emitting an explicit MOVZX instruction after setCC. > - These new instructions are encoded using 4 byte Extended EVEX encoding. > > Validation performed over stand alone test point using Intel SDE. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review resolutions. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20920/files - new: https://git.openjdk.org/jdk/pull/20920/files/bfe6f206..1488b588 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20920&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20920&range=00-01 Stats: 18 lines in 7 files changed: 0 ins; 0 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/20920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20920/head:pull/20920 PR: https://git.openjdk.org/jdk/pull/20920 From coleenp at openjdk.org Tue Sep 10 11:48:13 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 10 Sep 2024 11:48:13 GMT Subject: RFR: 8338526: Don't store abstract and interface Klasses in class metaspace [v6] In-Reply-To: <6SbHbHK4n6vHaDLeC-X1oFBcoGE1osgeSXV7gq36xP8=.6f7e9fc4-ff7d-412f-9e14-5650dfa6f5d9@github.com> References: <6SbHbHK4n6vHaDLeC-X1oFBcoGE1osgeSXV7gq36xP8=.6f7e9fc4-ff7d-412f-9e14-5650dfa6f5d9@github.com> Message-ID: On Fri, 6 Sep 2024 16:20:52 GMT, Coleen Phillimore wrote: >> This change stores InstanceKlass for interface and abstract classes in the non-class metaspace, since class metaspace will have limits on number of classes that can be represented when Lilliput changes go in. Classes that have no instances created for them don't require compressed class pointers. The generated LambdaForm classes are also AllStatic, and changing them to abstract moves them to non-class metaspace too. It's not technically great to make them abstract and not final but you can't have both. Java classfile access flags have no way of specifying something like AllStatic. >> >> Tested with tier1-8. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Replace Metaspace::is_compressed_klass_ptr with CompressedKlassPointers::is_in_encoding_range. Thanks for reviewing Ioi and Thomas, and thank you Thomas for the suggested changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19157#issuecomment-2340431106 From coleenp at openjdk.org Tue Sep 10 11:48:13 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 10 Sep 2024 11:48:13 GMT Subject: Integrated: 8338526: Don't store abstract and interface Klasses in class metaspace In-Reply-To: References: Message-ID: On Thu, 9 May 2024 13:51:09 GMT, Coleen Phillimore wrote: > This change stores InstanceKlass for interface and abstract classes in the non-class metaspace, since class metaspace will have limits on number of classes that can be represented when Lilliput changes go in. Classes that have no instances created for them don't require compressed class pointers. The generated LambdaForm classes are also AllStatic, and changing them to abstract moves them to non-class metaspace too. It's not technically great to make them abstract and not final but you can't have both. Java classfile access flags have no way of specifying something like AllStatic. > > Tested with tier1-8. This pull request has now been integrated. Changeset: ad104932 Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/ad104932e6c26806c353ad048ce5cff7d2b4c29a Stats: 92 lines in 19 files changed: 42 ins; 12 del; 38 mod 8338526: Don't store abstract and interface Klasses in class metaspace Reviewed-by: stuefe, iklam ------------- PR: https://git.openjdk.org/jdk/pull/19157 From tschatzl at openjdk.org Tue Sep 10 12:02:11 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 10 Sep 2024 12:02:11 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v18] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 11:15:47 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion > - riscv port for JEP 475 src/hotspot/cpu/aarch64/gc/g1/g1BarrierSetAssembler_aarch64.cpp line 210: > 208: Label& done, > 209: bool new_val_may_be_null) { > 210: // Does store cross heap regions? Suggestion: // Does store cross heap regions? Indentation ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1751721626 From stuefe at openjdk.org Tue Sep 10 12:07:09 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 10 Sep 2024 12:07:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 15:49:57 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Try to avoid lea in loadNklass (aarch64) >> - Fix release build error > > src/hotspot/share/oops/compressedKlass.cpp line 214: > >> 212: ss.print("Class space size (%zu) exceeds the maximum possible size (%zu)", >> 213: len, max_encoding_range_size()); >> 214: vm_exit_during_initialization(ss.base()); > > Why does this exit and not turn off compressed klass pointers and compact object headers? This is tricky. We are already deep in initialization and have done a couple of decisions based on +UseCompressedClassPointers (e.g. CDS setup). I *think* we could still go with -UseCCP, but I wonder whether this is wise. Note that this error is not new. In the old code, we simply asserted. That left us with UB in release builds, which remains unresolved. I simply made the error explicit in release too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751819814 From zzambers at openjdk.org Tue Sep 10 12:15:16 2024 From: zzambers at openjdk.org (Zdenek Zambersky) Date: Tue, 10 Sep 2024 12:15:16 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v9] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 17:28:16 GMT, Zdenek Zambersky wrote: >> Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: >> >> - Adapt JDK-8339148 >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Fix comment of WB::host_cpus() >> - Handle non-root + CGv2 >> - Add nested hierarchy to test framework >> - Revert "Add root check for SystemdMemoryAwarenessTest.java" >> >> This reverts commit 7e8d9ed46815096ae8c4502f3320ebf5208438d5. >> - Add root check for SystemdMemoryAwarenessTest.java >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - ... and 7 more: https://git.openjdk.org/jdk/compare/dd1b7120...30f32d22 > > I have done some testing on RHELs (build with changes from this PR + other 2 container PRs applied): > **RHEL-8** (cgroup1/non-root) > - test was skipped correctly > > **RHEL-9** (cgroup2/non-root) > - I saw failure of `active_processor_count` check. > - after investigation, I have found, that `cpu` cgroup controller is not delegated to `user at 1000.service` (and children) on rhel-9 (unlike in e.g. fedora) it only had `memory pids` (btw. available controllers at given "level" are listed in `cgroup.controllers` file in cgroups v2) > - when I modified `user at .service` to also delegate cpu controller, test passed > > Apart from issue with check for `active_processor_count` on RHEL-9/non-root, it looks good. However I don't know how to easily fix issue with `active_processor_count` check. Maybe check could be skipped for non-root. (Work-around is to modify system configuration.) > @zzambers Thanks for taking a look. > > > I have done some testing on RHELs (build with changes from this PR + other 2 container PRs applied): **RHEL-8** (cgroup1/non-root) > > ``` > > * test was skipped correctly > > ``` > > > > > > > > > > > > > > > > > > > > > > > > **RHEL-9** (cgroup2/non-root) > > ``` > > * I saw failure of `active_processor_count` check. > > > > * after investigation, I have found, that `cpu` cgroup controller is not delegated to `user at 1000.service` (and children) on rhel-9 (unlike in e.g. fedora) it only had `memory pids` (btw. available controllers at given "level" are listed in `cgroup.controllers` file in cgroups v2) > > > > * when I modified `user at .service` to also delegate cpu controller, test passed > > ``` > > Could it be that the setup you've done to employ delegation is similar to this one? https://github.com/jerboaa/openjdk-cgroupv2-setup/blob/97690683af17b303276ea473fe44b3dde7ead327/config_cgroupv2.yml#L24-L32 I have just added `cpu` to Delegate list of `user at .service`, looks similar, to what is done there. I see use of `Delegate=yes` in your link, that probably delegates all. Thanks for this link. > > > Apart from issue with check for `active_processor_count` on RHEL-9/non-root, it looks good. However I don't know how to easily fix issue with `active_processor_count` check. Maybe check could be skipped for non-root. (Work-around is to modify system configuration.) > > Do existing podman container tests pass on that system? It seems fair to assume that that's the baseline config for container tests in general: systemd ones or podman/docker. I know that on cg v2 not all container tests pass out-of-the-box. In particular certain CPU awareness tests. Keeping that basic idea in terms of required config for those tests consistent with other container tests seem adequate to me. You are right. I have ran container tests in my VM and indeed faced issue with missing cpuset controller. (yesterday I forgot to set required properties, so most of them got skipped) Interesting that we have not faced this issue in our testing (container tests are passing). However that is probably because we run containers tests in different way (we don't use VMs for it, but rather run them in beaker). I would need to investigate. Anyway good to know, there can be this issue with cgroup controllers. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2340526280 From stuefe at openjdk.org Tue Sep 10 12:16:11 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 10 Sep 2024 12:16:11 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 15:50:50 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Try to avoid lea in loadNklass (aarch64) >> - Fix release build error > > src/hotspot/share/oops/compressedKlass.cpp line 222: > >> 220: return; >> 221: } >> 222: #endif > > Why not add null pd_initialize to zero to remove this conditional code? I can do that. Added to backlist (https://wiki.openjdk.org/display/lilliput/JEP-450+Review+Todo) > src/hotspot/share/oops/compressedKlass.cpp line 224: > >> 222: #endif >> 223: >> 224: if (tiny_classpointer_mode()) { > > I kind of agree with Thomas Schatzl for this. Maybe it should be compact_classpointer_mode(). It's nice to have a new string for grep, but they're not really that tiny. Yes, makes sense. Added to backlist. This coding was developed somewhat independently from +COH at the beginning, but now the two parts (tinycp and the rest of COH) depend on each other anyway. I should just use UseCompactObjectHeaders or a flag directly derived from it. > src/hotspot/share/oops/compressedKlass.cpp line 234: > >> 232: _range = len; >> 233: >> 234: constexpr int log_cacheline = 6; > > Is 6 the log of DEFAULT_CACHE_LINE_SIZE? 64, yes > src/hotspot/share/oops/compressedKlass.cpp line 243: > >> 241: } else { >> 242: >> 243: // In legacy mode, we try, in order of preference: > > Can you not use the word 'legacy' here? Maybe in "non-compact object header mode"... okay. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751828214 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751831035 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751831994 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751833034 From coleenp at openjdk.org Tue Sep 10 12:22:09 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 10 Sep 2024 12:22:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 12:03:59 GMT, Thomas Stuefe wrote: >> src/hotspot/share/oops/compressedKlass.cpp line 214: >> >>> 212: ss.print("Class space size (%zu) exceeds the maximum possible size (%zu)", >>> 213: len, max_encoding_range_size()); >>> 214: vm_exit_during_initialization(ss.base()); >> >> Why does this exit and not turn off compressed klass pointers and compact object headers? > > This is tricky. We are already deep in initialization and have done a couple of decisions based on +UseCompressedClassPointers (e.g. CDS setup). I *think* we could still go with -UseCCP, but I wonder whether this is wise. > > Note that this error is not new. In the old code, we simply asserted. That left us with UB in release builds, which remains unresolved. I simply made the error explicit in release too. Ok, in this case, that's fine if we already asserted. A fatal error is better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751840556 From sgehwolf at openjdk.org Tue Sep 10 12:29:08 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Tue, 10 Sep 2024 12:29:08 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v9] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 12:12:50 GMT, Zdenek Zambersky wrote: > I have just added `cpu` to Delegate list of `user at .service`, looks similar, to what is done there. I see use of `Delegate=yes` in your link, that probably delegates all. > Thanks for this link. FWIW, I was able to reproduce what you said on RHEL 9 when run as user. The mentioned config fixes it. I'll add a hint when the assertion fails. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2340572838 From lucy at openjdk.org Tue Sep 10 12:41:06 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 10 Sep 2024 12:41:06 GMT Subject: RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM In-Reply-To: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> References: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> Message-ID: On Thu, 5 Sep 2024 00:58:10 GMT, Boris Ulasevich wrote: > With this change, I have adjusted the default settings for CodeCacheSegmentSize and CodeEntryAlignment for AARCH and ARM32. The main goal is to improve code density by reducing the number of wasted bytes (approximately **4%** waste). Improving code density may also have the side effect of boosting performance in large applications > > Each nmethod occupies a number of code cache segments (minimum allocation blocks). Since the size of an nmethod is not aligned to 128 bytes, the last occupied segment is half empty. Reducing the size of the code cache segments correspondingly minimizes waste. However, we should be careful about reducing the CodeCacheSegmentSize too much, as smaller segment sizes will increase the overhead of the CodeHeap::_segmap bitmap. A CodeCacheSegmentSize of 64 seems to be an optimal balance. > > The current large default value for CodeCacheSegmentSize (64+64) was historically introduced with the comment "Tiered compilation has large code-entry alignment" which doesn't make much sense to me. The history of this comment and value is as follows: > - The PPC port was introduced with CodeEntryAlignment=128 (recently reduced to 64: https://github.com/openjdk/jdk/commit/09a78b5d) and CodeCacheSegmentSize was adjusted accordingly for that platform. > - Soon after, the 128-byte alignment was applied to all platforms to hide a debug mode warning (https://github.com/openjdk/jdk/commit/e8bc971d). Despite the change (and Segmented Code Cache introduced later), the warning can still be reproduced today using the -XX:+VerifyCodeCache fastdebug option in large applications (10K nmethods ~ 10K free blocks in between them). > > I believe it is time to remove the comment and update the default value. > > I also suggest updating the default CodeEntryAlignment value for AARCH. The current setting is much larger than for x86 and was likely based on the typical cache line size of 64 bytes. Cortex-A57, A72 architecture software optimisation guides recommend a 32-byte alignment for subroutine entry points. Neoverse architecture software optimisation guides do not mention recommended entry point alignment. > > For reference, the default [function_align setting in GCC](https://github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/tuning_models/cortexa72.h#L44) is typically 16 or 32 bytes, depending on the target architecture. > > Hotspot performance tests with -XX:CodeCacheSegmentSize=64 and -XX:CodeEntryAlignment=16 options showed the following results: > - No performance impact on ... The warning can be understood as a hint to a potential efficiency issue. There is no hard limit for #free blocks. I do not have the Renaissance suite readily runnable at hand. Would you mind sending me the `-XX:+PrintCodeHeapAnalytics` output from such a run? You could use lutz.schmidt at sap.com to send it (not many people will be interested in it). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20864#issuecomment-2340613646 From rkennke at openjdk.org Tue Sep 10 12:42:48 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 10 Sep 2024 12:42:48 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v10] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with six additional commits since the last revision: - More touch-ups, fix Shenandoah oop iterator - Remove asserts in XArrayKlass::oop_oop_iterate() - Various touch-ups - Improve is_oop() - Rename GCForwarding -> FullGCForwarding; some touch-ups - Fix comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/2884499a..5da250cf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=08-09 Stats: 238 lines in 36 files changed: 74 ins; 65 del; 99 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From stuefe at openjdk.org Tue Sep 10 12:42:49 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 10 Sep 2024 12:42:49 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v10] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: On Mon, 9 Sep 2024 15:59:43 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with six additional commits since the last revision: >> >> - More touch-ups, fix Shenandoah oop iterator >> - Remove asserts in XArrayKlass::oop_oop_iterate() >> - Various touch-ups >> - Improve is_oop() >> - Rename GCForwarding -> FullGCForwarding; some touch-ups >> - Fix comment > > src/hotspot/share/oops/compressedKlass.cpp line 116: > >> 114: _range = end - _base; >> 115: >> 116: DEBUG_ONLY(assert_is_valid_encoding(addr, len, _base, _shift);) > > Can you refactor so the aarch64 path runs this same code without duplication? In tinycp mode, aarch64 runs this code though? The aarch64 variant of pd_initialize just returns then. In non-COH mode (preexisting, not touched by this patch) Aarch64 needs its own handling. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751866773 From stuefe at openjdk.org Tue Sep 10 12:42:49 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 10 Sep 2024 12:42:49 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: <4cTKmlYUEtFpr2TURf25gd7-_eSb-uF0cC0BmLl6wd0=.b9f0482d-5439-421b-9a29-7a014fb72558@github.com> References: <86SISXYdKqHq5_nSqeVVNgmxplVK6QuHvOjCmiCKkzQ=.92ac6af1-9a94-4068-b625-1e331314826e@github.com> <4cTKmlYUEtFpr2TURf25gd7-_eSb-uF0cC0BmLl6wd0=.b9f0482d-5439-421b-9a29-7a014fb72558@github.com> Message-ID: On Mon, 9 Sep 2024 16:01:10 GMT, Coleen Phillimore wrote: >> src/hotspot/share/oops/compressedKlass.hpp line 43: >> >>> 41: >>> 42: // Tiny-class-pointer mode >>> 43: static int _tiny_cp; // -1, 0=true, 1=false >> >> Suggestion: >> >> static int _tiny_cp; // -1 = uninitialized, 0 = true, 1 = false >> >> In addition to that, I am not sure if introducing a new term ("tiny") for compact class header related changes (and just here) makes the code more clear; I would have expected a "_compact_" prefix. Also all other members use "k"-klass and spell out "klass pointer", so I would prefer to keep that style. > > I agree with this. 'cp' reads as ConstantPool for me even though this is a different context. Okay, I will change that ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751867998 From rehn at openjdk.org Tue Sep 10 12:53:18 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 10 Sep 2024 12:53:18 GMT Subject: RFR: 8339771: RISC-V: Reduce icache flushes [v2] In-Reply-To: References: Message-ID: > Hey, please consider, > > All code which is offline (behind a barrier) do not need global icache flushes. > As we can instead in slow path locally (thread and hart) emit fence.i. > But if we were to be context switch do a hart which have not had fence.i emitted we can still fetch stale instructions. > To handle this case new now have kernel support: > https://docs.kernel.org/arch/riscv/cmodx.html > > It's not perfect as we will be emitting fence.i on any context switch for any thread with this patch, even if that thread do not execute on code heap (non attached native thread), and even if there was no changes to code heap. > But this is in many cases much faster as the icache flush global IPI is very intrusive. > Particular cases are running a concurrent gc with small head room. > In such scenario I measured 15% increased throughput on VF2. > A large CPU or less head room (faster GC cycles) will yield even more performance boost. > > Note that this requires 6.10 kernel. > > I'm running VF2 with 6.11-rc3 kernel and this passed tier1-3. (With setting on) > > Later we probably want this default on, but as it's hard to test I'll leave default off. Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: Comment, moved init after feature enabling ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20913/files - new: https://git.openjdk.org/jdk/pull/20913/files/40d3e1f3..8411301b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20913&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20913&range=00-01 Stats: 38 lines in 1 file changed: 18 ins; 20 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20913.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20913/head:pull/20913 PR: https://git.openjdk.org/jdk/pull/20913 From aph at openjdk.org Tue Sep 10 12:58:03 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 10 Sep 2024 12:58:03 GMT Subject: RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM In-Reply-To: References: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> Message-ID: On Mon, 9 Sep 2024 18:05:22 GMT, Igor Veresov wrote: > I don't quite remember making this change... And I don't remember any reasons as to why it might have been needed. OK, thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20864#issuecomment-2340670748 From tschatzl at openjdk.org Tue Sep 10 13:03:15 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 10 Sep 2024 13:03:15 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v18] In-Reply-To: References: Message-ID: <6oxoxxR5yGICT3d_4Wip3egvyFiQLWb3HWMxpL41jwY=.9ca3bb60-1e86-4bd2-a3b4-26eadc409eb5@github.com> On Mon, 9 Sep 2024 11:15:47 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion > - riscv port for JEP 475 Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2292405233 From sgehwolf at openjdk.org Tue Sep 10 13:23:45 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Tue, 10 Sep 2024 13:23:45 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v10] In-Reply-To: References: Message-ID: > Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. > > I'm adding those tests in order to not regress another time. > > Testing: > - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. > - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) > - [x] GHA Severin Gehwolf has updated the pull request incrementally with one additional commit since the last revision: Improve reliability of cpu quota test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19530/files - new: https://git.openjdk.org/jdk/pull/19530/files/30f32d22..0e52e004 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19530&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19530&range=08-09 Stats: 30 lines in 2 files changed: 15 ins; 5 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/19530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19530/head:pull/19530 PR: https://git.openjdk.org/jdk/pull/19530 From zzambers at openjdk.org Tue Sep 10 13:28:13 2024 From: zzambers at openjdk.org (Zdenek Zambersky) Date: Tue, 10 Sep 2024 13:28:13 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v6] In-Reply-To: <_PBSCDW3EI69FBqycC9IM-GYWlcgpM7I3qTsEiRJp6g=.fcf693ac-501a-4084-ab40-58d1fe895c0b@github.com> References: <-Ff0X6wkJWy78vOGT8F1m939z9Aoq8VjbUi_OTNoxko=.9447519f-8e98-4d9d-9c94-86cdbbbe3ae1@github.com> <_PBSCDW3EI69FBqycC9IM-GYWlcgpM7I3qTsEiRJp6g=.fcf693ac-501a-4084-ab40-58d1fe895c0b@github.com> Message-ID: <3xrkC7uUJpfmJrcERIl69CM8G5fbKONia5lCFzsiMK4=.2ea93ff8-1721-4f3c-90b1-70b86f98a03c@github.com> On Tue, 10 Sep 2024 13:23:37 GMT, Severin Gehwolf wrote: >> Looking through the coding it looks more or less okay to me; but if you really need to run it under user 'root' I think we will not have so much use for this in our test environments because we use other test users. >> Not saying that this is a very bad thing, maybe it is just the way it is, that 'root' is needed ? > > @MBaesken @zzambers I've updated this patch which should cover the config issue as pointed out by @zzambers. The latest version throws `SkippedException` with a hint should the match fail. Done in https://github.com/openjdk/jdk/pull/19530/commits/0e52e004e9766301294743aa42c8306d7a25a34f and added this info to the bug as well. > > Good to go now? @jerboaa Looks good, thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2340777251 From sgehwolf at openjdk.org Tue Sep 10 13:28:13 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Tue, 10 Sep 2024 13:28:13 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v6] In-Reply-To: References: <-Ff0X6wkJWy78vOGT8F1m939z9Aoq8VjbUi_OTNoxko=.9447519f-8e98-4d9d-9c94-86cdbbbe3ae1@github.com> Message-ID: <_PBSCDW3EI69FBqycC9IM-GYWlcgpM7I3qTsEiRJp6g=.fcf693ac-501a-4084-ab40-58d1fe895c0b@github.com> On Fri, 30 Aug 2024 11:05:24 GMT, Matthias Baesken wrote: >> Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - Add root check for SystemdMemoryAwarenessTest.java >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Add Whitebox check for host cpu >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Merge branch 'master' into jdk-8333446-systemd-slice-tests >> - Fix comments >> - 8333446: Add tests for hierarchical container support > > Looking through the coding it looks more or less okay to me; but if you really need to run it under user 'root' I think we will not have so much use for this in our test environments because we use other test users. > Not saying that this is a very bad thing, maybe it is just the way it is, that 'root' is needed ? @MBaesken @zzambers I've updated this patch which should cover the config issue as pointed out by @zzambers. The latest version throws `SkippedException` with a hint should the match fail. Done in https://github.com/openjdk/jdk/pull/19530/commits/0e52e004e9766301294743aa42c8306d7a25a34f and added this info to the bug as well. Good to go now? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2340771546 From zzambers at openjdk.org Tue Sep 10 13:32:09 2024 From: zzambers at openjdk.org (Zdenek Zambersky) Date: Tue, 10 Sep 2024 13:32:09 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v10] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 13:23:45 GMT, Severin Gehwolf wrote: >> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and fails on cgroups v2 due to the way how [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) was implemented when JDK 13 was a thing. Therefore immediately problem-listed. It should get unlisted once [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) merges. >> >> I'm adding those tests in order to not regress another time. >> >> Testing: >> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. >> - [x] New systemd test on cg v1 (passes). Fails on cg v2 (due to JDK-8322420) >> - [x] GHA > > Severin Gehwolf has updated the pull request incrementally with one additional commit since the last revision: > > Improve reliability of cpu quota test Marked as reviewed by zzambers (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/19530#pullrequestreview-2292534579 From jeisl at openjdk.org Tue Sep 10 13:54:23 2024 From: jeisl at openjdk.org (Josef Eisl) Date: Tue, 10 Sep 2024 13:54:23 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> Message-ID: <1xFVXOUr023IxIHucx0v8dket_QOWsIV0G3wVHnF5aE=.23e8334b-a4ed-4d94-8ad9-376ffe043882@github.com> On Wed, 21 Aug 2024 22:14:40 GMT, Magnus Ihse Bursie wrote: >> As a preparation for Hermetic Java, we need to have a way to look up during runtime if we are using a statically linked library or not. >> >> This change will be the first step needed towards compiling the object files only once, and then link them into either dynamic or static libraries. (The only exception will be the linktype.c[pp] files, which needs to be compiled twice, once for the dynamic libraries and once for the static libraries.) Getting there will require further work though. >> >> This is part of the changes that make up the draft PR https://github.com/openjdk/jdk/pull/19478, which I have broken out. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Also update build to link properly src/java.desktop/unix/native/libawt/awt/awt_LoadLibrary.c line 135: > 133: #endif > 134: > 135: if (!JVM_IsStaticallyLinked()) { These "cross-library" link-type probes are a challenge for GraalVM native image. In native image, we statically link most standard JNI libraries into the final executable. However in some cases, for example AWT, static linking is not feasible due to their dependencies. Thus, we resort to dynamic linking. Have a mix of static and dynamic linking means that `JVM_IsStaticallyLinked` should give different answers based on the caller. Example: lets assume a follow-up change introduces a call to `JVM_IsStaticallyLinked` in libnet (which native image statically links): * if `JVM_IsStaticallyLinked` is called from libawt.so, we want to answer `false` * if `JVM_IsStaticallyLinked` is called from libnet.a, we want to answer `true` Is the mixed linking use case is something that the Hermetic Java effort is having on its radar? For this particular case, one solutions could be to avoid cross-library calls. I.e., instead of calling `JVM_IsStaticallyLinked`, have an `AWT_IsStaticallyLinked` that is local to the libawt. This is similar to the pattern also used for JLI. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20666#discussion_r1752023584 From adinn at openjdk.org Tue Sep 10 13:57:11 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 10 Sep 2024 13:57:11 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v3] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Tue, 10 Sep 2024 00:53:32 GMT, Ioi Lam wrote: >> This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> **Overview** >> >> - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. >> - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. >> - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. >> - The boot classes are loaded as part of `vmClasses::resolve_all()` >> - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). >> - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. >> >> **All-or-nothing Loading** >> >> - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. >> - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: >> - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. >> - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. >> - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. >> - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exa... > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @dholmes-ora comments: logging indents Looks good as far as I can see -- although I only have a limited familiarity with the CDS archive code. src/hotspot/share/cds/aotClassLinker.hpp line 71: > 69: // > 70: class AOTClassLinker : AllStatic { > 71: using ClassesTable = ResourceHashtable; Can we have a symbolic name for this (prime) magic number here and in other places in this patch? I realise there is existing code which uses the raw number bit it is also consumed symbolically (e.g. in archiveBuilder.hpp, metaspaceClosure.hpp) src/hotspot/share/cds/aotLinkedClassBulkLoader.cpp line 191: > 189: } > 190: > 191: ClassLoaderData* loader_data = ClassLoaderData::class_loader_data(loader()); Can we assert here that loader() != nullptr? src/hotspot/share/cds/archiveBuilder.cpp line 433: > 431: } > 432: > 433: remember_embedded_pointer_in_enclosing_obj(ref); I'm not clear why this was moved up. Was this just an omission (bug) in the earlier version or do we now need to remember a reference location that we could previously safely ignore? src/hotspot/share/cds/cdsConfig.cpp line 551: > 549: } > 550: > 551: void CDSConfig::set_has_aot_linked_classes(bool is_static_archive, bool has_aot_linked_classes) { Why does this need to take `is_static_archive` as an argument? src/hotspot/share/cds/dynamicArchive.cpp line 138: > 136: verify_estimate_size(_estimated_metaspaceobj_bytes, "MetaspaceObjs"); > 137: > 138: sort_methods(); Could we have a comment to note that sorting and making shareable need to be done before calling `AOTClassLinker::write_to_archive();` ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20843#pullrequestreview-2291991335 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1751666092 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1751745625 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1751916420 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1751952246 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1751964685 From gziemski at openjdk.org Tue Sep 10 14:18:08 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 10 Sep 2024 14:18:08 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 05:57:08 GMT, David Holmes wrote: > > The template parameter rename I was planning on doing in a followup issue, however, if you really want, I can make the fix here too. > > Personally I'd be okay with doing it here as one final commit that can be viewed in isolation. Sure, I can do it. Is everyone OK with MT as the template parameter name? Another obvious choice is MEM_TAG Side note: why are template parameter names all capitals? To help distinguish them from "regular" parameters? Do we still want that naming scheme, or can we switch here to using mem_tag? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20872#issuecomment-2340957192 From aph at openjdk.org Tue Sep 10 15:03:09 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 10 Sep 2024 15:03:09 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v2] In-Reply-To: <-gmI8CWVHCk3ebpH6M3IaB4avuiG7QpzykKxYzGso2o=.b8072269-55e9-4c57-85e9-9f05fba5d934@github.com> References: <-gmI8CWVHCk3ebpH6M3IaB4avuiG7QpzykKxYzGso2o=.b8072269-55e9-4c57-85e9-9f05fba5d934@github.com> Message-ID: On Tue, 10 Sep 2024 11:45:25 GMT, Jatin Bhateja wrote: >> - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The >> condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. >> - This saves emitting an explicit MOVZX instruction after setCC. >> - These new instructions are encoded using 4 byte Extended EVEX encoding. >> >> Validation performed over stand alone test point using Intel SDE. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review resolutions. src/hotspot/cpu/x86/x86_64.ad line 7095: > 7093: "sete $res\n\t" > 7094: "movzbl $res, $res" %} > 7095: ins_encode %{ Maybe change the format statement to match. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20920#discussion_r1752157139 From lucy at openjdk.org Tue Sep 10 15:11:05 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 10 Sep 2024 15:11:05 GMT Subject: RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM In-Reply-To: <8YanHZxslhO-8HUEwO9VLlFETrk1W-rsy2EnLZYiLNE=.db2331a1-b433-45d5-9645-1620f5f4c76b@github.com> References: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> <8YanHZxslhO-8HUEwO9VLlFETrk1W-rsy2EnLZYiLNE=.db2331a1-b433-45d5-9645-1620f5f4c76b@github.com> Message-ID: On Tue, 10 Sep 2024 11:02:38 GMT, Boris Ulasevich wrote: >>> There were several optimization done for this code by @RealLucy [JDK-8223444](https://bugs.openjdk.org/browse/JDK-8223444) and [JDK-8231460](https://bugs.openjdk.org/browse/JDK-8231460). But it is still using `linked list` for free segments. Should we consider something more complex? Or it is not an issue? >> >> During my testing of the mentioned fixes, I never saw such long freelists. They can only appear with really severe fragmentation in the code cache. Artificial creation would work like allocating many (small) code blobs and then freeing every other. >> >> Adjacent free slots are fused (combined together) during free processing. During code blob allocation, the free list is searched first for a match. That usually helps well against a growing free list. To gain more insight, you may want to try `-XX:+PrintCodeHeapAnalytics`. The same information (with more options) is also available via jcmd. >> >> CodeCacheSegmentSize should not be chosen too small to keep memory and processing overhead in check. 64 bytes appears to be a feasible choice. > >> During my testing of the mentioned fixes, I never saw such long freelists. > > I see the warning every time I run Renaissance benchmarks with the VerifyCodeCache option. Is this a problem and not just a reflection of the fact that the codeheap contains 20K of methods with some holes in between? > > > $ ./build/linux-aarch64-server-fastdebug/jdk/bin/java -XX:+VerifyCodeCache -XX:+PrintCodeCache -jar ~/renaissance-jmh-0.14.2-95-gae3b5ce.jar Dotty > # Run progress: 0.00% complete, ETA 00:00:00 > # Fork: 1 of 5 > # Warmup Iteration 1: 86079.453 ms/op > # Warmup Iteration 2: 8212.947 ms/op > # Warmup Iteration 3: 5416.496 ms/op > # Warmup Iteration 4: 6841.073 ms/op > # Warmup Iteration 5: 4248.337 ms/op > # Warmup Iteration 6: OpenJDK 64-Bit Server VM warning: CodeHeap: # of free blocks > 10000 > 3530.697 ms/op > # Warmup Iteration 7: 3186.153 ms/op > > >> CodeCacheSegmentSize should not be chosen too small to keep memory and processing overhead in check. 64 bytes appears to be a feasible choice. > > OK. Thanks! @bulasevich Thanks for providing the Code Heap Stats. At vm shutdown time, when the stats were taken, there is no pathological state in the code heap. - non-profiled nmethods has 70 free blocks, occupying 1033k in total. - profiled nmethods has 525 free blocks, occupying 5550k in total. - non-nmethods has 5 free blocks, occupying 263k in total. The warning must be triggered by a transient situation, when many methods are removed from the code heap in a short period of time, without significant new method compilations. Eventually, this large number of free blocks will either be reallocated for new compilations or collapse into less, but larger, blocks when even more methods are removed from the code heap. In short: no need to worry. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20864#issuecomment-2341196975 From luhenry at openjdk.org Tue Sep 10 15:18:10 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 10 Sep 2024 15:18:10 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v2] In-Reply-To: <361CNPQYcSo_A4BmHd1RrYEVheSFNfc-WB3wXEEPUL4=.9fc78191-1052-4b26-852a-147e7f1dddd4@github.com> References: <361CNPQYcSo_A4BmHd1RrYEVheSFNfc-WB3wXEEPUL4=.9fc78191-1052-4b26-852a-147e7f1dddd4@github.com> Message-ID: On Mon, 9 Sep 2024 11:13:47 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> Thanks. >> >> This improvement is based on java.base/share/native/libzip/zlib/zcrc32.c, I made some modification to N (to 16) related code, then re-generate the tables needed, finally vectorize the code (original implementation in zcrc32.c is just scalar code). >> >> ## Test >> test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java, >> test/jdk/java/util/zip/TestCRC32.java >> >> ## Performance >> >> ###?on bananapi >> >> with patch >> >> Benchmark -with patch | (count) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- >> TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.884 | 0.03 | ns/op >> TestCRC32.testCRC32Update | 128 | avgt | 10 | 401.122 | 0.309 | ns/op >> TestCRC32.testCRC32Update | 256 | avgt | 10 | 680.168 | 0.032 | ns/op >> TestCRC32.testCRC32Update | 512 | avgt | 10 | 1062.426 | 0.401 | ns/op >> TestCRC32.testCRC32Update | 2048 | avgt | 10 | 3308.361 | 0.176 | ns/op >> TestCRC32.testCRC32Update | 16384 | avgt | 10 | 24403.231 | 20.248 | ns/op >> TestCRC32.testCRC32Update | 65536 | avgt | 10 | 103463.735 | 4.245 | ns/op >> >> >> >> without patch >> >> Benchmark -without patch | (count) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- >> TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.942 | 0.224 | ns/op >> TestCRC32.testCRC32Update | 128 | avgt | 10 | 364.159 | 0.019 | ns/op >> TestCRC32.testCRC32Update | 256 | avgt | 10 | 686.106 | 0.1 | ns/op >> TestCRC32.testCRC32Update | 512 | avgt | 10 | 1328.962 | 0.073 | ns/op >> TestCRC32.testCRC32Update | 2048 | avgt | 10 | 5191.116 | 0.189 | ns/op >> TestCRC32.testCRC32Update | 16384 | avgt | 10 | 41286.858 | 4.53 | ns/op >> TestCRC32.testCRC32Update | 65536 | avgt | 10 | 172340.099 | 11.004 | ns/op >> >> > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > zext_w src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1469: > 1467: > 1468: // prepare > 1469: add(tableN16, table3, 1*256*sizeof(juint), tmp1); where is that `1*256` coming from? It would be worth a comment src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1481: > 1479: } > 1480: vmv_v_x(vcrc, zr); > 1481: slli(crc, crc, 32); You can use zero extend here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1752180965 PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1752177020 From mdoerr at openjdk.org Tue Sep 10 15:26:07 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 10 Sep 2024 15:26:07 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: Message-ID: On Wed, 29 May 2024 12:58:02 GMT, Fredrik Bredberg wrote: > Removed the concept of an ObjectMonitor Responsible thread. > > The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. > > The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). > > After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. > > Passes tier1-tier7 on supported platforms. > x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. > Arm32 and Zero doesn't need any changes as far as I can tell. Works with `micro:LockUnlock` on real PPC64 hardware, too. However, we need to run more tests and also check performance. Please note that this PR has conflicts with other changes (https://github.com/openjdk/jdk/pull/20922 and recent developments in the loom repo). The JBS issue refers to "memory barriers (not a fence)", but you're using `StoreLoad` barriers which are nothing else than a "fence". I don't agree with the general statement that they have become significantly cheaper. That may be true for single chip designs, but not for large server systems (multi-socket). Did you run benchmarks which stress monitors on any large multi-socket system? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19454#issuecomment-2341245318 From mli at openjdk.org Tue Sep 10 15:30:07 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 10 Sep 2024 15:30:07 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v2] In-Reply-To: References: <361CNPQYcSo_A4BmHd1RrYEVheSFNfc-WB3wXEEPUL4=.9fc78191-1052-4b26-852a-147e7f1dddd4@github.com> Message-ID: On Tue, 10 Sep 2024 15:15:02 GMT, Ludovic Henry wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> zext_w > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1469: > >> 1467: >> 1468: // prepare >> 1469: add(tableN16, table3, 1*256*sizeof(juint), tmp1); > > where is that `1*256` coming from? It would be worth a comment I copied it from the original code. > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1481: > >> 1479: } >> 1480: vmv_v_x(vcrc, zr); >> 1481: slli(crc, crc, 32); > > You can use zero extend here. Thanks, I'll fix it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1752200820 PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1752201230 From stefank at openjdk.org Tue Sep 10 15:40:10 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 10 Sep 2024 15:40:10 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v5] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 19:02:25 GMT, Gerard Ziemski wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. >> >> This fix also includes a cleanup of all the related parameter names and local variable names. >> >> Testing is pending... >> >> Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > fix test At least in the GC code we sometimes spell out the template parameter name with CamelCase: Iterator, Function, ClosureType, OopClosureT, Parallel, ... I'm OK with using MT as the template parameter name. I'm also OK with using mem_tag, but you might hit problems when classes both have template parameter name and functions that take a MemTag. I see that GrowableArrayCHeap has this situation, but it looks like that is accidental code and isn't really needed, so maybe this isn't a big concern. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20872#issuecomment-2341288588 From szaldana at openjdk.org Tue Sep 10 15:42:57 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Tue, 10 Sep 2024 15:42:57 GMT Subject: RFR: 8204681: Option to include timestamp in hprof filename [v2] In-Reply-To: References: Message-ID: > Hi all, > > This PR addresses [8204681](https://bugs.openjdk.org/browse/JDK-8204681) enabling support for timestamp expansion in filenames specified in `-XX:HeapDumpPath` using `%t`. > > As mentioned in this comments for this issue, this is somewhat related to [8334492](https://bugs.openjdk.org/browse/JDK-8334492) where we enabled support for `%p` for filenames specified in jcmd. > > With this patch, I propose: > - Expanding the utility function `Arguments::copy_expand_pid` to `Arguments::copy_expand_arguments` to deal with `%p` expansions for pid and `%t` expansions for timestamps. > - Leveraging the above utility function to enable argument expansion for both heap dump filenames and jcmd output commands. > - Though the linked JBS issue only relates to heap dumps generated in case of OOM, I think we can edit it to more broadly support filename expansion to support `%t` for jcmd as well. > > Testing: > - [x] Added test cases pass with all platforms (verified with a GHA job). > - [x] Tier 1 passes with GHA. > > Looking forward to hearing your thoughts! > > Thanks, > Sonia Sonia Zaldana Calles has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'openjdk:master' into JDK-8204681 - 8204681: Option to include timestamp in hprof filename ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20568/files - new: https://git.openjdk.org/jdk/pull/20568/files/c550c9fd..165a77b4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20568&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20568&range=00-01 Stats: 44121 lines in 1412 files changed: 27993 ins; 8898 del; 7230 mod Patch: https://git.openjdk.org/jdk/pull/20568.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20568/head:pull/20568 PR: https://git.openjdk.org/jdk/pull/20568 From mli at openjdk.org Tue Sep 10 15:52:47 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 10 Sep 2024 15:52:47 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v3] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > Thanks. > > This improvement is based on java.base/share/native/libzip/zlib/zcrc32.c, I made some modification to N (to 16) related code, then re-generate the tables needed, finally vectorize the code (original implementation in zcrc32.c is just scalar code). > > ## Test > test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java, > test/jdk/java/util/zip/TestCRC32.java > > ## Performance > > ###?on bananapi > > with patch > > Benchmark -with patch | (count) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- > TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.884 | 0.03 | ns/op > TestCRC32.testCRC32Update | 128 | avgt | 10 | 401.122 | 0.309 | ns/op > TestCRC32.testCRC32Update | 256 | avgt | 10 | 680.168 | 0.032 | ns/op > TestCRC32.testCRC32Update | 512 | avgt | 10 | 1062.426 | 0.401 | ns/op > TestCRC32.testCRC32Update | 2048 | avgt | 10 | 3308.361 | 0.176 | ns/op > TestCRC32.testCRC32Update | 16384 | avgt | 10 | 24403.231 | 20.248 | ns/op > TestCRC32.testCRC32Update | 65536 | avgt | 10 | 103463.735 | 4.245 | ns/op > > > > without patch > > Benchmark -without patch | (count) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- > TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.942 | 0.224 | ns/op > TestCRC32.testCRC32Update | 128 | avgt | 10 | 364.159 | 0.019 | ns/op > TestCRC32.testCRC32Update | 256 | avgt | 10 | 686.106 | 0.1 | ns/op > TestCRC32.testCRC32Update | 512 | avgt | 10 | 1328.962 | 0.073 | ns/op > TestCRC32.testCRC32Update | 2048 | avgt | 10 | 5191.116 | 0.189 | ns/op > TestCRC32.testCRC32Update | 16384 | avgt | 10 | 41286.858 | 4.53 | ns/op > TestCRC32.testCRC32Update | 65536 | avgt | 10 | 172340.099 | 11.004 | ns/op > > > > ### on K230 > > with patch > References: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> <8YanHZxslhO-8HUEwO9VLlFETrk1W-rsy2EnLZYiLNE=.db2331a1-b433-45d5-9645-1620f5f4c76b@github.com> Message-ID: On Tue, 10 Sep 2024 15:08:40 GMT, Lutz Schmidt wrote: >>> During my testing of the mentioned fixes, I never saw such long freelists. >> >> I see the warning every time I run Renaissance benchmarks with the VerifyCodeCache option. Is this a problem and not just a reflection of the fact that the codeheap contains 20K of methods with some holes in between? >> >> >> $ ./build/linux-aarch64-server-fastdebug/jdk/bin/java -XX:+VerifyCodeCache -XX:+PrintCodeCache -jar ~/renaissance-jmh-0.14.2-95-gae3b5ce.jar Dotty >> # Run progress: 0.00% complete, ETA 00:00:00 >> # Fork: 1 of 5 >> # Warmup Iteration 1: 86079.453 ms/op >> # Warmup Iteration 2: 8212.947 ms/op >> # Warmup Iteration 3: 5416.496 ms/op >> # Warmup Iteration 4: 6841.073 ms/op >> # Warmup Iteration 5: 4248.337 ms/op >> # Warmup Iteration 6: OpenJDK 64-Bit Server VM warning: CodeHeap: # of free blocks > 10000 >> 3530.697 ms/op >> # Warmup Iteration 7: 3186.153 ms/op >> >> >>> CodeCacheSegmentSize should not be chosen too small to keep memory and processing overhead in check. 64 bytes appears to be a feasible choice. >> >> OK. Thanks! > > @bulasevich Thanks for providing the Code Heap Stats. > > At vm shutdown time, when the stats were taken, there is no pathological state in the code heap. > > - non-profiled nmethods has 70 free blocks, occupying 1033k in total. > - profiled nmethods has 525 free blocks, occupying 5550k in total. > - non-nmethods has 5 free blocks, occupying 263k in total. > > The warning must be triggered by a transient situation, when many methods are removed from the code heap in a short period of time, without significant new method compilations. Eventually, this large number of free blocks will either be reallocated for new compilations or collapse into less, but larger, blocks when even more methods are removed from the code heap. > > In short: no need to worry. @RealLucy Good! Thanks for checking! With this change I do not make things worse. Code Heap Stats numbers fluctuate wildly, but still look good with tuned CodeCacheSegmentSize and CodeEntryAlignment: - non-profiled nmethods has 53 free blocks, occupying 23k in total. - profiled nmethods has 199 free blocks, occupying 935k in total. - non-nmethods has 13 free blocks, occupying 296k in total. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20864#issuecomment-2341389827 From rcastanedalo at openjdk.org Tue Sep 10 16:26:58 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 10 Sep 2024 16:26:58 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v19] In-Reply-To: References: Message-ID: <5CDsDluK4CaytgLTPJPuKbz8Ug9mcF10mco0In8ljZM=.3fe332b8-36af-4de5-8cee-73f58a564497@github.com> > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Fix indentation in generate_post_barrier_fast_path Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/94145917..0979e41e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=17-18 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Tue Sep 10 16:26:59 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 10 Sep 2024 16:26:59 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v18] In-Reply-To: <6oxoxxR5yGICT3d_4Wip3egvyFiQLWb3HWMxpL41jwY=.9ca3bb60-1e86-4bd2-a3b4-26eadc409eb5@github.com> References: <6oxoxxR5yGICT3d_4Wip3egvyFiQLWb3HWMxpL41jwY=.9ca3bb60-1e86-4bd2-a3b4-26eadc409eb5@github.com> Message-ID: <7epSurWH76D6t-eSs3neVvSHYRdhdGanYobPU0Y_-SM=.5068c4a5-d220-417d-9d8a-0518bfdc61d8@github.com> On Tue, 10 Sep 2024 13:00:05 GMT, Thomas Schatzl wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion >> - riscv port for JEP 475 > > Marked as reviewed by tschatzl (Reviewer). Thanks for reviewing, @tschatzl! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2341418514 From rkennke at openjdk.org Tue Sep 10 19:11:30 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 10 Sep 2024 19:11:30 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix FullGCForwarding initialization ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/5da250cf..6abda7bc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=09-10 Stats: 8 lines in 7 files changed: 1 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From stuefe at openjdk.org Tue Sep 10 19:11:30 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 10 Sep 2024 19:11:30 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 17:40:03 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Try to avoid lea in loadNklass (aarch64) >> - Fix release build error > > src/hotspot/share/oops/compressedKlass.inline.hpp line 100: > >> 98: check_valid_klass(k, base(), shift()); >> 99: // Also assert that k falls into what we know is the valid Klass range. This is usually smaller >> 100: // than the encoding range (e.g. encoding range covers 4G, but we only have 1G class space and a > > 1G is the default CompressedClassSpaceSize but can be larger, right? So the comment isn't quite accurate. Or with tiny class pointers can it only be 1G? The comment was misleading, it referred to the 1g default class space. I recently changed class space (in mainline) to be max. 4GB (minus whatever little CDS needs), and for +COH, this is still true. 22 bit class pointer and 10 bit shift still gives us a max encoding range size of 4GB. I will update the comment. (->backlist) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1751872461 From pbk at openjdk.org Tue Sep 10 19:36:03 2024 From: pbk at openjdk.org (Peter B. Kessler) Date: Tue, 10 Sep 2024 19:36:03 GMT Subject: RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM In-Reply-To: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> References: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> Message-ID: On Thu, 5 Sep 2024 00:58:10 GMT, Boris Ulasevich wrote: > With this change, I have adjusted the default settings for CodeCacheSegmentSize and CodeEntryAlignment for AARCH and ARM32. The main goal is to improve code density by reducing the number of wasted bytes (approximately **4%** waste). Improving code density may also have the side effect of boosting performance in large applications > > Each nmethod occupies a number of code cache segments (minimum allocation blocks). Since the size of an nmethod is not aligned to 128 bytes, the last occupied segment is half empty. Reducing the size of the code cache segments correspondingly minimizes waste. However, we should be careful about reducing the CodeCacheSegmentSize too much, as smaller segment sizes will increase the overhead of the CodeHeap::_segmap bitmap. A CodeCacheSegmentSize of 64 seems to be an optimal balance. > > The current large default value for CodeCacheSegmentSize (64+64) was historically introduced with the comment "Tiered compilation has large code-entry alignment" which doesn't make much sense to me. The history of this comment and value is as follows: > - The PPC port was introduced with CodeEntryAlignment=128 (recently reduced to 64: https://github.com/openjdk/jdk/commit/09a78b5d) and CodeCacheSegmentSize was adjusted accordingly for that platform. > - Soon after, the 128-byte alignment was applied to all platforms to hide a debug mode warning (https://github.com/openjdk/jdk/commit/e8bc971d). Despite the change (and Segmented Code Cache introduced later), the warning can still be reproduced today using the -XX:+VerifyCodeCache fastdebug option in large applications (10K nmethods ~ 10K free blocks in between them). > > I believe it is time to remove the comment and update the default value. > > I also suggest updating the default CodeEntryAlignment value for AARCH. The current setting is much larger than for x86 and was likely based on the typical cache line size of 64 bytes. Cortex-A57, A72 architecture software optimisation guides recommend a 32-byte alignment for subroutine entry points. Neoverse architecture software optimisation guides do not mention recommended entry point alignment. > > For reference, the default [function_align setting in GCC](https://github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/tuning_models/cortexa72.h#L44) is typically 16 or 32 bytes, depending on the target architecture. > > Hotspot performance tests with -XX:CodeCacheSegmentSize=64 and -XX:CodeEntryAlignment=16 options showed the following results: > - No performance impact on ... Were performance runs made with CodeEntryAlignment set to other than 64 or 16? It seems like the other choices (32, 128, are there others that make sense?) should be tried. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20864#issuecomment-2341862790 From coleenp at openjdk.org Tue Sep 10 20:06:13 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 10 Sep 2024 20:06:13 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: Message-ID: On Wed, 29 May 2024 12:58:02 GMT, Fredrik Bredberg wrote: > Removed the concept of an ObjectMonitor Responsible thread. > > The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. > > The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). > > After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. > > Passes tier1-tier7 on supported platforms. > x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. > Arm32 and Zero doesn't need any changes as far as I can tell. This is such a nice simplifying change. I have some more suggestions. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 482: > 480: // This is faster on Nehalem and AMD Shanghai/Barcelona. > 481: // See https://blogs.oracle.com/dave/entry/instruction_selection_for_volatile_fences > 482: lock(); addl(Address(rsp, 0), 0); Since there's a membar above, do you need this lock/addl instructions? src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 510: > 508: > 509: // Memory barrier/fence > 510: // Dekker pivot point -- fulcrum : ST Owner; MEMBAR; LD Succ I think you should delete this whole comment block. The source code control system will remember this comment about Nehalem and AMD Shanghai/Barcelona. src/hotspot/share/runtime/javaThread.hpp line 620: > 618: > 619: // Support for SharedRuntime::monitor_exit_helper() > 620: ObjectMonitor* unlocked_inflated_monitor() { return _unlocked_inflated_monitor; } Can you make this a const method? src/hotspot/share/runtime/objectMonitor.cpp line 353: > 351: > 352: void ObjectMonitor::enter_for_with_contention_mark(JavaThread* locking_thread, ObjectMonitorContentionMark& contention_mark) { > 353: DEBUG_ONLY(bool success = ) ObjectMonitor::enterI_with_contention_mark(locking_thread, contention_mark); This is kind of noisy with DEBUG_ONLY. If you remove DEBUG_ONLY, does the windows compiler complain that you're not using the variable success in the product build? src/hotspot/share/runtime/objectMonitor.cpp line 574: > 572: for (;;) { > 573: if (own == DEFLATER_MARKER) { > 574: if (TryLockI(current)) { I can't tell the difference between TryLockI and enter_for(). Did I previously object to enter_for() here? Maybe I should take that back, and there should be a comment above enter_for() like // Enters a lock in behalf of a non-current thread, or a thread that is exiting and has previously given up the lock. // and it handles deflation. You could add a boolean that you expect success for the enter_for() caller from deoptimization (ie. must_succeed). This code is getting repetitive - it looks the same in all these places only a little bit different and hard to know why. src/hotspot/share/runtime/objectMonitor.cpp line 588: > 586: } else { > 587: // The lock had been free momentarily, but we lost the race to the lock. > 588: own = prev_own; So this retries now and doesn't break. Is it because it could be the DEFLATER_MARKER ? src/hotspot/share/runtime/objectMonitor.cpp line 904: > 902: } > 903: > 904: assert(_succ != current, "invariant"); This assert seems unnecessary since it's just reset above. src/hotspot/share/runtime/objectMonitor.cpp line 1104: > 1102: // 1. A release barrier ensures that changes to monitor meta-data > 1103: // (_succ, _EntryList, _cxq) and data protected by the lock will be > 1104: // visible before we release the lock. Where is this barrier? src/hotspot/share/runtime/objectMonitor.cpp line 1243: > 1241: ObjectMonitorContentionMark contention_mark(this); > 1242: > 1243: if (contentions() < 0) { You should use is_being_async_deflated() here instead of contentions() < 0. src/hotspot/share/runtime/objectMonitor.cpp line 1244: > 1242: > 1243: if (contentions() < 0) { > 1244: assert((intptr_t(_EntryList)|intptr_t(_cxq)) == 0 || _succ != nullptr, ""); Please add a space between | in this expression. ------------- Changes requested by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19454#pullrequestreview-2292693079 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1752089574 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1752087824 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1752384806 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1752639893 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1752655374 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1752397722 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1752403154 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1752408986 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1752432910 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1752435904 From coleenp at openjdk.org Tue Sep 10 20:39:09 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 10 Sep 2024 20:39:09 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v5] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 19:02:25 GMT, Gerard Ziemski wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. >> >> This fix also includes a cleanup of all the related parameter names and local variable names. >> >> Testing is pending... >> >> Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > fix test So many mem_tags. It looks fine to me. I didn't see that many "MemTag F"'s except in the ConcurrentHashTable. You could change that in a further patch. I would prefer a one letter template parameter name like maybe M (since T seems too generic). But F doesn't really bother me that much. It's not like typename E means that much either. Update: if people want the template parameter to be MemTag MT, that's fine too. src/hotspot/share/nmt/memTracker.hpp line 265: > 263: > 264: // MallocLimt: Given an allocation size s, check if mallocing this much > 265: // under category f would hit either the global limit or the limit for mem_tag. I don't know what "category f" is. Maybe reword as // MallocLimit: Given an allocation size s, check if allocating this much memory would hit the global limit or the // limit tagged with mem_tag. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20872#pullrequestreview-2293706394 PR Comment: https://git.openjdk.org/jdk/pull/20872#issuecomment-2341965524 PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1752681558 From gziemski at openjdk.org Tue Sep 10 20:47:46 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 10 Sep 2024 20:47:46 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v6] In-Reply-To: References: Message-ID: > Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. > > This fix also includes a cleanup of all the related parameter names and local variable names. > > Testing is pending... > > Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: rename MemoryTag template parameter names ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20872/files - new: https://git.openjdk.org/jdk/pull/20872/files/7a4f6e01..c779754c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20872&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20872&range=04-05 Stats: 400 lines in 19 files changed: 2 ins; 0 del; 398 mod Patch: https://git.openjdk.org/jdk/pull/20872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20872/head:pull/20872 PR: https://git.openjdk.org/jdk/pull/20872 From gziemski at openjdk.org Tue Sep 10 20:53:46 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 10 Sep 2024 20:53:46 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v7] In-Reply-To: References: Message-ID: > Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. > > This fix also includes a cleanup of all the related parameter names and local variable names. > > Testing is pending... > > Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: Coleen's feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20872/files - new: https://git.openjdk.org/jdk/pull/20872/files/c779754c..f1faba35 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20872&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20872&range=05-06 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20872/head:pull/20872 PR: https://git.openjdk.org/jdk/pull/20872 From gziemski at openjdk.org Tue Sep 10 20:53:47 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 10 Sep 2024 20:53:47 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v5] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 20:28:01 GMT, Coleen Phillimore wrote: >> Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: >> >> fix test > > src/hotspot/share/nmt/memTracker.hpp line 265: > >> 263: >> 264: // MallocLimt: Given an allocation size s, check if mallocing this much >> 265: // under category f would hit either the global limit or the limit for mem_tag. > > I don't know what "category f" is. Maybe reword as > > // MallocLimit: Given an allocation size s, check if allocating this much memory would hit the global limit or the > // limit tagged with mem_tag. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1752704620 From gziemski at openjdk.org Tue Sep 10 20:57:07 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 10 Sep 2024 20:57:07 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v7] In-Reply-To: References: Message-ID: <_TKagUmXD-PDlDcJuMltsaJo_MVX4ATgobge0V-RfGE=.28b3d5d3-8a09-444c-8383-54606627d956@github.com> On Tue, 10 Sep 2024 20:53:46 GMT, Gerard Ziemski wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. >> >> This fix also includes a cleanup of all the related parameter names and local variable names. >> >> Testing is pending... >> >> Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > Coleen's feedback I went with `MT` throughout the code for all MemTag template parameter names. It will be a nice mental shortcut once we get used to the change... I did not want to rock the boat too much and decided not to use `mem_tag` for template parameter names, it just did not look good next to the other names, with all capital letters. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20872#issuecomment-2341992426 From gziemski at openjdk.org Tue Sep 10 21:01:08 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 10 Sep 2024 21:01:08 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v7] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 20:53:46 GMT, Gerard Ziemski wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. >> >> This fix also includes a cleanup of all the related function/template parameter names and local variable names. >> >> Testing is pending... >> >> Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > Coleen's feedback Thank you for the feedback Coleen! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20872#issuecomment-2341999055 From apangin at openjdk.org Tue Sep 10 21:09:08 2024 From: apangin at openjdk.org (Andrei Pangin) Date: Tue, 10 Sep 2024 21:09:08 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v2] In-Reply-To: References: <2SvbUTqgsR8KlKSWLgNzh2RNo-uVtJm2xopck9yOmZs=.7cf5ed4f-dee6-434d-b33a-301c4bfc3fcc@github.com> <2xh9F1kC4pZ4yHEwiPxJzaqO5REJ7vRXs6m1v0ADRss=.3f34dd37-8785-4f6e-857b-47d9e2e2f6bb@github.com> Message-ID: On Tue, 10 Sep 2024 09:07:23 GMT, Liang Mao wrote: > Okay then that is a programming error not a VM error. It is up to the application to ensure that classes are kept alive if you have jMethodID's for them @dholmes-ora I'm afraid this is an impossible requirement. The problem has been discussed multiple times previously. E.g., the standard `GetAllStackTraces` API returns an array of raw jmethodIDs, and there is no a guaranteed way for an application to obtain corresponding strong jclass handles before the classes can be unloaded. It's even worse with `AsyncGetCallTrace` function which is typically called inside a signal handler where JVM TI functions are not allowed. It's OK for JVM TI functions to return an error in the case of class unloading, but a crash does not seem to be a valid JVM behavior. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2342011004 From sviswanathan at openjdk.org Tue Sep 10 21:36:09 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 10 Sep 2024 21:36:09 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v3] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 19:10:34 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm >> >> Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup >> -- | -- | -- | -- >> MathBench.tanhDouble | 70900 | 95618 | 1.35x > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > update libm tanh reference test with code review suggestions src/hotspot/cpu/x86/c1_LIRGenerator_x86.cpp line 810: > 808: x->id() == vmIntrinsics::_dpow || x->id() == vmIntrinsics::_dcos || > 809: x->id() == vmIntrinsics::_dsin || x->id() == vmIntrinsics::_dtan || > 810: x->id() == vmIntrinsics::_dlog10 || x->id() == vmIntrinsics::_dtanh) { Need to have the tanh under #Ifdef _LP64 as we are generating stub only for 64 bit. src/hotspot/cpu/x86/c1_LIRGenerator_x86.cpp line 1000: > 998: if (StubRoutines::dtanh() != nullptr) { > 999: __ call_runtime_leaf(StubRoutines::dtanh(), getThreadTemp(), result_reg, cc->args()); > 1000: } // TODO: else clause? You could instead have an assert here that StubRoutines::dtanh() is not null. Thereby no need for the else clause. src/hotspot/cpu/x86/templateInterpreterGenerator_x86_32.cpp line 376: > 374: // [ hi(arg) ] > 375: // > 376: if (kind == Interpreter::java_lang_math_tanh) { Need to update the copyright year to 2024 in this file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1752286133 PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1752289575 PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1752304061 From ihse at openjdk.org Tue Sep 10 21:50:14 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 10 Sep 2024 21:50:14 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: <1xFVXOUr023IxIHucx0v8dket_QOWsIV0G3wVHnF5aE=.23e8334b-a4ed-4d94-8ad9-376ffe043882@github.com> References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> <1xFVXOUr023IxIHucx0v8dket_QOWsIV0G3wVHnF5aE=.23e8334b-a4ed-4d94-8ad9-376ffe043882@github.com> Message-ID: On Tue, 10 Sep 2024 13:51:55 GMT, Josef Eisl wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> Also update build to link properly > > src/java.desktop/unix/native/libawt/awt/awt_LoadLibrary.c line 135: > >> 133: #endif >> 134: >> 135: if (!JVM_IsStaticallyLinked()) { > > These "cross-library" link-type probes are a challenge for GraalVM native image. In native image, we statically link most standard JNI libraries into the final executable. However in some cases, for example AWT, static linking is not feasible due to their dependencies. Thus, we resort to dynamic linking. Have a mix of static and dynamic linking means that `JVM_IsStaticallyLinked` should give different answers based on the caller. Example: > lets assume a follow-up change introduces a call to `JVM_IsStaticallyLinked` in libnet (which native image statically links): > * if `JVM_IsStaticallyLinked` is called from libawt.so, we want to answer `false` > * if `JVM_IsStaticallyLinked` is called from libnet.a, we want to answer `true` > > Is the mixed linking use case is something that the Hermetic Java effort is having on its radar? > > For this particular case, one solutions could be to avoid cross-library calls. I.e., instead of calling `JVM_IsStaticallyLinked`, have an `AWT_IsStaticallyLinked` that is local to the libawt. This is similar to the pattern also used for JLI. Mixing static and dynamic libraries willy-nilly sounds like a situation that is hard to resolve in a safe and sound way. The basic assumption here has been that you either have all libraries dynamic, or all libraries static. In the particular case you mention about libawt, I think the proper solution is to sort out the mess that is libawt_headless/libawt_xawt. It seems to me that the X support should be loaded using dlopen, not by hard dependencies to the X libraries. I think that should solve your problem (as well as many other problems). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20666#discussion_r1752767100 From eosterlund at openjdk.org Tue Sep 10 21:51:06 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 10 Sep 2024 21:51:06 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v5] In-Reply-To: References: Message-ID: <6x5TarM6X1KVJyDA1XnQdtO3-rsl-nOFtWt4sGA6ZWE=.8210cbf2-3fe2-45db-adc5-369782703d6d@github.com> On Tue, 10 Sep 2024 10:15:43 GMT, Liang Mao wrote: >> Hi, >> >> It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. >> >> test/hotspot/jtreg/runtime and gc are clean. >> >> Thanks, >> Liang > > Liang Mao has updated the pull request incrementally with one additional commit since the last revision: > > Exclude libagent8339725.cpp compiling for windows Some minor chages... src/hotspot/share/prims/jvmtiEnv.cpp line 3218: > 3216: Handle holder(Thread::current(), k->klass_holder()); // keep the klass alive > 3217: // Cannot check klass_holder == nullptr because klass could have null loader holder > 3218: (*declaring_class_ptr) = k->is_loader_alive() ? get_jni_class_non_null(k) : nullptr; This condition is already checked by the caller, so the !k->is_loader_alive() case should be dead code. I would prefer an assert to check that surely it is alive or we shouldn't be here. ------------- Changes requested by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20907#pullrequestreview-2293838889 PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1752765549 From eosterlund at openjdk.org Tue Sep 10 21:51:06 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 10 Sep 2024 21:51:06 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v2] In-Reply-To: References: <2SvbUTqgsR8KlKSWLgNzh2RNo-uVtJm2xopck9yOmZs=.7cf5ed4f-dee6-434d-b33a-301c4bfc3fcc@github.com> <2xh9F1kC4pZ4yHEwiPxJzaqO5REJ7vRXs6m1v0ADRss=.3f34dd37-8785-4f6e-857b-47d9e2e2f6bb@github.com> Message-ID: On Tue, 10 Sep 2024 09:02:35 GMT, David Holmes wrote: >>> Maybe I'm misunderstanding the test case but isn't it using a jMethodID for a class that may have been unloaded? >> >> Yes. > >> > Maybe I'm misunderstanding the test case but isn't it using a jMethodID for a class that may have been unloaded? >> >> Yes. > > Okay then that is a programming error not a VM error. It is up to the application to ensure that classes are kept alive if you have jMethodID's for them: > > https://docs.oracle.com/en/java/javase/22/docs/specs/jni/design.html#accessing-fields-and-methods > > any validation the VM attempts with jMethodId's (and field id's) is best-effort and not required. Yeah. On an API level @dholmes-ora is 100% right, and @xmas92 me and @stefank came to the same conclusion. The API implementation absolutely does not have to deal with these errors if we don't want to. We could just ignore it. But similar to @apangin I think that even though we don't have to deal with the error, we probably should do it in practice as some APIs expose jmethodID without the class. You can ask for the class so you know what to keep alive, but that is done literally with this very API, making it a bit more important that it does in fact report the errors correctly. So I think we should be nice here and make it work more robustly. My 50c! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2342062140 From kbarrett at openjdk.org Tue Sep 10 22:01:07 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 10 Sep 2024 22:01:07 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 14:15:53 GMT, Gerard Ziemski wrote: > Is everyone OK with `MT` as the template parameter name? Another obvious choice is `MEM_TAG` > > Side note: why are template parameter names all capitals? To help distinguish them from "regular" parameters? Do we still want that naming scheme, or can we switch here to using `mem_tag`? Type template parameters should follow the style guide rules for type names. I've not noticed many (any?) cases of noncompliance. Non-type template parameters should follow the style guide rules for ordinary local variables, or perhaps local constants. The style guide doesn't currently say which, doesn't discuss possible differences between local and non-local naming, and is also currently loose on constant names. (The GC team recently discussed constant naming, strongly preferring MixedCase, and strongly disliking ALL_CAPS.) I think I've mostly use local variable style. Non-type template parameters are sometimes used in non-trivial expressions, so having a more meaningful name can be helpful. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20872#issuecomment-2342079640 From kbarrett at openjdk.org Tue Sep 10 22:06:06 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 10 Sep 2024 22:06:06 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 21:58:39 GMT, Kim Barrett wrote: > Type template parameters should follow the style guide rules for type names. I've not noticed many (any?) cases of noncompliance. ... other than ConcurrentHashTable (sigh!) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20872#issuecomment-2342084599 From asmehra at openjdk.org Tue Sep 10 22:17:08 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 10 Sep 2024 22:17:08 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v3] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Tue, 10 Sep 2024 00:53:32 GMT, Ioi Lam wrote: >> This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> **Overview** >> >> - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. >> - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. >> - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. >> - The boot classes are loaded as part of `vmClasses::resolve_all()` >> - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). >> - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. >> >> **All-or-nothing Loading** >> >> - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. >> - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: >> - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. >> - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. >> - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. >> - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exa... > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @dholmes-ora comments: logging indents src/hotspot/share/cds/aotClassLinker.cpp line 149: > 147: add_candidate(ik); > 148: > 149: if (log_is_enabled(Info, cds, aot, load)) { Is `load` the correct log tag to use in this class? Can it be replaced with `link` tag? src/hotspot/share/cds/aotClassLinker.hpp line 111: > 109: > 110: static int num_app_initiated_classes(); > 111: static int num_platform_initiated_classes(); I don't see these methods (num_app_initiated_classes and num_platform_initiated_classes) used anywhere. Should they be removed? src/hotspot/share/cds/aotConstantPoolResolver.cpp line 111: > 109: > 110: if (CDSConfig::is_dumping_aot_linked_classes()) { > 111: if (AOTClassLinker::try_add_candidate(ik)) { Are we relying on the call to `try_add_candidate` to add the class to the candidate list? I guess that shouldn't be the case as the class have already been added through ArchiveBuilder::gather_klasses_and_symbols()->AOTClassLinker::add_candidates(). If so can we use AOTClassLinker::is_candidate(ik) here? src/hotspot/share/cds/archiveBuilder.cpp line 766: > 764: #define ADD_COUNT(x) \ > 765: x += 1; \ > 766: x ## _a += aotlinked; Can we do this instead: ```x ## _a += (aotlinked ? 1 : 0)``` and make `aotlinked` a bool. src/hotspot/share/cds/archiveBuilder.cpp line 779: > 777: DECLARE_INSTANCE_KLASS_COUNTER(num_app_klasses); > 778: DECLARE_INSTANCE_KLASS_COUNTER(num_hidden_klasses); > 779: DECLARE_INSTANCE_KLASS_COUNTER(num_unlinked_klasses); Nit-picking here - "unlinked" category doesn't need the "aot-linked" counter. src/hotspot/share/cds/filemap.cpp line 2455: > 2453: const char* prop = Arguments::get_property("java.system.class.loader"); > 2454: if (prop != nullptr) { > 2455: if (has_aot_linked_classes()) { Should this check be part of `FileMapInfo::validate_aot_class_linking`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1752750296 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1752692333 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1752689932 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1752770372 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1752766131 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1752780405 From asmehra at openjdk.org Tue Sep 10 22:20:07 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 10 Sep 2024 22:20:07 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v3] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: <0ebuNyvktpJlfGjrZGgcS5IsNn2nSSx5ImiVcL7HJkw=.05ca65a9-8982-403a-b271-3029d26e7124@github.com> On Tue, 10 Sep 2024 00:53:32 GMT, Ioi Lam wrote: >> This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> **Overview** >> >> - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. >> - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. >> - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. >> - The boot classes are loaded as part of `vmClasses::resolve_all()` >> - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). >> - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. >> >> **All-or-nothing Loading** >> >> - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. >> - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: >> - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. >> - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. >> - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. >> - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exa... > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @dholmes-ora comments: logging indents Overall it looks fine - just some minor nit-picking. I haven't finished reviewing yet though. I will continue my review tomorrow. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20843#issuecomment-2342157928 From lmesnik at openjdk.org Tue Sep 10 22:54:09 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 10 Sep 2024 22:54:09 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v5] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 10:15:43 GMT, Liang Mao wrote: >> Hi, >> >> It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. >> >> test/hotspot/jtreg/runtime and gc are clean. >> >> Thanks, >> Liang > > Liang Mao has updated the pull request incrementally with one additional commit since the last revision: > > Exclude libagent8339725.cpp compiling for windows Thanks, for adding test. I was not able to add authors. See more comments inline. (Mostly stylistic) test/hotspot/jtreg/runtime/8339725/Test8339725.java line 32: > 30: * @requires os.family == "linux" > 31: * @library /test/lib > 32: * @library / Why this line is needed? test/hotspot/jtreg/runtime/8339725/Test8339725.java line 35: > 33: * @modules java.base/jdk.internal.misc > 34: * java.management > 35: * @run main/othervm/native -agentlib:agent8339725 Test8339725 It should be @run main/driver Test8339725 since you run the test in forked process. test/hotspot/jtreg/runtime/8339725/Test8339725.java line 53: > 51: public static void test(String gcArg) throws Exception { > 52: ProcessBuilder pb = ProcessTools.createLimitedTestJavaProcessBuilder( > 53: "-agentpath:" + Utils.TEST_NATIVE_PATH + File.separator + System.mapLibraryName("agent8339725"), "-Xmx100m", gcArg, "Test"); Instead of using createLimitedTestJavaProcessBuilder and gcArg, please use createTestJavaProcessBuilder and allow system to set arguments. So we can run it in more configurations even if it should work now. All other test options would be appended to your nativepath and Xmx options. test/hotspot/jtreg/runtime/8339725/Test8339725.java line 63: > 61: public static void main(String[] args) throws Exception { > 62: long last = System.nanoTime(); > 63: for (int i = 0;; i++) { for (;;) { looks better if i never used. test/hotspot/jtreg/runtime/8339725/Test8339725.java line 81: > 79: public Class findClass(String name) throws ClassNotFoundException { > 80: byte[] b = Base64.getDecoder() > 81: .decode("yv66vgAAADQADgoAAwALBwAMBwANAQAGPGluaXQ+AQADKClWAQAEQ29kZQEAD0xpbmVOdW1iZXJU" + Please add comment explaining why this string is used as a template. test/hotspot/jtreg/runtime/8339725/libagent8339725.cpp line 36: > 34: > 35: #define BUFFER_SIZE 100000 > 36: static size_t ring_buffer[BUFFER_SIZE] = {0}; The rung_buffer is shared between threads without any locks, not safe. test/hotspot/jtreg/runtime/8339725/libagent8339725.cpp line 37: > 35: #define BUFFER_SIZE 100000 > 36: static size_t ring_buffer[BUFFER_SIZE] = {0}; > 37: static volatile int ring_buffer_idx = 0; Why volatile is needed here? test/hotspot/jtreg/runtime/8339725/libagent8339725.cpp line 41: > 39: > 40: void *get_method_details(void *arg) > 41: { please move { on the line with declartion, or 'if' 'for' etc in all places. test/hotspot/jtreg/runtime/8339725/libagent8339725.cpp line 50: > 48: > 49: // For JVM 17, 21, 22 calling GetMethodDeclaringClass is enough. > 50: if ((err = jvmti->GetMethodDeclaringClass(method, &method_class)) == 0) please use JVMTI_ERROR_NONE when check error code. test/hotspot/jtreg/runtime/8339725/libagent8339725.cpp line 53: > 51: { > 52: // JVM 8 needs this to crash > 53: jvmti->GetClassSignature(method_class, &class_name, NULL); The good practice is to check jvmti status and fails if it is not none. Check the 'check_jvmti_status' for this from 'jvmti_common.h" Please check all jvmti functions. ------------- Changes requested by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20907#pullrequestreview-2293888922 PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1752805301 PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1752811014 PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1752803305 PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1752856394 PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1752860230 PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1752835741 PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1752831021 PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1752814805 PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1752819319 PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1752880408 From duke at openjdk.org Wed Sep 11 00:29:30 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 11 Sep 2024 00:29:30 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v4] In-Reply-To: References: Message-ID: > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: c1 and template generator fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20657/files - new: https://git.openjdk.org/jdk/pull/20657/files/39350a37..4aa52bfd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=02-03 Stats: 8 lines in 2 files changed: 5 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20657/head:pull/20657 PR: https://git.openjdk.org/jdk/pull/20657 From duke at openjdk.org Wed Sep 11 00:29:30 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 11 Sep 2024 00:29:30 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v3] In-Reply-To: References: Message-ID: <0BMvrr-HLplOb73v8G3cdcG063jjNDkkBlCCnH8MH9c=.d7f172bf-750d-4ba4-840f-d2f492cac2c9@github.com> On Tue, 10 Sep 2024 16:26:38 GMT, Sandhya Viswanathan wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> update libm tanh reference test with code review suggestions > > src/hotspot/cpu/x86/c1_LIRGenerator_x86.cpp line 810: > >> 808: x->id() == vmIntrinsics::_dpow || x->id() == vmIntrinsics::_dcos || >> 809: x->id() == vmIntrinsics::_dsin || x->id() == vmIntrinsics::_dtan || >> 810: x->id() == vmIntrinsics::_dlog10 || x->id() == vmIntrinsics::_dtanh) { > > Need to have the tanh under #Ifdef _LP64 as we are generating stub only for 64 bit. Please see the newly added `#ifdef `in the updated code. > src/hotspot/cpu/x86/c1_LIRGenerator_x86.cpp line 1000: > >> 998: if (StubRoutines::dtanh() != nullptr) { >> 999: __ call_runtime_leaf(StubRoutines::dtanh(), getThreadTemp(), result_reg, cc->args()); >> 1000: } // TODO: else clause? > > You could instead have an assert here that StubRoutines::dtanh() is not null. Thereby no need for the else clause. Please see the newly added assert in the updated code. > src/hotspot/cpu/x86/templateInterpreterGenerator_x86_32.cpp line 376: > >> 374: // [ hi(arg) ] >> 375: // >> 376: if (kind == Interpreter::java_lang_math_tanh) { > > Need to update the copyright year to 2024 in this file. Please see the year updated to 2024 in the updated code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1752962967 PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1752962670 PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1752963446 From dholmes at openjdk.org Wed Sep 11 00:45:08 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 11 Sep 2024 00:45:08 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: Message-ID: <58EiqcAfncGTnk6wqesW4ZsVyt3Js02aiOpEbl4HCwI=.148ffb62-7571-4aa3-b0dd-996022290e9b@github.com> On Wed, 29 May 2024 12:58:02 GMT, Fredrik Bredberg wrote: > Removed the concept of an ObjectMonitor Responsible thread. > > The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. > > The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). > > After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. > > Passes tier1-tier7 on supported platforms. > x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. > Arm32 and Zero doesn't need any changes as far as I can tell. I've started looking at this and to be honest I'm surprised by the extent and complexity of the changes. The problem description sounded quite simple: get rid of the notion of the Responsible thread by putting in the fence that when missing could lead to stranding. I find it very hard to map many of the actual code changes to that problem statement. And I'm very unclear about the impact on the deflation protocol that this is causing. I think trying to look at diffs is the wrong way to analyze this change, I need to just look at the new code and try to understand the protocol - but that makes it hard to put comments into the PR. :( src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 476: > 474: // Release lock. > 475: movptr(Address(tmpReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner)), NULL_WORD); > 476: membar(StoreLoad); Why a standalone `storeload` here? This does not define a fence, nor release semantics - as per the definitions in orderAccess.hpp src/hotspot/share/runtime/objectMonitor.cpp line 310: > 308: > 309: bool ObjectMonitor::enterI_with_contention_mark(JavaThread* locking_thread, ObjectMonitorContentionMark& contention_mark) { > 310: // Used by ObjectSynchronizer::enter_for() to enter for another thread. This renaming is confusing for me. The `enter_for` methods were made explicit because normally locking is always done by the current thread for the current thread - but deopt breaks that. And now it seems we have an `EnterI` that is really an `EnterI_for` ?? ------------- PR Review: https://git.openjdk.org/jdk/pull/19454#pullrequestreview-2294522607 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1752964988 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1752967126 From dholmes at openjdk.org Wed Sep 11 00:45:09 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 11 Sep 2024 00:45:09 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: Message-ID: <5huGvMY_JIkG_sSqaJw0uosW8XNSYIqG0-2mw7BsCZA=.753ae72e-1f37-4e71-975f-9d12180205bd@github.com> On Tue, 10 Sep 2024 14:23:24 GMT, Coleen Phillimore wrote: >> Removed the concept of an ObjectMonitor Responsible thread. >> >> The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. >> >> The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). >> >> After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. >> >> Passes tier1-tier7 on supported platforms. >> x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. >> Arm32 and Zero doesn't need any changes as far as I can tell. > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 482: > >> 480: // This is faster on Nehalem and AMD Shanghai/Barcelona. >> 481: // See https://blogs.oracle.com/dave/entry/instruction_selection_for_volatile_fences >> 482: lock(); addl(Address(rsp, 0), 0); > > Since there's a membar above, do you need this lock/addl instructions? Just FTR this is a full fence on x86. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1752965437 From dholmes at openjdk.org Wed Sep 11 01:06:21 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 11 Sep 2024 01:06:21 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 21:58:39 GMT, Kim Barrett wrote: > Side note: why are template parameter names all capitals? I think it is an artifact of them usually being a single letter to represent a Type and thus a capital. But then we sometimes use more than a single-letter and decided to capitalize them all. I don't have an issue with things like MT which can be considered an abbreviation and thus okay to capitalize. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20872#issuecomment-2342417399 From darcy at openjdk.org Wed Sep 11 02:02:13 2024 From: darcy at openjdk.org (Joe Darcy) Date: Wed, 11 Sep 2024 02:02:13 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v2] In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 19:10:08 GMT, Srinivas Vamsi Parasa wrote: >> test/jdk/java/lang/Math/HyperbolicTests.java line 1009: >> >>> 1007: for(int i = 0; i < testCases.length; i++) { >>> 1008: double testCase = testCases[i]; >>> 1009: failures += testTanhWithReferenceUlpDiff(testCase, StrictMath.tanh(testCase), 2.5); >> >> The allowable worst-case error is 2.5 ulp, although at many arguments FDLIBM has a smaller error. >> >> For a general Math vs StrictMath test with an allowable 2.5 ulp error, without knowing how accurate FDLIBM is for that function and argument, a large error of approx. 2X the nominal error should be allowed (in case FDLIBM errors in one direction and the Math method errors in the other direction). >> >> If the test is going to use randomness, then its jtreg tags should include >> >> `@key randomness` >> >> and it is preferable to use jdk.test.lib.RandomFactory to get and Random object since that handles printing out a key so the random sequence can be replicated if the test fails. > >> If the test is going to use randomness, then its jtreg tags should include >> >> `@key randomness` >> >> and it is preferable to use jdk.test.lib.RandomFactory to get and Random object since that handles printing out a key so the random sequence can be replicated if the test fails. > > Please see the test updated to use `@key randomness` and` jdk.test.lib.RandomFactory` to get and Random object. > >> The allowable worst-case error is 2.5 ulp, although at many arguments FDLIBM has a smaller error. >> For a general Math vs StrictMath test with an allowable 2.5 ulp error, without knowing how accurate FDLIBM is for that function and argument, a large error of approx. 2X the nominal error should be allowed (in case FDLIBM errors in one direction and the Math method errors in the other direction). >> > So far the tests haven't failed with error of 2.5ulp. Would it be better to make it 5ulp? Please let me know. So far, this will be the only intrinsic implementation of tanh. Therefore, at the moment it is just checking the consistency of the intrinsic implementation with StrictMath/FDLIBM tanh. If the intrinsic has a ~1 ulp accuracy, it would be expected to often be within 2.5 ulps of FDLIBM tanh. However, as written the regression test would not necessarily pass against any allowable Math.tanh implementation, which is the usual criteria for java.lang.Math tests that aren't otherwise constrained (such as by being limited to a given subset of platforms). If there was a correctly rounded tanh to compare against, then this style of testing would be valid. Are there any plan to intrinsify sinh or cosh? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1753010811 From dholmes at openjdk.org Wed Sep 11 02:18:13 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 11 Sep 2024 02:18:13 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v5] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 10:15:43 GMT, Liang Mao wrote: >> Hi, >> >> It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. >> >> test/hotspot/jtreg/runtime and gc are clean. >> >> Thanks, >> Liang > > Liang Mao has updated the pull request incrementally with one additional commit since the last revision: > > Exclude libagent8339725.cpp compiling for windows So what exactly is the robustness fix being proposed here? The jMethodID is not a means to keep alive a class, and so we have an invalid one. Are we just trying to avoid a crash (in which case how are we reporting that the jMethodID is in fact invalid)? or are we actually making the jMethodID keep the class alive, contrary to the JNI Specification? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2342479187 From dholmes at openjdk.org Wed Sep 11 02:26:10 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 11 Sep 2024 02:26:10 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v5] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 10:15:43 GMT, Liang Mao wrote: >> Hi, >> >> It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. >> >> test/hotspot/jtreg/runtime and gc are clean. >> >> Thanks, >> Liang > > Liang Mao has updated the pull request incrementally with one additional commit since the last revision: > > Exclude libagent8339725.cpp compiling for windows src/hotspot/share/prims/jvmtiEnv.cpp line 3216: > 3214: NULL_CHECK(method, JVMTI_ERROR_INVALID_METHODID); > 3215: Klass* k = method->method_holder(); > 3216: Handle holder(Thread::current(), k->klass_holder()); // keep the klass alive How do we know `k` is not already unloaded? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1753025016 From lmao at openjdk.org Wed Sep 11 02:41:06 2024 From: lmao at openjdk.org (Liang Mao) Date: Wed, 11 Sep 2024 02:41:06 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v5] In-Reply-To: References: Message-ID: <63UCF5nY8HTO2hHvbFyYOCQNNjXIxzPI6kah7vaUCEM=.47d3624d-f6e6-4505-a8d8-c91c8e6f8e97@github.com> On Wed, 11 Sep 2024 02:23:10 GMT, David Holmes wrote: >> Liang Mao has updated the pull request incrementally with one additional commit since the last revision: >> >> Exclude libagent8339725.cpp compiling for windows > > src/hotspot/share/prims/jvmtiEnv.cpp line 3216: > >> 3214: NULL_CHECK(method, JVMTI_ERROR_INVALID_METHODID); >> 3215: Klass* k = method->method_holder(); >> 3216: Handle holder(Thread::current(), k->klass_holder()); // keep the klass alive > > How do we know `k` is not already unloaded? We don't know `k` would be unloaded or not in the middle of GC concurrent marking. So I think we need to keep it alive just like accessing weak reference. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1753034077 From lmao at openjdk.org Wed Sep 11 03:26:05 2024 From: lmao at openjdk.org (Liang Mao) Date: Wed, 11 Sep 2024 03:26:05 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v5] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 02:23:10 GMT, David Holmes wrote: >> Liang Mao has updated the pull request incrementally with one additional commit since the last revision: >> >> Exclude libagent8339725.cpp compiling for windows > > src/hotspot/share/prims/jvmtiEnv.cpp line 3216: > >> 3214: NULL_CHECK(method, JVMTI_ERROR_INVALID_METHODID); >> 3215: Klass* k = method->method_holder(); >> 3216: Handle holder(Thread::current(), k->klass_holder()); // keep the klass alive > > How do we know `k` is not already unloaded? As @fisk mentioned above, `k->is_loader_alive()` is checked in caller `jvmti_GetMethodDeclaringClass`. So we won't enter here if k is already unloaded. The exception this PR handles is to deal with the `k` in the middle of concurrent marking that `is_loader_alive` returns `true` but could be unloaded soon. We'd better keep it alive just like accessing weak reference. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1753067662 From lmao at openjdk.org Wed Sep 11 03:45:05 2024 From: lmao at openjdk.org (Liang Mao) Date: Wed, 11 Sep 2024 03:45:05 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v5] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 02:15:51 GMT, David Holmes wrote: > So what exactly is the robustness fix being proposed here? The jMethodID is not a means to keep alive a class, and so we have an invalid one. Are we just trying to avoid a crash (in which case how are we reporting that the jMethodID is in fact invalid)? or are we actually making the jMethodID keep the class alive, contrary to the JNI Specification? I guess it is a defect of VM. Before the GC is done and class is unloaded, the klass is still "alive" and the jvmti API already did correct thing. GC needs to implicitly keep it alive in the middle of concurrent marking just like java threads access a weak reference. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2342558027 From dholmes at openjdk.org Wed Sep 11 04:26:07 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 11 Sep 2024 04:26:07 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v5] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 10:15:43 GMT, Liang Mao wrote: >> Hi, >> >> It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. >> >> test/hotspot/jtreg/runtime and gc are clean. >> >> Thanks, >> Liang > > Liang Mao has updated the pull request incrementally with one additional commit since the last revision: > > Exclude libagent8339725.cpp compiling for windows Okay, so we have: jvmti_GetMethodDeclaringClass(jvmtiEnv* env, jmethodID method, jclass* declaring_class_ptr) { which does: Method* checked_method = Method::checked_resolve_jmethod_id(method); if (checked_method == nullptr) { return JVMTI_ERROR_INVALID_METHODID; } if (declaring_class_ptr == nullptr) { return JVMTI_ERROR_NULL_POINTER; } err = jvmti_env->GetMethodDeclaringClass(checked_method, declaring_class_ptr); and in `checked_resolve_jmethod_id` we have: // If the method's class holder object is unreferenced, but not yet marked as // unloaded, we need to return null here too because after a safepoint, its memory // will be reclaimed. return o->method_holder()->is_loader_alive() ? o : nullptr; so the loader must be alive at this point but could be unreferenced. How is the loader allowed to become not-alive after this check, whilst within `GetMethodDeclaringClass`? The current thread is `_thread_in_vm` so not safepoint safe, so no safepoint can occur. Does the GC allow it to become not-alive / get unloaded, concurrently with the execution of code like this? If so then that could have happened before we call `klass_holder()` and create the `Handle` here couldn't it: Klass* k = method->method_holder(); Handle holder(Thread::current(), k->klass_holder()); // keep the klass alive ?? General question: if `k->klass_holder()` keeps the class alive, at what point do we "release" this such that class can become not-alive again? BTW this seems a general problem that might impact any JVMTI function that has performed the above safety-check and then gone on to use the Method and its holder class. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2342592000 From lmao at openjdk.org Wed Sep 11 05:00:05 2024 From: lmao at openjdk.org (Liang Mao) Date: Wed, 11 Sep 2024 05:00:05 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v5] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 04:23:16 GMT, David Holmes wrote: > so the loader must be alive at this point but could be unreferenced. How is the loader allowed to become not-alive after this check, whilst within GetMethodDeclaringClass? The current thread is _thread_in_vm so not safepoint safe, so no safepoint can occur. Does the GC allow it to become not-alive / get unloaded, concurrently with the execution of code like this? Yes. `GetMethodDeclaringClass` is runnning in state of `_thread_in_vm` but it returns JNI handle with class into user native code then thread will be in `_thread_in_native` and GC can still go into end of marking and do reference processing and class unloading. > If so then that could have happened before we call klass_holder() and create the Handle here couldn't it: Klass* k = method->method_holder(); Handle holder(Thread::current(), k->klass_holder()); // keep the klass alive Inside "JvmtiEnv::GetMethodDeclaringClass" we call `klass_holder` to keep it alive in the thread state of `_thread_in_vm` which guarantees the safety. > General question: if k->klass_holder() keeps the class alive, at what point do we "release" this such that class can become not-alive again? In next GC, the class could be unloaded if the JNI handle with class is released. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2342623079 From aboldtch at openjdk.org Wed Sep 11 06:21:07 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 11 Sep 2024 06:21:07 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: <58EiqcAfncGTnk6wqesW4ZsVyt3Js02aiOpEbl4HCwI=.148ffb62-7571-4aa3-b0dd-996022290e9b@github.com> References: <58EiqcAfncGTnk6wqesW4ZsVyt3Js02aiOpEbl4HCwI=.148ffb62-7571-4aa3-b0dd-996022290e9b@github.com> Message-ID: On Wed, 11 Sep 2024 00:29:14 GMT, David Holmes wrote: >> Removed the concept of an ObjectMonitor Responsible thread. >> >> The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. >> >> The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). >> >> After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. >> >> Passes tier1-tier7 on supported platforms. >> x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. >> Arm32 and Zero doesn't need any changes as far as I can tell. > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 476: > >> 474: // Release lock. >> 475: movptr(Address(tmpReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner)), NULL_WORD); >> 476: membar(StoreLoad); > > Why a standalone `storeload` here? This does not define a fence, nor release semantics - as per the definitions in orderAccess.hpp On x86 `membar(LoadStore | StoreStore /* release */)` would be a nop. Not sure if adding it before nulling the pointer would make things clearer. `membar(StoreLoad);` is all that we need between clearing the owner and checking the queues / successor. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1753182728 From aboldtch at openjdk.org Wed Sep 11 06:21:08 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 11 Sep 2024 06:21:08 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: <5huGvMY_JIkG_sSqaJw0uosW8XNSYIqG0-2mw7BsCZA=.753ae72e-1f37-4e71-975f-9d12180205bd@github.com> References: <5huGvMY_JIkG_sSqaJw0uosW8XNSYIqG0-2mw7BsCZA=.753ae72e-1f37-4e71-975f-9d12180205bd@github.com> Message-ID: On Wed, 11 Sep 2024 00:30:17 GMT, David Holmes wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 482: >> >>> 480: // This is faster on Nehalem and AMD Shanghai/Barcelona. >>> 481: // See https://blogs.oracle.com/dave/entry/instruction_selection_for_volatile_fences >>> 482: lock(); addl(Address(rsp, 0), 0); >> >> Since there's a membar above, do you need this lock/addl instructions? > > Just FTR this is a full fence on x86. It is not needed. `membar(StoreLoad)` does exactly this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1753184198 From dholmes at openjdk.org Wed Sep 11 06:57:10 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 11 Sep 2024 06:57:10 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: <58EiqcAfncGTnk6wqesW4ZsVyt3Js02aiOpEbl4HCwI=.148ffb62-7571-4aa3-b0dd-996022290e9b@github.com> Message-ID: <8dI2OYtHYT1bkBhLR6nxiwYtsEf_bDsMk8i3nzNG2hI=.aadb0e69-8096-42f0-a2ea-2323babc494c@github.com> On Wed, 11 Sep 2024 06:16:33 GMT, Axel Boldt-Christmas wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 476: >> >>> 474: // Release lock. >>> 475: movptr(Address(tmpReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner)), NULL_WORD); >>> 476: membar(StoreLoad); >> >> Why a standalone `storeload` here? This does not define a fence, nor release semantics - as per the definitions in orderAccess.hpp > > On x86 `membar(LoadStore | StoreStore /* release */)` would be a nop. Not sure if adding it before nulling the pointer would make things clearer. > > `membar(StoreLoad);` is all that we need between clearing the owner and checking the queues / successor. Okay so again a "fence" reduces to a "StoreLoad" in practice for x86, but it would be nice if something - even a comment - stated "full fence" so the semantics are clear. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1753281373 From dholmes at openjdk.org Wed Sep 11 06:57:10 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 11 Sep 2024 06:57:10 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: <5huGvMY_JIkG_sSqaJw0uosW8XNSYIqG0-2mw7BsCZA=.753ae72e-1f37-4e71-975f-9d12180205bd@github.com> Message-ID: On Wed, 11 Sep 2024 06:18:02 GMT, Axel Boldt-Christmas wrote: >> Just FTR this is a full fence on x86. > > It is not needed. `membar(StoreLoad)` does exactly this. Okay, but if this is intended semantically to be a full fence then it should read that way not just "storeload". Not everyone will consider these synonymous. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1753276845 From dholmes at openjdk.org Wed Sep 11 07:04:05 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 11 Sep 2024 07:04:05 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v5] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 10:15:43 GMT, Liang Mao wrote: >> Hi, >> >> It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. >> >> test/hotspot/jtreg/runtime and gc are clean. >> >> Thanks, >> Liang > > Liang Mao has updated the pull request incrementally with one additional commit since the last revision: > > Exclude libagent8339725.cpp compiling for windows I think you are missing my point: Klass* k = method->method_holder(); // Why can't k already be not-alive here? Handle holder(Thread::current(), k->klass_holder()); // keep the klass alive > In next GC, the class could be unloaded if the JNI handle with class is released. Again this misses my point. If calling `k->klass_holder()` keeps `k` alive, what do we need to do to undo that? I assume there must be some kind of reference count in relation to this. ?? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2342820672 From jeisl at openjdk.org Wed Sep 11 07:05:17 2024 From: jeisl at openjdk.org (Josef Eisl) Date: Wed, 11 Sep 2024 07:05:17 GMT Subject: RFR: 8338768: Introduce runtime lookup to check for static builds [v2] In-Reply-To: References: <56GIZnufresPSrWCWHPkbY9-qCGlm20L-nbXUi5DFv8=.445586cf-37dc-45ce-9b91-9d0a6c85e5ca@github.com> <1xFVXOUr023IxIHucx0v8dket_QOWsIV0G3wVHnF5aE=.23e8334b-a4ed-4d94-8ad9-376ffe043882@github.com> Message-ID: On Tue, 10 Sep 2024 21:47:15 GMT, Magnus Ihse Bursie wrote: > sort out the mess that is libawt_headless/libawt_xawt sounds good. can you point me to a JBS? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20666#discussion_r1753307449 From mli at openjdk.org Wed Sep 11 07:07:38 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 11 Sep 2024 07:07:38 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v4] In-Reply-To: References: Message-ID: <0tOr6DKh8IeO8SUe9n_ozXTJC9AMjIif08FZRfIVUfQ=.5377ded9-363f-40ae-89be-31ed1e894ea3@github.com> > Hi, > Can you help to review this patch? > Thanks. > > This improvement is based on java.base/share/native/libzip/zlib/zcrc32.c, I made some modification to N (to 16) related code, then re-generate the tables needed, finally vectorize the code (original implementation in zcrc32.c is just scalar code). > > ## Test > test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java, > test/jdk/java/util/zip/TestCRC32.java > > ## Performance > > ###?on bananapi > > with patch > > Benchmark -with patch | (count) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- > TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.884 | 0.03 | ns/op > TestCRC32.testCRC32Update | 128 | avgt | 10 | 401.122 | 0.309 | ns/op > TestCRC32.testCRC32Update | 256 | avgt | 10 | 680.168 | 0.032 | ns/op > TestCRC32.testCRC32Update | 512 | avgt | 10 | 1062.426 | 0.401 | ns/op > TestCRC32.testCRC32Update | 2048 | avgt | 10 | 3308.361 | 0.176 | ns/op > TestCRC32.testCRC32Update | 16384 | avgt | 10 | 24403.231 | 20.248 | ns/op > TestCRC32.testCRC32Update | 65536 | avgt | 10 | 103463.735 | 4.245 | ns/op > > > > without patch > > Benchmark -without patch | (count) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- > TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.942 | 0.224 | ns/op > TestCRC32.testCRC32Update | 128 | avgt | 10 | 364.159 | 0.019 | ns/op > TestCRC32.testCRC32Update | 256 | avgt | 10 | 686.106 | 0.1 | ns/op > TestCRC32.testCRC32Update | 512 | avgt | 10 | 1328.962 | 0.073 | ns/op > TestCRC32.testCRC32Update | 2048 | avgt | 10 | 5191.116 | 0.189 | ns/op > TestCRC32.testCRC32Update | 16384 | avgt | 10 | 41286.858 | 4.53 | ns/op > TestCRC32.testCRC32Update | 65536 | avgt | 10 | 172340.099 | 11.004 | ns/op > > > > ### on K230 > > with patch > References: Message-ID: <98VDCDWoHHv8YxGMOqMqIGBOMBrR9tDhCMn51LcOVhU=.fae4048e-2dd7-48c7-8a81-393fa6e8a607@github.com> On Wed, 29 May 2024 12:58:02 GMT, Fredrik Bredberg wrote: > Removed the concept of an ObjectMonitor Responsible thread. > > The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. > > The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). > > After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. > > Passes tier1-tier7 on supported platforms. > x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. > Arm32 and Zero doesn't need any changes as far as I can tell. src/hotspot/share/runtime/objectMonitor.cpp line 313: > 311: // The monitor is private to or already owned by locking_thread which must be suspended. > 312: // So this code may only contend with deflation. > 313: assert(locking_thread == Thread::current() || locking_thread->is_obj_deopt_suspend(), "must be"); I feel like the comments and assert now belong in `ObjectMonitor::enter_for_with_contention_mark`. `enterI_with_contention_mark` should be renamed. This is now a sort of `TryLock_with_contention_mark`. `add_to_contentions(1);` below could be changed to `contention_mark.extend()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1753348561 From lmao at openjdk.org Wed Sep 11 07:19:05 2024 From: lmao at openjdk.org (Liang Mao) Date: Wed, 11 Sep 2024 07:19:05 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v5] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 07:01:27 GMT, David Holmes wrote: > I think you are missing my point: > > ``` > Klass* k = method->method_holder(); > // Why can't k already be not-alive here? > Handle holder(Thread::current(), k->klass_holder()); // keep the klass alive > ``` > > > In next GC, the class could be unloaded if the JNI handle with class is released. > > Again this misses my point. If calling `k->klass_holder()` keeps `k` alive, what do we need to do to undo that? I assume there must be some kind of reference count in relation to this. ?? We don't need to undo. The holder is a Java heap oop with WeakHandle connected to CLD. If we keep it alive, the oop will be made tri-color marking `GREY` and then be traced/marked. It would be dead in next GC cycle if no explicit reference or keeping alive again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2342847237 From aboldtch at openjdk.org Wed Sep 11 07:24:08 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 11 Sep 2024 07:24:08 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 20:02:37 GMT, Coleen Phillimore wrote: >> Removed the concept of an ObjectMonitor Responsible thread. >> >> The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. >> >> The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). >> >> After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. >> >> Passes tier1-tier7 on supported platforms. >> x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. >> Arm32 and Zero doesn't need any changes as far as I can tell. > > src/hotspot/share/runtime/objectMonitor.cpp line 574: > >> 572: for (;;) { >> 573: if (own == DEFLATER_MARKER) { >> 574: if (TryLockI(current)) { > > I can't tell the difference between TryLockI and enter_for(). Did I previously object to enter_for() here? Maybe I should take that back, and there should be a comment above enter_for() like > // Enters a lock in behalf of a non-current thread, or a thread that is exiting and has previously given up the lock. > // and it handles deflation. > > You could add a boolean that you expect success for the enter_for() caller from deoptimization (ie. must_succeed). > > This code is getting repetitive - it looks the same in all these places only a little bit different and hard to know why. I'd rather have a `TryLock_with_contention_mark` function which `enter_for` uses. Have stronger asserts in `enter_for`, essentially that it is used correctly from Deopt and that the result of `TryLock_with_contention_mark` is true. Suggestion: // Block out deflation as soon as possible. ObjectMonitorContentionMark contention_mark(this); // Check for deflation. if (enter_is_async_deflating()) { // Treat deflation as interference return TryLockResult::Interference; } if (TryLock_with_contention_mark(current, contention_mark)) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1753366689 From fyang at openjdk.org Wed Sep 11 07:28:07 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 11 Sep 2024 07:28:07 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v4] In-Reply-To: <0tOr6DKh8IeO8SUe9n_ozXTJC9AMjIif08FZRfIVUfQ=.5377ded9-363f-40ae-89be-31ed1e894ea3@github.com> References: <0tOr6DKh8IeO8SUe9n_ozXTJC9AMjIif08FZRfIVUfQ=.5377ded9-363f-40ae-89be-31ed1e894ea3@github.com> Message-ID: On Wed, 11 Sep 2024 07:07:38 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> Thanks. >> >> This improvement is based on java.base/share/native/libzip/zlib/zcrc32.c, I made some modification to N (to 16) related code, then re-generate the tables needed, finally vectorize the code (original implementation in zcrc32.c is just scalar code). >> >> ## Test >> test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java, >> test/jdk/java/util/zip/TestCRC32.java >> >> ## Performance >> >> ###?on bananapi >> >> with patch >> >> Benchmark | (count) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- >> TestCRC32.testCRC32Update | 64 | avgt | 10 | 222.297 | 0.106 | ns/op >> TestCRC32.testCRC32Update | 128 | avgt | 10 | 365.144 | 0.196 | ns/op >> TestCRC32.testCRC32Update | 256 | avgt | 10 | 687.14 | 0.235 | ns/op >> TestCRC32.testCRC32Update | 512 | avgt | 10 | 1043.833 | 0.083 | ns/op >> TestCRC32.testCRC32Update | 2048 | avgt | 10 | 3299.928 | 1.361 | ns/op >> TestCRC32.testCRC32Update | 16384 | avgt | 10 | 24384.502 | 25.298 | ns/op >> TestCRC32.testCRC32Update | 65536 | avgt | 10 | 103200.458 | 8.297 | ns/op >> >> >> >> without patch >> >> Benchmark | (count) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- >> TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.878 | 0.02 | ns/op >> TestCRC32.testCRC32Update | 128 | avgt | 10 | 364.173 | 0.032 | ns/op >> TestCRC32.testCRC32Update | 256 | avgt | 10 | 685.815 | 0.055 | ns/op >> TestCRC32.testCRC32Update | 512 | avgt | 10 | 1329.049 | 0.084 | ns/op >> TestCRC32.testCRC32Update | 2048 | avgt | 10 | 5189.302 | 0.666 | ns/op >> TestCRC32.testCRC32Update | 16384 | avgt | 10 | 41250.873 | 23.882 | ns/op >> TestCRC32.testCRC32Update | 65536 | avgt | 10 | 171664.002 | 15.011 | ns/op >> >> > ... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > fix perf regression src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1584: > 1582: sub(tmp1, len, tmp_limit); > 1583: bge(tmp1, zr, L_vector_entry); > 1584: } Hi Hamlin, I think maybe we should introduce another assember routine for the vector code? Let's say `kernel_crc32_using_vector` and delegate the work to it under `UseRVV`. That seems more cleaner to me and avoids "offset is too large" issue. I will take a look at the vector code later. BTW: Should `single_talbe_size` be `single_table_size`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1753373397 From lmao at openjdk.org Wed Sep 11 07:32:07 2024 From: lmao at openjdk.org (Liang Mao) Date: Wed, 11 Sep 2024 07:32:07 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v5] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 07:16:07 GMT, Liang Mao wrote: > // Why can't k already be not-alive here? No. `k` can't be already dead here. is_loader_alive already returned true. `k` could be dead later so we need to make sure the resurrection if it needs access. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2342871447 From mli at openjdk.org Wed Sep 11 07:46:04 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 11 Sep 2024 07:46:04 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v4] In-Reply-To: References: <0tOr6DKh8IeO8SUe9n_ozXTJC9AMjIif08FZRfIVUfQ=.5377ded9-363f-40ae-89be-31ed1e894ea3@github.com> Message-ID: On Wed, 11 Sep 2024 07:23:27 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> fix perf regression > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1584: > >> 1582: sub(tmp1, len, tmp_limit); >> 1583: bge(tmp1, zr, L_vector_entry); >> 1584: } > > Hi Hamlin, I think maybe we should introduce another assember routine for the vector code? Let's say `kernel_crc32_using_vector` and delegate the work to it under `UseRVV`. That seems more cleaner to me and avoids "offset is too large" issue. I will take a look at the vector code later. BTW: Should `single_talbe_size` be `single_table_size`? Not sure if I understand your suggestion correctly. Do you mean something like below? address generate_updateBytesCRC32() { if (UseRVV) { kernel_crc32_using_vector(); } else { kernel_crc32(...); } } But as kernel_crc32_using_vector reuses the code in kernel_crc32, and even with UseRVV, in some condition (when size is not large enough) we still need to fallback to L_unroll_loop_entry. Or maybe I could misunderstand what you mean? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1753436559 From fyang at openjdk.org Wed Sep 11 07:51:06 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 11 Sep 2024 07:51:06 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: Message-ID: On Wed, 29 May 2024 12:58:02 GMT, Fredrik Bredberg wrote: > Removed the concept of an ObjectMonitor Responsible thread. > > The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. > > The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). > > After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. > > Passes tier1-tier7 on supported platforms. > x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. > Arm32 and Zero doesn't need any changes as far as I can tell. Performed hs-tier1 - hs-tier3 tests on linux-riscv64 platform. Two minor comments for the riscv part. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 237: > 235: ld(t0, Address(tmp, ObjectMonitor::EntryList_offset())); > 236: ld(disp_hdr, Address(tmp, ObjectMonitor::cxq_offset())); > 237: orr(t0, t0, disp_hdr); It looks better to me if we use `tmp1Reg` here instead of its alias `disp_hdr` like you do for aarch64. I mean: ld(tmp1Reg, Address(tmp, ObjectMonitor::cxq_offset())); orr(t0, t0, tmp1Reg); src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 568: > 566: ld(tmp3_t, Address(tmp1_monitor, ObjectMonitor::cxq_offset())); > 567: orr(t0, t0, tmp3_t); > 568: beqz(t0, unlocked); // If so we are done. You might want to remove the preceding definition of label `release` as it is not used after this change. ------------- Changes requested by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19454#pullrequestreview-2292742845 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1752121330 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1752123657 From mli at openjdk.org Wed Sep 11 07:55:05 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 11 Sep 2024 07:55:05 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v4] In-Reply-To: References: <0tOr6DKh8IeO8SUe9n_ozXTJC9AMjIif08FZRfIVUfQ=.5377ded9-363f-40ae-89be-31ed1e894ea3@github.com> Message-ID: On Wed, 11 Sep 2024 07:43:12 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1584: >> >>> 1582: sub(tmp1, len, tmp_limit); >>> 1583: bge(tmp1, zr, L_vector_entry); >>> 1584: } >> >> Hi Hamlin, I think maybe we should introduce another assember routine for the vector code? Let's say `kernel_crc32_using_vector` and delegate the work to it under `UseRVV`. That seems more cleaner to me and avoids "offset is too large" issue. I will take a look at the vector code later. BTW: Should `single_talbe_size` be `single_table_size`? > > Not sure if I understand your suggestion correctly. Do you mean something like below? > > address generate_updateBytesCRC32() { > if (UseRVV) { kernel_crc32_using_vector(); } > else { kernel_crc32(...); } > } > > But as kernel_crc32_using_vector reuses the code in kernel_crc32, and even with UseRVV, in some condition (when size is not large enough) we still need to fallback to L_unroll_loop_entry. > Or maybe I could misunderstand what you mean? In a summary, the code paths are went through in following order: vector(optional) -> loop unroll -> other scalar cases, it depens on data size + UseRVV. So in UseRVV case, we need all the code path. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1753467371 From mli at openjdk.org Wed Sep 11 08:06:23 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 11 Sep 2024 08:06:23 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v5] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > Thanks. > > This improvement is based on java.base/share/native/libzip/zlib/zcrc32.c, I made some modification to N (to 16) related code, then re-generate the tables needed, finally vectorize the code (original implementation in zcrc32.c is just scalar code). > > ## Test > test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java, > test/jdk/java/util/zip/TestCRC32.java > > ## Performance > > ###?on bananapi > > with patch > > Benchmark | (count) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- > TestCRC32.testCRC32Update | 64 | avgt | 10 | 222.297 | 0.106 | ns/op > TestCRC32.testCRC32Update | 128 | avgt | 10 | 365.144 | 0.196 | ns/op > TestCRC32.testCRC32Update | 256 | avgt | 10 | 687.14 | 0.235 | ns/op > TestCRC32.testCRC32Update | 512 | avgt | 10 | 1043.833 | 0.083 | ns/op > TestCRC32.testCRC32Update | 2048 | avgt | 10 | 3299.928 | 1.361 | ns/op > TestCRC32.testCRC32Update | 16384 | avgt | 10 | 24384.502 | 25.298 | ns/op > TestCRC32.testCRC32Update | 65536 | avgt | 10 | 103200.458 | 8.297 | ns/op > > > > without patch > > Benchmark | (count) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- > TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.878 | 0.02 | ns/op > TestCRC32.testCRC32Update | 128 | avgt | 10 | 364.173 | 0.032 | ns/op > TestCRC32.testCRC32Update | 256 | avgt | 10 | 685.815 | 0.055 | ns/op > TestCRC32.testCRC32Update | 512 | avgt | 10 | 1329.049 | 0.084 | ns/op > TestCRC32.testCRC32Update | 2048 | avgt | 10 | 5189.302 | 0.666 | ns/op > TestCRC32.testCRC32Update | 16384 | avgt | 10 | 41250.873 | 23.882 | ns/op > TestCRC32.testCRC32Update | 65536 | avgt | 10 | 171664.002 | 15.011 | ns/op > > > > ### on K230 > > with patch > References: Message-ID: > Hi, > > It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. > > test/hotspot/jtreg/runtime and gc are clean. > > Thanks, > Liang Liang Mao has updated the pull request incrementally with one additional commit since the last revision: amend test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20907/files - new: https://git.openjdk.org/jdk/pull/20907/files/d08bdeb0..4f69639f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20907&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20907&range=04-05 Stats: 134 lines in 2 files changed: 35 ins; 30 del; 69 mod Patch: https://git.openjdk.org/jdk/pull/20907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20907/head:pull/20907 PR: https://git.openjdk.org/jdk/pull/20907 From lmao at openjdk.org Wed Sep 11 08:19:48 2024 From: lmao at openjdk.org (Liang Mao) Date: Wed, 11 Sep 2024 08:19:48 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v7] In-Reply-To: References: Message-ID: <3c_-moBPOJMeQwPQ8aW_VpoLw2IN2CgBjxZ9z4Z4aoM=.d123183d-4cd2-441e-9855-229808ff0885@github.com> > Hi, > > It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. > > test/hotspot/jtreg/runtime and gc are clean. > > Thanks, > Liang Liang Mao has updated the pull request incrementally with one additional commit since the last revision: Remove useless is_loader_alive ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20907/files - new: https://git.openjdk.org/jdk/pull/20907/files/4f69639f..4efc1b0c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20907&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20907&range=05-06 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20907/head:pull/20907 PR: https://git.openjdk.org/jdk/pull/20907 From mli at openjdk.org Wed Sep 11 08:20:06 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 11 Sep 2024 08:20:06 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v4] In-Reply-To: References: <0tOr6DKh8IeO8SUe9n_ozXTJC9AMjIif08FZRfIVUfQ=.5377ded9-363f-40ae-89be-31ed1e894ea3@github.com> Message-ID: On Wed, 11 Sep 2024 08:03:02 GMT, Fei Yang wrote: >> In a summary, the code paths are went through in following order: vector(optional) -> loop unroll -> other scalar cases, it depens on data size + UseRVV. So in UseRVV case, we need all the code path. > > Ah, I see. Regarding the "offset is too large" issue, could the far versions of these branches help? I mean setting the `is_far` parameters to true [1]. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp#L670-L675 I was thinking the similar thing. But it's bit tedious, as I need to figure out which jump should be far and which not, or I could just simply change any where suspicious, but it's not good either. So I pick the simplest solution to move it to the end, in other intrinsics I saw the similar code layout, they don't have the comment for it, but I guess they are doing the similar things. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1753549238 From epeter at openjdk.org Wed Sep 11 08:28:13 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 11 Sep 2024 08:28:13 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 19:11:30 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix FullGCForwarding initialization @rkennke Can you please explain the changes in these tests: test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},` This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction. Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2342983487 From rcastanedalo at openjdk.org Wed Sep 11 08:30:02 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 11 Sep 2024 08:30:02 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v20] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Fix a few style issues ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/0979e41e..141020e6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=18-19 Stats: 7 lines in 3 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From fyang at openjdk.org Wed Sep 11 08:30:08 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 11 Sep 2024 08:30:08 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v4] In-Reply-To: References: <0tOr6DKh8IeO8SUe9n_ozXTJC9AMjIif08FZRfIVUfQ=.5377ded9-363f-40ae-89be-31ed1e894ea3@github.com> Message-ID: On Wed, 11 Sep 2024 08:17:07 GMT, Hamlin Li wrote: >> Ah, I see. Regarding the "offset is too large" issue, could the far versions of these branches help? I mean setting the `is_far` parameters to true [1]. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp#L670-L675 > > I was thinking the similar thing. > But it's bit tedious, as I need to figure out which jump should be far and which not, or I could just simply change any where suspicious, but it's not good either. > So I pick the simplest solution to move it to the end, in other intrinsics I saw the similar code layout, they don't have the comment for it, but I guess they are doing the similar things. Yeah, I understand what you mean. I am just a bit concerned about the two `j(L_exit)` added. Seems the one in the `if (UseRVV)` block is not necessary? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1753572543 From fyang at openjdk.org Wed Sep 11 08:30:09 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 11 Sep 2024 08:30:09 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v5] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 08:06:23 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> Thanks. >> >> This improvement is based on java.base/share/native/libzip/zlib/zcrc32.c, I made some modification to N (to 16) related code, then re-generate the tables needed, finally vectorize the code (original implementation in zcrc32.c is just scalar code). >> >> ## Test >> test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java, >> test/jdk/java/util/zip/TestCRC32.java >> >> ## Performance >> >> ###?on bananapi >> >> with patch >> >> Benchmark | (count) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- >> TestCRC32.testCRC32Update | 64 | avgt | 10 | 222.297 | 0.106 | ns/op >> TestCRC32.testCRC32Update | 128 | avgt | 10 | 365.144 | 0.196 | ns/op >> TestCRC32.testCRC32Update | 256 | avgt | 10 | 687.14 | 0.235 | ns/op >> TestCRC32.testCRC32Update | 512 | avgt | 10 | 1043.833 | 0.083 | ns/op >> TestCRC32.testCRC32Update | 2048 | avgt | 10 | 3299.928 | 1.361 | ns/op >> TestCRC32.testCRC32Update | 16384 | avgt | 10 | 24384.502 | 25.298 | ns/op >> TestCRC32.testCRC32Update | 65536 | avgt | 10 | 103200.458 | 8.297 | ns/op >> >> >> >> without patch >> >> Benchmark | (count) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- >> TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.878 | 0.02 | ns/op >> TestCRC32.testCRC32Update | 128 | avgt | 10 | 364.173 | 0.032 | ns/op >> TestCRC32.testCRC32Update | 256 | avgt | 10 | 685.815 | 0.055 | ns/op >> TestCRC32.testCRC32Update | 512 | avgt | 10 | 1329.049 | 0.084 | ns/op >> TestCRC32.testCRC32Update | 2048 | avgt | 10 | 5189.302 | 0.666 | ns/op >> TestCRC32.testCRC32Update | 16384 | avgt | 10 | 41250.873 | 23.882 | ns/op >> TestCRC32.testCRC32Update | 65536 | avgt | 10 | 171664.002 | 15.011 | ns/op >> >> > ... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > typo src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1645: > 1643: andi(tmp2, tmp2, right_8_bits); > 1644: update_byte_crc32(crc, tmp2, table0); > 1645: j(L_exit); And this one will simply jump to next when UseRVV is false. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1753575832 From adinn at openjdk.org Wed Sep 11 08:30:39 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 11 Sep 2024 08:30:39 GMT Subject: RFR: 8339849: Enumerate opto and C1 stubs,generate enums, names, fields and generator calls Message-ID: Systematize handling of Opto and C1 stubs. Generate enum ids, static fields, stub/blob names and generator code from declarations using template macros as previously done with Shared stubs. Systematically reference stubs and stub names using ids. ------------- Commit messages: - fix missing include - fix problems with enum tags - enum fixes - correct renaming of patchign ids to stub ids - 8339849: Enumerate opto and C1 stubs,generate enums, names, fields and generator calls - systematize C1 stubs - systematize opto stubs Changes: https://git.openjdk.org/jdk/pull/20936/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20936&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339849 Stats: 997 lines in 44 files changed: 229 ins; 80 del; 688 mod Patch: https://git.openjdk.org/jdk/pull/20936.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20936/head:pull/20936 PR: https://git.openjdk.org/jdk/pull/20936 From adinn at openjdk.org Wed Sep 11 08:30:39 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 11 Sep 2024 08:30:39 GMT Subject: RFR: 8339849: Enumerate opto and C1 stubs, generate enums, names, fields and generator calls In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 15:27:09 GMT, Andrew Dinn wrote: > Systematize handling of Opto and C1 stubs. Generate enum ids, static fields, stub/blob names and generator code from declarations using template macros as previously done with Shared stubs. Systematically reference stubs and stub names using ids. src/hotspot/share/opto/runtime.cpp line 183: > 181: OPTO_STUBS_DO(GEN_OPTO_BLOB, GEN_OPTO_STUB, GEN_OPTO_JVMTI_STUB) > 182: > 183: /* This old code that has been commented out needs to be removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20936#discussion_r1753581414 From rcastanedalo at openjdk.org Wed Sep 11 08:32:12 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 11 Sep 2024 08:32:12 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v19] In-Reply-To: <5CDsDluK4CaytgLTPJPuKbz8Ug9mcF10mco0In8ljZM=.3fe332b8-36af-4de5-8cee-73f58a564497@github.com> References: <5CDsDluK4CaytgLTPJPuKbz8Ug9mcF10mco0In8ljZM=.3fe332b8-36af-4de5-8cee-73f58a564497@github.com> Message-ID: <8-IYniHv9GgBnsv9w3GggGF1mKKf3MfwxIxGIjEUh3c=.446607ac-5624-4c16-a1a5-a29187526023@github.com> On Tue, 10 Sep 2024 16:26:58 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Fix indentation in generate_post_barrier_fast_path > > Co-authored-by: Thomas Schatzl <59967451+tschatzl at users.noreply.github.com> I just fixed a few more indentation and code style glitches found by clang-format in commit 141020e6 (thanks @dlunde for helping with the setup). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2342993484 From adinn at openjdk.org Wed Sep 11 08:41:09 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 11 Sep 2024 08:41:09 GMT Subject: RFR: 8339849: Enumerate opto and C1 stubs, generate enums, names, fields and generator calls In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 15:27:09 GMT, Andrew Dinn wrote: > Systematize handling of Opto and C1 stubs. Generate enum ids, static fields, stub/blob names and generator code from declarations using template macros as previously done with Shared stubs. Systematically reference stubs and stub names using ids. src/hotspot/share/opto/runtime.cpp line 151: > 149: // defines temporarily rebind the generated names to reference the > 150: // relevant implementations. > 151: I am not 100% happy about using defines to finesse this problem of common C targets. One alternative here is to define methods local to class OptoRuntime which fit the generator naming convention and have them forward the call to the SharedRuntime methods. n.b. I used (local) method forwarding to allow blobs to share common typefunc providers. Another alternative is to declare the C target as a parameter to the opto blob declaration macro. That's more flexible but in almost all cases it repeats information already present and makes understanding and updating the declarations more complex. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20936#discussion_r1753617986 From lmao at openjdk.org Wed Sep 11 08:55:45 2024 From: lmao at openjdk.org (Liang Mao) Date: Wed, 11 Sep 2024 08:55:45 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v8] In-Reply-To: References: Message-ID: > Hi, > > It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. > > test/hotspot/jtreg/runtime and gc are clean. > > Thanks, > Liang Liang Mao has updated the pull request incrementally with one additional commit since the last revision: amend test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20907/files - new: https://git.openjdk.org/jdk/pull/20907/files/4efc1b0c..989fdb18 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20907&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20907&range=06-07 Stats: 26 lines in 2 files changed: 7 ins; 11 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/20907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20907/head:pull/20907 PR: https://git.openjdk.org/jdk/pull/20907 From ddong at openjdk.org Wed Sep 11 08:55:46 2024 From: ddong at openjdk.org (Denghui Dong) Date: Wed, 11 Sep 2024 08:55:46 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v5] In-Reply-To: References: Message-ID: <0m560eHXgyvlcLejz7VQQQdEF3eh2Ye5-lDjsDxzFtQ=.fe6b2f54-ff16-4101-bd9f-0037d0874886@github.com> On Tue, 10 Sep 2024 22:10:16 GMT, Leonid Mesnik wrote: >> Liang Mao has updated the pull request incrementally with one additional commit since the last revision: >> >> Exclude libagent8339725.cpp compiling for windows > > test/hotspot/jtreg/runtime/8339725/Test8339725.java line 32: > >> 30: * @requires os.family == "linux" >> 31: * @library /test/lib >> 32: * @library / > > Why this line is needed? removed > test/hotspot/jtreg/runtime/8339725/Test8339725.java line 35: > >> 33: * @modules java.base/jdk.internal.misc >> 34: * java.management >> 35: * @run main/othervm/native -agentlib:agent8339725 Test8339725 > > It should be > @run main/driver Test8339725 > since you run the test in forked process. updated. > test/hotspot/jtreg/runtime/8339725/Test8339725.java line 53: > >> 51: public static void test(String gcArg) throws Exception { >> 52: ProcessBuilder pb = ProcessTools.createLimitedTestJavaProcessBuilder( >> 53: "-agentpath:" + Utils.TEST_NATIVE_PATH + File.separator + System.mapLibraryName("agent8339725"), "-Xmx100m", gcArg, "Test"); > > Instead of using createLimitedTestJavaProcessBuilder and gcArg, please use > createTestJavaProcessBuilder and allow system to set arguments. So we can run it in more configurations even if it should work now. > All other test options would be appended to your nativepath and Xmx options. updated. > test/hotspot/jtreg/runtime/8339725/Test8339725.java line 63: > >> 61: public static void main(String[] args) throws Exception { >> 62: long last = System.nanoTime(); >> 63: for (int i = 0;; i++) { > > for (;;) { > looks better if i never used. updated. > test/hotspot/jtreg/runtime/8339725/Test8339725.java line 81: > >> 79: public Class findClass(String name) throws ClassNotFoundException { >> 80: byte[] b = Base64.getDecoder() >> 81: .decode("yv66vgAAADQADgoAAwALBwAMBwANAQAGPGluaXQ+AQADKClWAQAEQ29kZQEAD0xpbmVOdW1iZXJU" + > > Please add comment explaining why this string is used as a template. Changed the way to get the byte codes of the target class. > test/hotspot/jtreg/runtime/8339725/libagent8339725.cpp line 37: > >> 35: #define BUFFER_SIZE 100000 >> 36: static size_t ring_buffer[BUFFER_SIZE] = {0}; >> 37: static volatile int ring_buffer_idx = 0; > > Why volatile is needed here? Moved this static field into `ClassPrepareCallback`, and only one thread can modify it. > test/hotspot/jtreg/runtime/8339725/libagent8339725.cpp line 41: > >> 39: >> 40: void *get_method_details(void *arg) >> 41: { > > please move { on the line with declartion, or 'if' 'for' etc in all places. updated. > test/hotspot/jtreg/runtime/8339725/libagent8339725.cpp line 50: > >> 48: >> 49: // For JVM 17, 21, 22 calling GetMethodDeclaringClass is enough. >> 50: if ((err = jvmti->GetMethodDeclaringClass(method, &method_class)) == 0) > > please use JVMTI_ERROR_NONE when check error code. updated. > test/hotspot/jtreg/runtime/8339725/libagent8339725.cpp line 53: > >> 51: { >> 52: // JVM 8 needs this to crash >> 53: jvmti->GetClassSignature(method_class, &class_name, NULL); > > The good practice is to check jvmti status and fails if it is not none. > Check the 'check_jvmti_status' for this from 'jvmti_common.h" > Please check all jvmti functions. Added result check for all jvmti calls except `Deallocate`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1753650744 PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1753652777 PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1753650356 PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1753658539 PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1753661825 PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1753657463 PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1753653794 PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1753654138 PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1753665549 From mli at openjdk.org Wed Sep 11 08:59:23 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 11 Sep 2024 08:59:23 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v6] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > Thanks. > > This improvement is based on java.base/share/native/libzip/zlib/zcrc32.c, I made some modification to N (to 16) related code, then re-generate the tables needed, finally vectorize the code (original implementation in zcrc32.c is just scalar code). > > ## Test > test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java, > test/jdk/java/util/zip/TestCRC32.java > > ## Performance > > ###?on bananapi > > with patch > > Benchmark | (count) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- > TestCRC32.testCRC32Update | 64 | avgt | 10 | 222.297 | 0.106 | ns/op > TestCRC32.testCRC32Update | 128 | avgt | 10 | 365.144 | 0.196 | ns/op > TestCRC32.testCRC32Update | 256 | avgt | 10 | 687.14 | 0.235 | ns/op > TestCRC32.testCRC32Update | 512 | avgt | 10 | 1043.833 | 0.083 | ns/op > TestCRC32.testCRC32Update | 2048 | avgt | 10 | 3299.928 | 1.361 | ns/op > TestCRC32.testCRC32Update | 16384 | avgt | 10 | 24384.502 | 25.298 | ns/op > TestCRC32.testCRC32Update | 65536 | avgt | 10 | 103200.458 | 8.297 | ns/op > > > > without patch > > Benchmark | (count) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- > TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.878 | 0.02 | ns/op > TestCRC32.testCRC32Update | 128 | avgt | 10 | 364.173 | 0.032 | ns/op > TestCRC32.testCRC32Update | 256 | avgt | 10 | 685.815 | 0.055 | ns/op > TestCRC32.testCRC32Update | 512 | avgt | 10 | 1329.049 | 0.084 | ns/op > TestCRC32.testCRC32Update | 2048 | avgt | 10 | 5189.302 | 0.666 | ns/op > TestCRC32.testCRC32Update | 16384 | avgt | 10 | 41250.873 | 23.882 | ns/op > TestCRC32.testCRC32Update | 65536 | avgt | 10 | 171664.002 | 15.011 | ns/op > > > > ### on K230 > > with patch > References: <8mpU_pIgnJ_eNCSDLSMRR8zvDErPhSV_G8XePpmUl8U=.026964ac-f75e-472d-9187-f0c65548fa0c@github.com> <2yeb4_jKO9U7D1zHyLgi0GTKUym2iesw8lSMSl9tvIo=.c59ab670-31d2-48f3-aa2c-032ca2890c66@github.com> Message-ID: On Mon, 9 Sep 2024 09:38:50 GMT, Martin Doerr wrote: >> I guess let's keep it. I mean even if there is need to change the layout, then we have to remove `z_mvghi` and switch back to this implementation again. So maybe better we keep it here and hope for the best ? > > `BasicObjectLock` is a very small data structure. 12 bit offsets are more than enough. The actual offsets are 0 :-) I was only looking for extra safety. But I reverted it. Thanks for the suggestion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20740#discussion_r1753784379 From tschatzl at openjdk.org Wed Sep 11 09:36:04 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 11 Sep 2024 09:36:04 GMT Subject: RFR: 8339627: Cleanup Unsafe.setMemory intrinsic code In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 17:46:49 GMT, Johan Sj?len wrote: > Hi, > > The code for the `Unsafe.setMemory` intrinsic has a few issues that this PR cleans up. > > 1. The labels are unused in x86-64 intrinsic > 2. The function stub has an incorrect function prototype as it clearly manipulates the array so the array is not const, and we don't read the array so it probably shouldn't be called `src`. That's probably just an issue of `UnsafeArrayCopyStub` being copied and altered insufficiently. > > Thanks. lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20873#pullrequestreview-2296208213 From eosterlund at openjdk.org Wed Sep 11 09:59:06 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 11 Sep 2024 09:59:06 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v5] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 07:29:12 GMT, Liang Mao wrote: > > // Why can't k already be not-alive here? > > No. `k` can't be already dead here. is_loader_alive already returned true. `k` could be dead later so we need to make sure the resurrection if it needs access. Right. An oop dies after marking has terminated and it was not reachable other than potentially from phantom references. In this case, the oop isn't dead *yet*, but it is also only reachable through a phantom reference from the metadata, in the object graph. The oop will die unless something is done about it. If you pick out an oop that is not strongly reachable in the object graph, you could think of it as a sort of "zombie oop". You usually get to them through AS_NO_KEEPALIVE loads and people know to be careful then, but since CLD oops are a bit weird, this is another vector to zombie oops, because the model is not obvious that CLD oops (such as java_mirror) should only be usable after loading the holder (in a vm weak resolving way that keeps it alive). When acquiring such zombie oops that are "dying", you are not allowed to expose them to the object graph (such as putting them into a handle). Because while it isn't dead yet, it will be soon unless kept alive (until the next safepo int). I realize it's problematic that very few people know how these obscure metadata interactions work, and it would probably be a good idea to try to improve our internal APIs to make it harder to mess up. But the current design that we have, is that if you grab metadata from out of nowhere, you have to load its holder to keep it alive (at least until the next safepoint), before you expose any of its CLD oops (such as java_mirror) in the object graph. We rarely get random metadata from out of nowhere, because you typically get to a klass from a strongly reachable object, which keeps its metadata alive. But every now and then this pops up and it's rather obscure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2343183525 From lmao at openjdk.org Wed Sep 11 10:10:47 2024 From: lmao at openjdk.org (Liang Mao) Date: Wed, 11 Sep 2024 10:10:47 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v9] In-Reply-To: References: Message-ID: > Hi, > > It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. > > test/hotspot/jtreg/runtime and gc are clean. > > Thanks, > Liang Liang Mao has updated the pull request incrementally with one additional commit since the last revision: amend test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20907/files - new: https://git.openjdk.org/jdk/pull/20907/files/989fdb18..5aa77542 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20907&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20907&range=07-08 Stats: 9 lines in 1 file changed: 6 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20907/head:pull/20907 PR: https://git.openjdk.org/jdk/pull/20907 From lmao at openjdk.org Wed Sep 11 10:23:49 2024 From: lmao at openjdk.org (Liang Mao) Date: Wed, 11 Sep 2024 10:23:49 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v10] In-Reply-To: References: Message-ID: <5AYNAGdkQfu1RPz9WE-kM_BKcDGz345puShzyBAifDY=.b60a0b42-a0e9-422d-9db4-18d2c2db3898@github.com> > Hi, > > It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. > > test/hotspot/jtreg/runtime and gc are clean. > > Thanks, > Liang Liang Mao has updated the pull request incrementally with one additional commit since the last revision: use -Xmx50m to increace the crash posibility ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20907/files - new: https://git.openjdk.org/jdk/pull/20907/files/5aa77542..d6c6559b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20907&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20907&range=08-09 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20907/head:pull/20907 PR: https://git.openjdk.org/jdk/pull/20907 From fbredberg at openjdk.org Wed Sep 11 11:32:04 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 11 Sep 2024 11:32:04 GMT Subject: RFR: 8339627: Cleanup Unsafe.setMemory intrinsic code In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 17:46:49 GMT, Johan Sj?len wrote: > Hi, > > The code for the `Unsafe.setMemory` intrinsic has a few issues that this PR cleans up. > > 1. The labels are unused in x86-64 intrinsic > 2. The function stub has an incorrect function prototype as it clearly manipulates the array so the array is not const, and we don't read the array so it probably shouldn't be called `src`. That's probably just an issue of `UnsafeArrayCopyStub` being copied and altered insufficiently. > > Thanks. LGTM ------------- Marked as reviewed by fbredberg (Committer). PR Review: https://git.openjdk.org/jdk/pull/20873#pullrequestreview-2296543852 From fbredberg at openjdk.org Wed Sep 11 12:08:09 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 11 Sep 2024 12:08:09 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 14:23:24 GMT, Coleen Phillimore wrote: >> Removed the concept of an ObjectMonitor Responsible thread. >> >> The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. >> >> The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). >> >> After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. >> >> Passes tier1-tier7 on supported platforms. >> x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. >> Arm32 and Zero doesn't need any changes as far as I can tell. > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 482: > >> 480: // This is faster on Nehalem and AMD Shanghai/Barcelona. >> 481: // See https://blogs.oracle.com/dave/entry/instruction_selection_for_volatile_fences >> 482: lock(); addl(Address(rsp, 0), 0); > > Since there's a membar above, do you need this lock/addl instructions? Well spotted @coleenp! No it's not needed. It was meant to be replaced by `membar(StoreLoad)`, which as @xmas92 wrote, does exactly that. Also, since I use `membar(StoreLoad)` in all other platforms, I wanted it to be consistent. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1754266567 From coleenp at openjdk.org Wed Sep 11 12:11:06 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 11 Sep 2024 12:11:06 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: <58EiqcAfncGTnk6wqesW4ZsVyt3Js02aiOpEbl4HCwI=.148ffb62-7571-4aa3-b0dd-996022290e9b@github.com> References: <58EiqcAfncGTnk6wqesW4ZsVyt3Js02aiOpEbl4HCwI=.148ffb62-7571-4aa3-b0dd-996022290e9b@github.com> Message-ID: On Wed, 11 Sep 2024 00:33:15 GMT, David Holmes wrote: >> Removed the concept of an ObjectMonitor Responsible thread. >> >> The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. >> >> The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). >> >> After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. >> >> Passes tier1-tier7 on supported platforms. >> x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. >> Arm32 and Zero doesn't need any changes as far as I can tell. > > src/hotspot/share/runtime/objectMonitor.cpp line 310: > >> 308: >> 309: bool ObjectMonitor::enterI_with_contention_mark(JavaThread* locking_thread, ObjectMonitorContentionMark& contention_mark) { >> 310: // Used by ObjectSynchronizer::enter_for() to enter for another thread. > > This renaming is confusing for me. The `enter_for` methods were made explicit because normally locking is always done by the current thread for the current thread - but deopt breaks that. And now it seems we have an `EnterI` that is really an `EnterI_for` ?? Me too. So many functions that are sort of the same. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1754276945 From dholmes at openjdk.org Wed Sep 11 12:17:14 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 11 Sep 2024 12:17:14 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v10] In-Reply-To: <5AYNAGdkQfu1RPz9WE-kM_BKcDGz345puShzyBAifDY=.b60a0b42-a0e9-422d-9db4-18d2c2db3898@github.com> References: <5AYNAGdkQfu1RPz9WE-kM_BKcDGz345puShzyBAifDY=.b60a0b42-a0e9-422d-9db4-18d2c2db3898@github.com> Message-ID: <43juo9iTid-P7eAdi1yfs2t3rwGuxMXlGs_5WgazV4c=.813bc177-60af-4b26-aea8-55414e276b27@github.com> On Wed, 11 Sep 2024 10:23:49 GMT, Liang Mao wrote: >> Hi, >> >> It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. >> >> test/hotspot/jtreg/runtime and gc are clean. >> >> Thanks, >> Liang > > Liang Mao has updated the pull request incrementally with one additional commit since the last revision: > > use -Xmx50m to increace the crash posibility I guess I am still missing a piece here. We have the initial check for k being alive (which doesn't ensure it stays alive it just allows an early bail out), and we end with creating a JNI reference for the mirror oop. I assume that once we have the JNI reference the mirror oop is again strongly reachable and safe (if not how are we allowed to create the JNI reference for it?). So somewhere inbetween k is no longer alive and the mirror oop is junk. Or is it that things go bad after we create the JNI reference? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2343504474 From fbredberg at openjdk.org Wed Sep 11 12:20:10 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 11 Sep 2024 12:20:10 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 17:44:39 GMT, Coleen Phillimore wrote: >> Removed the concept of an ObjectMonitor Responsible thread. >> >> The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. >> >> The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). >> >> After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. >> >> Passes tier1-tier7 on supported platforms. >> x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. >> Arm32 and Zero doesn't need any changes as far as I can tell. > > src/hotspot/share/runtime/objectMonitor.cpp line 1104: > >> 1102: // 1. A release barrier ensures that changes to monitor meta-data >> 1103: // (_succ, _EntryList, _cxq) and data protected by the lock will be >> 1104: // visible before we release the lock. > > Where is this barrier? [release_clear_owner(current);](https://github.com/openjdk/jdk/blob/f9ddfc6fb0780a7d6e933a40ecd3cd458a058f04/src/hotspot/share/runtime/objectMonitor.cpp#L1183) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1754303798 From dholmes at openjdk.org Wed Sep 11 12:21:09 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 11 Sep 2024 12:21:09 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v10] In-Reply-To: <5AYNAGdkQfu1RPz9WE-kM_BKcDGz345puShzyBAifDY=.b60a0b42-a0e9-422d-9db4-18d2c2db3898@github.com> References: <5AYNAGdkQfu1RPz9WE-kM_BKcDGz345puShzyBAifDY=.b60a0b42-a0e9-422d-9db4-18d2c2db3898@github.com> Message-ID: On Wed, 11 Sep 2024 10:23:49 GMT, Liang Mao wrote: >> Hi, >> >> It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. >> >> test/hotspot/jtreg/runtime and gc are clean. >> >> Thanks, >> Liang > > Liang Mao has updated the pull request incrementally with one additional commit since the last revision: > > use -Xmx50m to increace the crash posibility FWIW I don't think resurrecting the dying oop is the right way to fix this given that the underlying problem is that the application failed to keep the class of the jMethodID alive. Can't we detect it is dying (obviously more that what `is_alive` does) and just act as-if it were already dead? There is an inherent race here so the application can't rely on this act of resurrection anyway. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2343512440 From fbredberg at openjdk.org Wed Sep 11 12:26:07 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 11 Sep 2024 12:26:07 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 19:48:50 GMT, Coleen Phillimore wrote: >> Removed the concept of an ObjectMonitor Responsible thread. >> >> The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. >> >> The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). >> >> After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. >> >> Passes tier1-tier7 on supported platforms. >> x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. >> Arm32 and Zero doesn't need any changes as far as I can tell. > > src/hotspot/share/runtime/objectMonitor.cpp line 353: > >> 351: >> 352: void ObjectMonitor::enter_for_with_contention_mark(JavaThread* locking_thread, ObjectMonitorContentionMark& contention_mark) { >> 353: DEBUG_ONLY(bool success = ) ObjectMonitor::enterI_with_contention_mark(locking_thread, contention_mark); > > This is kind of noisy with DEBUG_ONLY. If you remove DEBUG_ONLY, does the windows compiler complain that you're not using the variable success in the product build? I don't know. I come from a planet where warnings was errors, and just brought along the old habit to my new planet. I'll check with the Windows compiler. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1754324872 From coleenp at openjdk.org Wed Sep 11 12:29:06 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 11 Sep 2024 12:29:06 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: Message-ID: <_1FWsoNnxMXar0bQeJ-7OFkI0FRem5pTdY6Yn92mRI4=.d86f8ec4-08ab-44cf-aeb2-0817b6189a8a@github.com> On Wed, 11 Sep 2024 12:17:01 GMT, Fredrik Bredberg wrote: >> src/hotspot/share/runtime/objectMonitor.cpp line 1104: >> >>> 1102: // 1. A release barrier ensures that changes to monitor meta-data >>> 1103: // (_succ, _EntryList, _cxq) and data protected by the lock will be >>> 1104: // visible before we release the lock. >> >> Where is this barrier? > > [release_clear_owner(current);](https://github.com/openjdk/jdk/blob/f9ddfc6fb0780a7d6e933a40ecd3cd458a058f04/src/hotspot/share/runtime/objectMonitor.cpp#L1183) Ok, I see. It's release_store_owner which is step 1 and 2, then membar which is step 3. Ok, it does say that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1754332975 From fbredberg at openjdk.org Wed Sep 11 12:48:08 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 11 Sep 2024 12:48:08 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: <58EiqcAfncGTnk6wqesW4ZsVyt3Js02aiOpEbl4HCwI=.148ffb62-7571-4aa3-b0dd-996022290e9b@github.com> Message-ID: On Wed, 11 Sep 2024 06:16:33 GMT, Axel Boldt-Christmas wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 476: >> >>> 474: // Release lock. >>> 475: movptr(Address(tmpReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner)), NULL_WORD); >>> 476: membar(StoreLoad); >> >> Why a standalone `storeload` here? This does not define a fence, nor release semantics - as per the definitions in orderAccess.hpp > > On x86 `membar(LoadStore | StoreStore /* release */)` would be a nop. Not sure if adding it before nulling the pointer would make things clearer. > > `membar(StoreLoad);` is all that we need between clearing the owner and checking the queues / successor. As @xmas92 wrote, membar(StoreLoad); is all that we need between clearing the owner and checking the queues / successor. And, since I use membar(StoreLoad) in all other platforms, I wanted it to be consistent. Also if you look in [ObjectMonitor::exit](https://github.com/openjdk/jdk/blob/f9ddfc6fb0780a7d6e933a40ecd3cd458a058f04/src/hotspot/share/runtime/objectMonitor.cpp#L1132C6-L1132C25)() you'll see that this there is a call to [OrderAccess::storeload](https://github.com/openjdk/jdk/blob/f9ddfc6fb0780a7d6e933a40ecd3cd458a058f04/src/hotspot/share/runtime/objectMonitor.cpp#L1184)() just after [release_clear_owner](https://github.com/openjdk/jdk/blob/f9ddfc6fb0780a7d6e933a40ecd3cd458a058f04/src/hotspot/share/runtime/objectMonitor.cpp#L1183)(), so I'm just doing the same as has been done in the C++ slow-path for long. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1754393843 From fbredberg at openjdk.org Wed Sep 11 13:14:05 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 11 Sep 2024 13:14:05 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 17:38:09 GMT, Coleen Phillimore wrote: >> Removed the concept of an ObjectMonitor Responsible thread. >> >> The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. >> >> The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). >> >> After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. >> >> Passes tier1-tier7 on supported platforms. >> x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. >> Arm32 and Zero doesn't need any changes as far as I can tell. > > src/hotspot/share/runtime/objectMonitor.cpp line 588: > >> 586: } else { >> 587: // The lock had been free momentarily, but we lost the race to the lock. >> 588: own = prev_own; > > So this retries now and doesn't break. Is it because it could be the DEFLATER_MARKER ? It could be the deflator (or someone else). Anyhow, we will retry. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1754481635 From fbredberg at openjdk.org Wed Sep 11 13:25:14 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 11 Sep 2024 13:25:14 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 20:02:37 GMT, Coleen Phillimore wrote: >> Removed the concept of an ObjectMonitor Responsible thread. >> >> The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. >> >> The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). >> >> After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. >> >> Passes tier1-tier7 on supported platforms. >> x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. >> Arm32 and Zero doesn't need any changes as far as I can tell. > > src/hotspot/share/runtime/objectMonitor.cpp line 574: > >> 572: for (;;) { >> 573: if (own == DEFLATER_MARKER) { >> 574: if (TryLockI(current)) { > > I can't tell the difference between TryLockI and enter_for(). Did I previously object to enter_for() here? Maybe I should take that back, and there should be a comment above enter_for() like > // Enters a lock in behalf of a non-current thread, or a thread that is exiting and has previously given up the lock. > // and it handles deflation. > > You could add a boolean that you expect success for the enter_for() caller from deoptimization (ie. must_succeed). > > This code is getting repetitive - it looks the same in all these places only a little bit different and hard to know why. Yes @coleenp, you did previously object to calling `enter_for()` from `TryLock()`, which is why it is what it is today. I'm not too proud of how it turned out, and as @dholmes-ora also pointed out , the naming is a bit confusing, so that needs to be fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1754511550 From fbredberg at openjdk.org Wed Sep 11 13:25:11 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 11 Sep 2024 13:25:11 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: <58EiqcAfncGTnk6wqesW4ZsVyt3Js02aiOpEbl4HCwI=.148ffb62-7571-4aa3-b0dd-996022290e9b@github.com> Message-ID: On Wed, 11 Sep 2024 12:08:25 GMT, Coleen Phillimore wrote: >> src/hotspot/share/runtime/objectMonitor.cpp line 310: >> >>> 308: >>> 309: bool ObjectMonitor::enterI_with_contention_mark(JavaThread* locking_thread, ObjectMonitorContentionMark& contention_mark) { >>> 310: // Used by ObjectSynchronizer::enter_for() to enter for another thread. >> >> This renaming is confusing for me. The `enter_for` methods were made explicit because normally locking is always done by the current thread for the current thread - but deopt breaks that. And now it seems we have an `EnterI` that is really an `EnterI_for` ?? > > Me too. So many functions that are sort of the same. As I wrote above, the confusing renaming has to be fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1754518155 From rkennke at openjdk.org Wed Sep 11 13:37:16 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 11 Sep 2024 13:37:16 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 08:24:16 GMT, Emanuel Peter wrote: > @rkennke Can you please explain the changes in these tests: > > ``` > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java > test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java > ``` > > You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},` > > This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. > > I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? > > Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction. > > Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well). IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2343693629 From aph at openjdk.org Wed Sep 11 13:38:13 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 11 Sep 2024 13:38:13 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: Message-ID: On Wed, 29 May 2024 12:58:02 GMT, Fredrik Bredberg wrote: > Removed the concept of an ObjectMonitor Responsible thread. > > The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. > > The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). > > After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. > > Passes tier1-tier7 on supported platforms. > x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. > Arm32 and Zero doesn't need any changes as far as I can tell. src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 217: > 215: stlr(zr, owner_addr); > 216: membar(StoreLoad); > 217: Is there some reason we need a `memory_order_conservative` store here? You may not really need a `StoreLoad` here, as long as `ObjectMonitor::owner` is always read with `ldar` or `casal`. `Atomic::cmpxchg()` uses sequentially-consistent ops by default. Reason: on AArch64, `stlr;ldar` is sequentially consistent, which is stronger than release|acquire. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1754564375 From coleenp at openjdk.org Wed Sep 11 13:41:05 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 11 Sep 2024 13:41:05 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 13:20:23 GMT, Fredrik Bredberg wrote: >> src/hotspot/share/runtime/objectMonitor.cpp line 574: >> >>> 572: for (;;) { >>> 573: if (own == DEFLATER_MARKER) { >>> 574: if (TryLockI(current)) { >> >> I can't tell the difference between TryLockI and enter_for(). Did I previously object to enter_for() here? Maybe I should take that back, and there should be a comment above enter_for() like >> // Enters a lock in behalf of a non-current thread, or a thread that is exiting and has previously given up the lock. >> // and it handles deflation. >> >> You could add a boolean that you expect success for the enter_for() caller from deoptimization (ie. must_succeed). >> >> This code is getting repetitive - it looks the same in all these places only a little bit different and hard to know why. > > Yes @coleenp, you did previously object to calling `enter_for()` from `TryLock()`, which is why it is what it is today. > I'm not too proud of how it turned out, and as @dholmes-ora also pointed out , the naming is a bit confusing, so that needs to be fixed. I was actually confused because there's an enter_for() in all of the synchronizer files and didn't realize you were calling the one in ObjectMonitor. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1754573519 From aph at openjdk.org Wed Sep 11 13:45:05 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 11 Sep 2024 13:45:05 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 13:35:37 GMT, Andrew Haley wrote: >> Removed the concept of an ObjectMonitor Responsible thread. >> >> The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. >> >> The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). >> >> After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. >> >> Passes tier1-tier7 on supported platforms. >> x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. >> Arm32 and Zero doesn't need any changes as far as I can tell. > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 217: > >> 215: stlr(zr, owner_addr); >> 216: membar(StoreLoad); >> 217: > > Is there some reason we need a `memory_order_conservative` store here? > You may not really need a `StoreLoad` here, as long as `ObjectMonitor::owner` is always read with `ldar` or `casal`. `Atomic::cmpxchg()` uses sequentially-consistent ops by default. > Reason: on AArch64, `stlr;ldar` is sequentially consistent, which is stronger than release|acquire. Oh, not just sequentially consistent, but also barrier-ordered-before. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1754586693 From szaldana at openjdk.org Wed Sep 11 13:48:12 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Wed, 11 Sep 2024 13:48:12 GMT Subject: RFR: 8204681: Option to include timestamp in hprof filename [v2] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 15:42:57 GMT, Sonia Zaldana Calles wrote: >> Hi all, >> >> This PR addresses [8204681](https://bugs.openjdk.org/browse/JDK-8204681) enabling support for timestamp expansion in filenames specified in `-XX:HeapDumpPath` using `%t`. >> >> As mentioned in this comments for this issue, this is somewhat related to [8334492](https://bugs.openjdk.org/browse/JDK-8334492) where we enabled support for `%p` for filenames specified in jcmd. >> >> With this patch, I propose: >> - Expanding the utility function `Arguments::copy_expand_pid` to `Arguments::copy_expand_arguments` to deal with `%p` expansions for pid and `%t` expansions for timestamps. >> - Leveraging the above utility function to enable argument expansion for both heap dump filenames and jcmd output commands. >> - Though the linked JBS issue only relates to heap dumps generated in case of OOM, I think we can edit it to more broadly support filename expansion to support `%t` for jcmd as well. >> >> Testing: >> - [x] Added test cases pass with all platforms (verified with a GHA job). >> - [x] Tier 1 passes with GHA. >> >> Looking forward to hearing your thoughts! >> >> Thanks, >> Sonia > > Sonia Zaldana Calles has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8204681 > - 8204681: Option to include timestamp in hprof filename Hi folks, Sorry about the late reply. Just got back in from PTO. I?m glad to see that there is some consensus that diagnostic commands should ideally support timestamp expansion for output filenames (with the possibility of adding more replacement options in the future). Thomas raised the issue about backwards compatibility issues with `Arguments::copy_expand_arguments`. We?ve been discussing some ways to address this: 1. Introduce a new parameter for jcmd that supports filename expansion (`-filepattern` or something along those lines.) We would then need to sort out ignoring `-filename` if a filename pattern is provided. 2. Replacing `%t` with a pattern that is very unlikely to have been used and expanded by accident (`%TIMESTAMP`, etc). 3. A HotSpot flag (`-XX:+AllowFileNamePattern`) to accept these patterns in multiple places in the JVM. Option 2 is definitely the most straightforward to implement but it?s also not the cleanest. There is always a slight risk that someone?s configuration might break. I?d really like to find an actionable consensus on what would be best. Open to explore other options if anyone has any better ideas. Aside from that, I have noted the feedback about using current time vs start time and the `YYYY-MM-DD_HH-MM-SS` format. I will be making the appropriate changes once we?ve settled the other stuff. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20568#issuecomment-2343721070 From mdoerr at openjdk.org Wed Sep 11 13:49:05 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 11 Sep 2024 13:49:05 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 07:56:25 GMT, Fredrik Bredberg wrote: >> Removed the concept of an ObjectMonitor Responsible thread. >> >> The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. >> >> The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). >> >> After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. >> >> Passes tier1-tier7 on supported platforms. >> x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. >> Arm32 and Zero doesn't need any changes as far as I can tell. > > I've done basic testing on ppc64le, riscv64 and s390x using QEMU, but would appreciate if @TheRealMDoerr, @RealFYang and @offamitkumar could take it for a real test drive. @fbredber, @dholmes-ora: I got a substantial performance drop on our 96 Thread Xeon server: `LockUnlock.testContendedLock` seems to be less than half as fast as without this patch. Also, some of the `LockUnlock.testInflated*` seem to be affected. (Large PPC64 servers as well.) Can you reproduce this on your side? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19454#issuecomment-2343724544 From duke at openjdk.org Wed Sep 11 13:53:13 2024 From: duke at openjdk.org (duke) Date: Wed, 11 Sep 2024 13:53:13 GMT Subject: Withdrawn: 8300732: Whitebox functions for Metaspace test should use byte size In-Reply-To: References: Message-ID: On Thu, 4 Jul 2024 15:18:29 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses [8300732](https://bugs.openjdk.org/browse/JDK-8300732) switching Whitebox Metaspace test functions to use bytes as opposed to words. > > Testing: > - [x] `test/hotspot/jtreg/runtime/Metaspace` tests pass. > > Thanks, > Sonia This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/20039 From jsjolen at openjdk.org Wed Sep 11 14:00:17 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 11 Sep 2024 14:00:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 19:11:30 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix FullGCForwarding initialization Hi, Me and @caspernorrbin are reviewing the Metaspace changes (so anything in the `memory` and `metaspace` folders). We have found minor improvements that can be made and some nits, but the code over all looks OK. We are finishing up a first round of review now, and will have a second one. Thank you for your hard work and your patience with the review process. src/hotspot/share/memory/classLoaderMetaspace.cpp line 87: > 85: klass_alignment_words, > 86: "class arena"); > 87: } As per my comment in the header file, change the code to this: ```c++ if (class_context != nullptr) { // ... Same as in PR } else { _class_space_arena = _non_class_space_arena; } src/hotspot/share/memory/classLoaderMetaspace.cpp line 115: > 113: if (wastage.is_nonempty()) { > 114: non_class_space_arena()->deallocate(wastage); > 115: } This code reads a bit strangely. I understand *what* it tries to do. It tries to give back any wasted memory from either the class space arena *or* the non class space arena to the non class space arena's freelist. I assume that we do this since any wastage is presumably too small to be used by our new 22-bit class pointers. However, this context will be lost on future readers. It should have at least a comment in the `if (wastage.is_nonempty())` clause explaining what we expect should happen and why. For example: ```c++ // Any wasted memory is presumably too small for any class. // Therefore, give it back to the non-class space arena's free list. src/hotspot/share/memory/classLoaderMetaspace.cpp line 118: > 116: #ifdef ASSERT > 117: if (result.is_nonempty()) { > 118: const bool in_class_arena = class_space_arena() != nullptr ? class_space_arena()->contains(result) : false; Unnecessary nullptr check if you take my suggestion, or you should switch to `have_class_space_arena`. src/hotspot/share/memory/classLoaderMetaspace.cpp line 165: > 163: MetaBlock bl(ptr, word_size); > 164: // If the block would be reusable for a Klass, add to class arena, otherwise to > 165: // then non-class arena. Nit: spelling, "the" src/hotspot/share/memory/classLoaderMetaspace.hpp line 81: > 79: metaspace::MetaspaceArena* class_space_arena() const { return _class_space_arena; } > 80: > 81: bool have_class_space_arena() const { return _class_space_arena != nullptr; } This is unnecessary. Instead of having this and having to remember to check for nullness each time, just change the `_class_space_arena` to point to the same arena as the `_non_class_space_arena` does when we run with `-XX:-UseCompressedClassPointers` src/hotspot/share/memory/metaspace.cpp line 656: > 654: // Adjust size of the compressed class space. > 655: > 656: const size_t res_align = reserve_alignment(); Can you change the name to `root_chunk_size`? src/hotspot/share/memory/metaspace.hpp line 112: > 110: static size_t max_allocation_word_size(); > 111: > 112: // Minimum allocation alignment, in bytes. All MetaData shall be aligned correclty Nit: Spelling, "correctly" src/hotspot/share/memory/metaspace/metablock.hpp line 48: > 46: > 47: MetaWord* base() const { return _base; } > 48: const MetaWord* end() const { return _base + _word_size; } `assert(is_nonempty())` src/hotspot/share/memory/metaspace/metablock.hpp line 51: > 49: size_t word_size() const { return _word_size; } > 50: bool is_empty() const { return _base == nullptr; } > 51: bool is_nonempty() const { return _base != nullptr; } Can `_base == nullptr` but `_word_size != 0`? src/hotspot/share/memory/metaspace/metablock.hpp line 52: > 50: bool is_empty() const { return _base == nullptr; } > 51: bool is_nonempty() const { return _base != nullptr; } > 52: void reset() { _base = nullptr; _word_size = 0; } Is this function really necessary? According to my IDE it's only used in tests and even then the `MetaBlock` isn't used afterwards (so it has no effect). src/hotspot/share/memory/metaspace/metaspaceArena.hpp line 44: > 42: class FreeBlocks; > 43: > 44: struct ArenaStats; Nit: Sort? src/hotspot/share/memory/metaspace/metaspaceArena.hpp line 84: > 82: // between threads and needs to be synchronized in CLMS. > 83: > 84: const size_t _allocation_alignment_words; Nit: Document this? All other members are documented. ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2296528491 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754335269 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754398993 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754343513 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754459464 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754330432 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754619023 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754508321 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754142822 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754142098 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754153662 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754192464 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754197251 From gziemski at openjdk.org Wed Sep 11 14:06:06 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Wed, 11 Sep 2024 14:06:06 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v7] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 20:53:46 GMT, Gerard Ziemski wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. >> >> This fix also includes a cleanup of all the related function/template parameter names and local variable names. >> >> Testing is pending... >> >> Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > Coleen's feedback Are we sure we want `mt` for non-type parameter name in templates? We have these existing patterns already in our code: src/hotspot/share/utilities/growableArray.hpp:803:template src/hotspot/share/utilities/stack.hpp:54:template class StackIterator; src/hotspot/share/utilities/concurrentHashTable.inline.hpp:78:template src/hotspot/share/utilities/chunkedList.hpp:31:template class ChunkedList : public CHeapObj src/hotspot/share/gc/g1/g1BatchedTask.hpp:32:template src/hotspot/share/gc/shared/taskqueue.hpp:119:template src/hotspot/share/gc/shared/taskqueue.hpp:327:template src/hotspot/share/gc/shenandoah/shenandoahTaskqueue.hpp:40:template src/hotspot/share/nmt/arrayWithFreeList.hpp:34:template With mt they would look like: src/hotspot/share/utilities/growableArray.hpp:803:template src/hotspot/share/utilities/stack.hpp:54:template class StackIterator; src/hotspot/share/utilities/concurrentHashTable.inline.hpp:78:template src/hotspot/share/utilities/chunkedList.hpp:31:template class ChunkedList : public CHeapObj src/hotspot/share/gc/g1/g1BatchedTask.hpp:32:template src/hotspot/share/gc/shared/taskqueue.hpp:119:template src/hotspot/share/gc/shared/taskqueue.hpp:327:template src/hotspot/share/gc/shenandoah/shenandoahTaskqueue.hpp:40:template src/hotspot/share/nmt/arrayWithFreeList.hpp:34:template So `MT` or `mt` for non-type parameter name in templates, or should I punt on this particular change and leave it for a followup? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20872#issuecomment-2343769766 From sgehwolf at openjdk.org Wed Sep 11 14:13:50 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Wed, 11 Sep 2024 14:13:50 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v11] In-Reply-To: References: Message-ID: > Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and cgroups v2 (since [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) is fixed now). > > I'm adding those tests in order to not regress another time. > > Testing: > - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. > - [x] New systemd test on cg v1 and cg v2 (passes). > - [x] GHA Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: - Remove test from problem list as the bug is fixed - Merge branch 'master' into jdk-8333446-systemd-slice-tests - Improve reliability of cpu quota test - Adapt JDK-8339148 - Merge branch 'master' into jdk-8333446-systemd-slice-tests - Merge branch 'master' into jdk-8333446-systemd-slice-tests - Fix comment of WB::host_cpus() - Handle non-root + CGv2 - Add nested hierarchy to test framework - Revert "Add root check for SystemdMemoryAwarenessTest.java" This reverts commit 7e8d9ed46815096ae8c4502f3320ebf5208438d5. - ... and 10 more: https://git.openjdk.org/jdk/compare/9fc62b11...88298b99 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19530/files - new: https://git.openjdk.org/jdk/pull/19530/files/0e52e004..88298b99 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19530&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19530&range=09-10 Stats: 11492 lines in 376 files changed: 6584 ins; 3192 del; 1716 mod Patch: https://git.openjdk.org/jdk/pull/19530.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19530/head:pull/19530 PR: https://git.openjdk.org/jdk/pull/19530 From epeter at openjdk.org Wed Sep 11 14:17:16 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 11 Sep 2024 14:17:16 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 13:34:28 GMT, Roman Kennke wrote: > > @rkennke Can you please explain the changes in these tests: > > ``` > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java > > test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java > > ``` > > > > > > > > > > > > > > > > > > > > > > > > You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},` > > This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. > > I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? > > Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction. > > Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well). > > IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible. > > I will re-evaluate those tests, and add comments or remove the restrictions. If it has indeed been a while, then it might well be that some of them work now, since I did make some improvements to auto-vectorization ;) My suggestion is this: go over the examples, check which ones are now ok. Those that are not ok, add a comment, and file a bug: I can then analyze those cases later, and see how to write other tests or improve auto-vectorization. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2343797957 From rcastanedalo at openjdk.org Wed Sep 11 14:17:17 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 11 Sep 2024 14:17:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 19:11:30 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix FullGCForwarding initialization src/hotspot/share/memory/metaspace/binList.hpp line 202: > 200: b_last = b; > 201: } > 202: if (UseNewCode)printf("\n"); I guess this line is a leftover to be removed? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754702742 From sgehwolf at openjdk.org Wed Sep 11 14:28:12 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Wed, 11 Sep 2024 14:28:12 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 14:13:50 GMT, Severin Gehwolf wrote: >> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and cgroups v2 (since [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) is fixed now). >> >> I'm adding those tests in order to not regress another time. >> >> Testing: >> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. >> - [x] New systemd test on cg v1 and cg v2 (passes). >> - [x] GHA > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: > > - Remove test from problem list as the bug is fixed > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Improve reliability of cpu quota test > - Adapt JDK-8339148 > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Fix comment of WB::host_cpus() > - Handle non-root + CGv2 > - Add nested hierarchy to test framework > - Revert "Add root check for SystemdMemoryAwarenessTest.java" > > This reverts commit 7e8d9ed46815096ae8c4502f3320ebf5208438d5. > - ... and 10 more: https://git.openjdk.org/jdk/compare/4356d152...88298b99 I've merged master and removed the test from the problem list since the relevant bug got fixed. @MBaesken Said changes require a re-review before this patch would be ready for integration. Could you please take another look? Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2343834760 From fyang at openjdk.org Wed Sep 11 14:33:08 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 11 Sep 2024 14:33:08 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v6] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 08:59:23 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> Thanks. >> >> This improvement is based on java.base/share/native/libzip/zlib/zcrc32.c, I made some modification to N (to 16) related code, then re-generate the tables needed, finally vectorize the code (original implementation in zcrc32.c is just scalar code). >> >> ## Test >> test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java, >> test/jdk/java/util/zip/TestCRC32.java >> >> ## Performance >> >> ###?on bananapi >> >> with patch >> >> Benchmark | (count) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- >> TestCRC32.testCRC32Update | 64 | avgt | 10 | 222.297 | 0.106 | ns/op >> TestCRC32.testCRC32Update | 128 | avgt | 10 | 365.144 | 0.196 | ns/op >> TestCRC32.testCRC32Update | 256 | avgt | 10 | 687.14 | 0.235 | ns/op >> TestCRC32.testCRC32Update | 512 | avgt | 10 | 1043.833 | 0.083 | ns/op >> TestCRC32.testCRC32Update | 2048 | avgt | 10 | 3299.928 | 1.361 | ns/op >> TestCRC32.testCRC32Update | 16384 | avgt | 10 | 24384.502 | 25.298 | ns/op >> TestCRC32.testCRC32Update | 65536 | avgt | 10 | 103200.458 | 8.297 | ns/op >> >> >> >> without patch >> >> Benchmark | (count) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- >> TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.878 | 0.02 | ns/op >> TestCRC32.testCRC32Update | 128 | avgt | 10 | 364.173 | 0.032 | ns/op >> TestCRC32.testCRC32Update | 256 | avgt | 10 | 685.815 | 0.055 | ns/op >> TestCRC32.testCRC32Update | 512 | avgt | 10 | 1329.049 | 0.084 | ns/op >> TestCRC32.testCRC32Update | 2048 | avgt | 10 | 5189.302 | 0.666 | ns/op >> TestCRC32.testCRC32Update | 16384 | avgt | 10 | 41250.873 | 23.882 | ns/op >> TestCRC32.testCRC32Update | 65536 | avgt | 10 | 171664.002 | 15.011 | ns/op >> >> > ... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > remove redundant jump src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1583: > 1581: const int64_t tmp_limit = MaxVectorSize >= 32 ? unroll_words*2 : unroll_words*4; > 1582: sub(tmp1, len, tmp_limit); > 1583: bge(tmp1, zr, L_vector_entry); I don't quite understand this compare of `len` with `tmp_limit` here as I see `len` has already been updated on entry with `subw(len, len, unroll_words)`. Should we compare with the original `len` before the update? (And remove the `addi(len, len, unroll_words)` in `vector_update_crc32` at the same time). src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1654: > 1652: > 1653: addiw(len, len, -4); > 1654: bge(len, zr, L_by4_loop); Suggestion: `bgez(len, L_by4_loop);` src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1656: > 1654: bge(len, zr, L_by4_loop); > 1655: addiw(len, len, 4); > 1656: bgt(len, zr, L_by1_loop); Suggestion: `bgtz(len, L_by1_loop);` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1754754549 PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1753728829 PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1753730125 From stooke at openjdk.org Wed Sep 11 14:34:26 2024 From: stooke at openjdk.org (Simon Tooke) Date: Wed, 11 Sep 2024 14:34:26 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v3] In-Reply-To: References: Message-ID: > This PR changes the status of realpath() from a Posix-specific API to a globally available API, i.e. adding it to the "Hotspot Porting API". Code would refer to os::realpath() instead of os::Posix::realpath(). > > This requires the addition of a stub routine in os_posix.cpp and a Windows implementation of realpath(), using Windows _fullpath(). > > This PR depends on #20597 in that it removes the need for one #ifdef in that PR. Because of that, this PR will be modified when and if #20597 is integrated. > > Please note that guidelines for doing this appear in src/hotspot/share/runtime/os.hpp Simon Tooke has updated the pull request incrementally with two additional commits since the last revision: - move os::realpath() to previous location - move os::realpath() to previous location ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20683/files - new: https://git.openjdk.org/jdk/pull/20683/files/b7f495b2..f9202c0b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20683&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20683&range=01-02 Stats: 81 lines in 1 file changed: 41 ins; 40 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20683.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20683/head:pull/20683 PR: https://git.openjdk.org/jdk/pull/20683 From jwaters at openjdk.org Wed Sep 11 14:34:26 2024 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 11 Sep 2024 14:34:26 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v3] In-Reply-To: References: Message-ID: <6MDLVDlpJ7Qv84bxrlrpTcf6bHqAFoJEnLW3hhtvyis=.4b8fb19f-ae35-4723-9fbe-e8b0b9444150@github.com> On Wed, 11 Sep 2024 14:31:43 GMT, Simon Tooke wrote: >> This PR changes the status of realpath() from a Posix-specific API to a globally available API, i.e. adding it to the "Hotspot Porting API". Code would refer to os::realpath() instead of os::Posix::realpath(). >> >> This requires the addition of a stub routine in os_posix.cpp and a Windows implementation of realpath(), using Windows _fullpath(). >> >> This PR depends on #20597 in that it removes the need for one #ifdef in that PR. Because of that, this PR will be modified when and if #20597 is integrated. >> >> Please note that guidelines for doing this appear in src/hotspot/share/runtime/os.hpp > > Simon Tooke has updated the pull request incrementally with two additional commits since the last revision: > > - move os::realpath() to previous location > - move os::realpath() to previous location src/hotspot/os/posix/os_posix.cpp line 1031: > 1029: > 1030: // Sleep forever; naked call to OS-specific sleep; use with CAUTION > 1031: void os::infinite_sleep() { Ouch, looks like something broke in one of the commits, the new diff it's showing isn't pretty (infinite_sleep and friends have been displaced in the file, at least on my end) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1754761196 From stooke at openjdk.org Wed Sep 11 14:46:12 2024 From: stooke at openjdk.org (Simon Tooke) Date: Wed, 11 Sep 2024 14:46:12 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v2] In-Reply-To: <0m0zdvRVNY3ZjLycIST_UNQjTFChOPKKS1KvV1m1stc=.f7ae20ae-4143-4d0b-ba77-dc330d859de6@github.com> References: <0m0zdvRVNY3ZjLycIST_UNQjTFChOPKKS1KvV1m1stc=.f7ae20ae-4143-4d0b-ba77-dc330d859de6@github.com> Message-ID: <65dOgOW6wn4Ceq14ieVJ_2A4xDLtIdeRB-cbuxc-zBA=.4f4e753d-7b57-48e9-b6a6-6ea0839a7a6a@github.com> On Thu, 5 Sep 2024 21:03:56 GMT, David Holmes wrote: >> Simon Tooke has updated the pull request incrementally with two additional commits since the last revision: >> >> - simplify windwos realpath() implementation >> - get rid of os::posix::realpath() and os::win32::realpath() > > src/hotspot/os/windows/os_windows.cpp line 5330: > >> 5328: if (result == nullptr) { >> 5329: errno = ENAMETOOLONG; >> 5330: } > > This is a bit of an assumption. What if the name "includes a drive letter that isn't valid or can't be found"? Unfortunately Windows doesn't specify any further details beyond returning null. I probably cleaned up this code too much, and should've left it more like the Posix implementation. What I used to have would do one call (with buffer NULL) to get the real full path, then copy it if it fit or ENAMETOOLONG. In attempting to speed up the code I changed the semantics. I will change this code back to my previous implementation ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1754801308 From mbaesken at openjdk.org Wed Sep 11 14:47:10 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 11 Sep 2024 14:47:10 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v11] In-Reply-To: References: Message-ID: <8ZpNAV_DAu8VYhV-IJ6fb7fkpmCLruZIQfes8zCu3pk=.23c0bb58-b156-4bcd-8f49-3413daa6780a@github.com> On Wed, 11 Sep 2024 14:13:50 GMT, Severin Gehwolf wrote: >> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and cgroups v2 (since [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) is fixed now). >> >> I'm adding those tests in order to not regress another time. >> >> Testing: >> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. >> - [x] New systemd test on cg v1 and cg v2 (passes). >> - [x] GHA > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: > > - Remove test from problem list as the bug is fixed > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Improve reliability of cpu quota test > - Adapt JDK-8339148 > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Fix comment of WB::host_cpus() > - Handle non-root + CGv2 > - Add nested hierarchy to test framework > - Revert "Add root check for SystemdMemoryAwarenessTest.java" > > This reverts commit 7e8d9ed46815096ae8c4502f3320ebf5208438d5. > - ... and 10 more: https://git.openjdk.org/jdk/compare/66848ed0...88298b99 Marked as reviewed by mbaesken (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19530#pullrequestreview-2297384611 From bulasevich at openjdk.org Wed Sep 11 14:48:13 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Wed, 11 Sep 2024 14:48:13 GMT Subject: RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM In-Reply-To: References: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> Message-ID: On Tue, 10 Sep 2024 19:33:07 GMT, Peter B. Kessler wrote: > Were performance runs made with CodeEntryAlignment set to other than 64 or 16? It seems like the other choices (32, 128, are there others that make sense?) should be tried. Here are rough neoverse-v2 numbers: JmhDotty (-XX:CodeEntryAlignment=16) 701.93 ? 5.00 ms/op JmhDotty (-XX:CodeEntryAlignment=32) 703.56 ? 5.15 ms/op JmhDotty (-XX:CodeEntryAlignment=64) 704.46 ? 5.18 ms/op JmhDotty (-XX:CodeEntryAlignment=128) 703.71 ? 5.17 ms/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/20864#issuecomment-2343886412 From rcastanedalo at openjdk.org Wed Sep 11 14:50:17 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 11 Sep 2024 14:50:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 19:11:30 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix FullGCForwarding initialization src/hotspot/share/opto/machnode.cpp line 390: > 388: t = t->make_ptr(); > 389: } > 390: if (t->isa_narrowklass() && CompressedKlassPointers::shift() == 0) { Does this change have any effect? `UseCompressedClassPointers` should be implied by `t->isa_narrowklass()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1754813751 From zzambers at openjdk.org Wed Sep 11 14:53:09 2024 From: zzambers at openjdk.org (Zdenek Zambersky) Date: Wed, 11 Sep 2024 14:53:09 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v11] In-Reply-To: References: Message-ID: <6_3Dyh7FDNEuXHcPa0Zr9YjMu463CLIXSVC1e5Wj5vE=.6f32dbad-2d51-426a-9d4b-2e00e6a0a9bb@github.com> On Wed, 11 Sep 2024 14:13:50 GMT, Severin Gehwolf wrote: >> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and cgroups v2 (since [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) is fixed now). >> >> I'm adding those tests in order to not regress another time. >> >> Testing: >> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. >> - [x] New systemd test on cg v1 and cg v2 (passes). >> - [x] GHA > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: > > - Remove test from problem list as the bug is fixed > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Improve reliability of cpu quota test > - Adapt JDK-8339148 > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Fix comment of WB::host_cpus() > - Handle non-root + CGv2 > - Add nested hierarchy to test framework > - Revert "Add root check for SystemdMemoryAwarenessTest.java" > > This reverts commit 7e8d9ed46815096ae8c4502f3320ebf5208438d5. > - ... and 10 more: https://git.openjdk.org/jdk/compare/5447f2da...88298b99 Marked as reviewed by zzambers (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/19530#pullrequestreview-2297414719 From mdoerr at openjdk.org Wed Sep 11 15:27:37 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 11 Sep 2024 15:27:37 GMT Subject: RFR: 8338995: New Object to ObjectMonitor mapping: PPC64 implementation [v2] In-Reply-To: References: Message-ID: > PPC64 implementation of [JDK-8315884](https://bugs.openjdk.org/browse/JDK-8315884). Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Fix: Monitor address broken in recursive case. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20922/files - new: https://git.openjdk.org/jdk/pull/20922/files/0771fdf9..6dc7d495 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20922&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20922&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20922.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20922/head:pull/20922 PR: https://git.openjdk.org/jdk/pull/20922 From stefank at openjdk.org Wed Sep 11 15:43:06 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 11 Sep 2024 15:43:06 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 01:03:55 GMT, David Holmes wrote: > EDIT: Oh dear. I see I have been under a misapprehension about these template parameters, I tend to always thing such things are type parameters but they are not. MT would make sense for a type parameter, but mt would be more sensible for a non-type parameter. The fact the original was F threw me. I mean, it is still effectively a constant so using CamelCase (or an upper-case abbreviation) isn't really that weird for non-type template parameters, IMHO. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20872#issuecomment-2344023396 From stuefe at openjdk.org Wed Sep 11 15:53:11 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 11 Sep 2024 15:53:11 GMT Subject: RFR: 8204681: Option to include timestamp in hprof filename In-Reply-To: <3Z33v_k5LPdJndHtRPY6JnKHWsJWilQRyYxa7DFUftM=.2954c811-04db-49a8-8316-b21cdab28558@github.com> References: <3Z33v_k5LPdJndHtRPY6JnKHWsJWilQRyYxa7DFUftM=.2954c811-04db-49a8-8316-b21cdab28558@github.com> Message-ID: On Wed, 21 Aug 2024 09:54:08 GMT, Thomas Stuefe wrote: >> Hi all, >> >> This PR addresses [8204681](https://bugs.openjdk.org/browse/JDK-8204681) enabling support for timestamp expansion in filenames specified in `-XX:HeapDumpPath` using `%t`. >> >> As mentioned in this comments for this issue, this is somewhat related to [8334492](https://bugs.openjdk.org/browse/JDK-8334492) where we enabled support for `%p` for filenames specified in jcmd. >> >> With this patch, I propose: >> - Expanding the utility function `Arguments::copy_expand_pid` to `Arguments::copy_expand_arguments` to deal with `%p` expansions for pid and `%t` expansions for timestamps. >> - Leveraging the above utility function to enable argument expansion for both heap dump filenames and jcmd output commands. >> - Though the linked JBS issue only relates to heap dumps generated in case of OOM, I think we can edit it to more broadly support filename expansion to support `%t` for jcmd as well. >> >> Testing: >> - [x] Added test cases pass with all platforms (verified with a GHA job). >> - [x] Tier 1 passes with GHA. >> >> Looking forward to hearing your thoughts! >> >> Thanks, >> Sonia > > I think this could be very useful, but it needs more preparation and decisions. Possibly a CSR. > > - copy_expand_xxx is used in many places. While I think all of these places would benefit from more expansions than just %p, there is a potential backward compatibility issue if clients use %t for whatever reason today > - Do we want the time of the dump or the JVM start? If the JVM runs for a week, then produces a JFR file, should the file be named by the JVM start date? I think in most cases the *current* time makes more sense > - Do we want the printout as a human-readable date or as a numeric timestamp? Both makes sense depending on the post-processing clients want to do. > - Do we want to improve this function further, potentially adding more replacement options? > > One possible way to solve this: > - use different characters for timestamp (number) and datetime (human readable date) > - use always the current time > - If we want to add further replacements: > - come up with a new replacement character that does not clash with libc sprintf (IMHO using percent was not a good idea in the first place). E.g. `$` > - Add a new switch to guard this new replacement logic. By default off. If on, the contract is that any character following a `$` may be either now or in the future replaced with something different. Client must not use `$` as a normal character. > - We probably should remove all non-matching `$` from the input. > - The first replacements could be: `$p` for pid, `$t` for timestamp (numeric), `$d` for datetime > - later replacements can be added later. Since we guard the new feature with a switch and forbid the use of `$`, we are then free to do so without breaking backward compatibility. > > I would like to hear @dholmes-ora take on this. > > We had a similar system at SAP in our proprietary JVM, which was really useful, so I like this idea in general. > I don't object (don't really have strong views) on adding this functionality, but as @tstuefe notes there are a few things to consider. I'm not really averse to using the `%` character precisely because it is commonly identified as a format specifier - and I think `$` would be very problematic due to shell issues. At the risk of seeking the perfect instead of just doing what is immediately "good enough" we might also look at the unified logging decorators as potential formats: > > ``` > Available log decorators: > time (t), utctime (utc), uptime (u), timemillis (tm), uptimemillis (um), timenanos (tn), uptimenanos (un), hostname (hn), pid (p), tid (ti), level (l), tags (tg) > ``` > > and it may also allow for some code sharing. This is a very good thought. There were people that already thought about useful things to have in a log. This is very similar. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20568#issuecomment-2344045803 From coleenp at openjdk.org Wed Sep 11 15:57:08 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 11 Sep 2024 15:57:08 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v10] In-Reply-To: <5AYNAGdkQfu1RPz9WE-kM_BKcDGz345puShzyBAifDY=.b60a0b42-a0e9-422d-9db4-18d2c2db3898@github.com> References: <5AYNAGdkQfu1RPz9WE-kM_BKcDGz345puShzyBAifDY=.b60a0b42-a0e9-422d-9db4-18d2c2db3898@github.com> Message-ID: On Wed, 11 Sep 2024 10:23:49 GMT, Liang Mao wrote: >> Hi, >> >> It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. >> >> test/hotspot/jtreg/runtime and gc are clean. >> >> Thanks, >> Liang > > Liang Mao has updated the pull request incrementally with one additional commit since the last revision: > > use -Xmx50m to increace the crash posibility Please move the test to the serviceability directory. test/hotspot/jtreg/runtime/8339725/libagent8339725.cpp line 1: > 1: /* This test should go in the test/hotspot/jtreg/serviceability/jvmti/GetMethodDeclaringClass/TestUnloadedMethod.java or something like that. The bug number is in the test so the name of the test doesn't need to be the bug number. ------------- PR Review: https://git.openjdk.org/jdk/pull/20907#pullrequestreview-2297707121 PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1755045296 From coleenp at openjdk.org Wed Sep 11 16:00:08 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 11 Sep 2024 16:00:08 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v7] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 20:53:46 GMT, Gerard Ziemski wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. >> >> This fix also includes a cleanup of all the related function/template parameter names and local variable names. >> >> Testing is pending... >> >> Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > Coleen's feedback No, please use capital MT so it's easy to see it's a template parameter! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20872#issuecomment-2344060946 From rehn at openjdk.org Wed Sep 11 16:11:13 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 11 Sep 2024 16:11:13 GMT Subject: Integrated: 8339741: RISC-V: C ABI breakage for integer on stack In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 11:17:54 GMT, Robbin Ehn wrote: > Hi please review, > > When calling a native function using integers smaller than 64, > they must be loaded from a Java stack slot and widen to 64-bit, sign-extended. > In the interpreter case we only store 32-bit, which means the top 32-bit are 'random'. > In the compiler case we do an ld and grab random top 32-bit. > These should be loaded with a lw from Java stack, thus proper sign extended and then stored with sd into the native stack. > > I found the intrepter bug first, wrote a test case for it, which found the compiler bug. > > Here you can see the difference, both are legal todo from a compiler: > https://godbolt.org/z/85aMhja5f > Relevant specs: > https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc >> integer scalars narrower than XLEN bits are widened according to the sign of their type up to 32 bits, then sign-extended to XLEN bits. > > I checked floats also, they seems fine, but please go ahead and do a check regarding floats. > > Passes ./test/hotspot/jtreg/compiler/calls/, runnnig t1. This pull request has now been integrated. Changeset: bfe7f920 Author: Robbin Ehn URL: https://git.openjdk.org/jdk/commit/bfe7f9205b56483b4364130a3a87c58c3fc82998 Stats: 156 lines in 4 files changed: 150 ins; 0 del; 6 mod 8339741: RISC-V: C ABI breakage for integer on stack Reviewed-by: fyang, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/20912 From eosterlund at openjdk.org Wed Sep 11 16:12:07 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 11 Sep 2024 16:12:07 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v10] In-Reply-To: References: <5AYNAGdkQfu1RPz9WE-kM_BKcDGz345puShzyBAifDY=.b60a0b42-a0e9-422d-9db4-18d2c2db3898@github.com> Message-ID: <0BD0oNcSvpWZx2rDtRZhEgum_WYlLoGkofNFZ4JYKyI=.d888869c-1755-4ca0-b920-8209b34ff796@github.com> On Wed, 11 Sep 2024 12:18:21 GMT, David Holmes wrote: > FWIW I don't think resurrecting the dying oop is the right way to fix this given that the underlying problem is that the application failed to keep the class of the jMethodID alive. Can't we detect it is dying (obviously more that what `is_alive` does) and just act as-if it were already dead? There is an inherent race here so the application can't rely on this act of resurrection anyway. We can not detect the oop is dying. That is precisely what the GC is trying to figure out by going through the hassle of traversing the object graph. If what you are proposing was possible (detect unreachable oops by just looking at some cheap local property), then we would rewrite our GCs to exploit that magic. ;-) We would also rewrite Reference.get() to not keep the referent alive because we could just magically tell if it will get cleared in the future, or not. If you are imagining, for example, looking at not yet finalized marking bitmaps from the GC and report errors when encountering a not yet marked object, then we would randomly report errors for perfectly valid uses of the API. The GC just didn't get to that object yet. In other words, we have no way of telling by just looking at an object if the object *will* be found to be not reachable, or not, once it terminates. But by keeping it alive, we can control the answer: the oop will be found to be live. This is not a new problem. We have encountered it many times before. The standard way of dealing with this situation (wanting to publish edges to "peeked" oops in the object graph), is to keep the oop alive. Not sure why we would treat it differently here. Unless of course we say this is not supported and crash, but that seems a bit unfortunate IMO. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2344088263 From stuefe at openjdk.org Wed Sep 11 16:17:16 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 11 Sep 2024 16:17:16 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 12:47:30 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix FullGCForwarding initialization > > src/hotspot/share/memory/classLoaderMetaspace.cpp line 115: > >> 113: if (wastage.is_nonempty()) { >> 114: non_class_space_arena()->deallocate(wastage); >> 115: } > > This code reads a bit strangely. I understand *what* it tries to do. It tries to give back any wasted memory from either the class space arena *or* the non class space arena to the non class space arena's freelist. I assume that we do this since any wastage is presumably too small to be used by our new 22-bit class pointers. However, this context will be lost on future readers. It should have at least a comment in the `if (wastage.is_nonempty())` clause explaining what we expect should happen and why. For example: > > ```c++ > // Any wasted memory is presumably too small for any class. > // Therefore, give it back to the non-class space arena's free list. Yes. Some background: - wastage can only occur for larger Klass* alignments (aka class space arena alignment property), so only for +COH (note to self, maybe assert) - wastage is, by definition, not aligned to the required Klass* alignment, so it cannot be reused. Yes, its probably also too small Yes, I will write a better comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1755111131 From stuefe at openjdk.org Wed Sep 11 16:17:16 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 11 Sep 2024 16:17:16 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 14:15:12 GMT, Roberto Casta?eda Lozano wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix FullGCForwarding initialization > > src/hotspot/share/memory/metaspace/binList.hpp line 202: > >> 200: b_last = b; >> 201: } >> 202: if (UseNewCode)printf("\n"); > > I guess this line is a leftover to be removed? Yep thanks for spotting ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1755115905 From stuefe at openjdk.org Wed Sep 11 16:17:17 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 11 Sep 2024 16:17:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 16:14:39 GMT, Thomas Stuefe wrote: >> src/hotspot/share/memory/metaspace/binList.hpp line 202: >> >>> 200: b_last = b; >>> 201: } >>> 202: if (UseNewCode)printf("\n"); >> >> I guess this line is a leftover to be removed? > > Yep thanks for spotting So that was causing the empty lines in my logs (facepalm) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1755116656 From mli at openjdk.org Wed Sep 11 16:42:09 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 11 Sep 2024 16:42:09 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v6] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 14:29:47 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> remove redundant jump > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1583: > >> 1581: const int64_t tmp_limit = MaxVectorSize >= 32 ? unroll_words*2 : unroll_words*4; >> 1582: sub(tmp1, len, tmp_limit); >> 1583: bge(tmp1, zr, L_vector_entry); > > I don't quite understand this compare of `len` with `tmp_limit` here as I see `len` has already been updated on entry with `subw(len, len, unroll_words)`. Should we compare with the original `len` before the update? (And maybe remove the `addi(len, len, unroll_words)` in `vector_update_crc32` at the same time?). I'm not sure how you feel, when I red the original scalar version of crc32, the `len` is bit confusing to, it's added and subtracted here and there, it's bit hard to understand. When I implement the vector version, at first I'd like to refactor it and add vector version, but finally I gave up, I think it's better to separate the refactor and vector version, and in this pr I try hard to not touch original. So if you have the similar feeling, I would suggest to we do refactoring in a specific pr after this one, in which I will try to make the code straight and clear. And I'm going to implement crc in carry-less multiplication, so before carry-less one I would prefer to refactoring first anyway. To your specific question above: we could do it as you suggested, but seems it will not make code clear if we don't do a refactoring first, as in the first `if (UseRVV)` block, it could also jump to `L_unroll_loop_entry` as a fallback, in this way the code will be more complicated rather than clear. In a summary, my solution is to do a specific refactoring pr which will also address your comments (including the `s/bgt/bgtz` suggestion above, as `bgt` is the original style of this kernel_crc32 method). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1755150447 From sgehwolf at openjdk.org Wed Sep 11 16:42:11 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Wed, 11 Sep 2024 16:42:11 GMT Subject: RFR: 8333446: Add tests for hierarchical container support [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 14:13:50 GMT, Severin Gehwolf wrote: >> Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and cgroups v2 (since [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) is fixed now). >> >> I'm adding those tests in order to not regress another time. >> >> Testing: >> - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. >> - [x] New systemd test on cg v1 and cg v2 (passes). >> - [x] GHA > > Severin Gehwolf has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: > > - Remove test from problem list as the bug is fixed > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Improve reliability of cpu quota test > - Adapt JDK-8339148 > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Merge branch 'master' into jdk-8333446-systemd-slice-tests > - Fix comment of WB::host_cpus() > - Handle non-root + CGv2 > - Add nested hierarchy to test framework > - Revert "Add root check for SystemdMemoryAwarenessTest.java" > > This reverts commit 7e8d9ed46815096ae8c4502f3320ebf5208438d5. > - ... and 10 more: https://git.openjdk.org/jdk/compare/18576f5a...88298b99 Thanks all for the re-reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19530#issuecomment-2344148032 From sgehwolf at openjdk.org Wed Sep 11 17:00:17 2024 From: sgehwolf at openjdk.org (Severin Gehwolf) Date: Wed, 11 Sep 2024 17:00:17 GMT Subject: Integrated: 8333446: Add tests for hierarchical container support In-Reply-To: References: Message-ID: On Mon, 3 Jun 2024 17:28:09 GMT, Severin Gehwolf wrote: > Please review this PR which adds test support for systemd slices so that bugs like [JDK-8217338](https://bugs.openjdk.org/browse/JDK-8217338) can be verified. The added test, `SystemdMemoryAwarenessTest` currently passes on cgroups v1 and cgroups v2 (since [JDK-8322420](https://bugs.openjdk.org/browse/JDK-8322420) is fixed now). > > I'm adding those tests in order to not regress another time. > > Testing: > - [x] Container tests on Linux x86_64 cgroups v2 and Linux x86_64 cgroups v1. > - [x] New systemd test on cg v1 and cg v2 (passes). > - [x] GHA This pull request has now been integrated. Changeset: d9fdf69c Author: Severin Gehwolf URL: https://git.openjdk.org/jdk/commit/d9fdf69c34c20e0f2d526c2f04450acb904c3e80 Stats: 655 lines in 9 files changed: 648 ins; 0 del; 7 mod 8333446: Add tests for hierarchical container support Reviewed-by: mbaesken, zzambers ------------- PR: https://git.openjdk.org/jdk/pull/19530 From sviswanathan at openjdk.org Wed Sep 11 17:24:11 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 11 Sep 2024 17:24:11 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v2] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 01:59:54 GMT, Joe Darcy wrote: >>> If the test is going to use randomness, then its jtreg tags should include >>> >>> `@key randomness` >>> >>> and it is preferable to use jdk.test.lib.RandomFactory to get and Random object since that handles printing out a key so the random sequence can be replicated if the test fails. >> >> Please see the test updated to use `@key randomness` and` jdk.test.lib.RandomFactory` to get and Random object. >> >>> The allowable worst-case error is 2.5 ulp, although at many arguments FDLIBM has a smaller error. >>> For a general Math vs StrictMath test with an allowable 2.5 ulp error, without knowing how accurate FDLIBM is for that function and argument, a large error of approx. 2X the nominal error should be allowed (in case FDLIBM errors in one direction and the Math method errors in the other direction). >>> >> So far the tests haven't failed with error of 2.5ulp. Would it be better to make it 5ulp? Please let me know. > > So far, this will be the only intrinsic implementation of tanh. Therefore, at the moment it is just checking the consistency of the intrinsic implementation with StrictMath/FDLIBM tanh. If the intrinsic has a ~1 ulp accuracy, it would be expected to often be within 2.5 ulps of FDLIBM tanh. However, as written the regression test would not necessarily pass against any allowable Math.tanh implementation, which is the usual criteria for java.lang.Math tests that aren't otherwise constrained (such as by being limited to a given subset of platforms). > > If there was a correctly rounded tanh to compare against, then this style of testing would be valid. > > Are there any plan to intrinsify sinh or cosh? I think instead of random we should generate offline additional correctly rounded fixed test points to cater to new algorithm using high precision arithmetic library and then simply extend the HyperbolicTests.java with these new fixed test points using existing ulp testing mechanism in the test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1755203926 From rkennke at openjdk.org Wed Sep 11 17:31:54 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 11 Sep 2024 17:31:54 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v12] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Make is_oop() MT-safe - Re-enable some vectorization tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/6abda7bc..b6c11f74 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=10-11 Stats: 32 lines in 6 files changed: 7 ins; 8 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Wed Sep 11 17:38:57 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 11 Sep 2024 17:38:57 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: Message-ID: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Revert accidental change of UCOH default ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/b6c11f74..9e008ac1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=11-12 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From duke at openjdk.org Wed Sep 11 18:02:06 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 11 Sep 2024 18:02:06 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v2] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 17:21:36 GMT, Sandhya Viswanathan wrote: >> So far, this will be the only intrinsic implementation of tanh. Therefore, at the moment it is just checking the consistency of the intrinsic implementation with StrictMath/FDLIBM tanh. If the intrinsic has a ~1 ulp accuracy, it would be expected to often be within 2.5 ulps of FDLIBM tanh. However, as written the regression test would not necessarily pass against any allowable Math.tanh implementation, which is the usual criteria for java.lang.Math tests that aren't otherwise constrained (such as by being limited to a given subset of platforms). >> >> If there was a correctly rounded tanh to compare against, then this style of testing would be valid. >> >> Are there any plan to intrinsify sinh or cosh? > > I think instead of random we should generate offline additional correctly rounded fixed test points to cater to new algorithm using high precision arithmetic library and then simply extend the HyperbolicTests.java with these new fixed test points using existing ulp testing mechanism in the test. Thank you Sandhya(@sviswa7) for the suggestion! Will update the existing HyperbolicTests.java with new fixed point tests with quad precision reference values. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1755258108 From stooke at openjdk.org Wed Sep 11 19:17:19 2024 From: stooke at openjdk.org (Simon Tooke) Date: Wed, 11 Sep 2024 19:17:19 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v4] In-Reply-To: References: Message-ID: > This PR changes the status of realpath() from a Posix-specific API to a globally available API, i.e. adding it to the "Hotspot Porting API". Code would refer to os::realpath() instead of os::Posix::realpath(). > > This requires the addition of a stub routine in os_posix.cpp and a Windows implementation of realpath(), using Windows _fullpath(). > > This PR depends on #20597 in that it removes the need for one #ifdef in that PR. Because of that, this PR will be modified when and if #20597 is integrated. > > Please note that guidelines for doing this appear in src/hotspot/share/runtime/os.hpp Simon Tooke has updated the pull request incrementally with one additional commit since the last revision: properly test for buffer too small for path ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20683/files - new: https://git.openjdk.org/jdk/pull/20683/files/f9202c0b..9d9418d0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20683&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20683&range=02-03 Stats: 10 lines in 1 file changed: 6 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20683.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20683/head:pull/20683 PR: https://git.openjdk.org/jdk/pull/20683 From pchilanomate at openjdk.org Wed Sep 11 19:44:10 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 11 Sep 2024 19:44:10 GMT Subject: Integrated: 8335362: [Windows] Stack pointer increment in _cont_thaw stub can cause program to terminate with exit code 0xc0000005 In-Reply-To: References: Message-ID: On Wed, 4 Sep 2024 21:05:10 GMT, Patricio Chilano Mateo wrote: > Please review the following fix. In stub routine cont_thaw() we bump the stack pointer by the maximum size required to copy the frames currently stored in the top stackChunk. On Windows this increment of the stack pointer doesn't play nice with the way Windows sets up and manages stack pages. When a thread is created the stack is divided in 3 memory regions: regular committed pages, guard pages, reserved pages. The first pages are committed and the thread can read/write to them with no issues. The next pages(~2/3) are guard pages, which are committed but have the PAGE_GUARD attribute. When the thread tries to access a guard page the first time, the PAGE_GUARD attribute is removed and a new guard page from the reserved region is added. The rest of the stack are reserved pages and if we try to access it directly we get an EXCEPTION_ACCESS_VIOLATION (see bug for more details). So the problem is that we can bump the stack pointer too much and set it to point somewhere in the reserved r egion. When we then execute the call instruction for method thaw(), we get an EXCEPTION_ACCESS_VIOLATION exception, but because we cannot access the memory at the current stack pointer, we cannot call any method anymore, including the exception handler and the program terminates abruptly with exit code 0xc0000005. > > The fix implemented is to bang the stack pages one by one to let the Windows page protection take over. This is what we already do in os::map_stack_shadow_pages() in JavaCalls::call_helper(), and also in interpreter (bang_stack_shadow_pages()) and compiler (generate_stack_overflow_check()) code. It's actually also the same mechanism that Windows routine _chkstk used by the compiler employs (see bug comments with assembly code). > > I added new test BigStackChunk.java that reproduces the issue. The test fails without this fix and passes with it. I also tested the patch by running in mach5 tiers1-7. > > Thanks, > Patricio This pull request has now been integrated. Changeset: 591aa7c5 Author: Patricio Chilano Mateo URL: https://git.openjdk.org/jdk/commit/591aa7c5c7ebe2a289ed25f0b26126e30fba23f3 Stats: 130 lines in 4 files changed: 130 ins; 0 del; 0 mod 8335362: [Windows] Stack pointer increment in _cont_thaw stub can cause program to terminate with exit code 0xc0000005 Reviewed-by: dholmes, fparain ------------- PR: https://git.openjdk.org/jdk/pull/20862 From pchilanomate at openjdk.org Wed Sep 11 19:44:09 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 11 Sep 2024 19:44:09 GMT Subject: RFR: 8335362: [Windows] Stack pointer increment in _cont_thaw stub can cause program to terminate with exit code 0xc0000005 [v4] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 15:53:47 GMT, Patricio Chilano Mateo wrote: >> Please review the following fix. In stub routine cont_thaw() we bump the stack pointer by the maximum size required to copy the frames currently stored in the top stackChunk. On Windows this increment of the stack pointer doesn't play nice with the way Windows sets up and manages stack pages. When a thread is created the stack is divided in 3 memory regions: regular committed pages, guard pages, reserved pages. The first pages are committed and the thread can read/write to them with no issues. The next pages(~2/3) are guard pages, which are committed but have the PAGE_GUARD attribute. When the thread tries to access a guard page the first time, the PAGE_GUARD attribute is removed and a new guard page from the reserved region is added. The rest of the stack are reserved pages and if we try to access it directly we get an EXCEPTION_ACCESS_VIOLATION (see bug for more details). So the problem is that we can bump the stack pointer too much and set it to point somewhere in the reserved region. When we then execute the call instruction for method thaw(), we get an EXCEPTION_ACCESS_VIOLATION exception, but because we cannot access the memory at the current stack pointer, we cannot call any method anymore, including the exception handler and the program terminates abruptly with exit code 0xc0000005. >> >> The fix implemented is to bang the stack pages one by one to let the Windows page protection take over. This is what we already do in os::map_stack_shadow_pages() in JavaCalls::call_helper(), and also in interpreter (bang_stack_shadow_pages()) and compiler (generate_stack_overflow_check()) code. It's actually also the same mechanism that Windows routine _chkstk used by the compiler employs (see bug comments with assembly code). >> >> I added new test BigStackChunk.java that reproduces the issue. The test fails without this fix and passes with it. I also tested the patch by running in mach5 tiers1-7. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > print values in assert Thanks for the reviews Fred and David! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20862#issuecomment-2344560719 From iklam at openjdk.org Wed Sep 11 20:26:51 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 11 Sep 2024 20:26:51 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v4] In-Reply-To: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: > This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > **Overview** > > - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. > - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. > - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. > - The boot classes are loaded as part of `vmClasses::resolve_all()` > - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). > - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. > > **All-or-nothing Loading** > > - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. > - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: > - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. > - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. > - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. > - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exact same set of modules are visible whe... Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: @adinn comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20843/files - new: https://git.openjdk.org/jdk/pull/20843/files/0441aef0..5bba4ad4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20843&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20843&range=02-03 Stats: 16 lines in 7 files changed: 5 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/20843.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20843/head:pull/20843 PR: https://git.openjdk.org/jdk/pull/20843 From iklam at openjdk.org Wed Sep 11 20:26:52 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 11 Sep 2024 20:26:52 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v3] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Tue, 10 Sep 2024 10:08:35 GMT, Andrew Dinn wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> @dholmes-ora comments: logging indents > > src/hotspot/share/cds/aotClassLinker.hpp line 71: > >> 69: // >> 70: class AOTClassLinker : AllStatic { >> 71: using ClassesTable = ResourceHashtable; > > Can we have a symbolic name for this (prime) magic number here and in other places in this patch? I realise there is existing code which uses the raw number bit it is also consumed symbolically (e.g. in archiveBuilder.hpp, metaspaceClosure.hpp) I added a `static const int TABLE_SIZE = 15889; // prime number` > src/hotspot/share/cds/aotLinkedClassBulkLoader.cpp line 191: > >> 189: } >> 190: >> 191: ClassLoaderData* loader_data = ClassLoaderData::class_loader_data(loader()); > > Can we assert here that loader() != nullptr? I added a few asserts in this function about the expected initiating and defining loaders. > src/hotspot/share/cds/archiveBuilder.cpp line 433: > >> 431: } >> 432: >> 433: remember_embedded_pointer_in_enclosing_obj(ref); > > I'm not clear why this was moved up. Was this just an omission (bug) in the earlier version or do we now need to remember a reference location that we could previously safely ignore? If I remember correctly, this is a latent bug where we didn't remember the embedded pointer when the function returns early. This led to a crash that happened only when -XX:+AOTClassLinking is enabled. > src/hotspot/share/cds/cdsConfig.cpp line 551: > >> 549: } >> 550: >> 551: void CDSConfig::set_has_aot_linked_classes(bool is_static_archive, bool has_aot_linked_classes) { > > Why does this need to take `is_static_archive` as an argument? This argument is unused and I removed it. > src/hotspot/share/cds/dynamicArchive.cpp line 138: > >> 136: verify_estimate_size(_estimated_metaspaceobj_bytes, "MetaspaceObjs"); >> 137: >> 138: sort_methods(); > > Could we have a comment to note that sorting and making shareable need to be done before calling `AOTClassLinker::write_to_archive();` The reason I moved this code is I though it just looks odd -- we write a bunch of meta information about the classes, and then proceed on changing the contents of the classes. The new order -- fix up the classes and then write the meta info -- should be quite natural, so I think a comment isn't needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1755536629 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1755536786 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1755536983 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1755537234 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1755537319 From dholmes at openjdk.org Wed Sep 11 20:32:08 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 11 Sep 2024 20:32:08 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v10] In-Reply-To: <0BD0oNcSvpWZx2rDtRZhEgum_WYlLoGkofNFZ4JYKyI=.d888869c-1755-4ca0-b920-8209b34ff796@github.com> References: <5AYNAGdkQfu1RPz9WE-kM_BKcDGz345puShzyBAifDY=.b60a0b42-a0e9-422d-9db4-18d2c2db3898@github.com> <0BD0oNcSvpWZx2rDtRZhEgum_WYlLoGkofNFZ4JYKyI=.d888869c-1755-4ca0-b920-8209b34ff796@github.com> Message-ID: <81SHRwHgMt1g1gzDvpl03mDEVruFjU2XGh3bnytsRxQ=.665cdc01-d792-4bc7-bee9-f0d7bfbf9817@github.com> On Wed, 11 Sep 2024 16:09:09 GMT, Erik ?sterlund wrote: >> FWIW I don't think resurrecting the dying oop is the right way to fix this given that the underlying problem is that the application failed to keep the class of the jMethodID alive. Can't we detect it is dying (obviously more that what `is_alive` does) and just act as-if it were already dead? There is an inherent race here so the application can't rely on this act of resurrection anyway. > >> FWIW I don't think resurrecting the dying oop is the right way to fix this given that the underlying problem is that the application failed to keep the class of the jMethodID alive. Can't we detect it is dying (obviously more that what `is_alive` does) and just act as-if it were already dead? There is an inherent race here so the application can't rely on this act of resurrection anyway. > > We can not detect the oop is dying. That is precisely what the GC is trying to figure out by going through the hassle of traversing the object graph. If what you are proposing was possible (detect unreachable oops by just looking at some cheap local property), then we would rewrite our GCs to exploit that magic. ;-) We would also rewrite Reference.get() to not keep the referent alive because we could just magically tell if it will get cleared in the future, or not. > > If you are imagining, for example, looking at not yet finalized marking bitmaps from the GC and report errors when encountering a not yet marked object, then we would randomly report errors for perfectly valid uses of the API. The GC just didn't get to that object yet. In other words, we have no way of telling by just looking at an object if the object *will* be found to be not reachable, or not, once it terminates. But by keeping it alive, we can control the answer: the oop will be found to be live. > > This is not a new problem. We have encountered it many times before. The standard way of dealing with this situation (wanting to publish edges to "peeked" oops in the object graph), is to keep the oop alive. Not sure why we would treat it differently here. Unless of course we say this is not supported and crash, but that seems a bit unfortunate IMO. Thanks for the detailed explanations @fisk - much appreciated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2344642789 From matsaave at openjdk.org Wed Sep 11 21:02:41 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 11 Sep 2024 21:02:41 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v4] In-Reply-To: References: Message-ID: > This patch cleans up the use of `get_new_method()` so callers don't have to worry about throwing `NoSuchMethodError`. The method is refactored to throw the error and avoid ever returning nullptr. Verified with tier1-5 tests. Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Coleen suggestion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20874/files - new: https://git.openjdk.org/jdk/pull/20874/files/bd1cc1e8..e95d2bd1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20874&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20874&range=02-03 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20874.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20874/head:pull/20874 PR: https://git.openjdk.org/jdk/pull/20874 From coleenp at openjdk.org Wed Sep 11 21:18:12 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 11 Sep 2024 21:18:12 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> References: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> Message-ID: On Wed, 11 Sep 2024 17:38:57 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Revert accidental change of UCOH default I was starting to understand the concerns with having prototype_header in Klass. It seems like it would simplify encoding the klass for object allocation. My recent change https://bugs.openjdk.org/browse/JDK-8338526 breaks this. You need to pass a parameter to Klass() to tell whether to encode the klass pointer or not, and pass this to Klass() constructor. diff --git a/src/hotspot/share/oops/instanceKlass.cpp b/src/hotspot/share/oops/instanceKlass.cpp index fd198f54fc9..7aa4bd24948 100644 --- a/src/hotspot/share/oops/instanceKlass.cpp +++ b/src/hotspot/share/oops/instanceKlass.cpp @@ -511,7 +511,7 @@ InstanceKlass::InstanceKlass() { } InstanceKlass::InstanceKlass(const ClassFileParser& parser, KlassKind kind, ReferenceType reference_type) : - Klass(kind), + Klass(kind, (!parser.is_interface() && !parser.is_abstract())), _nest_members(nullptr), _nest_host(nullptr), _permitted_subclasses(nullptr), ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2344715540 From dholmes at openjdk.org Thu Sep 12 00:03:05 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 12 Sep 2024 00:03:05 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 07:56:25 GMT, Fredrik Bredberg wrote: >> Removed the concept of an ObjectMonitor Responsible thread. >> >> The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. >> >> The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). >> >> After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. >> >> Passes tier1-tier7 on supported platforms. >> x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. >> Arm32 and Zero doesn't need any changes as far as I can tell. > > I've done basic testing on ppc64le, riscv64 and s390x using QEMU, but would appreciate if @TheRealMDoerr, @RealFYang and @offamitkumar could take it for a real test drive. > @fbredber, @dholmes-ora: I got a substantial performance drop on our 96 Thread Xeon server: What OS for the Xeon? We have only seen issues with Windows. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19454#issuecomment-2344995807 From dholmes at openjdk.org Thu Sep 12 00:07:04 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 12 Sep 2024 00:07:04 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: <58EiqcAfncGTnk6wqesW4ZsVyt3Js02aiOpEbl4HCwI=.148ffb62-7571-4aa3-b0dd-996022290e9b@github.com> Message-ID: <6UxGZMfjEJfj7vA_1LFIDGkr65EufZc8nfoEpFeuyjk=.aedee205-884b-4288-bc6f-62503fe67eae@github.com> On Wed, 11 Sep 2024 12:45:56 GMT, Fredrik Bredberg wrote: >> On x86 `membar(LoadStore | StoreStore /* release */)` would be a nop. Not sure if adding it before nulling the pointer would make things clearer. >> >> `membar(StoreLoad);` is all that we need between clearing the owner and checking the queues / successor. > > As @xmas92 wrote, membar(StoreLoad); is all that we need between clearing the owner and checking the queues / successor. And, since I use membar(StoreLoad) in all other platforms, I wanted it to be consistent. > Also if you look in [ObjectMonitor::exit](https://github.com/openjdk/jdk/blob/f9ddfc6fb0780a7d6e933a40ecd3cd458a058f04/src/hotspot/share/runtime/objectMonitor.cpp#L1132C6-L1132C25)() you'll see that this there is a call to [OrderAccess::storeload](https://github.com/openjdk/jdk/blob/f9ddfc6fb0780a7d6e933a40ecd3cd458a058f04/src/hotspot/share/runtime/objectMonitor.cpp#L1184)() just after [release_clear_owner](https://github.com/openjdk/jdk/blob/f9ddfc6fb0780a7d6e933a40ecd3cd458a058f04/src/hotspot/share/runtime/objectMonitor.cpp#L1183)(), so I'm just doing the same as has been done in the C++ slow-path for long. When the key change here is "add in the missing fence that otherwise allowed stranding" then I would really like something to include the word "fence". Very few people will understand/recall the equivalence with storeload. A comment will suffice. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1755882189 From dholmes at openjdk.org Thu Sep 12 00:19:06 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 12 Sep 2024 00:19:06 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v7] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 20:53:46 GMT, Gerard Ziemski wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. >> >> This fix also includes a cleanup of all the related function/template parameter names and local variable names. >> >> Testing is pending... >> >> Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > Coleen's feedback If anything else crops up we can do a follow up. Thanks. Please leave as-is. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20872#pullrequestreview-2298985306 PR Comment: https://git.openjdk.org/jdk/pull/20872#issuecomment-2345010450 From kvn at openjdk.org Thu Sep 12 00:30:08 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 12 Sep 2024 00:30:08 GMT Subject: RFR: 8339849: Enumerate opto and C1 stubs, generate enums, names, fields and generator calls In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 15:27:09 GMT, Andrew Dinn wrote: > Systematize handling of Opto and C1 stubs. Generate enum ids, static fields, stub/blob names and generator code from declarations using template macros as previously done with Shared stubs. Systematically reference stubs and stub names using ids. Few comments src/hotspot/share/c1/c1_CodeStubs.hpp line 267: > 265: LIR_Opr _result; > 266: CodeEmitInfo* _info; > 267: C1StubId _stub_id; Preserve spacing src/hotspot/share/c1/c1_CodeStubs.hpp line 518: > 516: private: > 517: LIR_Opr _obj; > 518: C1StubId _stub; Spacing. src/hotspot/share/opto/escape.cpp line 2046: > 2044: if (meth == nullptr) { > 2045: const char* name = call->as_CallStaticJava()->_name; > 2046: assert(strncmp(name, "Opto Runtime multianewarray", 27) == 0, "TODO: add failed case check"); Please use "C2" instead of "Opto" in your changes in C2 code and in tests. src/hotspot/share/opto/runtime.cpp line 92: > 90: > 91: #define OPTO_BLOB_FIELD_DEFINE(name, type) \ > 92: type OptoRuntime:: BLOB_FIELD_NAME(name) = nullptr; Please use C2 instead of OPTO in macro names. src/hotspot/share/opto/runtime.cpp line 147: > 145: // from the stub name by appending suffix '_C'. However, in two cases > 146: // a common target method also needs to be called from shared runtime > 147: // stubs. In these two cases the opto stubs rely on method "opto" -> "C2" ------------- PR Review: https://git.openjdk.org/jdk/pull/20936#pullrequestreview-2298980632 PR Review Comment: https://git.openjdk.org/jdk/pull/20936#discussion_r1755886284 PR Review Comment: https://git.openjdk.org/jdk/pull/20936#discussion_r1755888197 PR Review Comment: https://git.openjdk.org/jdk/pull/20936#discussion_r1755890608 PR Review Comment: https://git.openjdk.org/jdk/pull/20936#discussion_r1755895998 PR Review Comment: https://git.openjdk.org/jdk/pull/20936#discussion_r1755901868 From kvn at openjdk.org Thu Sep 12 00:30:09 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 12 Sep 2024 00:30:09 GMT Subject: RFR: 8339849: Enumerate opto and C1 stubs, generate enums, names, fields and generator calls In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 08:38:19 GMT, Andrew Dinn wrote: >> Systematize handling of Opto and C1 stubs. Generate enum ids, static fields, stub/blob names and generator code from declarations using template macros as previously done with Shared stubs. Systematically reference stubs and stub names using ids. > > src/hotspot/share/opto/runtime.cpp line 151: > >> 149: // defines temporarily rebind the generated names to reference the >> 150: // relevant implementations. >> 151: > > I am not 100% happy about using defines to finesse this problem of common C targets. > > One alternative here is to define methods local to class OptoRuntime which fit the generator naming convention and have them forward the call to the SharedRuntime methods. n.b. I used (local) method forwarding to allow blobs to share common typefunc providers. > > Another alternative is to declare the C target as a parameter to the opto blob declaration macro. That's more flexible but in almost all cases it repeats information already present and makes understanding and updating the declarations more complex. This is indeed confusing even with comment. I prefer your first suggestion: "define methods local to class OptoRuntime which fit the generator naming convention and have them forward the call to the SharedRuntime methods" > src/hotspot/share/opto/runtime.cpp line 183: > >> 181: OPTO_STUBS_DO(GEN_OPTO_BLOB, GEN_OPTO_STUB, GEN_OPTO_JVMTI_STUB) >> 182: >> 183: /* > > This old code that has been commented out needs to be removed. Then do it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20936#discussion_r1755904699 PR Review Comment: https://git.openjdk.org/jdk/pull/20936#discussion_r1755900111 From dholmes at openjdk.org Thu Sep 12 00:32:05 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 12 Sep 2024 00:32:05 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v4] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 21:02:41 GMT, Matias Saavedra Silva wrote: >> This patch cleans up the use of `get_new_method()` so callers don't have to worry about throwing `NoSuchMethodError`. The method is refactored to throw the error and avoid ever returning nullptr. Verified with tier1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Coleen suggestion LGTM Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20874#pullrequestreview-2299001634 From dholmes at openjdk.org Thu Sep 12 01:02:08 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 12 Sep 2024 01:02:08 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v4] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 19:17:19 GMT, Simon Tooke wrote: >> This PR changes the status of realpath() from a Posix-specific API to a globally available API, i.e. adding it to the "Hotspot Porting API". Code would refer to os::realpath() instead of os::Posix::realpath(). >> >> This requires the addition of a stub routine in os_posix.cpp and a Windows implementation of realpath(), using Windows _fullpath(). >> >> This PR depends on #20597 in that it removes the need for one #ifdef in that PR. Because of that, this PR will be modified when and if #20597 is integrated. >> >> Please note that guidelines for doing this appear in src/hotspot/share/runtime/os.hpp > > Simon Tooke has updated the pull request incrementally with one additional commit since the last revision: > > properly test for buffer too small for path Changes requested by dholmes (Reviewer). src/hotspot/os/posix/os_posix.cpp line 990: > 988: } > 989: > 990: Spurious extra line ? src/hotspot/os/windows/os_windows.cpp line 5337: > 5335: ALLOW_C_FUNCTION(::free, ::free(p);) // *not* os::free > 5336: } > 5337: } You have forgotten to return `result`. ------------- PR Review: https://git.openjdk.org/jdk/pull/20683#pullrequestreview-2299014792 PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1755920995 PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1755930028 From lmao at openjdk.org Thu Sep 12 02:36:59 2024 From: lmao at openjdk.org (Liang Mao) Date: Thu, 12 Sep 2024 02:36:59 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v11] In-Reply-To: References: Message-ID: > Hi, > > It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. > > test/hotspot/jtreg/runtime and gc are clean. > > Thanks, > Liang Liang Mao has updated the pull request incrementally with one additional commit since the last revision: Move test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20907/files - new: https://git.openjdk.org/jdk/pull/20907/files/d6c6559b..eba5b130 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20907&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20907&range=09-10 Stats: 6 lines in 3 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/20907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20907/head:pull/20907 PR: https://git.openjdk.org/jdk/pull/20907 From lmao at openjdk.org Thu Sep 12 02:39:06 2024 From: lmao at openjdk.org (Liang Mao) Date: Thu, 12 Sep 2024 02:39:06 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v10] In-Reply-To: References: <5AYNAGdkQfu1RPz9WE-kM_BKcDGz345puShzyBAifDY=.b60a0b42-a0e9-422d-9db4-18d2c2db3898@github.com> Message-ID: On Wed, 11 Sep 2024 15:53:43 GMT, Coleen Phillimore wrote: >> Liang Mao has updated the pull request incrementally with one additional commit since the last revision: >> >> use -Xmx50m to increace the crash posibility > > test/hotspot/jtreg/runtime/8339725/libagent8339725.cpp line 1: > >> 1: /* > > This test should go in the test/hotspot/jtreg/serviceability/jvmti/GetMethodDeclaringClass/TestUnloadedMethod.java or something like that. > > The bug number is in the test so the name of the test doesn't need to be the bug number. Test moved to the new location. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1756008200 From iklam at openjdk.org Thu Sep 12 02:58:44 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 12 Sep 2024 02:58:44 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache Message-ID: This is the 4th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). **Problem:** This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store [`sun.invoke.util.Wrapper`](https://github.com/openjdk/jdk/blob/c3711dc90980fb3e63ff199612c201c4464626bf/src/java.base/share/classes/sun/invoke/util/Wrapper.java) enums in the AOT cache. Although CDS has some limited support for storing enums, the `Wrapper` type is too complex for the existing solution to handle. Please see the JBS issue for details. **Solution:** In the assembly phase, we store the initialized states of the `Wrapper` class (captured in a `java.lang.Class` object, a.k.a. the *mirror* of this class) into the AOT cache. In production run, we no longer execute `Wrapper::`, because all the static fields (contained in its mirror) are already initialized. **Review Notes:** - The new capability is controlled by `CDSConfig::is_initing_classes_at_dump_time()`. We can aot-initialize classes only if `-XX:+AOTClassLinking` is enabled. - The old (more limited) support for enums is still there (it's required when `AOTClassLinking` is disabled). See the call to `CDSEnumKlass::handle_enum_obj()` in heapShared.cpp. - `AOTClassInitializer::can_archive_initialized_mirror()` decides what classes can be aot-initialized. This is currently a very small set of classes, but will expand in [JDK-8293336](https://bugs.openjdk.org/browse/JDK-8293336) - Before, `HeapShared::archive_java_mirrors()` would clear out all the states in the archived mirrors. With this PR, the states of aot-initialized classes are preserved via `HeapShared::copy_aot_initialized_mirror()`. - During the early state of the production run, `AOTLinkedClassBulkLoader::init_required_classes_for_loader()` is called to make sure that: - all aot-initialized classes are moved into the `initialized` state (without executing its `` method). This is done in `InstanceKlass::initialize_from_cds()` - the classes of all the objects that are reachable from the aot-initialized mirrors are initialized. See comments above ` HeapShared::init_classes_reachable_from_archived_mirrors()` **Caveats:** Not all Enum classes can be stored in the initialized state. E.g., some Enums might have static fields that depend on the environment: enum Foo { [....] static final long TIME = System.currentTimeMillis(); } Therefore, this PR does not expose this capability for general usage. Currently, only a small set of Enums are stored in the archived heap, whose contents are carefully limited by tables in heapShared.cpp. We assume that none of the archived Enums have environment dependencies. Obviously a better mechanism (AOT object creation API, static analysis, etc) will be needed before this capability can be opened up. --- See [here](https://bugs.openjdk.org/browse/JDK-8315737) for the sequence of dependent RFEs for implementing JEP 483. ------------- Depends on: https://git.openjdk.org/jdk/pull/20843 Commit messages: - Clean up; removed unrelated changes in classPrinter.cpp - more cleanup - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap - More clean up for JDK-8293187 - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap - Simplified implemented by AOTClassInitializer. - 8293187: Support sun.invoke.util.Wrapper in CDS archive heap Changes: https://git.openjdk.org/jdk/pull/20958/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20958&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293187 Stats: 814 lines in 20 files changed: 742 ins; 16 del; 56 mod Patch: https://git.openjdk.org/jdk/pull/20958.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20958/head:pull/20958 PR: https://git.openjdk.org/jdk/pull/20958 From fyang at openjdk.org Thu Sep 12 03:04:05 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 12 Sep 2024 03:04:05 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v6] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 16:39:02 GMT, Hamlin Li wrote: > To your specific question above: we could do it as you suggested, but seems it will not make code clear if we don't do a refactoring first, as in the first `if (UseRVV)` block, it could also jump to `L_unroll_loop_entry` finally as a fallback. So in this way the code will be more complicated rather than clear. I agree with you in that it's better if we could to separate the refactor and vector version. Here is what I am thinking. I could be wrong as I haven't checked the vector code (`vector_update_crc32`) implementation yet. I suppose it's a performance consideration for the first `if (UseRVV)` block. The intention is execute the vector code only when the input `len` >= `tmp_limit`, right? If I am right, then I think we should use the original `len` before this `subw(len, len, unroll_words)` update on entry. Based on that, I am suggesting following sequence: const int64_t single_table_size = 256; const int64_t unroll = 16; const int64_t unroll_words = unroll*wordSize; mv(tmp5, right_32_bits); const ExternalAddress table_addr = StubRoutines::crc_table_addr(); la(table0, table_addr); add(table1, table0, 1*single_table_size*sizeof(juint), tmp1); add(table2, table0, 2*single_table_size*sizeof(juint), tmp1); add(table3, table2, 1*single_table_size*sizeof(juint), tmp1); if (UseRVV) { const int64_t tmp_limit = MaxVectorSize >= 32 ? unroll_words*2 : unroll_words*4; sub(tmp1, len, tmp_limit); bge(tmp1, zr, L_vector_entry); } subw(len, len, unroll_words); <=========== moved here andn(crc, tmp5, crc); <=========== moved here bge(len, zr, L_unroll_loop_entry); I don't see how `L_unroll_loop_entry` finally as a fallback are affecting us. The code doesn't become more complicated to me and the two versions are still separated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1756024897 From lmao at openjdk.org Thu Sep 12 03:23:07 2024 From: lmao at openjdk.org (Liang Mao) Date: Thu, 12 Sep 2024 03:23:07 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v10] In-Reply-To: <43juo9iTid-P7eAdi1yfs2t3rwGuxMXlGs_5WgazV4c=.813bc177-60af-4b26-aea8-55414e276b27@github.com> References: <5AYNAGdkQfu1RPz9WE-kM_BKcDGz345puShzyBAifDY=.b60a0b42-a0e9-422d-9db4-18d2c2db3898@github.com> <43juo9iTid-P7eAdi1yfs2t3rwGuxMXlGs_5WgazV4c=.813bc177-60af-4b26-aea8-55414e276b27@github.com> Message-ID: On Wed, 11 Sep 2024 12:14:45 GMT, David Holmes wrote: > I guess I am still missing a piece here. We have the initial check for k being alive (which doesn't ensure it stays alive it just allows an early bail out), and we end with creating a JNI reference for the mirror oop. I assume that once we have the JNI reference the mirror oop is again strongly reachable and safe (if not how are we allowed to create the JNI reference for it?). So somewhere inbetween k is no longer alive and the mirror oop is junk. Or is it that things go bad after we create the JNI reference? That's a good point. It's related to the fundamental GC algorithm and implementation in hotspot. In GC cycle we check the klass alive and create JNI strong reference in meantime but later the klass is dead which is counterintuitive. That is because GC concurrent marking/tracing replies on a basical tri-color algorithm which guarantees the safety of object graph traverse with modification to the graph. The problem is that we created a connection between a `WHITE` dying klass oop to a `BLACK` JNI Handle (which is part of GC roots scanned before graph traverse) and therefore `BLACK` pointing to `WHITE` violates the tri-color invariance. In the earlier concurrent GC implementation CMS, we don't need resurrection because CMS uses the graph insertion protection in tri-color AKA `incremental-update`. The JNI Handle will be rescanned after graph traverse and mark the dying class oop alive. However G1 and ZGC uses deletion protection in tri-color (which SATB belongs to) which has the advantag e to getting rid of rescan but cannot mark alive for the scenario you described. The solution is to keep the weak referent alive while accessing in GC cycle which is the CLD holder oop in this case. Technically we could definitely do something like CMS incremental-update to revive the oop while in JNI Hanlde creation(we tried and fixed this crash) but I guess it is not the general consistent way in hotspot and make things more confused and difficult to understand. I hope this can help. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2345187437 From dlong at openjdk.org Thu Sep 12 03:38:07 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 12 Sep 2024 03:38:07 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling In-Reply-To: <6KiIhfuMzw4--X2kuJqjQs6s8OhA-dGtMIvDDULrOkw=.71492623-4bd2-44b2-8c4e-21b35980ef81@github.com> References: <6KiIhfuMzw4--X2kuJqjQs6s8OhA-dGtMIvDDULrOkw=.71492623-4bd2-44b2-8c4e-21b35980ef81@github.com> Message-ID: <5ZrOvjjvkNxZA9ycmv7yrE0nysL_vV0hqTA98xxuK78=.3751080e-ca4c-4132-b9bc-2685e8c65193@github.com> On Fri, 6 Sep 2024 13:12:29 GMT, Coleen Phillimore wrote: >> This patch cleans up the use of `get_new_method()` so callers don't have to worry about throwing `NoSuchMethodError`. The method is refactored to throw the error and avoid ever returning nullptr. Verified with tier1-5 tests. > > What should this return for a deleted method? > > // ------------------------------------------------------------------ > // ciMethod::equals > // > // Returns true if the methods are the same, taking redefined methods > // into account. > bool ciMethod::equals(const ciMethod* m) const { > if (this == m) return true; > VM_ENTRY_MARK; > Method* m1 = this->get_Method(); > Method* m2 = m->get_Method(); > if (m1->is_old()) m1 = m1->get_new_method(); > if (m2->is_old()) m2 = m2->get_new_method(); > return m1 != Universe::no_such_method_error() && m1 == m2; // ??? > } @coleenp , I think it is enough for ciMethod::equals() to simply compare the values of orig_method_idnum() and not deal with old/deleted methods directly, but the last time I checked what orig_method_idnum() really meant I got confused by the idnum renumbering, so I wasn't able to convince myself that using orig_method_idnum() for comparison was correct. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20874#issuecomment-2345200106 From dlong at openjdk.org Thu Sep 12 03:42:05 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 12 Sep 2024 03:42:05 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling In-Reply-To: References: Message-ID: On Fri, 6 Sep 2024 12:51:38 GMT, Coleen Phillimore wrote: > We do the same thing with illegal_access_error() where the arguments may not match and there's a special case for this and no_such_method_error() in dependencies. Are the compilers confused by this too? If the compiler used the value from the itable, I think it could get confused. There are places in deoptimization for example, that use the callee signature instead of the call-site signature, to deal with signature mismatch caused by the invokedynamic appendix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20874#issuecomment-2345203264 From dlong at openjdk.org Thu Sep 12 03:47:04 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 12 Sep 2024 03:47:04 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v2] In-Reply-To: References: <2Jl1gRQ49EyqEEGyhk020hzxfDiydWk1u4V5-mLttyA=.c245a5b4-30f6-4a38-8e41-5d6acca57ecd@github.com> <_geeT1LEXJLEVX6kW4zv8z2YldHczqXWJRqrWtb8RzM=.41f5209e-39dc-49c8-aa44-01192c917578@github.com> Message-ID: On Tue, 10 Sep 2024 06:50:13 GMT, David Holmes wrote: >> So that implies that you trust my reading of this code. It is complicated enough that testing both seems like a safe thing to do and somewhat clarifying, or else adding an assert like: >> >> assert(new_method != nullptr || old_method->is_deleted(), "this is the only way this happens"); >> return new_method == nullptr ? nsme : new_method; > > The assert works for me. Can we assert the stronger statement: `(new_method == nullptr) == (old_method->is_deleted())` ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20874#discussion_r1756060487 From dlong at openjdk.org Thu Sep 12 03:58:04 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 12 Sep 2024 03:58:04 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v4] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 21:02:41 GMT, Matias Saavedra Silva wrote: >> This patch cleans up the use of `get_new_method()` so callers don't have to worry about throwing `NoSuchMethodError`. The method is refactored to throw the error and avoid ever returning nullptr. Verified with tier1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Coleen suggestion I think using a callee with the wrong signature could cause problems in other places, not just the compiler. Doesn't GC oopmap scanning depend on the signature of the callee method? And it might seem harmless if those "extra" arguments are not scanned, I believe there is a JVMTI API that can query those values. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20874#issuecomment-2345216935 From dlong at openjdk.org Thu Sep 12 04:01:09 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 12 Sep 2024 04:01:09 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v4] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 21:02:41 GMT, Matias Saavedra Silva wrote: >> This patch cleans up the use of `get_new_method()` so callers don't have to worry about throwing `NoSuchMethodError`. The method is refactored to throw the error and avoid ever returning nullptr. Verified with tier1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Coleen suggestion I think the JVMTI API I was thinking of was PopFrame, which says: " the operand stack is restored--the argument values are added back and if the invoke was not invokestatic, objectref is added back as well " " Note however, that any changes to the arguments, which occurred in the called method, remain; when execution continues, the first instruction to execute will be the invoke." ------------- PR Comment: https://git.openjdk.org/jdk/pull/20874#issuecomment-2345219547 From lmao at openjdk.org Thu Sep 12 05:31:08 2024 From: lmao at openjdk.org (Liang Mao) Date: Thu, 12 Sep 2024 05:31:08 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v10] In-Reply-To: <0BD0oNcSvpWZx2rDtRZhEgum_WYlLoGkofNFZ4JYKyI=.d888869c-1755-4ca0-b920-8209b34ff796@github.com> References: <5AYNAGdkQfu1RPz9WE-kM_BKcDGz345puShzyBAifDY=.b60a0b42-a0e9-422d-9db4-18d2c2db3898@github.com> <0BD0oNcSvpWZx2rDtRZhEgum_WYlLoGkofNFZ4JYKyI=.d888869c-1755-4ca0-b920-8209b34ff796@github.com> Message-ID: <8fgEm6AttYLP6RHna_29W3-nqK9J_pnDkeLarR9eO74=.cb9418f4-6f3e-4f2c-970a-25d138bbf53e@github.com> On Wed, 11 Sep 2024 16:09:09 GMT, Erik ?sterlund wrote: >> FWIW I don't think resurrecting the dying oop is the right way to fix this given that the underlying problem is that the application failed to keep the class of the jMethodID alive. Can't we detect it is dying (obviously more that what `is_alive` does) and just act as-if it were already dead? There is an inherent race here so the application can't rely on this act of resurrection anyway. > >> FWIW I don't think resurrecting the dying oop is the right way to fix this given that the underlying problem is that the application failed to keep the class of the jMethodID alive. Can't we detect it is dying (obviously more that what `is_alive` does) and just act as-if it were already dead? There is an inherent race here so the application can't rely on this act of resurrection anyway. > > We can not detect the oop is dying. That is precisely what the GC is trying to figure out by going through the hassle of traversing the object graph. If what you are proposing was possible (detect unreachable oops by just looking at some cheap local property), then we would rewrite our GCs to exploit that magic. ;-) We would also rewrite Reference.get() to not keep the referent alive because we could just magically tell if it will get cleared in the future, or not. > > If you are imagining, for example, looking at not yet finalized marking bitmaps from the GC and report errors when encountering a not yet marked object, then we would randomly report errors for perfectly valid uses of the API. The GC just didn't get to that object yet. In other words, we have no way of telling by just looking at an object if the object *will* be found to be not reachable, or not, once it terminates. But by keeping it alive, we can control the answer: the oop will be found to be live. > > This is not a new problem. We have encountered it many times before. The standard way of dealing with this situation (wanting to publish edges to "peeked" oops in the object graph), is to keep the oop alive. Not sure why we would treat it differently here. Unless of course we say this is not supported and crash, but that seems a bit unfortunate IMO. @fisk Do you think hotspot abuses the weak's `peek`? IMHO, `peek` should be restricted inside GC scope because only very few places need to use peek. In other component of VM, we could always keep alive if some alive API return true or try to access weak referent just like the Java code did. Does it make sense? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2345305346 From dholmes at openjdk.org Thu Sep 12 05:51:06 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 12 Sep 2024 05:51:06 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v2] In-Reply-To: References: <2Jl1gRQ49EyqEEGyhk020hzxfDiydWk1u4V5-mLttyA=.c245a5b4-30f6-4a38-8e41-5d6acca57ecd@github.com> <_geeT1LEXJLEVX6kW4zv8z2YldHczqXWJRqrWtb8RzM=.41f5209e-39dc-49c8-aa44-01192c917578@github.com> Message-ID: On Thu, 12 Sep 2024 03:44:32 GMT, Dean Long wrote: >> The assert works for me. > > Can we assert the stronger statement: `(new_method == nullptr) == (old_method->is_deleted())` ? Yeah I like that better too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20874#discussion_r1756165561 From lmao at openjdk.org Thu Sep 12 06:43:50 2024 From: lmao at openjdk.org (Liang Mao) Date: Thu, 12 Sep 2024 06:43:50 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v12] In-Reply-To: References: Message-ID: > Hi, > > It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. > > test/hotspot/jtreg/runtime and gc are clean. > > Thanks, > Liang Liang Mao has updated the pull request incrementally with one additional commit since the last revision: remove unused imports ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20907/files - new: https://git.openjdk.org/jdk/pull/20907/files/eba5b130..acf91c94 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20907&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20907&range=10-11 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20907/head:pull/20907 PR: https://git.openjdk.org/jdk/pull/20907 From mli at openjdk.org Thu Sep 12 07:43:39 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 12 Sep 2024 07:43:39 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v7] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > Thanks. > > This improvement is based on java.base/share/native/libzip/zlib/zcrc32.c, I made some modification to N (to 16) related code, then re-generate the tables needed, finally vectorize the code (original implementation in zcrc32.c is just scalar code). > > ## Test > test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java, > test/jdk/java/util/zip/TestCRC32.java > > ## Performance > > ###?on bananapi > > with patch > > Benchmark | (count) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- > TestCRC32.testCRC32Update | 64 | avgt | 10 | 222.297 | 0.106 | ns/op > TestCRC32.testCRC32Update | 128 | avgt | 10 | 365.144 | 0.196 | ns/op > TestCRC32.testCRC32Update | 256 | avgt | 10 | 687.14 | 0.235 | ns/op > TestCRC32.testCRC32Update | 512 | avgt | 10 | 1043.833 | 0.083 | ns/op > TestCRC32.testCRC32Update | 2048 | avgt | 10 | 3299.928 | 1.361 | ns/op > TestCRC32.testCRC32Update | 16384 | avgt | 10 | 24384.502 | 25.298 | ns/op > TestCRC32.testCRC32Update | 65536 | avgt | 10 | 103200.458 | 8.297 | ns/op > > > > without patch > > Benchmark | (count) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- > TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.878 | 0.02 | ns/op > TestCRC32.testCRC32Update | 128 | avgt | 10 | 364.173 | 0.032 | ns/op > TestCRC32.testCRC32Update | 256 | avgt | 10 | 685.815 | 0.055 | ns/op > TestCRC32.testCRC32Update | 512 | avgt | 10 | 1329.049 | 0.084 | ns/op > TestCRC32.testCRC32Update | 2048 | avgt | 10 | 5189.302 | 0.666 | ns/op > TestCRC32.testCRC32Update | 16384 | avgt | 10 | 41250.873 | 23.882 | ns/op > TestCRC32.testCRC32Update | 65536 | avgt | 10 | 171664.002 | 15.011 | ns/op > > > > ### on K230 > > with patch > (type_flag[1]);` Source root ? test/hotspot/gtest/nmt/test_nmt_reserved_region.cpp: `50: ASSERT_EQ(region2.mem_tag(), mtThreadStack); // Should be correct flag` Source root ? test/hotspot/gtest/nmt/test_vmatree.cpp: `435: const MemTag candidate_flags[candidates_len_flags] = {` `459: const MemTag mem_tag = candidate_flags[os::random() % candidates_len_flags];` ------------- PR Comment: https://git.openjdk.org/jdk/pull/20872#issuecomment-2345718458 From dlong at openjdk.org Thu Sep 12 09:25:05 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 12 Sep 2024 09:25:05 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v4] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 21:02:41 GMT, Matias Saavedra Silva wrote: >> This patch cleans up the use of `get_new_method()` so callers don't have to worry about throwing `NoSuchMethodError`. The method is refactored to throw the error and avoid ever returning nullptr. Verified with tier1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Coleen suggestion I'm trying to determine if this PR changes anything in regards to compilers, or leaves everything the same, so I am looking at callers of get_new_method() that don't already map is_deleted() to throw_no_such_method_error(). Is it true that CallInfo::resolved_method() and CallInfo::selected_method() can never return an is_old or is_deleted method? They check for is_old but not is_deleted. If get_new_method() actually returned nullptr for a deleted method then some callers might crash, so I am assuming this is impossible. There is probably a rule that if you have a resolved CallInfo then you aren't allowed to safepoint, so there is no way the resolved/selected methods in the CallInfo can change to old or deleted, and it's probably impossible for them start out as old/deleted before a safepoint. So why are these methods checking for is_old at all? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20874#issuecomment-2345732364 From fbredberg at openjdk.org Thu Sep 12 09:36:05 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Thu, 12 Sep 2024 09:36:05 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: Message-ID: <9pOvaBWAygddcKVCP0l4WOr6TFGQ5ZYivyoQ2aYSVnQ=.48264854-eba3-4690-8ef9-6b7eea65b702@github.com> On Thu, 12 Sep 2024 08:33:12 GMT, Martin Doerr wrote: >> I've done basic testing on ppc64le, riscv64 and s390x using QEMU, but would appreciate if @TheRealMDoerr, @RealFYang and @offamitkumar could take it for a real test drive. > >> > @fbredber, @dholmes-ora: I got a substantial performance drop on our 96 Thread Xeon server: >> >> What OS for the Xeon? We have only seen issues with Windows. > > Sorry, I forgot to mention that it's linux (SUSE Linux Enterprise Server 15 SP4). @TheRealMDoerr, @dholmes-ora > Works with `micro:LockUnlock` on real PPC64 hardware, too. However, we need to run more tests and also check performance. Please note that this PR has conflicts with other changes (#20922 and recent developments in the loom repo). Good that it works on real PPC64 hardware, but please run more tests. I'll sync with loom, and make sure to resolve any conflicts before integrating. > The JBS issue refers to "memory barriers (not a fence)", but you're using `StoreLoad` barriers which are nothing else than a "fence". I don't agree with the general statement that they have become significantly cheaper. That may be true for single chip designs, but not for large server systems (multi-socket). Did you run benchmarks which stress monitors on any large multi-socket system? I've run a substantial amount of performance tests available on our performance site. This PR has shown great performance increase on several tests and platforms (Windows being the exception, but that is handled as a separate [issue](https://bugs.openjdk.org/browse/JDK-8339730)). As an example: The DaCapo-xalan-large test showed an increased performance of 36% on Linux-aarch64. I asked our performance team about what kind of system they run, and got the answer that they do run multi-socket systems. But probably not what you would call large. > I got a substantial performance drop on our 96 Thread Xeon server: `LockUnlock.testContendedLock` seems to be less than half as fast as without this patch. Also, some of the `LockUnlock.testInflated*` seem to be affected. (Large PPC64 servers as well.) Can you reproduce this on your side? Please note that I've changed the `LockUnlock.testContendedLock` from `@Threads(2)` to `@Threads(3)` which might be the reason for your substantial performance drop. The reason I did this change was because it enabled me to increase the code coverage, and thereby execute all(?) the corner cases when doing ObjectMonitor locking. I can see how an added `StoreLoad` barrier will decrease performance if you run certain micro benchmarks, Then again it's only there if you have an inflated monitor (i.e. you are experiencing contended locking). In a real world application where you inflate, park and unpark, one added `StoreLoad` might not change the overall performance that much. Which is probably why we don't see any real regression when we run our performance tests (like DaCapo, Renaissance, SPECjvm etc.). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19454#issuecomment-2345755769 From rcastanedalo at openjdk.org Thu Sep 12 10:20:15 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 12 Sep 2024 10:20:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> References: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> Message-ID: On Wed, 11 Sep 2024 17:38:57 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Revert accidental change of UCOH default src/hotspot/share/opto/lcm.cpp line 272: > 270: const TypePtr* tptr; > 271: if ((UseCompressedOops || UseCompressedClassPointers) && > 272: (CompressedOops::shift() == 0 || CompressedKlassPointers::shift() == 0)) { Could you explain this change? It seems like it may affect C2's implicit null check analysis even for `-XX:-UseCompactObjectHeaders`. In particular, for the following configurations, the changed condition evaluates to true before the change and false after it, regardless of whether `UseCompactObjectHeaders` is enabled: (!UseCompressedOops, UseCompressedClassPointers, CompressedKlassPointers::shift() != 0) ( UseCompressedOops, !UseCompressedClassPointers, CompressedOops::shift() != 0) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1756570168 From fbredberg at openjdk.org Thu Sep 12 11:25:06 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Thu, 12 Sep 2024 11:25:06 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 13:42:06 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 217: >> >>> 215: stlr(zr, owner_addr); >>> 216: membar(StoreLoad); >>> 217: >> >> Is there some reason we need a `memory_order_conservative` store here? >> You may not really need a `StoreLoad` here, as long as `ObjectMonitor::owner` is always read with `ldar` or `casal`. `Atomic::cmpxchg()` uses sequentially-consistent ops by default. >> Reason: on AArch64, `stlr;ldar` is sequentially consistent, which is stronger than release|acquire. > > Oh, not just sequentially consistent, but also barrier-ordered-before. The reason is described [here](https://github.com/openjdk/jdk/blob/f9ddfc6fb0780a7d6e933a40ecd3cd458a058f04/src/hotspot/share/runtime/objectMonitor.cpp#L1106). If you look in [ObjectMonitor::exit](https://github.com/openjdk/jdk/blob/f9ddfc6fb0780a7d6e933a40ecd3cd458a058f04/src/hotspot/share/runtime/objectMonitor.cpp#L1132C6-L1132C25)() you'll see that this there is a call to [OrderAccess::storeload](https://github.com/openjdk/jdk/blob/f9ddfc6fb0780a7d6e933a40ecd3cd458a058f04/src/hotspot/share/runtime/objectMonitor.cpp#L1184)() just after [release_clear_owner](https://github.com/openjdk/jdk/blob/f9ddfc6fb0780a7d6e933a40ecd3cd458a058f04/src/hotspot/share/runtime/objectMonitor.cpp#L1183)(), so I'm just doing the same as has been done in the C++ slow-path for long. I used `membar(StoreLoad);` in all other platforms, and I wanted it to be consistent. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1756663884 From rcastanedalo at openjdk.org Thu Sep 12 11:49:15 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 12 Sep 2024 11:49:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> References: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> Message-ID: On Wed, 11 Sep 2024 17:38:57 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Revert accidental change of UCOH default src/hotspot/share/cds/filemap.cpp line 2457: > 2455: compressed_oops(), compressed_class_pointers()); > 2456: if (compressed_oops() != UseCompressedOops || compressed_class_pointers() != UseCompressedClassPointers) { > 2457: log_info(cds)("Unable to use shared archive.\nThe saved state of UseCompressedOops and UseCompressedClassPointers is " The promotion of this CDS log line from `info` to `warning` triggers false failures in the `test/hotspot/jtreg/compiler/intrinsics/bmi` tests when running them with `-XX:-UseCompressedClassPointers`. These tests expect the standard output of different JVM runs to be identical, but the timestamps in the log messages tend to differ. I suggest adjusting the test configuration so that log timestamps are simply omitted, as in [this patch](https://github.com/robcasloz/jdk/commit/48f6e90ef6e0a71b55df536ed04a8b72130b5ea9) (feel free to merge it as-is or with any further changes you may find necessary). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1756699774 From mdoerr at openjdk.org Thu Sep 12 11:57:08 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 12 Sep 2024 11:57:08 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: Message-ID: On Wed, 29 May 2024 12:58:02 GMT, Fredrik Bredberg wrote: > Removed the concept of an ObjectMonitor Responsible thread. > > The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. > > The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). > > After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. > > Passes tier1-tier7 on supported platforms. > x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. > Arm32 and Zero doesn't need any changes as far as I can tell. I've run it through our nightly testing (x86_64, aarch64, PPC64 with several OSes) and the good news is that I haven't seen any functional problems. Performance looks also good for the SPEC benchmarks. I don't think they stress Java monitors very strongly. I've rerun the `LockUnlock` micro benchmark with this patch applied, but `LockUnlock.java` reverted to the original version. This makes `LockUnlock.testContendedLock` faster, but not as fast as without this patch (on the 96 Thread Xeon linux server, similar on Power10). Would be great if anybody could confirm. I think this should at least be documented and the description of the JBS issue improved. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19454#issuecomment-2346083750 From dholmes at openjdk.org Thu Sep 12 11:57:09 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 12 Sep 2024 11:57:09 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v7] In-Reply-To: References: Message-ID: <7JNLDZV1F6_QuMJSq-pu08lFSEULMe41bpQDucXEoEw=.8c051b63-24eb-451d-a2b2-04e8926638c1@github.com> On Thu, 12 Sep 2024 08:35:16 GMT, Johan Sj?len wrote: >> Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: >> >> Coleen's feedback > > src/hotspot/share/gc/shenandoah/shenandoahTaskqueue.inline.hpp line 2: > >> 1: /* >> 2: * Copyright (c) 2016, 2019, Red Hat, Inc. All rights reserved. > > I don't think we're meant to update other companies' copyrights? That is correct - unless requested by someone representing that copyright owner. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1756711680 From azafari at openjdk.org Thu Sep 12 12:05:11 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 12 Sep 2024 12:05:11 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v7] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 20:53:46 GMT, Gerard Ziemski wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. >> >> This fix also includes a cleanup of all the related function/template parameter names and local variable names. >> >> Testing is pending... >> >> Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > Coleen's feedback Thank you @gerard-ziemski, for this huge change. After this change, the code looks much more nicer and consistent. If we are insisting on replacing `flag` with `tag`, I could find these missed ones by regexp search for `mem.*flag`: --- 7 results - 5 files Source root ? src/hotspot/share/nmt/memMapPrinter.cpp: `83: // A Cache that correlates range with MEMFLAG, optimized to be iterated quickly` Source root ? src/hotspot/share/nmt/memTracker.hpp: `208: // memory flags of the original region.` Source root ? src/hotspot/share/nmt/vmatree.hpp: `97: assert(!(type == StateType::Released) || data.mem_tag == mtNone, "Released type must have flag mtNone");` `108: return static_cast(type_flag[1]);` Source root ? test/hotspot/gtest/nmt/test_nmt_reserved_region.cpp: `50: ASSERT_EQ(region2.mem_tag(), mtThreadStack); // Should be correct flag` Source root ? test/hotspot/gtest/nmt/test_vmatree.cpp: `435: const MemTag candidate_flags[candidates_len_flags] = {` `459: const MemTag mem_tag = candidate_flags[os::random() % candidates_len_flags];` ------------- PR Review: https://git.openjdk.org/jdk/pull/20872#pullrequestreview-2300097782 From fbredberg at openjdk.org Thu Sep 12 12:30:08 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Thu, 12 Sep 2024 12:30:08 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: Message-ID: <1MC83jRy9o6GrZouJaYjgHyIoyfNvrakHuirZMxIdhk=.769c2ce1-795f-4981-a10b-cee04cad5a0a@github.com> On Thu, 12 Sep 2024 11:54:09 GMT, Martin Doerr wrote: >> Removed the concept of an ObjectMonitor Responsible thread. >> >> The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. >> >> The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). >> >> After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. >> >> Passes tier1-tier7 on supported platforms. >> x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. >> Arm32 and Zero doesn't need any changes as far as I can tell. > > I've run it through our nightly testing (x86_64, aarch64, PPC64 with several OSes) and the good news is that I haven't seen any functional problems. Performance looks also good for the SPEC benchmarks. I don't think they stress Java monitors very strongly. > > I've rerun the `LockUnlock` micro benchmark with this patch applied, but `LockUnlock.java` reverted to the original version. This makes `LockUnlock.testContendedLock` faster, but not as fast as without this patch (on the 96 Thread Xeon linux server, similar on Power10). Would be great if anybody could confirm. > I think this should at least be documented and the description of the JBS issue improved. @TheRealMDoerr > I've run it through our nightly testing (x86_64, aarch64, PPC64 with several OSes) and the good news is that I haven't seen any functional problems. Performance looks also good for the SPEC benchmarks. I don't think they stress Java monitors very strongly. That really is good news! Thanks for testing! > I've rerun the `LockUnlock` micro benchmark with this patch applied, but `LockUnlock.java` reverted to the original version. This makes `LockUnlock.testContendedLock` faster, but not as fast as without this patch (on the 96 Thread Xeon linux server, similar on Power10). Would be great if anybody could confirm. I think this should at least be documented and the description of the JBS issue improved. Tanks for confirming that my suspension was right. As I stated earlier, due to the added StoreLoad barrier a slight decrease in performance is probably to be expected if you just run `LockUnlock.testContendedLock`, but it shouldn't really matter when running real life applications. Anyhow I'll update the description of the JBS issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19454#issuecomment-2346149800 From adinn at openjdk.org Thu Sep 12 12:40:22 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 12 Sep 2024 12:40:22 GMT Subject: RFR: 8339849: Enumerate opto and C1 stubs, generate enums, names, fields and generator calls [v2] In-Reply-To: References: Message-ID: > Systematize handling of Opto and C1 stubs. Generate enum ids, static fields, stub/blob names and generator code from declarations using template macros as previously done with Shared stubs. Systematically reference stubs and stub names using ids. Andrew Dinn has updated the pull request incrementally with two additional commits since the last revision: - Answer review feedback - remove commented out old code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20936/files - new: https://git.openjdk.org/jdk/pull/20936/files/f1f7fc28..b2e81b04 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20936&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20936&range=00-01 Stats: 131 lines in 7 files changed: 20 ins; 37 del; 74 mod Patch: https://git.openjdk.org/jdk/pull/20936.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20936/head:pull/20936 PR: https://git.openjdk.org/jdk/pull/20936 From adinn at openjdk.org Thu Sep 12 12:40:22 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 12 Sep 2024 12:40:22 GMT Subject: RFR: 8339849: Enumerate opto and C1 stubs, generate enums, names, fields and generator calls [v2] In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 00:27:42 GMT, Vladimir Kozlov wrote: >> src/hotspot/share/opto/runtime.cpp line 151: >> >>> 149: // defines temporarily rebind the generated names to reference the >>> 150: // relevant implementations. >>> 151: >> >> I am not 100% happy about using defines to finesse this problem of common C targets. >> >> One alternative here is to define methods local to class OptoRuntime which fit the generator naming convention and have them forward the call to the SharedRuntime methods. n.b. I used (local) method forwarding to allow blobs to share common typefunc providers. >> >> Another alternative is to declare the C target as a parameter to the opto blob declaration macro. That's more flexible but in almost all cases it repeats information already present and makes understanding and updating the declarations more complex. > > This is indeed confusing even with comment. > I prefer your first suggestion: "define methods local to class OptoRuntime which fit the generator naming convention and have them forward the call to the SharedRuntime methods" Done. >> src/hotspot/share/opto/runtime.cpp line 183: >> >>> 181: OPTO_STUBS_DO(GEN_OPTO_BLOB, GEN_OPTO_STUB, GEN_OPTO_JVMTI_STUB) >>> 182: >>> 183: /* >> >> This old code that has been commented out needs to be removed. > > Then do it. Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20936#discussion_r1756771514 PR Review Comment: https://git.openjdk.org/jdk/pull/20936#discussion_r1756774703 From stefank at openjdk.org Thu Sep 12 13:01:15 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 12 Sep 2024 13:01:15 GMT Subject: RFR: 8340009: Improve the output from assert_different_registers Message-ID: <_lgzZBTM5wRKfgsMXdgbdR45SxD6WJZprnJBw6Gk1SU=.78dd43ee-4984-4037-b828-5fd8fb1ce791@github.com> `assert_different_registers` is a mechanism we use to ensure that we don't use the same register in different variables. When the assert triggers it is not immediately clear where and why the assert failed. For example, if I introduce an intentional violation: diff --git a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp index fde868a64b3..551878ac09d 100644 --- a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp +++ b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp @@ -1188,7 +1188,8 @@ void MacroAssembler::lookup_interface_method(Register recv_klass, Register scan_temp, Label& L_no_such_interface, bool return_method) { - assert_different_registers(recv_klass, intf_klass, scan_temp); + Register joker = intf_klass; + assert_different_registers(recv_klass, intf_klass, scan_temp, joker); assert_different_registers(method_result, intf_klass, scan_temp); assert(recv_klass != method_result || !return_method, "recv_klass can be destroyed when method isn't needed"); I get this error message: # Internal Error (src/hotspot/share/asm/register.hpp:287), pid=42568, tid=9731 # assert(!regs[i]->is_valid() || regs[i] != regs[j]) failed: Multiple uses of register: c_rarg0 The indicated file and line number refers to the `assert_different_registers` implementation and not the offending call site. More over, it's unclear from the assert which of the four variables contain the same register. I'd like to propose a few changes: 1) That we report the indices of the conflicting registers 2) That we report the correct file and line number 3) That we hide the is_valid() check to lower the noise in the output. Not strictly necessary, but I think it looks nicer. After these suggestions we'll get error messages that look like this: # Internal Error (src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp:1187), pid=59065, tid=8963 # assert(regs[i] != regs[j]) failed: regs[1] and regs[3] are both: c_rarg0 Which makes it easy to see that variables 1 and 3 are conflicting and by looking at the indicated file and line, it is clear that it is `intf_klass` and `joker` that are the offending variables. There might be a way to use more macros to propagate the variable names, but I propose that we start with this incremental improvement. ------------- Commit messages: - 8340009: Improve the output from assert_different_registers Changes: https://git.openjdk.org/jdk/pull/20965/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20965&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340009 Stats: 16 lines in 2 files changed: 7 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/20965.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20965/head:pull/20965 PR: https://git.openjdk.org/jdk/pull/20965 From coleenp at openjdk.org Thu Sep 12 13:03:06 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 12 Sep 2024 13:03:06 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v4] In-Reply-To: References: Message-ID: <_9D_UJt75MdlZfwBGky2b_eKlQZBZMVLUDJFE8IIfCQ=.9454e491-a233-44d5-9917-3866ba56e96e@github.com> On Thu, 12 Sep 2024 09:22:04 GMT, Dean Long wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Coleen suggestion > > I'm trying to determine if this PR changes anything in regards to compilers, or leaves everything the same, so I am looking at callers of get_new_method() that don't already map is_deleted() to throw_no_such_method_error(). > Is it true that CallInfo::resolved_method() and CallInfo::selected_method() can never return an is_old or is_deleted method? They check for is_old but not is_deleted. If get_new_method() actually returned nullptr for a deleted method then some callers might crash, so I am assuming this is impossible. There is probably a rule that if you have a resolved CallInfo then you aren't allowed to safepoint, so there is no way the resolved/selected methods in the CallInfo can change to old or deleted, and it's probably impossible for them start out as old/deleted before a safepoint. So why are these methods checking for is_old at all? @dean-long tbh I don't know what method_orig_idnum means without doing more research on it. It was added for a special case that I'd have to dig up. The idnum is what matches its entry in the jmethodID array and the matching in method_with_idnum() provides that we find the method with the equivalent signature. If you don't add or delete methods, redefinition can still reorder the methods because they are only sorted by name, and not name and signature. idnum is the right thing to use. It is true and _required_ that CallInfo never return an old method. We used to have more is_old() checks through the compiler code and CallInfo is the surface that we now check for this. In at least one place, we have a NoSafepointVerifier. See JDK-8327250 for more details in that bug description. In the interpreter, we don't have a NSV but the redefinition will replace the old methods in the cpCache so running in the interpreter will get the new version of the method. So back to worry that the no-arguments for the throw_NSME() replacement, could break things. The vtables and itables will never have this, as we cannot delete virtual methods. Before Matias's patch, we could return nullptr to some callers from CallInfo, and that will definitely crash. It's probably hard to construct a test case to show this, but maybe possible. The compiler has to store a deleted method in the CI or nmethod somewhere. If we have NSME on the stack and do a GC and the arguments aren't collected but not used (but consumed because we clear the expression stack (?)) could this cause a problem? For PopFrame, I don't know how to PopFrame from Unsafe::NSME method. If this is a problem, we would have to artificially add methods with the same signature to call NSME like we do for default_methods processing. This is a large undertaking and we should fix this bug first and if we can prove that this isn't sufficient add this code, we should file an but to study this because I don't know the answer to that. This change was to fix the null pointer problem at the source, which I think is the first step. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20874#issuecomment-2346222646 From aboldtch at openjdk.org Thu Sep 12 13:06:10 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 12 Sep 2024 13:06:10 GMT Subject: RFR: 8340009: Improve the output from assert_different_registers In-Reply-To: <_lgzZBTM5wRKfgsMXdgbdR45SxD6WJZprnJBw6Gk1SU=.78dd43ee-4984-4037-b828-5fd8fb1ce791@github.com> References: <_lgzZBTM5wRKfgsMXdgbdR45SxD6WJZprnJBw6Gk1SU=.78dd43ee-4984-4037-b828-5fd8fb1ce791@github.com> Message-ID: On Thu, 12 Sep 2024 12:56:13 GMT, Stefan Karlsson wrote: > `assert_different_registers` is a mechanism we use to ensure that we don't use the same register in different variables. When the assert triggers it is not immediately clear where and why the assert failed. > > For example, if I introduce an intentional violation: > > diff --git a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp > index fde868a64b3..551878ac09d 100644 > --- a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp > +++ b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp > @@ -1188,7 +1188,8 @@ void MacroAssembler::lookup_interface_method(Register recv_klass, > Register scan_temp, > Label& L_no_such_interface, > bool return_method) { > - assert_different_registers(recv_klass, intf_klass, scan_temp); > + Register joker = intf_klass; > + assert_different_registers(recv_klass, intf_klass, scan_temp, joker); > assert_different_registers(method_result, intf_klass, scan_temp); > assert(recv_klass != method_result || !return_method, > "recv_klass can be destroyed when method isn't needed"); > > I get this error message: > > # Internal Error (src/hotspot/share/asm/register.hpp:287), pid=42568, tid=9731 > # assert(!regs[i]->is_valid() || regs[i] != regs[j]) failed: Multiple uses of register: c_rarg0 > > The indicated file and line number refers to the `assert_different_registers` implementation and not the offending call site. More over, it's unclear from the assert which of the four variables contain the same register. > > I'd like to propose a few changes: > 1) That we report the indices of the conflicting registers > 2) That we report the correct file and line number > 3) That we hide the is_valid() check to lower the noise in the output. Not strictly necessary, but I think it looks nicer. > > After these suggestions we'll get error messages that look like this: > > # Internal Error (src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp:1187), pid=59065, tid=8963 > # assert(regs[i] != regs[j]) failed: regs[1] and regs[3] are both: c_rarg0 > > Which makes it easy to see that variables 1 and 3 are conflicting and by looking at the indicated file and line, it is clear that it is `intf_klass` and `joker` that are the offending variables. > > There might be a way to use more macros to propagate the variable names, but I propose that we start with this incremental improvement. A nice improvement. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20965#pullrequestreview-2300244543 From coleenp at openjdk.org Thu Sep 12 13:09:06 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 12 Sep 2024 13:09:06 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v4] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 21:02:41 GMT, Matias Saavedra Silva wrote: >> This patch cleans up the use of `get_new_method()` so callers don't have to worry about throwing `NoSuchMethodError`. The method is refactored to throw the error and avoid ever returning nullptr. Verified with tier1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Coleen suggestion Lastly, the option -XX:+AllowRedefinitionToAddDeleteMethods is the deprecated option where this is possible. I just did some research to see if it's time to obsolete this option but I did find some places where people use this for products. The worst place for this is if your redefined functions have lambda expressions, javac creates static private methods to implement them and tools that instrument native methods use this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20874#issuecomment-2346235396 From rkennke at openjdk.org Thu Sep 12 13:16:14 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 12 Sep 2024 13:16:14 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 13:34:28 GMT, Roman Kennke wrote: >> @rkennke Can you please explain the changes in these tests: >> >> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java >> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java >> test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java >> test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java >> test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java >> >> >> You added these IR rule restriction: >> `@IR(applyIf = {"UseCompactObjectHeaders", "false"},` >> >> This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. >> >> I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? >> >> Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction. >> >> Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well). > >> @rkennke Can you please explain the changes in these tests: >> >> ``` >> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java >> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java >> test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java >> test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java >> test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java >> ``` >> >> You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},` >> >> This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. >> >> I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? >> >> Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction. >> >> Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well). > > IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible. > > I will re-evaluate those tests, and add comments or remove the restrictions. > > > @rkennke Can you please explain the changes in these tests: > > > ``` > > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java > > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java > > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java > > > test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java > > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java > > > ``` > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},` > > > This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. > > > I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? > > > Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction. > > > Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well). > > > > > > IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible. > > I will re-evaluate those tests, and add comments or remove the restrictions. > > If it has indeed been a while, then it might well be that some of them work now, since I did make some improvements to auto-vectorization ;) > > My suggestion is this: go over the examples, check which ones are now ok. Those that are not ok, add a comment, and file a bug: I can then analyze those cases later, and see how to write other tests or improve auto-vectorization. Indeed, I could re-enable all tests in: test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java but unfortunately not those others: > > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java > > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java I think the issue with all of them is that vectorization in those scenarios only works when the operations inside the loop start at an array index that addresses an element at 8-byte-aligned offset. I have filed https://bugs.openjdk.org/browse/JDK-8340010 to track it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2346250313 From epeter at openjdk.org Thu Sep 12 13:23:14 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Sep 2024 13:23:14 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 13:13:01 GMT, Roman Kennke wrote: > > > > @rkennke Can you please explain the changes in these tests: > > > > ``` > > > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java > > > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java > > > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java > > > > test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java > > > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java > > > > ``` > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},` > > > > This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. > > > > I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? > > > > Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction. > > > > Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well). > > > > > > > > > IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible. > > > I will re-evaluate those tests, and add comments or remove the restrictions. > > > > > > If it has indeed been a while, then it might well be that some of them work now, since I did make some improvements to auto-vectorization ;) > > My suggestion is this: go over the examples, check which ones are now ok. Those that are not ok, add a comment, and file a bug: I can then analyze those cases later, and see how to write other tests or improve auto-vectorization. > > Indeed, I could re-enable all tests in: > > ``` > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java > test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java > ``` > > but unfortunately not those others: > > ``` > > > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java > > > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java > ``` > > I think the issue with all of them is that vectorization in those scenarios only works when the operations inside the loop start at an array index that addresses an element at 8-byte-aligned offset. > > I have filed https://bugs.openjdk.org/browse/JDK-8340010 to track it. Excellent, that is what I hoped for! Thanks for filing the bug, I'll look into it once this is integrated. You should probably mark it as "blocked by", not "related to" ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2346266568 From azafari at openjdk.org Thu Sep 12 13:36:10 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Thu, 12 Sep 2024 13:36:10 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v7] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 20:53:46 GMT, Gerard Ziemski wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. >> >> This fix also includes a cleanup of all the related function/template parameter names and local variable names. >> >> Testing is pending... >> >> Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > Coleen's feedback Again, if we want to replace mem type with mem tag. The regexp search for mem.*type gives this: --- 47 results - 19 files Source root ? src/hotspot/share/gc/shared/oopStorage.hpp: 92: // The memory type for allocations. 276: // The memory type for allocations. Source root ? src/hotspot/share/nmt/mallocSiteTable.hpp: 68: assert(mem_tag != mtNone, "Expect a real memory type"); Source root ? src/hotspot/share/nmt/memBaseline.cpp: 64: // Sort into allocation site addresses and memory type order for baseline comparison Source root ? src/hotspot/share/nmt/memBaseline.hpp: 56: by_site_and_type // by call site and memory type 206: // Sort allocation sites in call site address and memory type order Source root ? src/hotspot/share/nmt/memReporter.cpp: 138: void MemReporterBase::print_virtual_memory_region(const char* type, address base, size_t size) const { 179: // Summary by memory type 191: void MemSummaryReporter::report_summary_of_type(MemTag mem_tag, 280: void MemSummaryReporter::report_metadata(Metaspace::MetadataType type) const { 343: "Must have a valid memory type"); 522: // Summary diff by memory type 597: void MemSummaryDiffReporter::diff_summary_of_type(MemTag mem_tag, 798: MallocSiteIterator early_itr = _early_baseline.malloc_sites(MemBaseline::by_site_and_type); 799: MallocSiteIterator current_itr = _current_baseline.malloc_sites(MemBaseline::by_site_and_type); 851: // This site was originally allocated with one memory type, then released, 852: // then re-allocated at the same site (as far as we can tell) with a different memory type. Source root ? src/hotspot/share/nmt/memReporter.hpp: 141: // Report summary for each memory type Source root ? src/hotspot/share/nmt/memTracker.cpp: 64: // Memory type is encoded into tracking header as a byte field, Source root ? src/hotspot/share/nmt/nmtCommon.cpp: 32: #define MEMORY_TAG_DECLARE_NAME(type, human_readable) \ Source root ? src/hotspot/share/nmt/nmtCommon.hpp: 91: assert(tag_is_valid(mem_tag), "Invalid type (%u)", (unsigned)mem_tag); Source root ? src/hotspot/share/nmt/virtualMemoryTracker.cpp: 297: "Overwrite memory type for region [" INTPTR_FORMAT "-" INTPTR_FORMAT "), %u->%u.", 425: assert(reserved_rgn->mem_tag() == mtNone, "Overwrite memory type (should be mtNone, is: "%s")", Source root ? src/hotspot/share/nmt/virtualMemoryTracker.hpp: 153: // Move virtual memory from one memory type to another. 154: // Virtual memory can be reserved before it is associated with a memory type, and tagged 156: // type to specified memory type. 394: static bool split_reserved_region(address addr, size_t size, size_t split, MemTag mem_tag, MemTag split_type); Source root ? src/hotspot/share/nmt/vmatree.hpp: 97: assert(!(type == StateType::Released) || data.mem_tag == mtNone, "Released type must have flag mtNone"); Source root ? src/hotspot/share/prims/whitebox.cpp: 679: // Alloc memory using the test memory type so that we can use that to see if 695: // Alloc memory with pseudo call stack and specific memory type. Source root ? src/hotspot/share/runtime/os.cpp: 725: assert(mem_tag == header->mem_tag(), "weird NMT type mismatch (new:"%s" != old:"%s")\n", Source root ? src/hotspot/share/utilities/bitMap.hpp: 644: // NMT memory type Source root ? src/hotspot/share/utilities/debug.cpp: 714: MemTracker::record_virtual_memory_type(page, mtInternal); Source root ? src/hotspot/share/utilities/growableArray.cpp: 49: // memory type has to be specified for C heap allocation 50: assert(mem_tag != mtNone, "memory type not specified for C heap object"); Source root ? src/hotspot/share/utilities/resizeableResourceHash.hpp: 33: MemTag MEM_TYPE> 55: table = NEW_C_HEAP_ARRAY(Node*, table_size, MEM_TYPE); 75: MemTag MEM_TYPE = mtInternal, 80: ResizeableResourceHashtableStorage, 81: K, V, ALLOC_TYPE, MEM_TYPE, HASH, EQUALS> { 84: using BASE = ResourceHashtableBase, 85: K, V, ALLOC_TYPE, MEM_TYPE, HASH, EQUALS>; Source root ? src/hotspot/share/utilities/resourceHash.hpp: 57: MemTag MEM_TYPE, 156: *ptr = new (MEM_TYPE) Node(hv, key, value, *ptr); 177: *ptr = new (MEM_TYPE) Node(hv, key, value); 196: *ptr = new (MEM_TYPE) Node(hv, key); 218: *ptr = new (MEM_TYPE) Node(hv, key, value); The `MemTracker::record_virtual_memory_type` can be changed to `MemTracker::record_virtual_memory_tag` ------------- PR Review: https://git.openjdk.org/jdk/pull/20872#pullrequestreview-2300326979 PR Review: https://git.openjdk.org/jdk/pull/20872#pullrequestreview-2300332293 From stooke at openjdk.org Thu Sep 12 13:56:42 2024 From: stooke at openjdk.org (Simon Tooke) Date: Thu, 12 Sep 2024 13:56:42 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v5] In-Reply-To: References: Message-ID: > This PR changes the status of realpath() from a Posix-specific API to a globally available API, i.e. adding it to the "Hotspot Porting API". Code would refer to os::realpath() instead of os::Posix::realpath(). > > This requires the addition of a stub routine in os_posix.cpp and a Windows implementation of realpath(), using Windows _fullpath(). > > This PR depends on #20597 in that it removes the need for one #ifdef in that PR. Because of that, this PR will be modified when and if #20597 is integrated. > > Please note that guidelines for doing this appear in src/hotspot/share/runtime/os.hpp Simon Tooke has updated the pull request incrementally with three additional commits since the last revision: - remove empty line - fix indentation - fix missing return statement ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20683/files - new: https://git.openjdk.org/jdk/pull/20683/files/9d9418d0..33c4b402 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20683&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20683&range=03-04 Stats: 3 lines in 3 files changed: 1 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20683.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20683/head:pull/20683 PR: https://git.openjdk.org/jdk/pull/20683 From stooke at openjdk.org Thu Sep 12 13:56:43 2024 From: stooke at openjdk.org (Simon Tooke) Date: Thu, 12 Sep 2024 13:56:43 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v4] In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 00:47:21 GMT, David Holmes wrote: >> Simon Tooke has updated the pull request incrementally with one additional commit since the last revision: >> >> properly test for buffer too small for path > > src/hotspot/os/posix/os_posix.cpp line 990: > >> 988: } >> 989: >> 990: > > Spurious extra line ? done. > src/hotspot/os/windows/os_windows.cpp line 5337: > >> 5335: ALLOW_C_FUNCTION(::free, ::free(p);) // *not* os::free >> 5336: } >> 5337: } > > You have forgotten to return `result`. done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1756920370 PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1756921115 From stooke at openjdk.org Thu Sep 12 13:56:43 2024 From: stooke at openjdk.org (Simon Tooke) Date: Thu, 12 Sep 2024 13:56:43 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v5] In-Reply-To: <6MDLVDlpJ7Qv84bxrlrpTcf6bHqAFoJEnLW3hhtvyis=.4b8fb19f-ae35-4723-9fbe-e8b0b9444150@github.com> References: <6MDLVDlpJ7Qv84bxrlrpTcf6bHqAFoJEnLW3hhtvyis=.4b8fb19f-ae35-4723-9fbe-e8b0b9444150@github.com> Message-ID: On Wed, 11 Sep 2024 14:31:43 GMT, Julian Waters wrote: >> Simon Tooke has updated the pull request incrementally with three additional commits since the last revision: >> >> - remove empty line >> - fix indentation >> - fix missing return statement > > src/hotspot/os/posix/os_posix.cpp line 1031: > >> 1029: >> 1030: // Sleep forever; naked call to OS-specific sleep; use with CAUTION >> 1031: void os::infinite_sleep() { > > Ouch, looks like something broke in one of the commits, the new diff it's showing isn't pretty (infinite_sleep and friends have been displaced in the file, at least on my end) caught and fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1756919153 From stooke at openjdk.org Thu Sep 12 13:56:43 2024 From: stooke at openjdk.org (Simon Tooke) Date: Thu, 12 Sep 2024 13:56:43 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v2] In-Reply-To: <65dOgOW6wn4Ceq14ieVJ_2A4xDLtIdeRB-cbuxc-zBA=.4f4e753d-7b57-48e9-b6a6-6ea0839a7a6a@github.com> References: <0m0zdvRVNY3ZjLycIST_UNQjTFChOPKKS1KvV1m1stc=.f7ae20ae-4143-4d0b-ba77-dc330d859de6@github.com> <65dOgOW6wn4Ceq14ieVJ_2A4xDLtIdeRB-cbuxc-zBA=.4f4e753d-7b57-48e9-b6a6-6ea0839a7a6a@github.com> Message-ID: On Wed, 11 Sep 2024 14:43:24 GMT, Simon Tooke wrote: >> src/hotspot/os/windows/os_windows.cpp line 5330: >> >>> 5328: if (result == nullptr) { >>> 5329: errno = ENAMETOOLONG; >>> 5330: } >> >> This is a bit of an assumption. What if the name "includes a drive letter that isn't valid or can't be found"? Unfortunately Windows doesn't specify any further details beyond returning null. > > I probably cleaned up this code too much, and should've left it more like the Posix implementation. > What I used to have would do one call (with buffer NULL) to get the real full path, then copy it if it fit or ENAMETOOLONG. In attempting to speed up the code I changed the semantics. I will change this code back to my previous implementation changed code to test for ENAMETOOLONG. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1756922180 From stooke at openjdk.org Thu Sep 12 13:56:44 2024 From: stooke at openjdk.org (Simon Tooke) Date: Thu, 12 Sep 2024 13:56:44 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v2] In-Reply-To: <0m0zdvRVNY3ZjLycIST_UNQjTFChOPKKS1KvV1m1stc=.f7ae20ae-4143-4d0b-ba77-dc330d859de6@github.com> References: <0m0zdvRVNY3ZjLycIST_UNQjTFChOPKKS1KvV1m1stc=.f7ae20ae-4143-4d0b-ba77-dc330d859de6@github.com> Message-ID: <1fiQ2yP8Det5SV2qDMpi_V4wAsHOHm8ou5-ALW4Fo60=.7e6e16d1-e65d-4d16-bb18-430f0357beba@github.com> On Thu, 5 Sep 2024 21:04:34 GMT, David Holmes wrote: >> Simon Tooke has updated the pull request incrementally with two additional commits since the last revision: >> >> - simplify windwos realpath() implementation >> - get rid of os::posix::realpath() and os::win32::realpath() > > src/hotspot/share/runtime/os.hpp line 672: > >> 670: >> 671: // A safe implementation of realpath which will not cause a buffer overflow if the resolved path >> 672: // is longer than PATH_MAX. > > Nit: remove leading space to align with text on previous line. done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1756918131 From duke at openjdk.org Thu Sep 12 14:00:41 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 12 Sep 2024 14:00:41 GMT Subject: RFR: 8337674: ZGC: Consistent style for naming private static constants Message-ID: There are various styles for naming private static constants in ZGC. Some have a leading underscore, some begin with a lowercase letter and some start with an uppercase letter. The convention we feel is most appropriate, which also aligns with the hotspot style guide, is to have mixed-case with the first letter of each word capitalized when naming private static constants. There are also some occurrences of writing `const static` instead of the more commonly used `static const`, which should be made consistent to have the static keyword appear first. The lines changed have been identified by running: `rg "static const .* [[:lower:]].* =" src/hotspot/share/gc/z src/hotspot/*/*/gc/z` `rg "static const .* _.* =" src/hotspot/share/gc/z src/hotspot/*/*/gc/z` `rg "const static" src/hotspot/share/gc/z src/hotspot/*/*/gc/z` The occurrences of `const static valid_max_address_offset_bits` have been converted to `static const` from `const static` but have not been renamed to mixed-case as the occurrences are not exposed outside their function(s). Tested with tiers 1-3. ------------- Commit messages: - 8337674: ZGC: Consistent style for naming private static constants Changes: https://git.openjdk.org/jdk/pull/20968/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20968&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337674 Stats: 62 lines in 24 files changed: 0 ins; 0 del; 62 mod Patch: https://git.openjdk.org/jdk/pull/20968.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20968/head:pull/20968 PR: https://git.openjdk.org/jdk/pull/20968 From stefank at openjdk.org Thu Sep 12 14:16:05 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 12 Sep 2024 14:16:05 GMT Subject: RFR: 8337674: ZGC: Consistent style for naming private static constants In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 13:55:34 GMT, Joel Sikstr?m wrote: > There are various styles for naming private static constants in ZGC. Some have a leading underscore, some begin with a lowercase letter and some start with an uppercase letter. The convention we feel is most appropriate, which also aligns with the hotspot style guide, is to have mixed-case with the first letter of each word capitalized when naming private static constants. There are also some occurrences of writing `const static` instead of the more commonly used `static const`, which should be made consistent to have the static keyword appear first. > > The lines changed have been identified by running: > `rg "static const .* [[:lower:]].* =" src/hotspot/share/gc/z src/hotspot/*/*/gc/z` > `rg "static const .* _.* =" src/hotspot/share/gc/z src/hotspot/*/*/gc/z` > `rg "const static" src/hotspot/share/gc/z src/hotspot/*/*/gc/z` > > The occurrences of `const static valid_max_address_offset_bits` have been converted to `static const` from `const static` but have not been renamed to mixed-case as the occurrences are not exposed outside their function(s). > > Tested with tiers 1-3. Looks good. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20968#pullrequestreview-2300456671 From asmehra at openjdk.org Thu Sep 12 15:06:09 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Thu, 12 Sep 2024 15:06:09 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v3] In-Reply-To: <0ebuNyvktpJlfGjrZGgcS5IsNn2nSSx5ImiVcL7HJkw=.05ca65a9-8982-403a-b271-3029d26e7124@github.com> References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> <0ebuNyvktpJlfGjrZGgcS5IsNn2nSSx5ImiVcL7HJkw=.05ca65a9-8982-403a-b271-3029d26e7124@github.com> Message-ID: On Tue, 10 Sep 2024 22:17:00 GMT, Ashutosh Mehra wrote: > I will continue my review tomorrow. My review is done now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20843#issuecomment-2346553992 From asmehra at openjdk.org Thu Sep 12 15:06:10 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Thu, 12 Sep 2024 15:06:10 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v4] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: <4_SO0UUJcjV1a1CA4_YwQRrddZUfhEhqVp-TCTon6k0=.ab2382c2-126e-4151-b77e-3acc50bd1d3d@github.com> On Wed, 11 Sep 2024 20:26:51 GMT, Ioi Lam wrote: >> This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> **Overview** >> >> - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. >> - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. >> - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. >> - The boot classes are loaded as part of `vmClasses::resolve_all()` >> - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). >> - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. >> >> **All-or-nothing Loading** >> >> - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. >> - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: >> - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. >> - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. >> - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. >> - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exa... > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @adinn comments test/hotspot/jtreg/runtime/cds/appcds/jvmti/ClassFileLoadHookTest.java line 110: > 108: "" + ClassFileLoadHook.TestCaseId.SHARING_ON_CFLH_ON); > 109: if (out.contains("Using AOT-linked classes: false")) { > 110: // We are running with VM options that do not support -XX:+AOTClassLinking When will we run into this case? Is there a VM option that would silently disable AOTClassLinking in prod run? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1756943024 From gziemski at openjdk.org Thu Sep 12 15:21:37 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 12 Sep 2024 15:21:37 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v8] In-Reply-To: References: Message-ID: > Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. > > This fix also includes a cleanup of all the related function/template parameter names and local variable names. > > Testing is pending... > > Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: Johan's feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20872/files - new: https://git.openjdk.org/jdk/pull/20872/files/f1faba35..4c7f181d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20872&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20872&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20872/head:pull/20872 PR: https://git.openjdk.org/jdk/pull/20872 From gziemski at openjdk.org Thu Sep 12 15:21:37 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 12 Sep 2024 15:21:37 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v8] In-Reply-To: <7JNLDZV1F6_QuMJSq-pu08lFSEULMe41bpQDucXEoEw=.8c051b63-24eb-451d-a2b2-04e8926638c1@github.com> References: <7JNLDZV1F6_QuMJSq-pu08lFSEULMe41bpQDucXEoEw=.8c051b63-24eb-451d-a2b2-04e8926638c1@github.com> Message-ID: On Thu, 12 Sep 2024 11:54:49 GMT, David Holmes wrote: >> src/hotspot/share/gc/shenandoah/shenandoahTaskqueue.inline.hpp line 2: >> >>> 1: /* >>> 2: * Copyright (c) 2016, 2019, Red Hat, Inc. All rights reserved. >> >> I don't think we're meant to update other companies' copyrights? > > That is correct - unless requested by someone representing that copyright owner. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20872#discussion_r1757091561 From gziemski at openjdk.org Thu Sep 12 15:27:11 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 12 Sep 2024 15:27:11 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v7] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 14:03:17 GMT, Gerard Ziemski wrote: >> Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: >> >> Coleen's feedback > > Are we sure we want `mt` for non-type parameter name in templates? We have these existing patterns already in our code: > > > src/hotspot/share/utilities/growableArray.hpp:803:template > src/hotspot/share/utilities/stack.hpp:54:template class StackIterator; > src/hotspot/share/utilities/concurrentHashTable.inline.hpp:78:template > src/hotspot/share/utilities/chunkedList.hpp:31:template class ChunkedList : public CHeapObj > src/hotspot/share/gc/g1/g1BatchedTask.hpp:32:template > src/hotspot/share/gc/shared/taskqueue.hpp:119:template > src/hotspot/share/gc/shared/taskqueue.hpp:327:template > src/hotspot/share/gc/shenandoah/shenandoahTaskqueue.hpp:40:template > src/hotspot/share/nmt/arrayWithFreeList.hpp:34:template > > > With mt they would look like: > > > src/hotspot/share/utilities/growableArray.hpp:803:template > src/hotspot/share/utilities/stack.hpp:54:template class StackIterator; > src/hotspot/share/utilities/concurrentHashTable.inline.hpp:78:template > src/hotspot/share/utilities/chunkedList.hpp:31:template class ChunkedList : public CHeapObj > src/hotspot/share/gc/g1/g1BatchedTask.hpp:32:template > src/hotspot/share/gc/shared/taskqueue.hpp:119:template > src/hotspot/share/gc/shared/taskqueue.hpp:327:template > src/hotspot/share/gc/shenandoah/shenandoahTaskqueue.hpp:40:template > src/hotspot/share/nmt/arrayWithFreeList.hpp:34:template > > > So `MT` or `mt` for non-type parameter name in templates, or should I punt on this particular change and leave it for a followup? > Thank you @gerard-ziemski, for this huge change. After this change, the code looks much more nicer and consistent. > > If we are insisting on replacing `flag` with `tag`, I could find these missed ones by regexp search for `mem.*flag`: > > 7 results - 5 files > > Source root ? src/hotspot/share/nmt/memMapPrinter.cpp: `83: // A Cache that correlates range with MEMFLAG, optimized to be iterated quickly` > > Source root ? src/hotspot/share/nmt/memTracker.hpp: `208: // memory flags of the original region.` > > Source root ? src/hotspot/share/nmt/vmatree.hpp: > > `97: assert(!(type == StateType::Released) || data.mem_tag == mtNone, "Released type must have flag mtNone");` > > `108: return static_cast(type_flag[1]);` > > Source root ? test/hotspot/gtest/nmt/test_nmt_reserved_region.cpp: `50: ASSERT_EQ(region2.mem_tag(), mtThreadStack); // Should be correct flag` > > Source root ? test/hotspot/gtest/nmt/test_vmatree.cpp: > > `435: const MemTag candidate_flags[candidates_len_flags] = {` > > `459: const MemTag mem_tag = candidate_flags[os::random() % candidates_len_flags];` Thank you Ashfin for laking such a detailed look and providing the feedback. I fixed the issues, except: `108: return static_cast(type_flag[1]);` This actually has more than just MemTag, it also has StateType, so I left it as is. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20872#issuecomment-2346602184 From gziemski at openjdk.org Thu Sep 12 15:32:57 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 12 Sep 2024 15:32:57 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v9] In-Reply-To: References: Message-ID: <95R2iRDLQWj7eiaXmYoqXFx_I0mDr3ORuQOsb111O6o=.1a5580ec-b6d8-4b3f-b324-141eb843a408@github.com> > Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. > > This fix also includes a cleanup of all the related function/template parameter names and local variable names. > > Testing is pending... > > Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: Afshin's feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20872/files - new: https://git.openjdk.org/jdk/pull/20872/files/4c7f181d..8e576cf1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20872&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20872&range=07-08 Stats: 26 lines in 14 files changed: 0 ins; 0 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/20872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20872/head:pull/20872 PR: https://git.openjdk.org/jdk/pull/20872 From gziemski at openjdk.org Thu Sep 12 15:40:35 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 12 Sep 2024 15:40:35 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v10] In-Reply-To: References: Message-ID: > Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. > > This fix also includes a cleanup of all the related function/template parameter names and local variable names. > > Testing is pending... > > Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) Gerard Ziemski has updated the pull request incrementally with two additional commits since the last revision: - Afshin's feedback, record_virtual_memory_type -> record_virtual_memory_tag - Afshin's feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20872/files - new: https://git.openjdk.org/jdk/pull/20872/files/8e576cf1..1765579b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20872&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20872&range=08-09 Stats: 42 lines in 24 files changed: 0 ins; 0 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/20872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20872/head:pull/20872 PR: https://git.openjdk.org/jdk/pull/20872 From gziemski at openjdk.org Thu Sep 12 15:40:35 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 12 Sep 2024 15:40:35 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v7] In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 13:31:28 GMT, Afshin Zafari wrote: > ## Again, if we want to replace mem type with mem tag. The regexp search for mem.*type gives this: > 47 results - 19 files Done. > The `MemTracker::record_virtual_memory_type` can be changed to `MemTracker::record_virtual_memory_tag` Done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20872#issuecomment-2346629838 PR Comment: https://git.openjdk.org/jdk/pull/20872#issuecomment-2346630233 From gziemski at openjdk.org Thu Sep 12 15:40:35 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 12 Sep 2024 15:40:35 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v9] In-Reply-To: <95R2iRDLQWj7eiaXmYoqXFx_I0mDr3ORuQOsb111O6o=.1a5580ec-b6d8-4b3f-b324-141eb843a408@github.com> References: <95R2iRDLQWj7eiaXmYoqXFx_I0mDr3ORuQOsb111O6o=.1a5580ec-b6d8-4b3f-b324-141eb843a408@github.com> Message-ID: <2CL9dG3e4T5eoAgDFZxD04BewQlYQm8kE5auIOIoewM=.05bef75b-de33-458c-9700-20a2171c5e20@github.com> On Thu, 12 Sep 2024 15:32:57 GMT, Gerard Ziemski wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. >> >> This fix also includes a cleanup of all the related function/template parameter names and local variable names. >> >> Testing is pending... >> >> Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > Afshin's feedback Thank you Afshin again for taking detailed look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20872#issuecomment-2346633208 From stuefe at openjdk.org Thu Sep 12 15:41:22 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 12 Sep 2024 15:41:22 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> Message-ID: On Thu, 12 Sep 2024 10:17:47 GMT, Roberto Casta?eda Lozano wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert accidental change of UCOH default > > src/hotspot/share/opto/lcm.cpp line 272: > >> 270: const TypePtr* tptr; >> 271: if ((UseCompressedOops || UseCompressedClassPointers) && >> 272: (CompressedOops::shift() == 0 || CompressedKlassPointers::shift() == 0)) { > > Could you explain this change? It seems like it may affect C2's implicit null check analysis even for `-XX:-UseCompactObjectHeaders`. In particular, for the following configurations, the changed condition evaluates to true before the change and false after it, regardless of whether `UseCompactObjectHeaders` is enabled: > > (!UseCompressedOops, UseCompressedClassPointers, CompressedKlassPointers::shift() != 0) > ( UseCompressedOops, !UseCompressedClassPointers, CompressedOops::shift() != 0) Hi @robcasloz The `CompressedKlassPointers` utility class is not usable anymore with `-UseCompressedClassPointers`. One change is that if `UseCompressedClassPointers` is off, `CompressedKlassPointers` stays uninitialized. And that makes more sense then to rely on the static initialization values of `CompressedOops::_shift`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1757126946 From gziemski at openjdk.org Thu Sep 12 15:45:11 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 12 Sep 2024 15:45:11 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v9] In-Reply-To: <95R2iRDLQWj7eiaXmYoqXFx_I0mDr3ORuQOsb111O6o=.1a5580ec-b6d8-4b3f-b324-141eb843a408@github.com> References: <95R2iRDLQWj7eiaXmYoqXFx_I0mDr3ORuQOsb111O6o=.1a5580ec-b6d8-4b3f-b324-141eb843a408@github.com> Message-ID: On Thu, 12 Sep 2024 15:32:57 GMT, Gerard Ziemski wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. >> >> This fix also includes a cleanup of all the related function/template parameter names and local variable names. >> >> Testing is pending... >> >> Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) > > Gerard Ziemski has updated the pull request incrementally with one additional commit since the last revision: > > Afshin's feedback I have incorporated Afshin feedback, but can revert the changes and punt them to a followup if there is a pushback against doing it right now. I think that we are now in pretty good shape. I could use final approvals, with anything that's not critical to be done in followup(s) later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20872#issuecomment-2346637453 From gziemski at openjdk.org Thu Sep 12 15:45:12 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 12 Sep 2024 15:45:12 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v10] In-Reply-To: References: Message-ID: <4uTuyYrAIeQPJf1I0_5GPU4-3kAAcKOQdSk3dPrIS6s=.62fd59ea-c1a7-44bd-a4ad-482a9d252cf3@github.com> On Thu, 12 Sep 2024 15:40:35 GMT, Gerard Ziemski wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. >> >> This fix also includes a cleanup of all the related function/template parameter names and local variable names. >> >> Testing is pending... >> >> Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) > > Gerard Ziemski has updated the pull request incrementally with two additional commits since the last revision: > > - Afshin's feedback, record_virtual_memory_type -> record_virtual_memory_tag > - Afshin's feedback Johan's incremental feedback: https://openjdk.github.io/cr/?repo=jdk&pr=20872&range=06-07 Afshin's incremental feedback: https://openjdk.github.io/cr/?repo=jdk&pr=20872&range=07-08 https://openjdk.github.io/cr/?repo=jdk&pr=20872&range=08-09 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20872#issuecomment-2346643064 From stuefe at openjdk.org Thu Sep 12 15:46:17 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 12 Sep 2024 15:46:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: Message-ID: <-CikzUsH1qKbMujGJQFhaPlKaCUDzqH-jEZNM5BZVQQ=.22d236a1-a69a-42e0-86d1-aa738c6e6e6d@github.com> On Wed, 11 Sep 2024 14:47:07 GMT, Roberto Casta?eda Lozano wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert accidental change of UCOH default > > src/hotspot/share/opto/machnode.cpp line 390: > >> 388: t = t->make_ptr(); >> 389: } >> 390: if (t->isa_narrowklass() && CompressedKlassPointers::shift() == 0) { > > Does this change have any effect? `UseCompressedClassPointers` should be implied by `t->isa_narrowklass()`. I don't remember if this change was a reaction to an error or if I just guarded `CompressedKlassPointers::shift()` with +UseCCP because that is the prerequisite now. Probably the latter. I can remove this. Probably should assert then for +UseCCP. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1757135035 From gziemski at openjdk.org Thu Sep 12 15:52:41 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Thu, 12 Sep 2024 15:52:41 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v11] In-Reply-To: References: Message-ID: <1Io2b9D2fXcUlf-VyTzgeqloWFuGNUh6T1wXzROJHgc=.8373bbd4-3aa9-4865-acc8-5f046607d052@github.com> > Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. > > This fix also includes a cleanup of all the related function/template parameter names and local variable names. > > Testing is pending... > > Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) Gerard Ziemski has updated the pull request incrementally with two additional commits since the last revision: - copyrights - Afshin's feedback, tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20872/files - new: https://git.openjdk.org/jdk/pull/20872/files/1765579b..7db989f3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20872&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20872&range=09-10 Stats: 9 lines in 7 files changed: 1 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/20872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20872/head:pull/20872 PR: https://git.openjdk.org/jdk/pull/20872 From coleenp at openjdk.org Thu Sep 12 16:01:15 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 12 Sep 2024 16:01:15 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v11] In-Reply-To: <1Io2b9D2fXcUlf-VyTzgeqloWFuGNUh6T1wXzROJHgc=.8373bbd4-3aa9-4865-acc8-5f046607d052@github.com> References: <1Io2b9D2fXcUlf-VyTzgeqloWFuGNUh6T1wXzROJHgc=.8373bbd4-3aa9-4865-acc8-5f046607d052@github.com> Message-ID: <4rSXUED92zoCWNuvkwpyNsA50elNxX2BquvOmwZf8HA=.5811acf9-38eb-458a-b4bf-02a7176c36bc@github.com> On Thu, 12 Sep 2024 15:52:41 GMT, Gerard Ziemski wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. >> >> This fix also includes a cleanup of all the related function/template parameter names and local variable names. >> >> Testing is pending... >> >> Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) > > Gerard Ziemski has updated the pull request incrementally with two additional commits since the last revision: > > - copyrights > - Afshin's feedback, tests This looks good to me. Thanks for finding some additional cases, Afshin. Thanks also for updating the copyrights. Thanks for this effort for consensus and for the change, Gerard. Edit: please let GHA complete. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20872#pullrequestreview-2300775940 From stuefe at openjdk.org Thu Sep 12 16:08:15 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 12 Sep 2024 16:08:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 15:58:29 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Try to avoid lea in loadNklass (aarch64) >> - Fix release build error > > src/hotspot/cpu/aarch64/compressedKlass_aarch64.cpp line 147: > >> 145: #endif >> 146: >> 147: return true; > > This should only be in the compressedKlass.cpp file. Okay. I will remove the whole `CompressedKlassPointers::pd_initialize` logic. We only need it for one architecture (aarch) and one case (+UseCCP -UseCOH), so maybe its not worth fanning out across all platforms, including Zero. Instead, I will add a short `ifdef` section to `CompressedKlassPointers::initialize`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1757169570 From kvn at openjdk.org Thu Sep 12 16:22:07 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 12 Sep 2024 16:22:07 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 00:42:01 GMT, Ioi Lam wrote: > This is the 4th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > **Problem:** > > This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store [`sun.invoke.util.Wrapper`](https://github.com/openjdk/jdk/blob/c3711dc90980fb3e63ff199612c201c4464626bf/src/java.base/share/classes/sun/invoke/util/Wrapper.java) enums in the AOT cache. Although CDS has some limited support for storing enums, the `Wrapper` type is too complex for the existing solution to handle. Please see the JBS issue for details. > > **Solution:** > > In the assembly phase, we store the initialized states of the `Wrapper` class (captured in a `java.lang.Class` object, a.k.a. the *mirror* of this class) into the AOT cache. > > In production run, we no longer execute `Wrapper::`, because all the static fields (contained in its mirror) are already initialized. > > **Review Notes:** > > - The new capability is controlled by `CDSConfig::is_initing_classes_at_dump_time()`. We can aot-initialize classes only if `-XX:+AOTClassLinking` is enabled. > - The old (more limited) support for enums is still there (it's required when `AOTClassLinking` is disabled). See the call to `CDSEnumKlass::handle_enum_obj()` in heapShared.cpp. > - `AOTClassInitializer::can_archive_initialized_mirror()` decides what classes can be aot-initialized. This is currently a very small set of classes, but will expand in [JDK-8293336](https://bugs.openjdk.org/browse/JDK-8293336) > - Before, `HeapShared::archive_java_mirrors()` would clear out all the states in the archived mirrors. With this PR, the states of aot-initialized classes are preserved via `HeapShared::copy_aot_initialized_mirror()`. > - During the early state of the production run, `AOTLinkedClassBulkLoader::init_required_classes_for_loader()` is called to make sure that: > - all aot-initialized classes are moved into the `initialized` state (without executing its `` method). This is done in `InstanceKlass::initialize_from_cds()` > - the classes of all the objects that are reachable from the aot-initialized mirrors are initialized. See comments above ` HeapShared::init_classes_reachable_from_archived_mirrors()` > > **Caveats:** > > Not all Enum classes can be stored in the initialized state. E.g., some Enums might have static fields that depend on the environment: > > > enum Foo { > [....] > static fin... src/hotspot/share/oops/instanceKlass.cpp line 828: > 826: link_class(CHECK); > 827: > 828: #ifdef AZZERT ? "ASSERT" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20958#discussion_r1757191705 From kvn at openjdk.org Thu Sep 12 16:27:06 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 12 Sep 2024 16:27:06 GMT Subject: RFR: 8339849: Enumerate opto and C1 stubs, generate enums, names, fields and generator calls [v2] In-Reply-To: References: Message-ID: <1vBtvFxhbgRFLjlz9vEpAlZKmWC8fvsj1b7qgjE8300=.b2092892-11fd-46d6-ab8a-7ffaa9d7b7af@github.com> On Thu, 12 Sep 2024 12:40:22 GMT, Andrew Dinn wrote: >> Systematize handling of Opto and C1 stubs. Generate enum ids, static fields, stub/blob names and generator code from declarations using template macros as previously done with Shared stubs. Systematically reference stubs and stub names using ids. > > Andrew Dinn has updated the pull request incrementally with two additional commits since the last revision: > > - Answer review feedback > - remove commented out old code Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20936#pullrequestreview-2300845145 From mdoerr at openjdk.org Thu Sep 12 17:21:32 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 12 Sep 2024 17:21:32 GMT Subject: RFR: 8340012: [C2] assert(KlassEncodingMetaspaceMax > pd) failed: change encoding max if new encoding after 8338526 Message-ID: After JDK-8338526, the transformation can only be done if the Klass* is in the encoding range (also see JBS issue for more details). I've also enhanced the assertion message. ------------- Commit messages: - 8340012: [C2] assert(KlassEncodingMetaspaceMax > pd) failed: change encoding max if new encoding after 8338526 Changes: https://git.openjdk.org/jdk/pull/20971/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20971&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340012 Stats: 6 lines in 3 files changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20971.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20971/head:pull/20971 PR: https://git.openjdk.org/jdk/pull/20971 From coleenp at openjdk.org Thu Sep 12 17:36:06 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 12 Sep 2024 17:36:06 GMT Subject: RFR: 8340012: [C2] assert(KlassEncodingMetaspaceMax > pd) failed: change encoding max if new encoding after 8338526 In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 17:16:08 GMT, Martin Doerr wrote: > After JDK-8338526, the transformation can only be done if the Klass* is in the encoding range (also see JBS issue for more details). I've also enhanced the assertion message. Thank you for fixing this. I hope there are no others. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20971#pullrequestreview-2300981506 From kvn at openjdk.org Thu Sep 12 17:36:06 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 12 Sep 2024 17:36:06 GMT Subject: RFR: 8340012: [C2] assert(KlassEncodingMetaspaceMax > pd) failed: change encoding max if new encoding after 8338526 In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 17:16:08 GMT, Martin Doerr wrote: > After JDK-8338526, the transformation can only be done if the Klass* is in the encoding range (also see JBS issue for more details). I've also enhanced the assertion message. Good. Can you add regression test? ------------- PR Review: https://git.openjdk.org/jdk/pull/20971#pullrequestreview-2300984218 From coleenp at openjdk.org Thu Sep 12 17:37:15 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 12 Sep 2024 17:37:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 16:04:45 GMT, Thomas Stuefe wrote: >> src/hotspot/cpu/aarch64/compressedKlass_aarch64.cpp line 147: >> >>> 145: #endif >>> 146: >>> 147: return true; >> >> This should only be in the compressedKlass.cpp file. > > Okay. I will remove the whole `CompressedKlassPointers::pd_initialize` logic. We only need it for one architecture (aarch) and one case (+UseCCP -UseCOH), so maybe its not worth fanning out across all platforms, including Zero. Instead, I will add a short `ifdef` section to `CompressedKlassPointers::initialize`. Yes, looking at this further, it does seem like a small amount of conditional compilation that sets all the same values that are set in the architecture independent version. It seems best to move it there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1757300544 From kvn at openjdk.org Thu Sep 12 17:39:04 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 12 Sep 2024 17:39:04 GMT Subject: RFR: 8340012: [C2] assert(KlassEncodingMetaspaceMax > pd) failed: change encoding max if new encoding after 8338526 In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 17:32:00 GMT, Coleen Phillimore wrote: >> After JDK-8338526, the transformation can only be done if the Klass* is in the encoding range (also see JBS issue for more details). I've also enhanced the assertion message. > > Thank you for fixing this. I hope there are no others. @coleenp, do you have an assert in your changes which check that only abstract and interface classes are outside encoding space? I concern that due to some bug we would get regular class outside. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20971#issuecomment-2346875046 From coleenp at openjdk.org Thu Sep 12 17:52:04 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 12 Sep 2024 17:52:04 GMT Subject: RFR: 8340012: [C2] assert(KlassEncodingMetaspaceMax > pd) failed: change encoding max if new encoding after 8338526 In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 17:32:00 GMT, Coleen Phillimore wrote: >> After JDK-8338526, the transformation can only be done if the Klass* is in the encoding range (also see JBS issue for more details). I've also enhanced the assertion message. > > Thank you for fixing this. I hope there are no others. > @coleenp, do you have an assert in your changes which check that only abstract and interface classes are outside encoding space? I concern that due to some bug we would get regular class outside. I only check these conditions when allocating the Klass. Maybe it could be added to CompressedKlassPointers::is_in_encoding_range(). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20971#issuecomment-2346898130 From coleenp at openjdk.org Thu Sep 12 18:04:10 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 12 Sep 2024 18:04:10 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v12] In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 06:43:50 GMT, Liang Mao wrote: >> Hi, >> >> It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. >> >> test/hotspot/jtreg/runtime and gc are clean. >> >> Thanks, >> Liang > > Liang Mao has updated the pull request incrementally with one additional commit since the last revision: > > remove unused imports This looks good, pending @fisk review. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20907#pullrequestreview-2301045551 From stefank at openjdk.org Thu Sep 12 18:30:07 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 12 Sep 2024 18:30:07 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v12] In-Reply-To: References: Message-ID: <-pL7lvYkSIz7y-74j2DZnr35731hG5Cm3VfFcrSY8Nk=.47e05a83-1b2b-4974-934a-f2e493507329@github.com> On Thu, 12 Sep 2024 06:43:50 GMT, Liang Mao wrote: >> Hi, >> >> It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. >> >> test/hotspot/jtreg/runtime and gc are clean. >> >> Thanks, >> Liang > > Liang Mao has updated the pull request incrementally with one additional commit since the last revision: > > remove unused imports The HotSpot code change looks good. I haven't reviewed the test code. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20907#pullrequestreview-2301097681 From stefank at openjdk.org Thu Sep 12 18:35:15 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 12 Sep 2024 18:35:15 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v10] In-Reply-To: <0BD0oNcSvpWZx2rDtRZhEgum_WYlLoGkofNFZ4JYKyI=.d888869c-1755-4ca0-b920-8209b34ff796@github.com> References: <5AYNAGdkQfu1RPz9WE-kM_BKcDGz345puShzyBAifDY=.b60a0b42-a0e9-422d-9db4-18d2c2db3898@github.com> <0BD0oNcSvpWZx2rDtRZhEgum_WYlLoGkofNFZ4JYKyI=.d888869c-1755-4ca0-b920-8209b34ff796@github.com> Message-ID: On Wed, 11 Sep 2024 16:09:09 GMT, Erik ?sterlund wrote: >> FWIW I don't think resurrecting the dying oop is the right way to fix this given that the underlying problem is that the application failed to keep the class of the jMethodID alive. Can't we detect it is dying (obviously more that what `is_alive` does) and just act as-if it were already dead? There is an inherent race here so the application can't rely on this act of resurrection anyway. > >> FWIW I don't think resurrecting the dying oop is the right way to fix this given that the underlying problem is that the application failed to keep the class of the jMethodID alive. Can't we detect it is dying (obviously more that what `is_alive` does) and just act as-if it were already dead? There is an inherent race here so the application can't rely on this act of resurrection anyway. > > We can not detect the oop is dying. That is precisely what the GC is trying to figure out by going through the hassle of traversing the object graph. If what you are proposing was possible (detect unreachable oops by just looking at some cheap local property), then we would rewrite our GCs to exploit that magic. ;-) We would also rewrite Reference.get() to not keep the referent alive because we could just magically tell if it will get cleared in the future, or not. > > If you are imagining, for example, looking at not yet finalized marking bitmaps from the GC and report errors when encountering a not yet marked object, then we would randomly report errors for perfectly valid uses of the API. The GC just didn't get to that object yet. In other words, we have no way of telling by just looking at an object if the object *will* be found to be not reachable, or not, once it terminates. But by keeping it alive, we can control the answer: the oop will be found to be live. > > This is not a new problem. We have encountered it many times before. The standard way of dealing with this situation (wanting to publish edges to "peeked" oops in the object graph), is to keep the oop alive. Not sure why we would treat it differently here. Unless of course we say this is not supported and crash, but that seems a bit unfortunate IMO. > @fisk Do you think hotspot abuses the weak's `peek`? IMHO, `peek` should be restricted inside GC scope because only very few places need to use peek. In other component of VM, we could always keep alive if some alive API return true or try to access weak referent just like the Java code did. Does it make sense? We need to use `peek` for various serviceability features, or else they artificially keep classes alive. Take a look at this PR for a recent discussion and change in this area: https://github.com/openjdk/jdk/pull/19769 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2346979146 From luhenry at openjdk.org Thu Sep 12 18:41:05 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 12 Sep 2024 18:41:05 GMT Subject: RFR: 8339771: RISC-V: Reduce icache flushes [v2] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 12:53:18 GMT, Robbin Ehn wrote: >> Hey, please consider, >> >> All code which is offline (behind a barrier) do not need global icache flushes. >> As we can instead in slow path locally (thread and hart) emit fence.i. >> But if we were to be context switch do a hart which have not had fence.i emitted we can still fetch stale instructions. >> To handle this case new now have kernel support: >> https://docs.kernel.org/arch/riscv/cmodx.html >> >> It's not perfect as we will be emitting fence.i on any context switch for any thread with this patch, even if that thread do not execute on code heap (non attached native thread), and even if there was no changes to code heap. >> But this is in many cases much faster as the icache flush global IPI is very intrusive. >> Particular cases are running a concurrent gc with small head room. >> In such scenario I measured 15% increased throughput on VF2. >> A large CPU or less head room (faster GC cycles) will yield even more performance boost. >> >> Note that this requires 6.10 kernel. >> >> I'm running VF2 with 6.11-rc3 kernel and this passed tier1-3. (With setting on) >> >> Later we probably want this default on, but as it's hard to test I'll leave default off. > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Comment, moved init after feature enabling Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20913#pullrequestreview-2301118506 From lmesnik at openjdk.org Thu Sep 12 18:42:07 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 12 Sep 2024 18:42:07 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v12] In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 06:43:50 GMT, Liang Mao wrote: >> Hi, >> >> It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. >> >> test/hotspot/jtreg/runtime and gc are clean. >> >> Thanks, >> Liang > > Liang Mao has updated the pull request incrementally with one additional commit since the last revision: > > remove unused imports test/hotspot/jtreg/serviceability/jvmti/GetMethodDeclaringClass/TestUnloadedClass.java line 32: > 30: * @requires (os.family == "linux") & (vm.debug != true) > 31: * @library /test/lib > 32: * @run main/othervm/timeout=300 TestUnloadedClass please change to othervm to driver so only forked vm is run with all vm flags. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1757386625 From rehn at openjdk.org Thu Sep 12 18:52:05 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 12 Sep 2024 18:52:05 GMT Subject: RFR: 8339771: RISC-V: Reduce icache flushes [v2] In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 18:38:55 GMT, Ludovic Henry wrote: >> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: >> >> Comment, moved init after feature enabling > > Marked as reviewed by luhenry (Committer). Thank you @luhenry ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20913#issuecomment-2347009363 From kvn at openjdk.org Thu Sep 12 19:10:09 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 12 Sep 2024 19:10:09 GMT Subject: RFR: 8340012: [C2] assert(KlassEncodingMetaspaceMax > pd) failed: change encoding max if new encoding after 8338526 In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 17:49:37 GMT, Coleen Phillimore wrote: > > @coleenp, do you have an assert in your changes which check that only abstract and interface classes are outside encoding space? I concern that due to some bug we would get regular class outside. > > I only check these conditions when allocating the Klass. Maybe it could be added to CompressedKlassPointers::is_in_encoding_range(). @TheRealMDoerr can you add such assert? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20971#issuecomment-2347041229 From kvn at openjdk.org Thu Sep 12 19:29:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 12 Sep 2024 19:29:05 GMT Subject: RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM In-Reply-To: References: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> Message-ID: On Wed, 11 Sep 2024 14:45:15 GMT, Boris Ulasevich wrote: >> Were performance runs made with CodeEntryAlignment set to other than 64 or 16? It seems like the other choices (32, 128, are there others that make sense?) should be tried. > >> Were performance runs made with CodeEntryAlignment set to other than 64 or 16? It seems like the other choices (32, 128, are there others that make sense?) should be tried. > > Here are rough neoverse-v2 numbers: > JmhDotty (-XX:CodeEntryAlignment=16) 701.93 ? 5.00 ms/op > JmhDotty (-XX:CodeEntryAlignment=32) 703.56 ? 5.15 ms/op > JmhDotty (-XX:CodeEntryAlignment=64) 704.46 ? 5.18 ms/op > JmhDotty (-XX:CodeEntryAlignment=128) 703.71 ? 5.17 ms/op @bulasevich I just want to let you know that we continue our performance testing... ------------- PR Comment: https://git.openjdk.org/jdk/pull/20864#issuecomment-2347075871 From iklam at openjdk.org Thu Sep 12 21:43:24 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 12 Sep 2024 21:43:24 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v5] In-Reply-To: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: > This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > **Overview** > > - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. > - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. > - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. > - The boot classes are loaded as part of `vmClasses::resolve_all()` > - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). > - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. > > **All-or-nothing Loading** > > - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. > - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: > - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. > - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. > - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. > - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exact same set of modules are visible whe... Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: @ashu-mehra comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20843/files - new: https://git.openjdk.org/jdk/pull/20843/files/5bba4ad4..66a4ff41 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20843&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20843&range=03-04 Stats: 28 lines in 6 files changed: 11 ins; 0 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/20843.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20843/head:pull/20843 PR: https://git.openjdk.org/jdk/pull/20843 From iklam at openjdk.org Thu Sep 12 21:43:25 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 12 Sep 2024 21:43:25 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v4] In-Reply-To: <4_SO0UUJcjV1a1CA4_YwQRrddZUfhEhqVp-TCTon6k0=.ab2382c2-126e-4151-b77e-3acc50bd1d3d@github.com> References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> <4_SO0UUJcjV1a1CA4_YwQRrddZUfhEhqVp-TCTon6k0=.ab2382c2-126e-4151-b77e-3acc50bd1d3d@github.com> Message-ID: On Thu, 12 Sep 2024 14:04:40 GMT, Ashutosh Mehra wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> @adinn comments > > test/hotspot/jtreg/runtime/cds/appcds/jvmti/ClassFileLoadHookTest.java line 110: > >> 108: "" + ClassFileLoadHook.TestCaseId.SHARING_ON_CFLH_ON); >> 109: if (out.contains("Using AOT-linked classes: false")) { >> 110: // We are running with VM options that do not support -XX:+AOTClassLinking > > When will we run into this case? Is there a VM option that would silently disable AOTClassLinking in prod run? AOTClassLinking requires the ability to write the full archived module graph (FMG), so it will be disabled if jtreg is executed with - a GC that doesn't support object heap writing, such as ZGC - a module related option that's not compatible with FMG, such as --add-opens I edited the comment in the test to make it clear that the JVM options affects the writing of the archive. I also modified the log to something like this: Using AOT-linked classes: false (static archive: no aot-linked classes) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1757622228 From iklam at openjdk.org Thu Sep 12 21:43:25 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 12 Sep 2024 21:43:25 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v3] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Tue, 10 Sep 2024 21:31:31 GMT, Ashutosh Mehra wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> @dholmes-ora comments: logging indents > > src/hotspot/share/cds/aotClassLinker.cpp line 149: > >> 147: add_candidate(ik); >> 148: >> 149: if (log_is_enabled(Info, cds, aot, load)) { > > Is `load` the correct log tag to use in this class? Can it be replaced with `link` tag? I changed to the `link` tag. > src/hotspot/share/cds/aotClassLinker.hpp line 111: > >> 109: >> 110: static int num_app_initiated_classes(); >> 111: static int num_platform_initiated_classes(); > > I don't see these methods (num_app_initiated_classes and num_platform_initiated_classes) used anywhere. Should they be removed? I added logging in DumpAllocStats::print_stats() to use these methods. > src/hotspot/share/cds/aotConstantPoolResolver.cpp line 111: > >> 109: >> 110: if (CDSConfig::is_dumping_aot_linked_classes()) { >> 111: if (AOTClassLinker::try_add_candidate(ik)) { > > Are we relying on the call to `try_add_candidate` to add the class to the candidate list? I guess that shouldn't be the case as the class have already been added through ArchiveBuilder::gather_klasses_and_symbols()->AOTClassLinker::add_candidates(). If so can we use AOTClassLinker::is_candidate(ik) here? You're correct. I changed it to `AOTClassLinker::is_candidate(ik)` > src/hotspot/share/cds/archiveBuilder.cpp line 766: > >> 764: #define ADD_COUNT(x) \ >> 765: x += 1; \ >> 766: x ## _a += aotlinked; > > Can we do this instead: > > ```x ## _a += (aotlinked ? 1 : 0)``` > > and make `aotlinked` a bool. Fixed. > src/hotspot/share/cds/archiveBuilder.cpp line 779: > >> 777: DECLARE_INSTANCE_KLASS_COUNTER(num_app_klasses); >> 778: DECLARE_INSTANCE_KLASS_COUNTER(num_hidden_klasses); >> 779: DECLARE_INSTANCE_KLASS_COUNTER(num_unlinked_klasses); > > Nit-picking here - "unlinked" category doesn't need the "aot-linked" counter. Fixed. > src/hotspot/share/cds/filemap.cpp line 2455: > >> 2453: const char* prop = Arguments::get_property("java.system.class.loader"); >> 2454: if (prop != nullptr) { >> 2455: if (has_aot_linked_classes()) { > > Should this check be part of `FileMapInfo::validate_aot_class_linking`? I put the check here so that the "Archived non-system classes are disabled because ...." message will not be printed a few lines below. Otherwise the user will see two different error messages that say different things. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1757622677 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1757622615 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1757622522 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1757622803 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1757622734 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1757622394 From sviswanathan at openjdk.org Thu Sep 12 23:17:13 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 12 Sep 2024 23:17:13 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes Message-ID: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. Summary of changes is as follows: 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code For the following source: public void test() { var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); index.selectFrom(inpvect).intoArray(byteres, j); } } The code generated for inner main now looks as follows: ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 0x00007f40d02274d0: movslq %ebx,%r13 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) 0x00007f40d022751f: add $0x40,%ebx 0x00007f40d0227522: cmp %r8d,%ebx 0x00007f40d0227525: jl 0x00007f40d02274d0 Best Regards, Sandhya ------------- Commit messages: - Merge branch 'master' of https://git.openjdk.java.net/jdk into rearrangewrap - Some cleanup - Some small fixes - Initial feedback - Optionally partial wrap shuffles during construction - Wrap shuffle on rearrange Changes: https://git.openjdk.org/jdk/pull/20634/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20634&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340079 Stats: 686 lines in 47 files changed: 548 ins; 30 del; 108 mod Patch: https://git.openjdk.org/jdk/pull/20634.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20634/head:pull/20634 PR: https://git.openjdk.org/jdk/pull/20634 From psandoz at openjdk.org Thu Sep 12 23:17:13 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Thu, 12 Sep 2024 23:17:13 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: <0LLon0mwzZnp8sR_306z0BoBUjXQAgLBn_KHP-37PC0=.802ff560-b97b-43ba-83b1-94d331d7e03e@github.com> On Mon, 19 Aug 2024 21:47:23 GMT, Sandhya Viswanathan wrote: > Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. > > Summary of changes is as follows: > 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. > 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code > > For the following source: > > > public void test() { > var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); > for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { > var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); > index.selectFrom(inpvect).intoArray(byteres, j); > } > } > > > The code generated for inner main now looks as follows: > ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 > 0x00007f40d02274d0: movslq %ebx,%r13 > 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 > 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) > 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 > 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) > 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 > 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) > 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 > 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) > 0x00007f40d022751f: add $0x40,%ebx > 0x00007f40d0227522: cmp %r8d,%ebx > 0x00007f40d0227525: jl 0x00007f40d02274d0 > > Best Regards, > Sandhya API shapes are good! I see you intrinsified `selectFrom` which, IIUC, optimally generates C2 nodes that are functionally equivalent to the Java expression `v.rearrange(this.toShuffle())`. That way we can better generate an optimal set of instructions? Do you know what deficiencies there that blocks us from compiling the expression down to the same set of instructions as the intrinsic? Not suggesting we do that here, just for future reference. Adding link to UTF-8 decoding use case for convenience and reminder: https://github.com/AugustNagro/utf8.java/blob/master/src/main/java/com/augustnagro/utf8/Utf8.java. I think this is good enough to promote out of draft and create a CSR for the API changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2305377165 PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2305412450 PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2346848993 From sviswanathan at openjdk.org Thu Sep 12 23:17:14 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 12 Sep 2024 23:17:14 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: <0LLon0mwzZnp8sR_306z0BoBUjXQAgLBn_KHP-37PC0=.802ff560-b97b-43ba-83b1-94d331d7e03e@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> <0LLon0mwzZnp8sR_306z0BoBUjXQAgLBn_KHP-37PC0=.802ff560-b97b-43ba-83b1-94d331d7e03e@github.com> Message-ID: On Thu, 22 Aug 2024 18:21:50 GMT, Paul Sandoz wrote: > API shapes are good! > > I see you intrinsified `selectFrom` which, IIUC, optimally generates C2 nodes that are functionally equivalent to the Java expression `v.rearrange(this.toShuffle())`. That way we can better generate an optimal set of instructions? > > Do you know what deficiencies there that blocks us from compiling the expression down to the same set of instructions as the intrinsic? Not suggesting we do that here, just for future reference. Yes, I intrinsified to generate optimial set of instructions. In the expression `v.rearrange(this.toShuffle())` we will do first partial wrap as part of this.toShuffle() and then full wrap as part of rearrange. In the intrinsic I am only doing full wrap. Without intrinsic, if for whatever reason the this.toShuffle() is not moved out of the loop by the JIT, we incur additional overhead of the partial wrap in the hot code path. I saw this happening when the following is run as part of the jmh instead of being called from standalone java with a loop: var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); index.selectFrom(inpvect).intoArray(byteres, j); } The perf difference between the intrinsic and no intrinsic observed in this case then is about 20%. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2305521441 From kvn at openjdk.org Thu Sep 12 23:26:07 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 12 Sep 2024 23:26:07 GMT Subject: RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM In-Reply-To: References: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> Message-ID: On Wed, 11 Sep 2024 14:45:15 GMT, Boris Ulasevich wrote: >> Were performance runs made with CodeEntryAlignment set to other than 64 or 16? It seems like the other choices (32, 128, are there others that make sense?) should be tried. > >> Were performance runs made with CodeEntryAlignment set to other than 64 or 16? It seems like the other choices (32, 128, are there others that make sense?) should be tried. > > Here are rough neoverse-v2 numbers: > JmhDotty (-XX:CodeEntryAlignment=16) 701.93 ? 5.00 ms/op > JmhDotty (-XX:CodeEntryAlignment=32) 703.56 ? 5.15 ms/op > JmhDotty (-XX:CodeEntryAlignment=64) 704.46 ? 5.18 ms/op > JmhDotty (-XX:CodeEntryAlignment=128) 703.71 ? 5.17 ms/op Hi @bulasevich We got significant (few percents) regression when testing -XX:CodeCacheSegmentSize=64 -XX:CodeEntryAlignment=16 on Ampere system which has N1 CPU. Is it possible to not set CodeEntryAlignment=16 for N1? May be even limit it for V1 and V2 only? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20864#issuecomment-2347402012 From lmao at openjdk.org Fri Sep 13 00:33:34 2024 From: lmao at openjdk.org (Liang Mao) Date: Fri, 13 Sep 2024 00:33:34 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v13] In-Reply-To: References: Message-ID: > Hi, > > It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. > > test/hotspot/jtreg/runtime and gc are clean. > > Thanks, > Liang Liang Mao has updated the pull request incrementally with one additional commit since the last revision: change main/othervm to driver ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20907/files - new: https://git.openjdk.org/jdk/pull/20907/files/acf91c94..ccd2a163 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20907&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20907&range=11-12 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20907/head:pull/20907 PR: https://git.openjdk.org/jdk/pull/20907 From ddong at openjdk.org Fri Sep 13 00:33:34 2024 From: ddong at openjdk.org (Denghui Dong) Date: Fri, 13 Sep 2024 00:33:34 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v12] In-Reply-To: References: Message-ID: <-Sxdy7ps-D-d65KI8Oe0w5gKmk73TAi-CLLLqicebcA=.696e7eff-c2ca-4e83-bbb1-5eb6dd7f35a7@github.com> On Thu, 12 Sep 2024 18:38:49 GMT, Leonid Mesnik wrote: >> Liang Mao has updated the pull request incrementally with one additional commit since the last revision: >> >> remove unused imports > > test/hotspot/jtreg/serviceability/jvmti/GetMethodDeclaringClass/TestUnloadedClass.java line 32: > >> 30: * @requires (os.family == "linux") & (vm.debug != true) >> 31: * @library /test/lib >> 32: * @run main/othervm/timeout=300 TestUnloadedClass > > please change to othervm to driver so only forked vm is run with all vm flags. Changed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1757762462 From lmesnik at openjdk.org Fri Sep 13 00:44:08 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 13 Sep 2024 00:44:08 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v13] In-Reply-To: References: Message-ID: <78GGyYgosP2SNVyxaySJ7ufQ9PXY4M2zHYcvfd2x5bs=.799f56ca-07f3-4d8e-a1c5-d7046023f45e@github.com> On Fri, 13 Sep 2024 00:33:34 GMT, Liang Mao wrote: >> Hi, >> >> It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. >> >> test/hotspot/jtreg/runtime and gc are clean. >> >> Thanks, >> Liang > > Liang Mao has updated the pull request incrementally with one additional commit since the last revision: > > change main/othervm to driver test/hotspot/jtreg/serviceability/jvmti/GetMethodDeclaringClass/TestUnloadedClass.java line 30: > 28: * @summary Stress test GetMethodDeclaringClass > 29: * @requires vm.jvmti > 30: * @requires (os.family == "linux") & (vm.debug != true) Sorry, that missed initially. Why test is linux and non-debug only? `* @requires (os.family == "linux") & (vm.debug != true)` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1757810920 From dlong at openjdk.org Fri Sep 13 00:48:04 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 13 Sep 2024 00:48:04 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v4] In-Reply-To: <_9D_UJt75MdlZfwBGky2b_eKlQZBZMVLUDJFE8IIfCQ=.9454e491-a233-44d5-9917-3866ba56e96e@github.com> References: <_9D_UJt75MdlZfwBGky2b_eKlQZBZMVLUDJFE8IIfCQ=.9454e491-a233-44d5-9917-3866ba56e96e@github.com> Message-ID: On Thu, 12 Sep 2024 13:00:49 GMT, Coleen Phillimore wrote: > If we have NSME on the stack and do a GC and the arguments aren't collected but not used (but consumed because we clear the expression stack (?)) could this cause a problem? Clearing the expression stack in the callee (NSME) doesn't remove the incoming arguments, because they are in the locals. Clearing the expression stack in the caller would mean the callee frame is already gone, I assume. > For PopFrame, I don't know how to PopFrame from Unsafe::NSME method. The JVMTI agent just needs to get control, I believe. It doesn't even need to be at a breakpoint of event handler -- I think suspending the thread, which would happen at a safepoint, is enough. Then it can request the frame to be popped. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20874#issuecomment-2347688840 From dholmes at openjdk.org Fri Sep 13 02:19:07 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 13 Sep 2024 02:19:07 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v11] In-Reply-To: <1Io2b9D2fXcUlf-VyTzgeqloWFuGNUh6T1wXzROJHgc=.8373bbd4-3aa9-4865-acc8-5f046607d052@github.com> References: <1Io2b9D2fXcUlf-VyTzgeqloWFuGNUh6T1wXzROJHgc=.8373bbd4-3aa9-4865-acc8-5f046607d052@github.com> Message-ID: On Thu, 12 Sep 2024 15:52:41 GMT, Gerard Ziemski wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. >> >> This fix also includes a cleanup of all the related function/template parameter names and local variable names. >> >> Testing is pending... >> >> Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) > > Gerard Ziemski has updated the pull request incrementally with two additional commits since the last revision: > > - copyrights > - Afshin's feedback, tests Looks good. I thought those type/tag issues had been pointed out earlier, but good to catch them anyway. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20872#pullrequestreview-2302006352 From lmao at openjdk.org Fri Sep 13 02:36:08 2024 From: lmao at openjdk.org (Liang Mao) Date: Fri, 13 Sep 2024 02:36:08 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v13] In-Reply-To: <78GGyYgosP2SNVyxaySJ7ufQ9PXY4M2zHYcvfd2x5bs=.799f56ca-07f3-4d8e-a1c5-d7046023f45e@github.com> References: <78GGyYgosP2SNVyxaySJ7ufQ9PXY4M2zHYcvfd2x5bs=.799f56ca-07f3-4d8e-a1c5-d7046023f45e@github.com> Message-ID: <0I-N2xnYHM8nWzgKGL-5DrYhd5G95hYHxqGQNTz93YE=.e8450cbd-aac3-4b88-9e35-b76887ce4b13@github.com> On Fri, 13 Sep 2024 00:41:13 GMT, Leonid Mesnik wrote: >> Liang Mao has updated the pull request incrementally with one additional commit since the last revision: >> >> change main/othervm to driver > > test/hotspot/jtreg/serviceability/jvmti/GetMethodDeclaringClass/TestUnloadedClass.java line 30: > >> 28: * @summary Stress test GetMethodDeclaringClass >> 29: * @requires vm.jvmti >> 30: * @requires (os.family == "linux") & (vm.debug != true) > > Sorry, that missed initially. Why test is linux and non-debug only? > `* @requires (os.family == "linux") & (vm.debug != true)` The test depends on pthread and it's a stress test that fastdebug ran out of time. We don't want to trade off the crash reproduction probability to make fastdebug pass. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1758086641 From lmesnik at openjdk.org Fri Sep 13 02:44:13 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 13 Sep 2024 02:44:13 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v13] In-Reply-To: <0I-N2xnYHM8nWzgKGL-5DrYhd5G95hYHxqGQNTz93YE=.e8450cbd-aac3-4b88-9e35-b76887ce4b13@github.com> References: <78GGyYgosP2SNVyxaySJ7ufQ9PXY4M2zHYcvfd2x5bs=.799f56ca-07f3-4d8e-a1c5-d7046023f45e@github.com> <0I-N2xnYHM8nWzgKGL-5DrYhd5G95hYHxqGQNTz93YE=.e8450cbd-aac3-4b88-9e35-b76887ce4b13@github.com> Message-ID: On Fri, 13 Sep 2024 02:33:11 GMT, Liang Mao wrote: >> test/hotspot/jtreg/serviceability/jvmti/GetMethodDeclaringClass/TestUnloadedClass.java line 30: >> >>> 28: * @summary Stress test GetMethodDeclaringClass >>> 29: * @requires vm.jvmti >>> 30: * @requires (os.family == "linux") & (vm.debug != true) >> >> Sorry, that missed initially. Why test is linux and non-debug only? >> `* @requires (os.family == "linux") & (vm.debug != true)` > > The test depends on pthread and it's a stress test that fastdebug ran out of time. We don't want to trade off the crash reproduction probability to make fastdebug pass. The fastdebug testing is very important for hotspot testing. Is it possible just to increase timeout so test pass with fastdebug? You can make 2 tests - 1 for product and 1 for fastdebug with different timeouts, if you want, but I would recommend just to increase the timeout. Also, please exclude test from tier1_serviceability in TEST.groups because it takes too many time for tier1. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1758091768 From lmesnik at openjdk.org Fri Sep 13 02:44:13 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 13 Sep 2024 02:44:13 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v13] In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 00:33:34 GMT, Liang Mao wrote: >> Hi, >> >> It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. >> >> test/hotspot/jtreg/runtime and gc are clean. >> >> Thanks, >> Liang > > Liang Mao has updated the pull request incrementally with one additional commit since the last revision: > > change main/othervm to driver Changes requested by lmesnik (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20907#pullrequestreview-2302023230 From dholmes at openjdk.org Fri Sep 13 04:05:08 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 13 Sep 2024 04:05:08 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v5] In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 13:56:42 GMT, Simon Tooke wrote: >> This PR changes the status of realpath() from a Posix-specific API to a globally available API, i.e. adding it to the "Hotspot Porting API". Code would refer to os::realpath() instead of os::Posix::realpath(). >> >> This requires the addition of a stub routine in os_posix.cpp and a Windows implementation of realpath(), using Windows _fullpath(). >> >> This PR depends on #20597 in that it removes the need for one #ifdef in that PR. Because of that, this PR will be modified when and if #20597 is integrated (or vice-versa) >> >> Please note that guidelines for doing this appear in src/hotspot/share/runtime/os.hpp > > Simon Tooke has updated the pull request incrementally with three additional commits since the last revision: > > - remove empty line > - fix indentation > - fix missing return statement Okay this seems fine to me now. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20683#pullrequestreview-2302082313 From dholmes at openjdk.org Fri Sep 13 04:51:07 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 13 Sep 2024 04:51:07 GMT Subject: RFR: 8340009: Improve the output from assert_different_registers In-Reply-To: <_lgzZBTM5wRKfgsMXdgbdR45SxD6WJZprnJBw6Gk1SU=.78dd43ee-4984-4037-b828-5fd8fb1ce791@github.com> References: <_lgzZBTM5wRKfgsMXdgbdR45SxD6WJZprnJBw6Gk1SU=.78dd43ee-4984-4037-b828-5fd8fb1ce791@github.com> Message-ID: On Thu, 12 Sep 2024 12:56:13 GMT, Stefan Karlsson wrote: > `assert_different_registers` is a mechanism we use to ensure that we don't use the same register in different variables. When the assert triggers it is not immediately clear where and why the assert failed. > > For example, if I introduce an intentional violation: > > diff --git a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp > index fde868a64b3..551878ac09d 100644 > --- a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp > +++ b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp > @@ -1188,7 +1188,8 @@ void MacroAssembler::lookup_interface_method(Register recv_klass, > Register scan_temp, > Label& L_no_such_interface, > bool return_method) { > - assert_different_registers(recv_klass, intf_klass, scan_temp); > + Register joker = intf_klass; > + assert_different_registers(recv_klass, intf_klass, scan_temp, joker); > assert_different_registers(method_result, intf_klass, scan_temp); > assert(recv_klass != method_result || !return_method, > "recv_klass can be destroyed when method isn't needed"); > > I get this error message: > > # Internal Error (src/hotspot/share/asm/register.hpp:287), pid=42568, tid=9731 > # assert(!regs[i]->is_valid() || regs[i] != regs[j]) failed: Multiple uses of register: c_rarg0 > > The indicated file and line number refers to the `assert_different_registers` implementation and not the offending call site. More over, it's unclear from the assert which of the four variables contain the same register. > > I'd like to propose a few changes: > 1) That we report the indices of the conflicting registers > 2) That we report the correct file and line number > 3) That we hide the is_valid() check to lower the noise in the output. Not strictly necessary, but I think it looks nicer. > > After these suggestions we'll get error messages that look like this: > > # Internal Error (src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp:1187), pid=59065, tid=8963 > # assert(regs[i] != regs[j]) failed: regs[1] and regs[3] are both: c_rarg0 > > Which makes it easy to see that variables 1 and 3 are conflicting and by looking at the indicated file and line, it is clear that it is `intf_klass` and `joker` that are the offending variables. > > There might be a way to use more macros to propagate the variable names, but I propose that we start with this incremental improvement. Looks good. That isn't an assert I've had to deal with but this is certainly a huge improvement in its usability. :) Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20965#pullrequestreview-2302115777 From lmao at openjdk.org Fri Sep 13 05:47:07 2024 From: lmao at openjdk.org (Liang Mao) Date: Fri, 13 Sep 2024 05:47:07 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v13] In-Reply-To: References: <78GGyYgosP2SNVyxaySJ7ufQ9PXY4M2zHYcvfd2x5bs=.799f56ca-07f3-4d8e-a1c5-d7046023f45e@github.com> <0I-N2xnYHM8nWzgKGL-5DrYhd5G95hYHxqGQNTz93YE=.e8450cbd-aac3-4b88-9e35-b76887ce4b13@github.com> Message-ID: <22lYio5OB8MOBKY-gqS_wD1xr8Lw7UYt3o1bNxYNvHM=.362420b2-0c82-4af8-8ed3-19dce4c6f163@github.com> On Fri, 13 Sep 2024 02:41:00 GMT, Leonid Mesnik wrote: >> The test depends on pthread and it's a stress test that fastdebug ran out of time. We don't want to trade off the crash reproduction probability to make fastdebug pass. > > The fastdebug testing is very important for hotspot testing. Is it possible just to increase timeout so test pass with fastdebug? You can make 2 tests - 1 for product and 1 for fastdebug with different timeouts, if you want, but I would recommend just to increase the timeout. > Also, please exclude test from tier1_serviceability in TEST.groups because it takes too many time for tier1. Fastdebug needs more than 30 minutes to finish and can hardly reproduce the crash. Do we still need that? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1758215314 From stuefe at openjdk.org Fri Sep 13 06:02:08 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 13 Sep 2024 06:02:08 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v5] In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 13:56:42 GMT, Simon Tooke wrote: >> This PR changes the status of realpath() from a Posix-specific API to a globally available API, i.e. adding it to the "Hotspot Porting API". Code would refer to os::realpath() instead of os::Posix::realpath(). >> >> This requires the addition of a stub routine in os_posix.cpp and a Windows implementation of realpath(), using Windows _fullpath(). >> >> This PR depends on #20597 in that it removes the need for one #ifdef in that PR. Because of that, this PR will be modified when and if #20597 is integrated (or vice-versa) >> >> Please note that guidelines for doing this appear in src/hotspot/share/runtime/os.hpp > > Simon Tooke has updated the pull request incrementally with three additional commits since the last revision: > > - remove empty line > - fix indentation > - fix missing return statement Possibly for another RFE, since this one has been in the making long enough: Write a gtest for this new function. src/hotspot/os/windows/os_windows.cpp line 5334: > 5332: } else { > 5333: errno = ENAMETOOLONG; > 5334: } Curious, why not just passing in outbuf and outbuflen directly instead of letting the function allocate memory just to then copy the content? To get a guaranteed ENAMETOOLONG? Just a question, not a comment. I am fine with this as it is. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20683#pullrequestreview-2302179063 PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1758226212 From adinn at openjdk.org Fri Sep 13 06:46:12 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Fri, 13 Sep 2024 06:46:12 GMT Subject: Integrated: 8339849: Enumerate opto and C1 stubs, generate enums, names, fields and generator calls In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 15:27:09 GMT, Andrew Dinn wrote: > Systematize handling of Opto and C1 stubs. Generate enum ids, static fields, stub/blob names and generator code from declarations using template macros as previously done with Shared stubs. Systematically reference stubs and stub names using ids. This pull request has now been integrated. Changeset: b88ff9c9 Author: Andrew Dinn URL: https://git.openjdk.org/jdk/commit/b88ff9c986bfe5e14e2ba5803a464fbf6e131df8 Stats: 1033 lines in 44 files changed: 237 ins; 105 del; 691 mod 8339849: Enumerate opto and C1 stubs, generate enums, names, fields and generator calls Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/20936 From rcastanedalo at openjdk.org Fri Sep 13 06:46:15 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 13 Sep 2024 06:46:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: <-CikzUsH1qKbMujGJQFhaPlKaCUDzqH-jEZNM5BZVQQ=.22d236a1-a69a-42e0-86d1-aa738c6e6e6d@github.com> References: <-CikzUsH1qKbMujGJQFhaPlKaCUDzqH-jEZNM5BZVQQ=.22d236a1-a69a-42e0-86d1-aa738c6e6e6d@github.com> Message-ID: On Thu, 12 Sep 2024 15:42:59 GMT, Thomas Stuefe wrote: >> src/hotspot/share/opto/machnode.cpp line 390: >> >>> 388: t = t->make_ptr(); >>> 389: } >>> 390: if (t->isa_narrowklass() && CompressedKlassPointers::shift() == 0) { >> >> Does this change have any effect? `UseCompressedClassPointers` should be implied by `t->isa_narrowklass()`. > > I don't remember if this change was a reaction to an error or if I just guarded `CompressedKlassPointers::shift()` with +UseCCP because that is the prerequisite now. Probably the latter. I can remove this. Probably should assert then for +UseCCP. I see, thanks. In that case, I would suggest removing the explicit `UseCompressedClassPointers` test, since it should be implied by `t->isa_narrowklass()`. `check_init()` within `CompressedKlassPointers::shift()` would already fail for the unexpected case where `t->isa_narrowklass() && !UseCompressedClassPointers`, no? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758270661 From iklam at openjdk.org Fri Sep 13 07:41:20 2024 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 13 Sep 2024 07:41:20 GMT Subject: RFR: 8293337: Store method handle intrinsics in AOT cache Message-ID: <4fMDUSZRg0HcIiZmr-yqr7vleFXrD_zNXpdd_pfgHQ8=.4a679e86-b0f9-4101-bcc2-f49d8bcb417b@github.com> This is the 5th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store Java heap objects that have native pointers to the C++ `Method` objects returned by `SystemDictionary::find_method_handle_intrinsic()` These `Method` objects are created within the JVM. They do not belong to any actual Java classes. We store all these `Method` objects into the AOT cache, so that they can be referenced by other artifacts in the AOT cache. --- See [here](https://bugs.openjdk.org/browse/JDK-8315737) for the sequence of dependent RFEs for implementing JEP 483. ------------- Depends on: https://git.openjdk.org/jdk/pull/20958 Commit messages: - some clean up - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics - 8293337: Archive method handle intrinsics Changes: https://git.openjdk.org/jdk/pull/20959/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20959&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8293337 Stats: 126 lines in 9 files changed: 125 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20959.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20959/head:pull/20959 PR: https://git.openjdk.org/jdk/pull/20959 From amitkumar at openjdk.org Fri Sep 13 07:43:33 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 13 Sep 2024 07:43:33 GMT Subject: RFR: 8339416: [s390x] Provide implementation for resolve_global_jobject Message-ID: This PR provides "resolve_global_jobject" method implementation for s390x-port. Testing: * Tier1 test with Fastdebug; * Added these changes on top of https://github.com/openjdk/jdk/pull/20479 and modified the call in the stubGenerator_s390.cpp file; * 1. Ran tier1 test with a call to "resolve_jobect" * 2. Ran tier1 test with a call to "resolve_global_jobject" I didn't see any new failure appearing there. ------------- Commit messages: - implement resolve_global_jobject Changes: https://git.openjdk.org/jdk/pull/20986/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20986&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339416 Stats: 62 lines in 4 files changed: 55 ins; 4 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20986.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20986/head:pull/20986 PR: https://git.openjdk.org/jdk/pull/20986 From rcastanedalo at openjdk.org Fri Sep 13 07:49:15 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 13 Sep 2024 07:49:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> Message-ID: On Thu, 12 Sep 2024 15:38:18 GMT, Thomas Stuefe wrote: >> src/hotspot/share/opto/lcm.cpp line 272: >> >>> 270: const TypePtr* tptr; >>> 271: if ((UseCompressedOops || UseCompressedClassPointers) && >>> 272: (CompressedOops::shift() == 0 || CompressedKlassPointers::shift() == 0)) { >> >> Could you explain this change? It seems like it may affect C2's implicit null check analysis even for `-XX:-UseCompactObjectHeaders`. In particular, for the following configurations, the changed condition evaluates to true before the change and false after it, regardless of whether `UseCompactObjectHeaders` is enabled: >> >> (!UseCompressedOops, UseCompressedClassPointers, CompressedKlassPointers::shift() != 0) >> ( UseCompressedOops, !UseCompressedClassPointers, CompressedOops::shift() != 0) > > Hi @robcasloz > > The `CompressedKlassPointers` utility class is not usable anymore with `-UseCompressedClassPointers`. One change is that if `UseCompressedClassPointers` is off, `CompressedKlassPointers` stays uninitialized. And that makes more sense then to rely on the static initialization values of `CompressedOops::_shift`. Thanks for the explanation. I wonder if the test is necessary at all, or one could simply use `base->get_ptr_type()` unconditionally, which defaults to `base->bottom_type()->isa_ptr()` anyway for non-compressed pointers. But this simplification would be in any case out of the scope of this changeset. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758356268 From jbhateja at openjdk.org Fri Sep 13 07:52:38 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Sep 2024 07:52:38 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v3] In-Reply-To: References: Message-ID: > - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The > condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. > - This saves emitting an explicit MOVZX instruction after setCC. > - These new instructions are encoded using 4 byte Extended EVEX encoding. > > Validation performed over stand alone test point using Intel SDE. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Review comments resolution. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8339790 - Review resolutions. - 8339790: Support Intel APX setzucc instruction. ------------- Changes: https://git.openjdk.org/jdk/pull/20920/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20920&range=02 Stats: 77 lines in 7 files changed: 26 ins; 25 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/20920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20920/head:pull/20920 PR: https://git.openjdk.org/jdk/pull/20920 From rcastanedalo at openjdk.org Fri Sep 13 07:57:15 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 13 Sep 2024 07:57:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> Message-ID: On Thu, 12 Sep 2024 11:46:35 GMT, Roberto Casta?eda Lozano wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert accidental change of UCOH default > > src/hotspot/share/cds/filemap.cpp line 2457: > >> 2455: compressed_oops(), compressed_class_pointers()); >> 2456: if (compressed_oops() != UseCompressedOops || compressed_class_pointers() != UseCompressedClassPointers) { >> 2457: log_info(cds)("Unable to use shared archive.\nThe saved state of UseCompressedOops and UseCompressedClassPointers is " > > The promotion of this CDS log line from `info` to `warning` triggers false failures in the `test/hotspot/jtreg/compiler/intrinsics/bmi` tests when running them with `-XX:-UseCompressedClassPointers`. These tests expect the standard output of different JVM runs to be identical, but the timestamps in the log messages tend to differ. I suggest adjusting the test configuration so that log timestamps are simply omitted, as in [this patch](https://github.com/robcasloz/jdk/commit/48f6e90ef6e0a71b55df536ed04a8b72130b5ea9) (feel free to merge it as-is or with any further changes you may find necessary). This comment has been marked as "resolved" without any apparent action being taken, is that intentional? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758369787 From rkennke at openjdk.org Fri Sep 13 08:21:54 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 13 Sep 2024 08:21:54 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v14] In-Reply-To: References: Message-ID: <3-nOaxvBWIcOLzCOlrWPzJtsYRknVz5JIwx21X8xkIg=.6ce32f15-a4d6-44d5-9b9e-c3015de45e66@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Hide log timestamps in test to prevent false failures ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/9e008ac1..69f1ef1d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=12-13 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Fri Sep 13 08:21:55 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 13 Sep 2024 08:21:55 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v14] In-Reply-To: References: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> Message-ID: <99QfaesSJzBLGXsBKOdiSwjAdt18pwNMh62Pyhr-6bk=.b27f001b-e3e3-4826-9542-698eef2a9ee3@github.com> On Fri, 13 Sep 2024 07:54:30 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/share/cds/filemap.cpp line 2457: >> >>> 2455: compressed_oops(), compressed_class_pointers()); >>> 2456: if (compressed_oops() != UseCompressedOops || compressed_class_pointers() != UseCompressedClassPointers) { >>> 2457: log_info(cds)("Unable to use shared archive.\nThe saved state of UseCompressedOops and UseCompressedClassPointers is " >> >> The promotion of this CDS log line from `info` to `warning` triggers false failures in the `test/hotspot/jtreg/compiler/intrinsics/bmi` tests when running them with `-XX:-UseCompressedClassPointers`. These tests expect the standard output of different JVM runs to be identical, but the timestamps in the log messages tend to differ. I suggest adjusting the test configuration so that log timestamps are simply omitted, as in [this patch](https://github.com/robcasloz/jdk/commit/48f6e90ef6e0a71b55df536ed04a8b72130b5ea9) (feel free to merge it as-is or with any further changes you may find necessary). > > This comment has been marked as "resolved" without any apparent action being taken, is that intentional? I have merged your patch locally but forgot to push it. Sorry. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758407575 From azafari at openjdk.org Fri Sep 13 08:45:15 2024 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 13 Sep 2024 08:45:15 GMT Subject: RFR: 8337563: NMT: rename MEMFLAGS to MemTag [v11] In-Reply-To: <1Io2b9D2fXcUlf-VyTzgeqloWFuGNUh6T1wXzROJHgc=.8373bbd4-3aa9-4865-acc8-5f046607d052@github.com> References: <1Io2b9D2fXcUlf-VyTzgeqloWFuGNUh6T1wXzROJHgc=.8373bbd4-3aa9-4865-acc8-5f046607d052@github.com> Message-ID: <3-s8755E5hKSDICVfIfYYa0Tk1FeL5-idsorLOFnqfs=.eda363bc-1d5e-4173-9007-c2d10114e396@github.com> On Thu, 12 Sep 2024 15:52:41 GMT, Gerard Ziemski wrote: >> Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. >> >> `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. >> >> This fix also includes a cleanup of all the related function/template parameter names and local variable names. >> >> Testing is pending... >> >> Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) > > Gerard Ziemski has updated the pull request incrementally with two additional commits since the last revision: > > - copyrights > - Afshin's feedback, tests Thank you all, for the great work here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20872#issuecomment-2348381372 From aturbanov at openjdk.org Fri Sep 13 09:30:18 2024 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Fri, 13 Sep 2024 09:30:18 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v4] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 00:29:30 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm >> >> Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup >> -- | -- | -- | -- >> MathBench.tanhDouble | 70900 | 95618 | 1.35x > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > c1 and template generator fixes test/jdk/java/lang/Math/HyperbolicTests.java line 1011: > 1009: } > 1010: > 1011: for(int i = 0; i < testCases.length; i++) { Suggestion: for (int i = 0; i < testCases.length; i++) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1758522740 From stuefe at openjdk.org Fri Sep 13 09:30:24 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 13 Sep 2024 09:30:24 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 12:19:32 GMT, Coleen Phillimore wrote: >> This is tricky. We are already deep in initialization and have done a couple of decisions based on +UseCompressedClassPointers (e.g. CDS setup). I *think* we could still go with -UseCCP, but I wonder whether this is wise. >> >> Note that this error is not new. In the old code, we simply asserted. That left us with UB in release builds, which remains unresolved. I simply made the error explicit in release too. > > Ok, in this case, that's fine if we already asserted. A fatal error is better. Actually, a lot of the old code had dusty side corners that were UB. Making narrowKlass smaller than 32bit exposed a lot of them, and a lot of the changes in and around CompressedKlassPointers are about cleanly making explicit what before had been implicit or just broken (e.g. a clear distinction between encoding range and Klass range, and a clear handling of narrowKlass bit width as a runtime value). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758522844 From stuefe at openjdk.org Fri Sep 13 09:38:15 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 13 Sep 2024 09:38:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 12:13:58 GMT, Thomas Stuefe wrote: >> src/hotspot/share/oops/compressedKlass.cpp line 243: >> >>> 241: } else { >>> 242: >>> 243: // In legacy mode, we try, in order of preference: >> >> Can you not use the word 'legacy' here? Maybe in "non-compact object header mode"... > > okay. I removed all traces of "legacy" and "tiny", reverting to "standard" or "non-coh" vs "coh". I would prefer to use the shorthand "coh" in some places since "compact object header mode" is a mouthful and gives me RSI :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758533732 From stefank at openjdk.org Fri Sep 13 09:44:19 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 13 Sep 2024 09:44:19 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v14] In-Reply-To: <3-nOaxvBWIcOLzCOlrWPzJtsYRknVz5JIwx21X8xkIg=.6ce32f15-a4d6-44d5-9b9e-c3015de45e66@github.com> References: <3-nOaxvBWIcOLzCOlrWPzJtsYRknVz5JIwx21X8xkIg=.6ce32f15-a4d6-44d5-9b9e-c3015de45e66@github.com> Message-ID: On Fri, 13 Sep 2024 08:21:54 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Hide log timestamps in test to prevent false failures I went over the oops/ directory and added a few cleanup requests and comments. src/hotspot/share/oops/instanceOop.hpp line 43: > 41: } else { > 42: return sizeof(instanceOopDesc); > 43: } This entire function can be removed. It returns the same value as oopDesc::base_offset_in_bytes(), but in a slightly different way. src/hotspot/share/oops/markWord.hpp line 171: > 169: return mask_bits(value(), lock_mask_in_place | self_fwd_mask_in_place) >= static_cast(marked_value); > 170: } > 171: Suggestion to retain code layout. Suggestion: src/hotspot/share/oops/markWord.inline.hpp line 29: > 27: > 28: #include "oops/markWord.hpp" > 29: #include "oops/compressedOops.inline.hpp" Suggestion: #include "oops/compressedOops.inline.hpp" #include "oops/markWord.hpp" src/hotspot/share/oops/objArrayKlass.cpp line 146: > 144: > 145: size_t ObjArrayKlass::oop_size(oop obj) const { > 146: // In this assert, we cannot safely access the Klass* with compact headers. I would like a comment stating that this assert is turned of because size_give_klass calls oop_size on an object that might be concurrently forwarded. src/hotspot/share/oops/oop.cpp line 158: > 156: // Only has a klass gap when compressed class pointers are used and not > 157: // using compact headers. > 158: return UseCompressedClassPointers && !UseCompactObjectHeaders; This comment can just be removed. src/hotspot/share/oops/oop.hpp line 340: > 338: // field offset. Use an offset halfway into the markWord, as the markWord is never > 339: // partially loaded from C2. > 340: return 4; I asked around to see what people felt about dropping references to mark_offset_in_bytes(), which we know is 0. There was a request to strive to use mark_offset_in_bytes() for clarity. Suggestion: return mark_offset_in_bytes() + 4; src/hotspot/share/oops/oop.hpp line 349: > 347: static int klass_gap_offset_in_bytes() { > 348: assert(has_klass_gap(), "only applicable to compressed klass pointers"); > 349: assert(!UseCompactObjectHeaders, "don't use klass_gap_offset_in_bytes() with compact headers"); This assert is implied by `has_klass_gap()`. I don't see the need to repeat it here. src/hotspot/share/oops/oop.hpp line 363: > 361: return sizeof(markWord) + sizeof(Klass*); > 362: } > 363: } Not a strong request for this PR, but there are many places that calculates almost the same thing, and it might be good to limit the number of places we do similar calculations. I'm wondering if it wouldn't be better for readability to structure the code as follows: static int header_size_in_bytes() { if (UseCompactObjectHeaders) { return sizeof(markWord); } else if (UseCompressedClassPointers) { return sizeof(markWord) + sizeof(narrowKlass); } else { return sizeof(markWord) + sizeof(Klass*); } } // Size of object header, aligned to platform wordSize static int header_size() { return align_up(header_size_in_bytes(), HeapWordSize) / HeapWordSize; } ... static int base_offset_in_bytes() { return header_size_in_bytes(); } src/hotspot/share/oops/oop.inline.hpp line 161: > 159: > 160: void oopDesc::set_klass_gap(HeapWord* mem, int v) { > 161: assert(!UseCompactObjectHeaders, "don't set Klass* gap with compact headers"); We might want to consider just simplifying the function to: void oopDesc::set_klass_gap(HeapWord* mem, int v) { assert(has_klass_gap(), "precondition"); *(int*)(((char*)mem) + klass_gap_offset_in_bytes()) = v; } src/hotspot/share/oops/oop.inline.hpp line 295: > 293: // Used by scavengers > 294: void oopDesc::forward_to(oop p) { > 295: assert(cast_from_oop(p) != this, Do we really need the cast here? ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2302542279 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758503206 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758482703 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758505713 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758479437 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758478106 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758472909 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758474349 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758528515 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758538380 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758540055 From stefank at openjdk.org Fri Sep 13 09:44:20 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 13 Sep 2024 09:44:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Mon, 9 Sep 2024 12:17:17 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Try to avoid lea in loadNklass (aarch64) >> - Fix release build error > > src/hotspot/share/oops/oop.cpp line 230: > >> 228: // disjunct below to fail if the two comparands are computed across such >> 229: // a concurrent change. >> 230: return Universe::heap()->is_stw_gc_active() && klass->is_objArray_klass() && is_forwarded() && (UseParallelGC || UseG1GC); > > Is this still true after the recent changes like JDK-8311163? It might be worth waiting for. That bug doesn't fix all cases where the the length field is modified. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758477168 From shade at openjdk.org Fri Sep 13 10:13:03 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 13 Sep 2024 10:13:03 GMT Subject: RFR: 8340009: Improve the output from assert_different_registers In-Reply-To: <_lgzZBTM5wRKfgsMXdgbdR45SxD6WJZprnJBw6Gk1SU=.78dd43ee-4984-4037-b828-5fd8fb1ce791@github.com> References: <_lgzZBTM5wRKfgsMXdgbdR45SxD6WJZprnJBw6Gk1SU=.78dd43ee-4984-4037-b828-5fd8fb1ce791@github.com> Message-ID: On Thu, 12 Sep 2024 12:56:13 GMT, Stefan Karlsson wrote: > `assert_different_registers` is a mechanism we use to ensure that we don't use the same register in different variables. When the assert triggers it is not immediately clear where and why the assert failed. > > For example, if I introduce an intentional violation: > > diff --git a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp > index fde868a64b3..551878ac09d 100644 > --- a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp > +++ b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp > @@ -1188,7 +1188,8 @@ void MacroAssembler::lookup_interface_method(Register recv_klass, > Register scan_temp, > Label& L_no_such_interface, > bool return_method) { > - assert_different_registers(recv_klass, intf_klass, scan_temp); > + Register joker = intf_klass; > + assert_different_registers(recv_klass, intf_klass, scan_temp, joker); > assert_different_registers(method_result, intf_klass, scan_temp); > assert(recv_klass != method_result || !return_method, > "recv_klass can be destroyed when method isn't needed"); > > I get this error message: > > # Internal Error (src/hotspot/share/asm/register.hpp:287), pid=42568, tid=9731 > # assert(!regs[i]->is_valid() || regs[i] != regs[j]) failed: Multiple uses of register: c_rarg0 > > The indicated file and line number refers to the `assert_different_registers` implementation and not the offending call site. More over, it's unclear from the assert which of the four variables contain the same register. > > I'd like to propose a few changes: > 1) That we report the indices of the conflicting registers > 2) That we report the correct file and line number > 3) That we hide the is_valid() check to lower the noise in the output. Not strictly necessary, but I think it looks nicer. > > After these suggestions we'll get error messages that look like this: > > # Internal Error (src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp:1187), pid=59065, tid=8963 > # assert(regs[i] != regs[j]) failed: regs[1] and regs[3] are both: c_rarg0 > > Which makes it easy to see that variables 1 and 3 are conflicting and by looking at the indicated file and line, it is clear that it is `intf_klass` and `joker` that are the offending variables. > > There might be a way to use more macros to propagate the variable names, but I propose that we start with this incremental improvement. As a frequent user of this assert, I sure appreciate this quality of life improvement. It would be super-awesome if assert printed the "actual" source names for registers that clash, but that likely gets us into a very hairy macro business. This is already a huge step forward. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20965#pullrequestreview-2302723611 From shade at openjdk.org Fri Sep 13 10:27:23 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 13 Sep 2024 10:27:23 GMT Subject: RFR: 8340105: Expose BitMap::print_on in release builds Message-ID: A small quality of life improvement. This irritates me often enough when I am looking at various bitmaps in release builds. BitMap::print_on is not available in release builds, and bitmaps in debug builds are sometimes different than the ones in release builds. This often forces me to do additional hack to expose it. I think it should just be available in release builds to begin with. ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/20995/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20995&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340105 Stats: 9 lines in 2 files changed: 1 ins; 8 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20995.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20995/head:pull/20995 PR: https://git.openjdk.org/jdk/pull/20995 From stuefe at openjdk.org Fri Sep 13 11:03:04 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 13 Sep 2024 11:03:04 GMT Subject: RFR: 8340105: Expose BitMap::print_on in release builds In-Reply-To: References: Message-ID: <5VKiskqPic3u_wRNtdHGsBAT4NiPP-uSG5-7YBfT2wE=.90d9fbb9-0423-46c3-b3d9-8f17c24e9e92@github.com> On Fri, 13 Sep 2024 10:21:51 GMT, Aleksey Shipilev wrote: > A small quality of life improvement. This irritates me often enough when I am looking at various bitmaps in release builds. BitMap::print_on is not available in release builds, and bitmaps in debug builds are sometimes different than the ones in release builds. This often forces me to do additional hack to expose it. I think it should just be available in release builds to begin with. Good. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20995#pullrequestreview-2302821554 From tschatzl at openjdk.org Fri Sep 13 11:15:15 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 13 Sep 2024 11:15:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Fri, 13 Sep 2024 09:00:32 GMT, Stefan Karlsson wrote: >> src/hotspot/share/oops/oop.cpp line 230: >> >>> 228: // disjunct below to fail if the two comparands are computed across such >>> 229: // a concurrent change. >>> 230: return Universe::heap()->is_stw_gc_active() && klass->is_objArray_klass() && is_forwarded() && (UseParallelGC || UseG1GC); >> >> Is this still true after the recent changes like JDK-8311163? It might be worth waiting for. > > That bug doesn't fix all cases where the the length field is modified. Which ones are remaining? JDK-8337709 implemented the same change for G1 GC before JDK-8311163. The full collectors/g1 marking do not modify the length fields but have multiple separate queues which is a different issue. It will also be handled by the new `PartialArrayTaskStepper`, but should be of no concern here. If I am not missing some case, this whole method is unnecessary now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758672296 From bulasevich at openjdk.org Fri Sep 13 12:38:21 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Fri, 13 Sep 2024 12:38:21 GMT Subject: RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM [v2] In-Reply-To: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> References: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> Message-ID: > With this change, I have adjusted the default settings for CodeCacheSegmentSize and CodeEntryAlignment for AARCH and ARM32. The main goal is to improve code density by reducing the number of wasted bytes (approximately **4%** waste). Improving code density may also have the side effect of boosting performance in large applications > > Each nmethod occupies a number of code cache segments (minimum allocation blocks). Since the size of an nmethod is not aligned to 128 bytes, the last occupied segment is half empty. Reducing the size of the code cache segments correspondingly minimizes waste. However, we should be careful about reducing the CodeCacheSegmentSize too much, as smaller segment sizes will increase the overhead of the CodeHeap::_segmap bitmap. A CodeCacheSegmentSize of 64 seems to be an optimal balance. > > The current large default value for CodeCacheSegmentSize (64+64) was historically introduced with the comment "Tiered compilation has large code-entry alignment" which doesn't make much sense to me. The history of this comment and value is as follows: > - The PPC port was introduced with CodeEntryAlignment=128 (recently reduced to 64: https://github.com/openjdk/jdk/commit/09a78b5d) and CodeCacheSegmentSize was adjusted accordingly for that platform. > - Soon after, the 128-byte alignment was applied to all platforms to hide a debug mode warning (https://github.com/openjdk/jdk/commit/e8bc971d). Despite the change (and Segmented Code Cache introduced later), the warning can still be reproduced today using the -XX:+VerifyCodeCache fastdebug option in large applications (10K nmethods ~ 10K free blocks in between them). > > I believe it is time to remove the comment and update the default value. > > I also suggest updating the default CodeEntryAlignment value for AARCH. The current setting is much larger than for x86 and was likely based on the typical cache line size of 64 bytes. Cortex-A57, A72 architecture software optimisation guides recommend a 32-byte alignment for subroutine entry points. Neoverse architecture software optimisation guides do not mention recommended entry point alignment. > > For reference, the default [function_align setting in GCC](https://github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/tuning_models/cortexa72.h#L44) is typically 16 or 32 bytes, depending on the target architecture. > > Hotspot performance tests with -XX:CodeCacheSegmentSize=64 and -XX:CodeEntryAlignment=16 options showed the following results: > - No performance impact on ... Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: CodeEntryAlignment=16: limit to V1 and V2 only ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20864/files - new: https://git.openjdk.org/jdk/pull/20864/files/88aa6db6..10ecb900 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20864&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20864&range=00-01 Stats: 7 lines in 1 file changed: 3 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20864.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20864/head:pull/20864 PR: https://git.openjdk.org/jdk/pull/20864 From bulasevich at openjdk.org Fri Sep 13 12:38:21 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Fri, 13 Sep 2024 12:38:21 GMT Subject: RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM In-Reply-To: References: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> Message-ID: On Thu, 12 Sep 2024 23:23:51 GMT, Vladimir Kozlov wrote: >>> Were performance runs made with CodeEntryAlignment set to other than 64 or 16? It seems like the other choices (32, 128, are there others that make sense?) should be tried. >> >> Here are rough neoverse-v2 numbers: >> JmhDotty (-XX:CodeEntryAlignment=16) 701.93 ? 5.00 ms/op >> JmhDotty (-XX:CodeEntryAlignment=32) 703.56 ? 5.15 ms/op >> JmhDotty (-XX:CodeEntryAlignment=64) 704.46 ? 5.18 ms/op >> JmhDotty (-XX:CodeEntryAlignment=128) 703.71 ? 5.17 ms/op > > Hi @bulasevich > We got significant (few percents) regression when testing -XX:CodeCacheSegmentSize=64 -XX:CodeEntryAlignment=16 on Ampere system which has N1 CPU. > Is it possible to not set CodeEntryAlignment=16 for N1? > May be even limit it for V1 and V2 only? @vnkozlov Many thanks! Do you reproduce the regression on a public benchmark that I can also try? Now I restrict CodeEntryAlignment=16 for V1 and V2 only. And I restart my performance tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20864#issuecomment-2348852151 From stuefe at openjdk.org Fri Sep 13 12:51:17 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 13 Sep 2024 12:51:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v14] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: On Tue, 10 Sep 2024 12:35:42 GMT, Thomas Stuefe wrote: >> src/hotspot/share/oops/compressedKlass.cpp line 116: >> >>> 114: _range = end - _base; >>> 115: >>> 116: DEBUG_ONLY(assert_is_valid_encoding(addr, len, _base, _shift);) >> >> Can you refactor so the aarch64 path runs this same code without duplication? > > In tinycp mode, aarch64 runs this code though? The aarch64 variant of pd_initialize just returns then. In non-COH mode (preexisting, not touched by this patch) Aarch64 needs its own handling. I refactored: Now we should have no duplication (once my patch hits Romans PR branch) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758800913 From stefank at openjdk.org Fri Sep 13 12:51:18 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 13 Sep 2024 12:51:18 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Fri, 13 Sep 2024 11:10:58 GMT, Thomas Schatzl wrote: >> That bug doesn't fix all cases where the the length field is modified. > > Which ones are remaining? JDK-8337709 implemented the same change for G1 GC before JDK-8311163. > > The full collectors/g1 marking do not modify the length fields but have multiple separate queues which is a different issue. It will also be handled by the new `PartialArrayTaskStepper`, but should be of no concern here. > > If I am not missing some case, this whole method is unnecessary now. If you've already fixed this for GC then I agree that we could remove this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758805418 From stefank at openjdk.org Fri Sep 13 12:51:18 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 13 Sep 2024 12:51:18 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Fri, 13 Sep 2024 12:47:09 GMT, Stefan Karlsson wrote: >> Which ones are remaining? JDK-8337709 implemented the same change for G1 GC before JDK-8311163. >> >> The full collectors/g1 marking do not modify the length fields but have multiple separate queues which is a different issue. It will also be handled by the new `PartialArrayTaskStepper`, but should be of no concern here. >> >> If I am not missing some case, this whole method is unnecessary now. > > If you've already fixed this for GC then I agree that we could remove this. This seems like something that should be done as a separate patch that gets pushed before this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758808115 From stefank at openjdk.org Fri Sep 13 12:54:05 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 13 Sep 2024 12:54:05 GMT Subject: RFR: 8340009: Improve the output from assert_different_registers In-Reply-To: References: <_lgzZBTM5wRKfgsMXdgbdR45SxD6WJZprnJBw6Gk1SU=.78dd43ee-4984-4037-b828-5fd8fb1ce791@github.com> Message-ID: On Thu, 12 Sep 2024 13:03:15 GMT, Axel Boldt-Christmas wrote: >> `assert_different_registers` is a mechanism we use to ensure that we don't use the same register in different variables. When the assert triggers it is not immediately clear where and why the assert failed. >> >> For example, if I introduce an intentional violation: >> >> diff --git a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp >> index fde868a64b3..551878ac09d 100644 >> --- a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp >> +++ b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp >> @@ -1188,7 +1188,8 @@ void MacroAssembler::lookup_interface_method(Register recv_klass, >> Register scan_temp, >> Label& L_no_such_interface, >> bool return_method) { >> - assert_different_registers(recv_klass, intf_klass, scan_temp); >> + Register joker = intf_klass; >> + assert_different_registers(recv_klass, intf_klass, scan_temp, joker); >> assert_different_registers(method_result, intf_klass, scan_temp); >> assert(recv_klass != method_result || !return_method, >> "recv_klass can be destroyed when method isn't needed"); >> >> I get this error message: >> >> # Internal Error (src/hotspot/share/asm/register.hpp:287), pid=42568, tid=9731 >> # assert(!regs[i]->is_valid() || regs[i] != regs[j]) failed: Multiple uses of register: c_rarg0 >> >> The indicated file and line number refers to the `assert_different_registers` implementation and not the offending call site. More over, it's unclear from the assert which of the four variables contain the same register. >> >> I'd like to propose a few changes: >> 1) That we report the indices of the conflicting registers >> 2) That we report the correct file and line number >> 3) That we hide the is_valid() check to lower the noise in the output. Not strictly necessary, but I think it looks nicer. >> >> After these suggestions we'll get error messages that look like this: >> >> # Internal Error (src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp:1187), pid=59065, tid=8963 >> # assert(regs[i] != regs[j]) failed: regs[1] and regs[3] are both: c_rarg0 >> >> Which makes it easy to see that variables 1 and 3 are conflicting and by looking at the indicated file and line, it is clear that it is `intf_klass` and `joker` that are the offending variables. >> >> There might be a way to use more macros to propagate the variable names, but I propose that we start with this incre... > > A nice improvement. Thanks for reviewing! @xmas92 has a prototype that prints the "actual" source names, but it uses large macro expansions, so I'm not sure we would take that in. Maybe he can show it as a curiosity. :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20965#issuecomment-2348887162 From rkennke at openjdk.org Fri Sep 13 12:56:17 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 13 Sep 2024 12:56:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v14] In-Reply-To: References: <3-nOaxvBWIcOLzCOlrWPzJtsYRknVz5JIwx21X8xkIg=.6ce32f15-a4d6-44d5-9b9e-c3015de45e66@github.com> Message-ID: On Fri, 13 Sep 2024 09:39:23 GMT, Stefan Karlsson wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Hide log timestamps in test to prevent false failures > > src/hotspot/share/oops/oop.inline.hpp line 295: > >> 293: // Used by scavengers >> 294: void oopDesc::forward_to(oop p) { >> 295: assert(cast_from_oop(p) != this, > > Do we really need the cast here? Yes, otherwise compiler complains about ambiguous != operator. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758815451 From stefank at openjdk.org Fri Sep 13 12:57:04 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 13 Sep 2024 12:57:04 GMT Subject: RFR: 8340105: Expose BitMap::print_on in release builds In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 10:21:51 GMT, Aleksey Shipilev wrote: > A small quality of life improvement. This irritates me often enough when I am looking at various bitmaps in release builds. BitMap::print_on is not available in release builds, and bitmaps in debug builds are sometimes different than the ones in release builds. This often forces me to do additional hack to expose it. I think it should just be available in release builds to begin with. Looks good. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20995#pullrequestreview-2303052668 From rkennke at openjdk.org Fri Sep 13 13:03:16 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 13 Sep 2024 13:03:16 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v14] In-Reply-To: References: <3-nOaxvBWIcOLzCOlrWPzJtsYRknVz5JIwx21X8xkIg=.6ce32f15-a4d6-44d5-9b9e-c3015de45e66@github.com> Message-ID: On Fri, 13 Sep 2024 09:31:39 GMT, Stefan Karlsson wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Hide log timestamps in test to prevent false failures > > src/hotspot/share/oops/oop.hpp line 363: > >> 361: return sizeof(markWord) + sizeof(Klass*); >> 362: } >> 363: } > > Not a strong request for this PR, but there are many places that calculates almost the same thing, and it might be good to limit the number of places we do similar calculations. > > I'm wondering if it wouldn't be better for readability to structure the code as follows: > > static int header_size_in_bytes() { > if (UseCompactObjectHeaders) { > return sizeof(markWord); > } else if (UseCompressedClassPointers) { > return sizeof(markWord) + sizeof(narrowKlass); > } else { > return sizeof(markWord) + sizeof(Klass*); > } > } > > // Size of object header, aligned to platform wordSize > static int header_size() { > return align_up(header_size_in_bytes(), HeapWordSize) / HeapWordSize; > } > ... > static int base_offset_in_bytes() { > return header_size_in_bytes(); > } Ok. I filed: https://bugs.openjdk.org/browse/JDK-8340118 for now, let's see if I can sort this out before integrating this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758825458 From szaldana at openjdk.org Fri Sep 13 13:07:26 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Fri, 13 Sep 2024 13:07:26 GMT Subject: RFR: 8336874: WhiteBoxAPI: assert(!method->is_abstract() && (osr_bci == InvocationEntryBci || !method->is_native())) failed: cannot compile abstract/native methods Message-ID: Hi all, This PR addresses [8336874](https://bugs.openjdk.org/browse/JDK-8336874) ensuring enqueuing an abstract method for compilation doesn't hit an assert with WhiteBox. Testing: - [x] Added test case passes. Thanks, Sonia ------------- Commit messages: - 8336874: WhiteBoxAPI: assert(!method->is_abstract() && (osr_bci == InvocationEntryBci || !method->is_native())) failed: cannot compile abstract/native methods Changes: https://git.openjdk.org/jdk/pull/20973/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20973&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8336874 Stats: 60 lines in 2 files changed: 60 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20973.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20973/head:pull/20973 PR: https://git.openjdk.org/jdk/pull/20973 From rkennke at openjdk.org Fri Sep 13 13:11:45 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 13 Sep 2024 13:11:45 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Various touch-ups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/69f1ef1d..990926f5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=13-14 Stats: 25 lines in 8 files changed: 3 ins; 17 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From coleenp at openjdk.org Fri Sep 13 13:15:07 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 13 Sep 2024 13:15:07 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v4] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 21:02:41 GMT, Matias Saavedra Silva wrote: >> This patch cleans up the use of `get_new_method()` so callers don't have to worry about throwing `NoSuchMethodError`. The method is refactored to throw the error and avoid ever returning nullptr. Verified with tier1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Coleen suggestion So maybe instead of returning Unsafe::no_such_method_exception() we should just have a fatal error with the message that the method has been deleted. The user is privileged by using redefinition and is using a deprecated option. Maybe that's the best response. Opinions? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20874#issuecomment-2348933420 From stefank at openjdk.org Fri Sep 13 13:18:16 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 13 Sep 2024 13:18:16 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v14] In-Reply-To: References: <3-nOaxvBWIcOLzCOlrWPzJtsYRknVz5JIwx21X8xkIg=.6ce32f15-a4d6-44d5-9b9e-c3015de45e66@github.com> Message-ID: On Fri, 13 Sep 2024 12:53:29 GMT, Roman Kennke wrote: >> src/hotspot/share/oops/oop.inline.hpp line 295: >> >>> 293: // Used by scavengers >>> 294: void oopDesc::forward_to(oop p) { >>> 295: assert(cast_from_oop(p) != this, >> >> Do we really need the cast here? > > Yes, otherwise compiler complains about ambiguous != operator. OK, we shouldn't need to. It seems like I can silence the compiler by tweaking oopsHierarchy.hpp. I'll deal with that as a follow-up. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758853099 From fbredberg at openjdk.org Fri Sep 13 13:19:26 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Fri, 13 Sep 2024 13:19:26 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v2] In-Reply-To: References: Message-ID: <7GRqYZv0orxGXAMb7ldeV5skNK7FraE-ovnhwlMDKNg=.0018fbb2-36ed-4870-80f6-6aaf026cd9f2@github.com> > Removed the concept of an ObjectMonitor Responsible thread. > > The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. > > The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). > > After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. > > Passes tier1-tier7 on supported platforms. > x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. > Arm32 and Zero doesn't need any changes as far as I can tell. Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: Update one, after the review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19454/files - new: https://git.openjdk.org/jdk/pull/19454/files/f9ddfc6f..d2c6db2b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19454&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19454&range=00-01 Stats: 55 lines in 5 files changed: 18 ins; 25 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/19454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19454/head:pull/19454 PR: https://git.openjdk.org/jdk/pull/19454 From fbredberg at openjdk.org Fri Sep 13 13:19:26 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Fri, 13 Sep 2024 13:19:26 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v2] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 14:39:37 GMT, Fei Yang wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update one, after the review > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 237: > >> 235: ld(t0, Address(tmp, ObjectMonitor::EntryList_offset())); >> 236: ld(disp_hdr, Address(tmp, ObjectMonitor::cxq_offset())); >> 237: orr(t0, t0, disp_hdr); > > It looks better to me if we use `tmp1Reg` here instead of its alias `disp_hdr` like you do for aarch64. I mean: > > ld(tmp1Reg, Address(tmp, ObjectMonitor::cxq_offset())); > orr(t0, t0, tmp1Reg); fixed > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 568: > >> 566: ld(tmp3_t, Address(tmp1_monitor, ObjectMonitor::cxq_offset())); >> 567: orr(t0, t0, tmp3_t); >> 568: beqz(t0, unlocked); // If so we are done. > > You might want to remove the preceding definition of label `release` as it is not used after this change. fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1758852591 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1758852979 From fbredberg at openjdk.org Fri Sep 13 13:19:27 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Fri, 13 Sep 2024 13:19:27 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v2] In-Reply-To: References: Message-ID: <-Lnuhf9nMjR0cMI1hdhFJjjdgQTbYvJjYtbp8OO03Rg=.401d0c20-2716-4e7a-97e6-0ca3c81e8096@github.com> On Wed, 11 Sep 2024 12:04:58 GMT, Fredrik Bredberg wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 482: >> >>> 480: // This is faster on Nehalem and AMD Shanghai/Barcelona. >>> 481: // See https://blogs.oracle.com/dave/entry/instruction_selection_for_volatile_fences >>> 482: lock(); addl(Address(rsp, 0), 0); >> >> Since there's a membar above, do you need this lock/addl instructions? > > Well spotted @coleenp! No it's not needed. It was meant to be replaced by `membar(StoreLoad)`, which as @xmas92 wrote, does exactly that. Also, since I use `membar(StoreLoad)` in all other platforms, I wanted it to be consistent. fixed >> src/hotspot/share/runtime/objectMonitor.cpp line 353: >> >>> 351: >>> 352: void ObjectMonitor::enter_for_with_contention_mark(JavaThread* locking_thread, ObjectMonitorContentionMark& contention_mark) { >>> 353: DEBUG_ONLY(bool success = ) ObjectMonitor::enterI_with_contention_mark(locking_thread, contention_mark); >> >> This is kind of noisy with DEBUG_ONLY. If you remove DEBUG_ONLY, does the windows compiler complain that you're not using the variable success in the product build? > > I don't know. I come from a planet where warnings was errors, and just brought along the old habit to my new planet. I'll check with the Windows compiler. fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1758844955 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1758850687 From fbredberg at openjdk.org Fri Sep 13 13:19:27 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Fri, 13 Sep 2024 13:19:27 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v2] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 14:22:35 GMT, Coleen Phillimore wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update one, after the review > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 510: > >> 508: >> 509: // Memory barrier/fence >> 510: // Dekker pivot point -- fulcrum : ST Owner; MEMBAR; LD Succ > > I think you should delete this whole comment block. The source code control system will remember this comment about Nehalem and AMD Shanghai/Barcelona. fixed > src/hotspot/share/runtime/javaThread.hpp line 620: > >> 618: >> 619: // Support for SharedRuntime::monitor_exit_helper() >> 620: ObjectMonitor* unlocked_inflated_monitor() { return _unlocked_inflated_monitor; } > > Can you make this a const method? fixed > src/hotspot/share/runtime/objectMonitor.cpp line 904: > >> 902: } >> 903: >> 904: assert(_succ != current, "invariant"); > > This assert seems unnecessary since it's just reset above. fixed > src/hotspot/share/runtime/objectMonitor.cpp line 1243: > >> 1241: ObjectMonitorContentionMark contention_mark(this); >> 1242: >> 1243: if (contentions() < 0) { > > You should use is_being_async_deflated() here instead of contentions() < 0. fixed > src/hotspot/share/runtime/objectMonitor.cpp line 1244: > >> 1242: >> 1243: if (contentions() < 0) { >> 1244: assert((intptr_t(_EntryList)|intptr_t(_cxq)) == 0 || _succ != nullptr, ""); > > Please add a space between | in this expression. fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1758844352 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1758846833 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1758848201 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1758849327 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1758855247 From fbredberg at openjdk.org Fri Sep 13 13:19:27 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Fri, 13 Sep 2024 13:19:27 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v2] In-Reply-To: References: <58EiqcAfncGTnk6wqesW4ZsVyt3Js02aiOpEbl4HCwI=.148ffb62-7571-4aa3-b0dd-996022290e9b@github.com> Message-ID: On Wed, 11 Sep 2024 13:22:20 GMT, Fredrik Bredberg wrote: >> Me too. So many functions that are sort of the same. > > As I wrote above, the confusing renaming has to be fixed. fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1758852256 From fbredberg at openjdk.org Fri Sep 13 13:19:28 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Fri, 13 Sep 2024 13:19:28 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v2] In-Reply-To: <98VDCDWoHHv8YxGMOqMqIGBOMBrR9tDhCMn51LcOVhU=.fae4048e-2dd7-48c7-8a81-393fa6e8a607@github.com> References: <98VDCDWoHHv8YxGMOqMqIGBOMBrR9tDhCMn51LcOVhU=.fae4048e-2dd7-48c7-8a81-393fa6e8a607@github.com> Message-ID: <5FPr8pmX7Dkipk6wSAvL_-rLH3xaXrqTr1OnRNepbGw=.ed4aa8d1-a440-4a13-8727-7c1c84a8ed3f@github.com> On Wed, 11 Sep 2024 07:15:29 GMT, Axel Boldt-Christmas wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update one, after the review > > src/hotspot/share/runtime/objectMonitor.cpp line 313: > >> 311: // The monitor is private to or already owned by locking_thread which must be suspended. >> 312: // So this code may only contend with deflation. >> 313: assert(locking_thread == Thread::current() || locking_thread->is_obj_deopt_suspend(), "must be"); > > I feel like the comments and assert now belong in `ObjectMonitor::enter_for_with_contention_mark`. > > `enterI_with_contention_mark` should be renamed. This is now a sort of `TryLock_with_contention_mark`. > > `add_to_contentions(1);` below could be changed to `contention_mark.extend()`. fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1758854361 From fbredberg at openjdk.org Fri Sep 13 13:19:28 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Fri, 13 Sep 2024 13:19:28 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v2] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 13:38:19 GMT, Coleen Phillimore wrote: >> Yes @coleenp, you did previously object to calling `enter_for()` from `TryLock()`, which is why it is what it is today. >> I'm not too proud of how it turned out, and as @dholmes-ora also pointed out , the naming is a bit confusing, so that needs to be fixed. > > I was actually confused because there's an enter_for() in all of the synchronizer files and didn't realize you were calling the one in ObjectMonitor. fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1758851794 From tschatzl at openjdk.org Fri Sep 13 13:51:16 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 13 Sep 2024 13:51:16 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Fri, 13 Sep 2024 12:48:53 GMT, Stefan Karlsson wrote: >> If you've already fixed this for GC then I agree that we could remove this. > > This seems like something that should be done as a separate patch that gets pushed before this PR. Will remove in JDK-8340119. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1758906485 From lmesnik at openjdk.org Fri Sep 13 14:36:12 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 13 Sep 2024 14:36:12 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v13] In-Reply-To: <22lYio5OB8MOBKY-gqS_wD1xr8Lw7UYt3o1bNxYNvHM=.362420b2-0c82-4af8-8ed3-19dce4c6f163@github.com> References: <78GGyYgosP2SNVyxaySJ7ufQ9PXY4M2zHYcvfd2x5bs=.799f56ca-07f3-4d8e-a1c5-d7046023f45e@github.com> <0I-N2xnYHM8nWzgKGL-5DrYhd5G95hYHxqGQNTz93YE=.e8450cbd-aac3-4b88-9e35-b76887ce4b13@github.com> <22lYio5OB8MOBKY-gqS_wD1xr8Lw7UYt3o1bNxYNvHM=.362420b2-0c82-4af8-8ed3-19dce4c6f163@github.com> Message-ID: On Fri, 13 Sep 2024 05:44:44 GMT, Liang Mao wrote: >> The fastdebug testing is very important for hotspot testing. Is it possible just to increase timeout so test pass with fastdebug? You can make 2 tests - 1 for product and 1 for fastdebug with different timeouts, if you want, but I would recommend just to increase the timeout. >> Also, please exclude test from tier1_serviceability in TEST.groups because it takes too many time for tier1. > > Fastdebug needs more than 30 minutes to finish and can hardly reproduce the crash. Do we still need that? The 30 min is too much, however some testing in debug is better then nothing. Can you add the test parameter "iterations" and set it to something reasonable .Then add test /** * @test * @bug 8339725 * @summary Stress test GetMethodDeclaringClass * @requires vm.jvmti * @requires (os.family == "linux") * @library /test/lib * @run driver/timeout=300 TestUnloadedClass */ and updat the check to if (titerations == 0) { output.shouldContain("OutOfMemoryError"); } So we have some testing in debug. The debug configurations are executed very often with different flags and still worth to run reduced testcase even if original bug is not reproduced. While the original testcase with no time/iteration limit is executed in product only. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1758983204 From jkarthikeyan at openjdk.org Fri Sep 13 14:42:08 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 13 Sep 2024 14:42:08 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v3] In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 07:52:38 GMT, Jatin Bhateja wrote: >> - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The >> condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. >> - This saves emitting an explicit MOVZX instruction after setCC. >> - These new instructions are encoded using 4 byte Extended EVEX encoding. >> >> Validation performed over stand alone test point using Intel SDE. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Review comments resolution. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8339790 > - Review resolutions. > - 8339790: Support Intel APX setzucc instruction. This looks good to me! ------------- Marked as reviewed by jkarthikeyan (Committer). PR Review: https://git.openjdk.org/jdk/pull/20920#pullrequestreview-2303317737 From epeter at openjdk.org Fri Sep 13 14:48:17 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 13 Sep 2024 14:48:17 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 6 Sep 2024 18:08:04 GMT, Jatin Bhateja wrote: >> test/jdk/jdk/incubator/vector/ShortMaxVectorTests.java line 1048: >> >>> 1046: return SHORT_GENERATOR_SELECT_FROM_TRIPLES.stream().map(List::toArray). >>> 1047: toArray(Object[][]::new); >>> 1048: } >> >> Just a control question: does this also occasionally generate examples with out-of-bounds indices? Negative out of bounds and positive out of bounds? > > Original API did throw IndexOutOfBoundsException, but later on we have moved away from exception throwing semantics to wrapping semantics. > Please find details at following comment > https://github.com/openjdk/jdk/pull/20508#issuecomment-2306344606 And do we test that the wrapping works correctly? >> test/jdk/jdk/incubator/vector/ShortMaxVectorTests.java line 5812: >> >>> 5810: ShortVector bv = ShortVector.fromArray(SPECIES, b, i); >>> 5811: ShortVector idxv = ShortVector.fromArray(SPECIES, idx, i); >>> 5812: idxv.selectFrom(av, bv).intoArray(r, i); >> >> Would this test catch a bug where the backend would generate vectors that are too long or too short? > > Existing vectorAPI inline expansion entry points explicitly pass lane type and count as intrinsic arguments, this is used to create concrete ideal vector types. That does not answer my question. If the backend operations you implemented would have the wrong vector-length: do we have any tests that would catch that? Often that requires not just going "up" with a loop but also "counting down" with the loop iv. Do you know what I mean? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1758999902 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1759002531 From epeter at openjdk.org Fri Sep 13 14:52:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 13 Sep 2024 14:52:11 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: <7bghGF2-qbhP1hJA2ljtdA3xSUSqiV0RLaOYm4AcZSQ=.eb3e36b2-5461-4755-ae71-2de89660649f@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7bghGF2-qbhP1hJA2ljtdA3xSUSqiV0RLaOYm4AcZSQ=.eb3e36b2-5461-4755-ae71-2de89660649f@github.com> Message-ID: On Tue, 3 Sep 2024 11:45:53 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding descriptive comments > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 544: > >> 542: byte[] vpayload1 = ((ByteVector)v1).vec(); >> 543: byte[] vpayload2 = ((ByteVector)v2).vec(); >> 544: byte[] vpayload3 = ((ByteVector)v3).vec(); > > Is there a reason you are not using more descriptive names here instead of `vpayload1`? > I also wonder if the `selectFromHelper` should not be named more specifically: `selectFromTwoVector(s)Helper`? You only gave me a thumbs up and no change - but comment resolved. Is that intentional? Makes me feel like you are ignoring my comments, and that discourages me from reviewing in the future. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1759008094 From epeter at openjdk.org Fri Sep 13 14:56:10 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 13 Sep 2024 14:56:10 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v8] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 6 Sep 2024 18:13:34 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review resolutions. Can you please **define** somewhere what it means to `prune indexes`? It does not help me much more than the previous "massaging indexes" you had before I asked you to change it. > Also: I'm a little worried about the semantics change of the RearrangeNode that you did with the changes in RearrangeNode::Ideal. It looks a little "hacky", especially in conjunction with the vector_indexes_needs_massaging method. Can you give a clear definition of the semantics of RearrangeNode and vector_indexes_needs_massaging, please? You have also not responded to this yet. It seems to me that before your proposed change, `RearrangeNode` had a clear and easy semantic, and now you somehow "hack it" to work with your `vector_indexes_needs_pruning`. Can you explain please why this makes sense and add a comment to `RearrangeNode` what its semantics is? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2349148857 From mdoerr at openjdk.org Fri Sep 13 15:11:10 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 13 Sep 2024 15:11:10 GMT Subject: RFR: 8340012: [C2] assert(KlassEncodingMetaspaceMax > pd) failed: change encoding max if new encoding after 8338526 In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 17:33:28 GMT, Vladimir Kozlov wrote: > Good. Can you add regression test? My current regression test is: Run applications/ctw/modules/java_xml.java on AIX. :-) Is there a way to force the metaspace part for such classes outside of the encoding range? Then, we may be able to write a dedicated regression test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20971#issuecomment-2349181358 From coleenp at openjdk.org Fri Sep 13 15:18:06 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 13 Sep 2024 15:18:06 GMT Subject: RFR: 8340012: [C2] assert(KlassEncodingMetaspaceMax > pd) failed: change encoding max if new encoding after 8338526 In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 17:16:08 GMT, Martin Doerr wrote: > After JDK-8338526, the transformation can only be done if the Klass* is in the encoding range (also see JBS issue for more details). I've also enhanced the assertion message. All you need is an interface class to go through this code. I wonder why it only failed on AIX? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20971#issuecomment-2349196395 From mdoerr at openjdk.org Fri Sep 13 15:32:04 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 13 Sep 2024 15:32:04 GMT Subject: RFR: 8340012: [C2] assert(KlassEncodingMetaspaceMax > pd) failed: change encoding max if new encoding after 8338526 In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 15:15:19 GMT, Coleen Phillimore wrote: > All you need is an interface class to go through this code. I wonder why it only failed on AIX? It fails on AIX because it allocates at different addresses and we get a class at 0x0a00010091130af8. I guess that the classes seen by this C2 code are still within the encodable range on all other platforms (including linux on PPC64). Note that x86 is not affected because it doesn't use the transformation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20971#issuecomment-2349225767 From mdoerr at openjdk.org Fri Sep 13 15:43:20 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 13 Sep 2024 15:43:20 GMT Subject: RFR: 8340012: [C2] assert(KlassEncodingMetaspaceMax > pd) failed: change encoding max if new encoding after 8338526 [v2] In-Reply-To: References: Message-ID: > After JDK-8338526, the transformation can only be done if the Klass* is in the encoding range (also see JBS issue for more details). I've also enhanced the assertion message. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Add sanity check. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20971/files - new: https://git.openjdk.org/jdk/pull/20971/files/a6a0bdf7..1b5147d3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20971&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20971&range=00-01 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20971.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20971/head:pull/20971 PR: https://git.openjdk.org/jdk/pull/20971 From stuefe at openjdk.org Fri Sep 13 15:43:21 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 13 Sep 2024 15:43:21 GMT Subject: RFR: 8340012: [C2] assert(KlassEncodingMetaspaceMax > pd) failed: change encoding max if new encoding after 8338526 [v2] In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 15:40:56 GMT, Martin Doerr wrote: >> After JDK-8338526, the transformation can only be done if the Klass* is in the encoding range (also see JBS issue for more details). I've also enhanced the assertion message. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add sanity check. Good catch. I was afraid we would hit things like this. I hope there are not more. src/hotspot/share/oops/compressedKlass.inline.hpp line 54: > 52: assert(check_alignment(v), "Address not aligned"); > 53: uint64_t pd = (uint64_t)(pointer_delta(v, narrow_base, 1)); > 54: assert(KlassEncodingMetaspaceMax > pd, "change encoding max if new encoding (Klass " PTR_FORMAT ", Base " PTR_FORMAT ")", p2i(v), p2i(narrow_base)); Would it be possible to leave this hunk out? I am in the process of rewriting large parts of this for Lilliput, so it won't survive in this current form. These things will be a lot better tested, and the assertion messages make more sense, too) ------------- PR Review: https://git.openjdk.org/jdk/pull/20971#pullrequestreview-2303454203 PR Review Comment: https://git.openjdk.org/jdk/pull/20971#discussion_r1759082597 From mdoerr at openjdk.org Fri Sep 13 15:43:21 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 13 Sep 2024 15:43:21 GMT Subject: RFR: 8340012: [C2] assert(KlassEncodingMetaspaceMax > pd) failed: change encoding max if new encoding after 8338526 In-Reply-To: References: Message-ID: <3_kI-zSNTCQROrljm039lgdEbsts5-w29LPK3dAlMrc=.7465c00f-1e1b-4217-8ba9-79c255faf96c@github.com> On Fri, 13 Sep 2024 15:15:19 GMT, Coleen Phillimore wrote: >> After JDK-8338526, the transformation can only be done if the Klass* is in the encoding range (also see JBS issue for more details). I've also enhanced the assertion message. > > All you need is an interface class to go through this code. I wonder why it only failed on AIX? > > > @coleenp, do you have an assert in your changes which check that only abstract and interface classes are outside encoding space? I concern that due to some bug we would get regular class outside. > > > > > > I only check these conditions when allocating the Klass. Maybe it could be added to CompressedKlassPointers::is_in_encoding_range(). > > @TheRealMDoerr can you add such assert? I've added a sanity check in the `ciKlass` version. `CompressedKlassPointers::is_in_encoding_range()` takes a `void*` and I don't want to assume that it will always point to the beginning of a Klass. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20971#issuecomment-2349249318 From stuefe at openjdk.org Fri Sep 13 15:47:05 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 13 Sep 2024 15:47:05 GMT Subject: RFR: 8340012: [C2] assert(KlassEncodingMetaspaceMax > pd) failed: change encoding max if new encoding after 8338526 In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 15:29:30 GMT, Martin Doerr wrote: > > All you need is an interface class to go through this code. I wonder why it only failed on AIX? > > It fails on AIX because it allocates at different addresses and we get a class at 0x0a00010091130af8. I guess that the classes seen by this C2 code are still within the encodable range on all other platforms (including linux on PPC64). Note that x86 is not affected because it doesn't use the transformation. Which CPUs are affected? Only PPC64? > > Good. Can you add regression test? > > My current regression test is: Run applications/ctw/modules/java_xml.java on AIX. :-) > > Is there a way to force the metaspace part for such classes outside of the encoding range? Then, we may be able to write a dedicated regression test. You can this: - use `CompressedClassSpaceBaseAddress` to enforce a low base address, e.g. 1GB. This also switches off CDS. It may not work, in which case you throw a SkippedException. But I use this in many tests. - use a small class space, e.g. 128MB. - class space will be `[1g .. 1g+128m)` - under most architectures (all apart aarch64), the resulting encoding scheme will use base=0 shift=0. Now you have limited the encoding range to `[0 .. 4gb)` . On aarch64, you will get, I believe, base=1g shift=0, `[1g...5g)`, still okay. - Normal metaspace lives in freely allocated mmaps, which usually float around somewhere distant. The chance that they are outside the range is very high. this works on Linux, but depends on `vm.mmap_min_addr` sysctl. You may have to lower that value. It should usually be very low though (64k or so) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20971#issuecomment-2349255902 From mdoerr at openjdk.org Fri Sep 13 15:52:11 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 13 Sep 2024 15:52:11 GMT Subject: RFR: 8340012: [C2] assert(KlassEncodingMetaspaceMax > pd) failed: change encoding max if new encoding after 8338526 [v2] In-Reply-To: References: Message-ID: <_OTVCsF_mEHj9EZFiz92DkogGg7SDQopX90PKx7xmp0=.6ea17b17-eeb8-4cf1-94b3-7fb9b9704586@github.com> On Fri, 13 Sep 2024 15:35:13 GMT, Thomas Stuefe wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Add sanity check. > > src/hotspot/share/oops/compressedKlass.inline.hpp line 54: > >> 52: assert(check_alignment(v), "Address not aligned"); >> 53: uint64_t pd = (uint64_t)(pointer_delta(v, narrow_base, 1)); >> 54: assert(KlassEncodingMetaspaceMax > pd, "change encoding max if new encoding (Klass " PTR_FORMAT ", Base " PTR_FORMAT ")", p2i(v), p2i(narrow_base)); > > Would it be possible to leave this hunk out? I am in the process of rewriting large parts of this for Lilliput, so it won't survive in this current form. These things will be a lot better tested, and the assertion messages make more sense, too) Thanks for taking a look! It would be possible to revert this part. But I don't understand the benefit. Wouldn't it be better to have this extra output for the time being? I don't know when Lilliput will make it to jdk head. It seems to be not targeted, yet. Or will your new code be ready soon and contributed separately? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20971#discussion_r1759102290 From mdoerr at openjdk.org Fri Sep 13 16:07:05 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 13 Sep 2024 16:07:05 GMT Subject: RFR: 8340012: [C2] assert(KlassEncodingMetaspaceMax > pd) failed: change encoding max if new encoding after 8338526 In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 15:44:35 GMT, Thomas Stuefe wrote: > Which CPUs are affected? Only PPC64? The C2 transformation is used on aarch64, ppc64, riscv, s390. The issue has been observed on AIX only so far. It may be possible to trigger it on other platforms by extra flags. > use CompressedClassSpaceBaseAddress to enforce a low base address, e.g. 1GB. Thanks for the hints! We need `CompressedKlassPointers::base() == nullptr` to trigger the problem. Otherwise, C2 will not use the transformation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20971#issuecomment-2349291099 From stuefe at openjdk.org Fri Sep 13 16:07:05 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 13 Sep 2024 16:07:05 GMT Subject: RFR: 8340012: [C2] assert(KlassEncodingMetaspaceMax > pd) failed: change encoding max if new encoding after 8338526 In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 15:44:35 GMT, Thomas Stuefe wrote: > > > All you need is an interface class to go through this code. I wonder why it only failed on AIX? > > > > > > It fails on AIX because it allocates at different addresses and we get a class at 0x0a00010091130af8. I guess that the classes seen by this C2 code are still within the encodable range on all other platforms (including linux on PPC64). Note that x86 is not affected because it doesn't use the transformation. > > Which CPUs are affected? Only PPC64? > > > > Good. Can you add regression test? > > > > > > My current regression test is: Run applications/ctw/modules/java_xml.java on AIX. :-) > > Is there a way to force the metaspace part for such classes outside of the encoding range? Then, we may be able to write a dedicated regression test. > > You can this: > > * use `CompressedClassSpaceBaseAddress` to enforce a low base address, e.g. 1GB. This also switches off CDS. It may not work, in which case you throw a SkippedException. But I use this in many tests. > * use a small class space, e.g. 128MB. > * class space will be `[1g .. 1g+128m)` > * under most architectures (all apart aarch64), the resulting encoding scheme will use base=0 shift=0. Now you have limited the encoding range to `[0 .. 4gb)` . On aarch64, you will get, I believe, base=1g shift=0, `[1g...5g)`, still okay. > * Normal metaspace lives in freely allocated mmaps, which usually float around somewhere distant. The chance that they are outside the range is very high. > > this works on Linux, but depends on `vm.mmap_min_addr` sysctl. You may have to lower that value. It should usually be very low though (64k or so) Another way to test this is to stress-test the JVM with -Xshare:off and a small class space size. Since 8312018 we do our best to allocate the class space in low address regions and run unscaled. And it usually works. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20971#issuecomment-2349295344 From stuefe at openjdk.org Fri Sep 13 16:07:06 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 13 Sep 2024 16:07:06 GMT Subject: RFR: 8340012: [C2] assert(KlassEncodingMetaspaceMax > pd) failed: change encoding max if new encoding after 8338526 [v2] In-Reply-To: <_OTVCsF_mEHj9EZFiz92DkogGg7SDQopX90PKx7xmp0=.6ea17b17-eeb8-4cf1-94b3-7fb9b9704586@github.com> References: <_OTVCsF_mEHj9EZFiz92DkogGg7SDQopX90PKx7xmp0=.6ea17b17-eeb8-4cf1-94b3-7fb9b9704586@github.com> Message-ID: On Fri, 13 Sep 2024 15:49:06 GMT, Martin Doerr wrote: >> src/hotspot/share/oops/compressedKlass.inline.hpp line 54: >> >>> 52: assert(check_alignment(v), "Address not aligned"); >>> 53: uint64_t pd = (uint64_t)(pointer_delta(v, narrow_base, 1)); >>> 54: assert(KlassEncodingMetaspaceMax > pd, "change encoding max if new encoding (Klass " PTR_FORMAT ", Base " PTR_FORMAT ")", p2i(v), p2i(narrow_base)); >> >> Would it be possible to leave this hunk out? I am in the process of rewriting large parts of this for Lilliput, so it won't survive in this current form. These things will be a lot better tested, and the assertion messages make more sense, too) > > Thanks for taking a look! It would be possible to revert this part. But I don't understand the benefit. Wouldn't it be better to have this extra output for the time being? I don't know when Lilliput will make it to jdk head. It seems to be not targeted, yet. > Or will your new code be ready soon and contributed separately? Okay, sure, maybe you are right. Merging it should be easy enough. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20971#discussion_r1759117593 From eosterlund at openjdk.org Fri Sep 13 16:10:09 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 13 Sep 2024 16:10:09 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v13] In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 00:33:34 GMT, Liang Mao wrote: >> Hi, >> >> It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. >> >> test/hotspot/jtreg/runtime and gc are clean. >> >> Thanks, >> Liang > > Liang Mao has updated the pull request incrementally with one additional commit since the last revision: > > change main/othervm to driver JVM code looks good. Thanks for fixing! ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20907#pullrequestreview-2303513802 From kvn at openjdk.org Fri Sep 13 16:12:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 13 Sep 2024 16:12:05 GMT Subject: RFR: 8340012: [C2] assert(KlassEncodingMetaspaceMax > pd) failed: change encoding max if new encoding after 8338526 [v2] In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 15:43:20 GMT, Martin Doerr wrote: >> After JDK-8338526, the transformation can only be done if the Klass* is in the encoding range (also see JBS issue for more details). I've also enhanced the assertion message. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add sanity check. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20971#pullrequestreview-2303518151 From asmehra at openjdk.org Fri Sep 13 16:13:10 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Fri, 13 Sep 2024 16:13:10 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v5] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Thu, 12 Sep 2024 21:43:24 GMT, Ioi Lam wrote: >> This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> **Overview** >> >> - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. >> - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. >> - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. >> - The boot classes are loaded as part of `vmClasses::resolve_all()` >> - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). >> - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. >> >> **All-or-nothing Loading** >> >> - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. >> - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: >> - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. >> - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. >> - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. >> - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exa... > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @ashu-mehra comments src/hotspot/share/cds/aotClassLinker.cpp line 227: > 225: } > 226: > 227: int AOTClassLinker::num_initiated_classes(oop loader1, oop loader2) { The two loader arguments here are quite confusing marking it hard to understand the code. Can it be refactored as this: int AOTClassLinker::num_platform_initiated_classes() { // AOTLinkedClassBulkLoader will initiate loading of all public boot classes in the platform loader. return num_initiated_classes(nullptr); } int AOTClassLinker::num_app_initiated_classes() { // AOTLinkedClassBulkLoader will initiate loading of all public boot/platform classes in the app loader. return num_platform_initiated_classes + num_initiated_classes(SystemDictionary::java_platform_loader()); } int AOTClassLinker::num_initiated_classes(oop loader) { int n = 0; for (int i = 0; i < _sorted_candidates->length(); i++) { InstanceKlass* ik = _sorted_candidates->at(i); if (ik->is_public() && !ik->is_hidden() && (ik->class_loader() == loader) { n++; } } return n; } src/hotspot/share/cds/aotLinkedClassBulkLoader.cpp line 199: > 197: InstanceKlass* ik = classes->at(i); > 198: assert(ik->is_loaded(), "must have already been loaded by a parent loader"); > 199: assert(ik->class_loader() != initiating_loader(), "must be a parent loader"); Can we also add an assert that ik->class_loader() must be either boot or platform loader. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1759128918 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1759129552 From jbhateja at openjdk.org Fri Sep 13 16:20:28 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Sep 2024 16:20:28 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v10] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Documentation change suggerstion from Paul ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/4a93042b..4301c817 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=08-09 Stats: 343 lines in 2 files changed: 173 ins; 8 del; 162 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From stuefe at openjdk.org Fri Sep 13 16:26:04 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 13 Sep 2024 16:26:04 GMT Subject: RFR: 8340012: [C2] assert(KlassEncodingMetaspaceMax > pd) failed: change encoding max if new encoding after 8338526 In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 16:02:49 GMT, Martin Doerr wrote: > > Which CPUs are affected? Only PPC64? > > The C2 transformation is used on aarch64, ppc64, riscv, s390. The issue has been observed on AIX only so far. It may be possible to trigger it on other platforms by extra flags. > > > use CompressedClassSpaceBaseAddress to enforce a low base address, e.g. 1GB. > > Thanks for the hints! We need `CompressedKlassPointers::base() == nullptr` to trigger the problem. Otherwise, C2 will not use the transformation. the described process should then work for ppc64, risc and s390 at least. Update: ah, I realize the misunderstanding. `CompressedClassSpaceBaseAddress` is badly named. It should be named `CompressedClassSpaceStartAddress` . It places the class space at that address, and then encoding scheme will be decided from that. With a low-placed and small class space you will get base=zero on most platforms. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20971#issuecomment-2349326342 From psandoz at openjdk.org Fri Sep 13 16:46:17 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 13 Sep 2024 16:46:17 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 6 Sep 2024 18:08:09 GMT, Jatin Bhateja wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Vector.java line 2770: >> >>> 2768: >>> 2769: /** >>> 2770: * Rearranges the lane elements of two vectors, selecting lanes >> >> I have a bit of a name concern here. Why are we calling it "select" and not "rearrange"? Because for a single "from" vector we also call it "rearrange", right? Is "select" not often synonymous to "blend", which works also with two "from" vectors, but with a mask and not indexing for "selection/rearranging"? > > We already have another flavor of [selectFrom](https://docs.oracle.com/en/java/javase/22/docs/api/jdk.incubator.vector/jdk/incubator/vector/Vector.html#selectFrom(jdk.incubator.vector.Vector)) which permutes single vector, new API extents its semantics to two vector selection, so we kept the nomenclature consistent. Select operates only on vectors where the `this` vector represents the indexes to *select* elements from the other vectors. Rearrange operates on vectors and a shuffle argument that *rearranges* elements from the other vectors. The former behavior can be specified in terms of the latter behavior, and ideally the equivalent expressions should result in ~same generated sequence of instructions. However, we are not there yet and need to further optimize shuffles to make that happen. But, we can optimize `selectFrom` with the dependent change to wrap indexes instead of throwing when out of bounds. (Separately there is an annoying issue with select, that we should not address in this PR. Using a Float/Double Vector for indexes is awkward.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1759182233 From kvn at openjdk.org Fri Sep 13 16:56:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 13 Sep 2024 16:56:05 GMT Subject: RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM In-Reply-To: References: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> Message-ID: On Fri, 13 Sep 2024 12:35:36 GMT, Boris Ulasevich wrote: > Do you reproduce the regression on a public benchmark that I can also try? It was our internal benchmark. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20864#issuecomment-2349416751 From mdoerr at openjdk.org Fri Sep 13 17:09:07 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 13 Sep 2024 17:09:07 GMT Subject: RFR: 8340012: [C2] assert(KlassEncodingMetaspaceMax > pd) failed: change encoding max if new encoding after 8338526 In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 16:21:21 GMT, Thomas Stuefe wrote: >>> Which CPUs are affected? Only PPC64? >> >> The C2 transformation is used on aarch64, ppc64, riscv, s390. The issue has been observed on AIX only so far. It may be possible to trigger it on other platforms by extra flags. >> >>> use CompressedClassSpaceBaseAddress to enforce a low base address, e.g. 1GB. >> >> Thanks for the hints! We need `CompressedKlassPointers::base() == nullptr` to trigger the problem. Otherwise, C2 will not use the transformation. > >> > Which CPUs are affected? Only PPC64? >> >> The C2 transformation is used on aarch64, ppc64, riscv, s390. The issue has been observed on AIX only so far. It may be possible to trigger it on other platforms by extra flags. >> >> > use CompressedClassSpaceBaseAddress to enforce a low base address, e.g. 1GB. >> >> Thanks for the hints! We need `CompressedKlassPointers::base() == nullptr` to trigger the problem. Otherwise, C2 will not use the transformation. > > the described process should then work for ppc64, risc and s390 at least. > > Update: ah, I realize the misunderstanding. > > `CompressedClassSpaceBaseAddress` is badly named. It should be named `CompressedClassSpaceStartAddress` . It places the class space at that address, and then encoding scheme will be decided from that. With a low-placed and small class space you will get base=zero on most platforms. I was able to reproduce the issue on linux ppc64le by `make run-test TEST="applications/ctw/modules/java_xml.java" JTREG="VM_OPTIONS=-Xshare:off"`. Unfortunately, I have no reproducer / regression test for any Oracle-supported platform. x86 is not affected and I couldn't get the metaspace configured as needed to trigger the error on aarch64 (as explained by @tstuefe). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20971#issuecomment-2349470004 From kvn at openjdk.org Fri Sep 13 17:16:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 13 Sep 2024 17:16:05 GMT Subject: RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM [v2] In-Reply-To: References: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> Message-ID: <8huOmVgK-3t54voB3eoEH2XnX1TSw-KUkni7ieNET8k=.cb4dfbe9-1c3c-4147-81c4-67c531eac4f9@github.com> On Fri, 13 Sep 2024 12:38:21 GMT, Boris Ulasevich wrote: >> With this change, I have adjusted the default settings for CodeCacheSegmentSize and CodeEntryAlignment for AARCH and ARM32. The main goal is to improve code density by reducing the number of wasted bytes (approximately **4%** waste). Improving code density may also have the side effect of boosting performance in large applications >> >> Each nmethod occupies a number of code cache segments (minimum allocation blocks). Since the size of an nmethod is not aligned to 128 bytes, the last occupied segment is half empty. Reducing the size of the code cache segments correspondingly minimizes waste. However, we should be careful about reducing the CodeCacheSegmentSize too much, as smaller segment sizes will increase the overhead of the CodeHeap::_segmap bitmap. A CodeCacheSegmentSize of 64 seems to be an optimal balance. >> >> The current large default value for CodeCacheSegmentSize (64+64) was historically introduced with the comment "Tiered compilation has large code-entry alignment" which doesn't make much sense to me. The history of this comment and value is as follows: >> - The PPC port was introduced with CodeEntryAlignment=128 (recently reduced to 64: https://github.com/openjdk/jdk/commit/09a78b5d) and CodeCacheSegmentSize was adjusted accordingly for that platform. >> - Soon after, the 128-byte alignment was applied to all platforms to hide a debug mode warning (https://github.com/openjdk/jdk/commit/e8bc971d). Despite the change (and Segmented Code Cache introduced later), the warning can still be reproduced today using the -XX:+VerifyCodeCache fastdebug option in large applications (10K nmethods ~ 10K free blocks in between them). >> >> I believe it is time to remove the comment and update the default value. >> >> I also suggest updating the default CodeEntryAlignment value for AARCH. The current setting is much larger than for x86 and was likely based on the typical cache line size of 64 bytes. Cortex-A57, A72 architecture software optimisation guides recommend a 32-byte alignment for subroutine entry points. Neoverse architecture software optimisation guides do not mention recommended entry point alignment. >> >> For reference, the default [function_align setting in GCC](https://github.com/gcc-mirror/gcc/blob/master/gcc/config/aarch64/tuning_models/cortexa72.h#L44) is typically 16 or 32 bytes, depending on the target architecture. >> >> Hotspot performance tests with -XX:CodeCacheSegmentSize=64 and -XX:CodeEntryAlignment=16 options showed the following result... > > Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: > > CodeEntryAlignment=16: limit to V1 and V2 only Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20864#pullrequestreview-2303658888 From qamai at openjdk.org Fri Sep 13 17:23:05 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 13 Sep 2024 17:23:05 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Mon, 19 Aug 2024 21:47:23 GMT, Sandhya Viswanathan wrote: > Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. > > Summary of changes is as follows: > 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. > 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code > > For the following source: > > > public void test() { > var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); > for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { > var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); > index.selectFrom(inpvect).intoArray(byteres, j); > } > } > > > The code generated for inner main now looks as follows: > ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 > 0x00007f40d02274d0: movslq %ebx,%r13 > 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 > 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) > 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 > 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) > 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 > 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) > 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 > 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) > 0x00007f40d022751f: add $0x40,%ebx > 0x00007f40d0227522: cmp %r8d,%ebx > 0x00007f40d0227525: jl 0x00007f40d02274d0 > > Best Regards, > Sandhya Given `rearrange` with 1 vector gets wrapping indices semantics. I think we should stop normalizing indices when converting a `Vector` into a `VectorShuffle` (currently we wrap all out-of-bound elements to `[-VLEN, 0)`). Then the rearrange with 2 vectors will also wrap similarly (all indices are `& (VLEN * 2 - 1)`, then indices `[0, VLEN)` maps to the first vector and indices `[VLEN, 2 * VLEN)` map to the second vector). What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2349535266 From jbhateja at openjdk.org Fri Sep 13 17:31:24 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Sep 2024 17:31:24 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Mon, 19 Aug 2024 21:47:23 GMT, Sandhya Viswanathan wrote: > Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. > > Summary of changes is as follows: > 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. > 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code > > For the following source: > > > public void test() { > var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); > for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { > var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); > index.selectFrom(inpvect).intoArray(byteres, j); > } > } > > > The code generated for inner main now looks as follows: > ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 > 0x00007f40d02274d0: movslq %ebx,%r13 > 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 > 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) > 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 > 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) > 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 > 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) > 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 > 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) > 0x00007f40d022751f: add $0x40,%ebx > 0x00007f40d0227522: cmp %r8d,%ebx > 0x00007f40d0227525: jl 0x00007f40d02274d0 > > Best Regards, > Sandhya src/hotspot/share/opto/vectorIntrinsics.cpp line 2206: > 2204: const Type * byte_bt = Type::get_const_basic_type(T_BYTE); > 2205: const TypeVect * byte_vt = TypeVect::make(byte_bt, num_elem); > 2206: Node* byte_shuffle = gvn().transform(VectorCastNode::make(cast_vopc, v1, T_BYTE, num_elem)); We can be optimal here and prevent down casting and subsequent load shuffles in applicable scenarios, e.g. indexes held in integral vectors. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1758203424 From kvn at openjdk.org Fri Sep 13 17:32:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 13 Sep 2024 17:32:10 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v3] In-Reply-To: References: Message-ID: <3WoyFjZSLvQyBaCl3bJxLCXPwutSfmzyaY8fhnZ21bs=.0c8cd153-3160-4391-8f63-20597f5e84b5@github.com> On Fri, 13 Sep 2024 07:52:38 GMT, Jatin Bhateja wrote: >> - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The >> condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. >> - This saves emitting an explicit MOVZX instruction after setCC. >> - These new instructions are encoded using 4 byte Extended EVEX encoding. >> >> Validation performed over stand alone test point using Intel SDE. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Review comments resolution. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8339790 > - Review resolutions. > - 8339790: Support Intel APX setzucc instruction. Just one comment src/hotspot/cpu/x86/gc/x/x_x86_64.ad line 129: > 127: format %{ "lock\n\t" > 128: "cmpxchgq $newval, $mem\n\t" > 129: "sete_with_zextl $res\n\t" %} Please, use `setcc` in format to match `ins_encode`. `sete_with_zextl` isused only when it is supported. ------------- PR Review: https://git.openjdk.org/jdk/pull/20920#pullrequestreview-2303690290 PR Review Comment: https://git.openjdk.org/jdk/pull/20920#discussion_r1759235416 From jbhateja at openjdk.org Fri Sep 13 17:41:08 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Sep 2024 17:41:08 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 13 Sep 2024 14:45:29 GMT, Emanuel Peter wrote: >> Existing vectorAPI inline expansion entry points explicitly pass lane type and count as intrinsic arguments, this is used to create concrete ideal vector types. > > That does not answer my question. If the backend operations you implemented would have the wrong vector-length: do we have any tests that would catch that? Often that requires not just going "up" with a loop but also "counting down" with the loop iv. Do you know what I mean? Patch includes tests for all the species (combination of vector type and sizes), each vector kernel is validated against equivalent scalar implementation, scenario which you are referring is implicitly handled though tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1759246223 From stooke at openjdk.org Fri Sep 13 17:42:07 2024 From: stooke at openjdk.org (Simon Tooke) Date: Fri, 13 Sep 2024 17:42:07 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v5] In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 05:57:13 GMT, Thomas Stuefe wrote: >> Simon Tooke has updated the pull request incrementally with three additional commits since the last revision: >> >> - remove empty line >> - fix indentation >> - fix missing return statement > > src/hotspot/os/windows/os_windows.cpp line 5334: > >> 5332: } else { >> 5333: errno = ENAMETOOLONG; >> 5334: } > > Curious, why not just passing in outbuf and outbuflen directly instead of letting the function allocate memory just to then copy the content? To get a guaranteed ENAMETOOLONG? > > Just a question, not a comment. I am fine with this as it is. That's exactly it - I don't know what different versions of Windows do. I do know, for instance, that my Windows 10 vm sets errno to ERANGE (I could simplify the code and just test and convert that, but I'm being cautious, I guess). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1759247376 From sviswanathan at openjdk.org Fri Sep 13 18:20:07 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 13 Sep 2024 18:20:07 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Fri, 13 Sep 2024 17:20:40 GMT, Quan Anh Mai wrote: > Given `rearrange` with 1 vector gets wrapping indices semantics. I think we should stop normalizing indices when converting a `Vector` into a `VectorShuffle` (currently we wrap all out-of-bound elements to `[-VLEN, 0)`). Then the rearrange with 2 vectors will also wrap similarly (all indices are `& (VLEN * 2 - 1)`, then indices `[0, VLEN)` maps to the first vector and indices `[VLEN, 2 * VLEN)` map to the second vector). We will normalize the indices when we invoke `VectorShuffle::toVector` which I think is much less used than `Vector::toShuffle`. What do you think? The guidance from Paul Sandoz and John Rose is to keep the the partial wrapping at shuffle construction as is for now and only change the rearrange and selectFrom apis. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2349763832 From sviswanathan at openjdk.org Fri Sep 13 18:27:06 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 13 Sep 2024 18:27:06 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Fri, 13 Sep 2024 05:30:36 GMT, Jatin Bhateja wrote: >> Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. >> >> Summary of changes is as follows: >> 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. >> 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code >> >> For the following source: >> >> >> public void test() { >> var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); >> for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { >> var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); >> index.selectFrom(inpvect).intoArray(byteres, j); >> } >> } >> >> >> The code generated for inner main now looks as follows: >> ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 >> 0x00007f40d02274d0: movslq %ebx,%r13 >> 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) >> 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) >> 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) >> 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) >> 0x00007f40d022751f: add $0x40,%ebx >> 0x00007f40d0227522: cmp %r8d,%ebx >> 0x00007f40d0227525: jl 0x00007f40d02274d0 >> >> Best Regards, >> Sandhya > > src/hotspot/share/opto/vectorIntrinsics.cpp line 2206: > >> 2204: const Type * byte_bt = Type::get_const_basic_type(T_BYTE); >> 2205: const TypeVect * byte_vt = TypeVect::make(byte_bt, num_elem); >> 2206: Node* byte_shuffle = gvn().transform(VectorCastNode::make(cast_vopc, v1, T_BYTE, num_elem)); > > We can be optimal here and prevent down casting and subsequent load shuffles in applicable scenarios, e.g. indexes held in integral vectors. @jatin-bhateja If you could expand on this comment with specific cases it will be helpful. The loadShuffle generation is needed for platform specific handling of shuffles and cannot be optimized out here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759296031 From jbhateja at openjdk.org Fri Sep 13 18:30:08 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Sep 2024 18:30:08 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v8] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 13 Sep 2024 14:53:18 GMT, Emanuel Peter wrote: > Can you please **define** somewhere what it means to `prune indexes`? It does not help me much more than the previous "massaging indexes" you had before I asked you to change it. > > > Also: I'm a little worried about the semantics change of the RearrangeNode that you did with the changes in RearrangeNode::Ideal. It looks a little "hacky", especially in conjunction with the vector_indexes_needs_massaging method. Can you give a clear definition of the semantics of RearrangeNode and vector_indexes_needs_massaging, please? > > You have also not responded to this yet. It seems to me that before your proposed change, `RearrangeNode` had a clear and easy semantic, and now you somehow "hack it" to work with your `vector_indexes_needs_pruning`. Can you explain please why this makes sense and add a comment to `RearrangeNode` what its semantics is? In case target does not directly support two vector selection instruction we lower the IR to its constituents, this is better than intrinsification failure as it saves costly vector boxing penalties. Think in terms of desired compiler IR and not rearrange API semantics, VectorRearrange IR node always expects a shape conformance b/w vector to be permuted and index vector, since shuffle indices are held in byte array based backing storage hence compiler injects VectorLoadShuffle nodes to upcast the byte vector lanes holding indexes to match the input vector lane. Since selectFrom API already passes the indexes through vector hence we can save emitting redundant toShuffle() + toVector() operations in all cases apart from some target specific scenarios e.g. AVX2 targets [do not support direct short vector permute]instruction "VPERMW", hence we need to [massage the index vector](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L8771) to emulate desired permutation using byte permute instruction. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2349801299 From mdoerr at openjdk.org Fri Sep 13 18:32:41 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 13 Sep 2024 18:32:41 GMT Subject: RFR: 8338995: New Object to ObjectMonitor mapping: PPC64 implementation [v3] In-Reply-To: References: Message-ID: > PPC64 implementation of [JDK-8315884](https://bugs.openjdk.org/browse/JDK-8315884). Martin Doerr has updated the pull request incrementally with two additional commits since the last revision: - Remove empty line. - Improve register usage and readability. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20922/files - new: https://git.openjdk.org/jdk/pull/20922/files/6dc7d495..571601f7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20922&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20922&range=01-02 Stats: 16 lines in 1 file changed: 9 ins; 1 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/20922.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20922/head:pull/20922 PR: https://git.openjdk.org/jdk/pull/20922 From jbhateja at openjdk.org Fri Sep 13 18:40:53 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Sep 2024 18:40:53 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v9] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Documentation suggestions from Paul. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/d3ee3104..1c00f417 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=07-08 Stats: 36 lines in 1 file changed: 23 ins; 2 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From jbhateja at openjdk.org Fri Sep 13 19:09:04 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Sep 2024 19:09:04 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Fri, 13 Sep 2024 18:24:04 GMT, Sandhya Viswanathan wrote: >> src/hotspot/share/opto/vectorIntrinsics.cpp line 2206: >> >>> 2204: const Type * byte_bt = Type::get_const_basic_type(T_BYTE); >>> 2205: const TypeVect * byte_vt = TypeVect::make(byte_bt, num_elem); >>> 2206: Node* byte_shuffle = gvn().transform(VectorCastNode::make(cast_vopc, v1, T_BYTE, num_elem)); >> >> We can be optimal here and prevent down casting and subsequent load shuffles in applicable scenarios, e.g. indexes held in integral vectors. > > @jatin-bhateja If you could expand on this comment with specific cases it will be helpful. The loadShuffle generation is needed for platform specific handling of shuffles and cannot be optimized out here. Hi @sviswa7, I was suggesting emitting toShuffle() + toVector() only if it's needed under a target specific hook, since indexes are anyways passed though vector. Please let me know if you find blow explanation too constraining. https://github.com/openjdk/jdk/pull/20508#issuecomment-2349801299 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759345567 From sviswanathan at openjdk.org Fri Sep 13 19:17:04 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 13 Sep 2024 19:17:04 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: <2483R4bBJDN4UpBlRJSQVE2KjdctYIy0j__kzRGRDHc=.71baed04-9185-4111-a3ce-ce32a40cb570@github.com> On Fri, 13 Sep 2024 19:04:12 GMT, Jatin Bhateja wrote: >> @jatin-bhateja If you could expand on this comment with specific cases it will be helpful. The loadShuffle generation is needed for platform specific handling of shuffles and cannot be optimized out here. > > Hi @sviswa7, I was suggesting emitting toShuffle() + toVector() only if it's needed under a target specific hook, since indexes are anyways passed though vector. Please let me know if you find blow explanation too constraining. > https://github.com/openjdk/jdk/pull/20508#issuecomment-2349801299 I think VectorLoadShuffle removal optimizations should be a separate PR and well thought out. So far the contract has been that rearrange always gets the shuffle through VectorLoadShuffle and I would like to keep that contract in this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759361459 From jbhateja at openjdk.org Fri Sep 13 19:23:14 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Sep 2024 19:23:14 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v3] In-Reply-To: <3WoyFjZSLvQyBaCl3bJxLCXPwutSfmzyaY8fhnZ21bs=.0c8cd153-3160-4391-8f63-20597f5e84b5@github.com> References: <3WoyFjZSLvQyBaCl3bJxLCXPwutSfmzyaY8fhnZ21bs=.0c8cd153-3160-4391-8f63-20597f5e84b5@github.com> Message-ID: On Fri, 13 Sep 2024 17:27:29 GMT, Vladimir Kozlov wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: >> >> - Review comments resolution. >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8339790 >> - Review resolutions. >> - 8339790: Support Intel APX setzucc instruction. > > src/hotspot/cpu/x86/gc/x/x_x86_64.ad line 129: > >> 127: format %{ "lock\n\t" >> 128: "cmpxchgq $newval, $mem\n\t" >> 129: "sete_with_zextl $res\n\t" %} > > Please, use `setcc` in format to match `ins_encode`. `sete_with_zextl` isused only when it is supported. Hi @vnkozlov , setcc used in inst_encoding block is a macro assembly routine which emits either setcc + movzbl or setzucc for APX supported targets, I wanted to use one opto instruction to correctly depict semantics of both the cases. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20920#discussion_r1759379552 From kvn at openjdk.org Fri Sep 13 19:40:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 13 Sep 2024 19:40:05 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v3] In-Reply-To: References: <3WoyFjZSLvQyBaCl3bJxLCXPwutSfmzyaY8fhnZ21bs=.0c8cd153-3160-4391-8f63-20597f5e84b5@github.com> Message-ID: <0q8eO87_rRl1zJTaysGKEpPPhLpgZYobzVUIZpJ8onY=.26218530-9e34-4e2f-8cee-6f567fa18b95@github.com> On Fri, 13 Sep 2024 19:20:07 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/gc/x/x_x86_64.ad line 129: >> >>> 127: format %{ "lock\n\t" >>> 128: "cmpxchgq $newval, $mem\n\t" >>> 129: "sete_with_zextl $res\n\t" %} >> >> Please, use `setcc` in format to match `ins_encode`. `sete_with_zextl` isused only when it is supported. > > Hi @vnkozlov , > setcc used in inst_encoding block is a macro assembly routine which emits either setcc + movzbl or setzucc for APX supported targets, I wanted to use one opto instruction to correctly depict semantics of both the cases. Yes, I see `setcc` definition and using it in `format` is fine since it will match to `inst_encoding`. On other hand, there is no macro or assembler instruction `sete_with_zextl` and it will be confusing. If you want you can add comment to format (and you should not use `\n` in last line): "setcc $res\t# emits sete + movzbl or setzucc for APX" %} ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20920#discussion_r1759397114 From psandoz at openjdk.org Fri Sep 13 19:48:04 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 13 Sep 2024 19:48:04 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Fri, 13 Sep 2024 18:17:21 GMT, Sandhya Viswanathan wrote: > > The guidance from Paul Sandoz and John Rose is to keep the the partial wrapping at shuffle construction as is for now and only change the rearrange and selectFrom apis. Yes, we are trying to take smaller incremental steps. Once the we are done with this work we can step back and discuss/review what to do about shuffles. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2350039460 From psandoz at openjdk.org Fri Sep 13 20:09:05 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 13 Sep 2024 20:09:05 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Mon, 19 Aug 2024 21:47:23 GMT, Sandhya Viswanathan wrote: > Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. > > Summary of changes is as follows: > 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. > 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code > > For the following source: > > > public void test() { > var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); > for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { > var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); > index.selectFrom(inpvect).intoArray(byteres, j); > } > } > > > The code generated for inner main now looks as follows: > ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 > 0x00007f40d02274d0: movslq %ebx,%r13 > 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 > 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) > 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 > 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) > 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 > 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) > 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 > 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) > 0x00007f40d022751f: add $0x40,%ebx > 0x00007f40d0227522: cmp %r8d,%ebx > 0x00007f40d0227525: jl 0x00007f40d02274d0 > > Best Regards, > Sandhya src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 2439: > 2437: (v1, s_, m_) -> v1.uOp((i, a) -> { > 2438: int ei = s_.laneSource(i); > 2439: return ei < 0 || !m_.laneIsSet(i) ? 0 : v1.lane(ei); The `ei < 0` test is redundant. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Vector.java line 2637: > 2635: * > 2636: * For each lane {@code N} of the shuffle, and for each lane > 2637: * source index {@code I=s.wrapIndex(s.laneSource(N))} in the shuffle, The pseudo code below starting at line 2644 needs adjusting to: Vector r = this.rearrange(s); return broadcast(0).blend(r, m); src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Vector.java line 2755: > 2753: * > 2754: * The result is the same as the expression > 2755: * {@code v.rearrange(this.toShuffle().wrapIndexes())}. Since we also adjusted `rearrange` the existing expression is fine, recommend no change here and to the mask accepting version. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759431093 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759428672 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759418829 From jbhateja at openjdk.org Fri Sep 13 20:37:27 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Sep 2024 20:37:27 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v4] In-Reply-To: References: Message-ID: > - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The > condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. > - This saves emitting an explicit MOVZX instruction after setCC. > - These new instructions are encoded using 4 byte Extended EVEX encoding. > > Validation performed over stand alone test point using Intel SDE. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolution. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20920/files - new: https://git.openjdk.org/jdk/pull/20920/files/998501e1..c1c42d38 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20920&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20920&range=02-03 Stats: 12 lines in 3 files changed: 0 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/20920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20920/head:pull/20920 PR: https://git.openjdk.org/jdk/pull/20920 From jbhateja at openjdk.org Fri Sep 13 20:37:27 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 13 Sep 2024 20:37:27 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v3] In-Reply-To: <0q8eO87_rRl1zJTaysGKEpPPhLpgZYobzVUIZpJ8onY=.26218530-9e34-4e2f-8cee-6f567fa18b95@github.com> References: <3WoyFjZSLvQyBaCl3bJxLCXPwutSfmzyaY8fhnZ21bs=.0c8cd153-3160-4391-8f63-20597f5e84b5@github.com> <0q8eO87_rRl1zJTaysGKEpPPhLpgZYobzVUIZpJ8onY=.26218530-9e34-4e2f-8cee-6f567fa18b95@github.com> Message-ID: On Fri, 13 Sep 2024 19:36:59 GMT, Vladimir Kozlov wrote: >> Hi @vnkozlov , >> setcc used in inst_encoding block is a macro assembly routine which emits either setcc + movzbl or setzucc for APX supported targets, I wanted to use one opto instruction to correctly depict semantics of both the cases. > > Yes, I see `setcc` definition and using it in `format` is fine since it will match to `inst_encoding`. > On other hand, there is no macro or assembler instruction `sete_with_zextl` and it will be confusing. > > If you want you can add comment to format (and you should not use `\n` in last line): > > "setcc $res\t# emits sete + movzbl or setzucc for APX" %} DONE ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20920#discussion_r1759470761 From kvn at openjdk.org Fri Sep 13 21:40:15 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 13 Sep 2024 21:40:15 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v4] In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 20:37:27 GMT, Jatin Bhateja wrote: >> - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The >> condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. >> - This saves emitting an explicit MOVZX instruction after setCC. >> - These new instructions are encoded using 4 byte Extended EVEX encoding. >> >> Validation performed over stand alone test point using Intel SDE. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20920#pullrequestreview-2304195956 From kvn at openjdk.org Fri Sep 13 21:48:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 13 Sep 2024 21:48:10 GMT Subject: RFR: 8293337: Store method handle intrinsics in AOT cache In-Reply-To: <4fMDUSZRg0HcIiZmr-yqr7vleFXrD_zNXpdd_pfgHQ8=.4a679e86-b0f9-4101-bcc2-f49d8bcb417b@github.com> References: <4fMDUSZRg0HcIiZmr-yqr7vleFXrD_zNXpdd_pfgHQ8=.4a679e86-b0f9-4101-bcc2-f49d8bcb417b@github.com> Message-ID: On Thu, 12 Sep 2024 03:42:05 GMT, Ioi Lam wrote: > This is the 5th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store Java heap objects that have native pointers to the C++ `Method` objects returned by `SystemDictionary::find_method_handle_intrinsic()` > > These `Method` objects are created within the JVM. They do not belong to any actual Java classes. We store all these `Method` objects into the AOT cache, so that they can be referenced by other artifacts in the AOT cache. > > --- > See [here](https://bugs.openjdk.org/browse/JDK-8315737) for the sequence of dependent RFEs for implementing JEP 483. src/hotspot/share/classfile/systemDictionary.hpp line 349: > 347: // Second part of load_shared_class > 348: static void load_shared_class_misc(InstanceKlass* ik, ClassLoaderData* loader_data) NOT_CDS_RETURN; > 349: static void restore_archived_method_handle_intrinsics_impl(TRAPS); Missing `NOT_CDS_RETURN` ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20959#discussion_r1759549906 From psandoz at openjdk.org Fri Sep 13 21:58:16 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 13 Sep 2024 21:58:16 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v10] In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 16:20:28 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Documentation change suggerstion from Paul src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java line 573: > 571: * @see VectorMath#addSaturating(int, int) > 572: */ > 573: public static final Associative SADD = assoc("SADD", "+", VectorSupport.VECTOR_OP_SADD, VO_NOFP+VO_ASSOC); I don't believe saturation arithmetic is associative. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1759556540 From psandoz at openjdk.org Fri Sep 13 22:02:30 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 13 Sep 2024 22:02:30 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v10] In-Reply-To: References: Message-ID: <0PApPK8O06mwyZXwM5XpEGPDdmeqaxhX-llQryWSUpo=.03ae9624-b809-48ca-8e07-242aad3e9df2@github.com> On Fri, 13 Sep 2024 16:20:28 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Documentation change suggerstion from Paul src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java line 589: > 587: * @see VectorMath#minUnsigned(int, int) (int, int) > 588: */ > 589: public static final Associative UMIN = assoc("UMIN", "umin", VectorSupport.VECTOR_OP_UMIN, VO_NOFP+VO_ASSOC); We should rename the existing unsigned compare operators to use the same naming scheme i.e., s/UNSIGNED_LT/ULT etc. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1759558844 From kvn at openjdk.org Fri Sep 13 22:12:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 13 Sep 2024 22:12:13 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v17] In-Reply-To: <_7xTwicd2PDxJZOJ7xLdkHZc18UT-9sVZk-3YFgkMA0=.7db81e94-4f38-44a0-9983-6e391459aab2@github.com> References: <_7xTwicd2PDxJZOJ7xLdkHZc18UT-9sVZk-3YFgkMA0=.7db81e94-4f38-44a0-9983-6e391459aab2@github.com> Message-ID: <-lhMoCYQAGXWEAQ2ySemYzUh_DjKgqi4pG10NdrHils=.b2bc294a-941d-42aa-a00f-149d9260dfeb@github.com> On Mon, 9 Sep 2024 14:41:25 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 112: >> >>> 110: // The answer is that stores of different sizes can co-exist >>> 111: // in the same sequence of RawMem effects. We sometimes initialize >>> 112: // a whole 'tile' of array elements with a single jint or jlong.) >> >> I'm having trouble making sense of this comment. I guess a jlong could be used to null-initialize two >> 32bit oops/narrowOops? But that doesn't have anything to do with jints. > > I am not sure the complex overlap test is necessary here, this code was copy-pasted from [MemNode::find_previous_store()](https://github.com/openjdk/jdk/blob/c54fc08aa3c63e4b26dc5edb2436844dfd3bab7c/src/hotspot/share/opto/memnode.cpp#L678) by [JDK-8057737](https://bugs.openjdk.org/browse/JDK-8057737), and in this new context I do not see how we might find stores of different sizes as mentioned in the comment. jlongs could be used to null-initialize two 32-bit OOPs, but such initializing stores are not even visible in C2's intermediate representation at the time `G1BarrierSetC2::g1_can_remove_pre_barrier()` is called. The fact that the comment refers to initializing several array elements with a single jint suggests to me that this code has lost some of its original purpose after being copied into a narrower context (OOP stores after object allocations). But since this code is pre-existing and in the worst case it is just performing some unnecessary work, I suggest to leave it as-is and possibly investigate how to simplify it as a follow-up task. Yes, the comment reference to combined initialization stores: [memnode.cpp#L4925](https://github.com/openjdk/jdk/blob/c54fc08aa3c63e4b26dc5edb2436844dfd3bab7c/src/hotspot/share/opto/memnode.cpp#L4925) Which is used only for primitive type (integers and floats) constant strores. There was also recent change by Emanuel to combine stores into primitive arrays: [JDK-8335390](https://bugs.openjdk.org/browse/JDK-8335390) None of above do anything to oop stores. I agree that this code could left for now and be optimized later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1759565105 From sviswanathan at openjdk.org Fri Sep 13 22:30:36 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 13 Sep 2024 22:30:36 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v2] In-Reply-To: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: > Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. > > Summary of changes is as follows: > 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. > 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code > > For the following source: > > > public void test() { > var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); > for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { > var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); > index.selectFrom(inpvect).intoArray(byteres, j); > } > } > > > The code generated for inner main now looks as follows: > ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 > 0x00007f40d02274d0: movslq %ebx,%r13 > 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 > 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) > 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 > 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) > 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 > 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) > 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 > 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) > 0x00007f40d022751f: add $0x40,%ebx > 0x00007f40d0227522: cmp %r8d,%ebx > 0x00007f40d0227525: jl 0x00007f40d02274d0 > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20634/files - new: https://git.openjdk.org/jdk/pull/20634/files/694aceb5..428f2289 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20634&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20634&range=00-01 Stats: 14 lines in 8 files changed: 0 ins; 4 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/20634.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20634/head:pull/20634 PR: https://git.openjdk.org/jdk/pull/20634 From duke at openjdk.org Fri Sep 13 22:31:01 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 13 Sep 2024 22:31:01 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v5] In-Reply-To: References: Message-ID: > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: quad precision tanh tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20657/files - new: https://git.openjdk.org/jdk/pull/20657/files/4aa52bfd..d4ddc313 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=03-04 Stats: 859 lines in 1 file changed: 859 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20657/head:pull/20657 PR: https://git.openjdk.org/jdk/pull/20657 From duke at openjdk.org Fri Sep 13 22:33:12 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 13 Sep 2024 22:33:12 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v2] In-Reply-To: References: Message-ID: <0cm-1lCXQVJaCJbfp7evFyNvkLFxwUyfSukdy40aVxY=.7129df0b-d9a0-4a3b-b7f6-53f767b01ef6@github.com> On Wed, 11 Sep 2024 01:59:54 GMT, Joe Darcy wrote: >>> If the test is going to use randomness, then its jtreg tags should include >>> >>> `@key randomness` >>> >>> and it is preferable to use jdk.test.lib.RandomFactory to get and Random object since that handles printing out a key so the random sequence can be replicated if the test fails. >> >> Please see the test updated to use `@key randomness` and` jdk.test.lib.RandomFactory` to get and Random object. >> >>> The allowable worst-case error is 2.5 ulp, although at many arguments FDLIBM has a smaller error. >>> For a general Math vs StrictMath test with an allowable 2.5 ulp error, without knowing how accurate FDLIBM is for that function and argument, a large error of approx. 2X the nominal error should be allowed (in case FDLIBM errors in one direction and the Math method errors in the other direction). >>> >> So far the tests haven't failed with error of 2.5ulp. Would it be better to make it 5ulp? Please let me know. > > So far, this will be the only intrinsic implementation of tanh. Therefore, at the moment it is just checking the consistency of the intrinsic implementation with StrictMath/FDLIBM tanh. If the intrinsic has a ~1 ulp accuracy, it would be expected to often be within 2.5 ulps of FDLIBM tanh. However, as written the regression test would not necessarily pass against any allowable Math.tanh implementation, which is the usual criteria for java.lang.Math tests that aren't otherwise constrained (such as by being limited to a given subset of platforms). > > If there was a correctly rounded tanh to compare against, then this style of testing would be valid. > > Are there any plan to intrinsify sinh or cosh? Hi Joe (@jddarcy), As suggested by Sandhya (@sviswa7), I added ~750 fixed point tests for tanh in `TanhTests.java` using the quad precision tanh implementation in libquadmath library from gcc. Please let me know if this looks good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1759579101 From sviswanathan at openjdk.org Fri Sep 13 22:33:18 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 13 Sep 2024 22:33:18 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Fri, 13 Sep 2024 19:45:11 GMT, Paul Sandoz wrote: >>> Given `rearrange` with 1 vector gets wrapping indices semantics. I think we should stop normalizing indices when converting a `Vector` into a `VectorShuffle` (currently we wrap all out-of-bound elements to `[-VLEN, 0)`). Then the rearrange with 2 vectors will also wrap similarly (all indices are `& (VLEN * 2 - 1)`, then indices `[0, VLEN)` maps to the first vector and indices `[VLEN, 2 * VLEN)` map to the second vector). We will normalize the indices when we invoke `VectorShuffle::toVector` which I think is much less used than `Vector::toShuffle`. What do you think? >> >> The guidance from Paul Sandoz and John Rose is to keep the the partial wrapping at shuffle construction as is for now and only change the rearrange and selectFrom apis. > >> >> The guidance from Paul Sandoz and John Rose is to keep the the partial wrapping at shuffle construction as is for now and only change the rearrange and selectFrom apis. > > Yes, we are trying to take smaller incremental steps. Once the we are done with this work we can step back and discuss/review what to do about shuffles. @PaulSandoz Thanks a lot for the review. I have addressed your review comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2350535307 From duke at openjdk.org Fri Sep 13 22:36:45 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 13 Sep 2024 22:36:45 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v6] In-Reply-To: References: Message-ID: > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Update test/jdk/java/lang/Math/HyperbolicTests.java Co-authored-by: Andrey Turbanov ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20657/files - new: https://git.openjdk.org/jdk/pull/20657/files/d4ddc313..ca3314c5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20657/head:pull/20657 PR: https://git.openjdk.org/jdk/pull/20657 From sviswanathan at openjdk.org Fri Sep 13 23:13:20 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 13 Sep 2024 23:13:20 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v2] In-Reply-To: <0cm-1lCXQVJaCJbfp7evFyNvkLFxwUyfSukdy40aVxY=.7129df0b-d9a0-4a3b-b7f6-53f767b01ef6@github.com> References: <0cm-1lCXQVJaCJbfp7evFyNvkLFxwUyfSukdy40aVxY=.7129df0b-d9a0-4a3b-b7f6-53f767b01ef6@github.com> Message-ID: On Fri, 13 Sep 2024 22:30:25 GMT, Srinivas Vamsi Parasa wrote: >> So far, this will be the only intrinsic implementation of tanh. Therefore, at the moment it is just checking the consistency of the intrinsic implementation with StrictMath/FDLIBM tanh. If the intrinsic has a ~1 ulp accuracy, it would be expected to often be within 2.5 ulps of FDLIBM tanh. However, as written the regression test would not necessarily pass against any allowable Math.tanh implementation, which is the usual criteria for java.lang.Math tests that aren't otherwise constrained (such as by being limited to a given subset of platforms). >> >> If there was a correctly rounded tanh to compare against, then this style of testing would be valid. >> >> Are there any plan to intrinsify sinh or cosh? > > Hi Joe (@jddarcy), > > As suggested by Sandhya (@sviswa7), I added ~750 fixed point tests for tanh in `TanhTests.java` using the quad precision tanh implementation in libquadmath library from gcc. > > Please let me know if this looks good. @vamsi-parasa In my thoughts the best way to do this is add the additional tests points to HyperbolicTests.java itself in the testcases array of testTanh() method. We should remove all the other changes from HyperbolicTests.java. Also no need for separate TanhTests.java file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1759602199 From kvn at openjdk.org Fri Sep 13 23:23:19 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 13 Sep 2024 23:23:19 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v20] In-Reply-To: References: Message-ID: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> On Wed, 11 Sep 2024 08:30:02 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Fix a few style issues src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 241: > 239: assert(newval_bottom->isa_ptr() || newval_bottom->isa_narrowoop(), "newval should be an OOP"); > 240: TypePtr::PTR newval_type = newval_bottom->make_ptr()->ptr(); > 241: uint8_t barrier_data = store->barrier_data(); Should you check barrier data for 0? `is_ptr()` has wide set of types. It includes TypeRawPtr, TypeKlassPtr and TypeMetadataPtr. Where you filtering them? src/hotspot/share/gc/g1/g1BarrierSet.cpp line 65: > 63: #else > 64: make_barrier_set_c2(), > 65: #endif I assume it is temporary until all ports a ready (except 32-bit x86 may be). Right? src/hotspot/share/opto/matcher.cpp line 1821: > 1819: if( rule >= _END_INST_CHAIN_RULE || rule < _BEGIN_INST_CHAIN_RULE ) { > 1820: assert(C->node_arena()->contains(s->_leaf) || !has_new_node(s->_leaf), > 1821: "duplicating node that's already been matched"); Why it was removed? src/hotspot/share/opto/matcher.cpp line 2845: > 2843: n->Opcode() == Op_StoreN && > 2844: m->is_EncodeP(); > 2845: } Add comment that `m` is input of `n`. I thought about adding assert too but I will leave it to you. src/hotspot/share/opto/output.cpp line 2026: > 2024: if (n->is_MachNullCheck()) { > 2025: assert(n->in(1)->as_Mach()->barrier_data() == 0, > 2026: "Implicit null checks on memory accesses with barriers are not yet supported"); I don't see here changes in `lcm.cpp` which would prevent it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1759604325 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1759604944 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1759593453 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1759593131 PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1759605704 From lmao at openjdk.org Sat Sep 14 02:15:40 2024 From: lmao at openjdk.org (Liang Mao) Date: Sat, 14 Sep 2024 02:15:40 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v14] In-Reply-To: References: Message-ID: > Hi, > > It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. > > test/hotspot/jtreg/runtime and gc are clean. > > Thanks, > Liang Liang Mao has updated the pull request incrementally with one additional commit since the last revision: support debug build for test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20907/files - new: https://git.openjdk.org/jdk/pull/20907/files/ccd2a163..dcf85090 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20907&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20907&range=12-13 Stats: 10 lines in 1 file changed: 7 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20907/head:pull/20907 PR: https://git.openjdk.org/jdk/pull/20907 From lmao at openjdk.org Sat Sep 14 02:19:11 2024 From: lmao at openjdk.org (Liang Mao) Date: Sat, 14 Sep 2024 02:19:11 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v13] In-Reply-To: References: <78GGyYgosP2SNVyxaySJ7ufQ9PXY4M2zHYcvfd2x5bs=.799f56ca-07f3-4d8e-a1c5-d7046023f45e@github.com> <0I-N2xnYHM8nWzgKGL-5DrYhd5G95hYHxqGQNTz93YE=.e8450cbd-aac3-4b88-9e35-b76887ce4b13@github.com> <22lYio5OB8MOBKY-gqS_wD1xr8Lw7UYt3o1bNxYNvHM=.362420b2-0c82-4af8-8ed3-19dce4c6f163@github.com> Message-ID: <996ilt1Mp9TPRh6F92ktSNdUNKbu3u8Wbjn-VFyoq-U=.93e24113-475f-44c2-a172-d13cdc9d41d4@github.com> On Fri, 13 Sep 2024 14:33:13 GMT, Leonid Mesnik wrote: >> Fastdebug needs more than 30 minutes to finish and can hardly reproduce the crash. Do we still need that? > > The 30 min is too much, however some testing in debug is better then nothing. > Can you add the test parameter "iterations" and set it to something reasonable .Then add test > > /** > * @test > * @bug 8339725 > * @summary Stress test GetMethodDeclaringClass > * @requires vm.jvmti > * @requires (os.family == "linux") > * @library /test/lib > * @run driver/timeout=300 TestUnloadedClass > */ > > and updat the check to > > if (titerations == 0) { > output.shouldContain("OutOfMemoryError"); > } > > > So we have some testing in debug. The debug configurations are executed very often with different flags and still worth to run reduced testcase even if original bug is not reproduced. > While the original testcase with no time/iteration limit is executed in product only. Ok. I limit the iterations in debug build to make the test runable. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20907#discussion_r1759643877 From lmesnik at openjdk.org Sat Sep 14 02:25:08 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sat, 14 Sep 2024 02:25:08 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v14] In-Reply-To: References: Message-ID: On Sat, 14 Sep 2024 02:15:40 GMT, Liang Mao wrote: >> Hi, >> >> It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. >> >> test/hotspot/jtreg/runtime and gc are clean. >> >> Thanks, >> Liang > > Liang Mao has updated the pull request incrementally with one additional commit since the last revision: > > support debug build for test Thanks you for adding testcase and improving it! ------------- Marked as reviewed by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20907#pullrequestreview-2304354032 From duke at openjdk.org Sat Sep 14 05:14:12 2024 From: duke at openjdk.org (duke) Date: Sat, 14 Sep 2024 05:14:12 GMT Subject: RFR: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass [v14] In-Reply-To: References: Message-ID: On Sat, 14 Sep 2024 02:15:40 GMT, Liang Mao wrote: >> Hi, >> >> It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. >> >> test/hotspot/jtreg/runtime and gc are clean. >> >> Thanks, >> Liang > > Liang Mao has updated the pull request incrementally with one additional commit since the last revision: > > support debug build for test @mmyxym Your change (at version dcf8509029e96888181cfeca812363e66654da51) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20907#issuecomment-2350841374 From lmao at openjdk.org Sat Sep 14 05:39:14 2024 From: lmao at openjdk.org (Liang Mao) Date: Sat, 14 Sep 2024 05:39:14 GMT Subject: Integrated: 8339725: Concurrent GC crashed due to GetMethodDeclaringClass In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 05:33:35 GMT, Liang Mao wrote: > Hi, > > It's a fix for 8339725. I think getting the oop from Klass::java_mirror() should use a ON_PHANTOM_OOP_REF decorator here which could make sure the oop would be kept alive in concurrent marking and return nullptr while in concurrent reference processing and unloading. > > test/hotspot/jtreg/runtime and gc are clean. > > Thanks, > Liang This pull request has now been integrated. Changeset: c91fa278 Author: Liang Mao URL: https://git.openjdk.org/jdk/commit/c91fa278fe17ab204beef0fcef1ada6dd0bc37bb Stats: 244 lines in 5 files changed: 241 ins; 0 del; 3 mod 8339725: Concurrent GC crashed due to GetMethodDeclaringClass Reviewed-by: lmesnik, coleenp, eosterlund, stefank ------------- PR: https://git.openjdk.org/jdk/pull/20907 From dholmes at openjdk.org Sat Sep 14 08:12:06 2024 From: dholmes at openjdk.org (David Holmes) Date: Sat, 14 Sep 2024 08:12:06 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v4] In-Reply-To: References: Message-ID: <4Gj8ZTCGlUStSrs7YIGVKbPrHlCXEu2ujjHplNSNSEo=.65a5fac4-c7e1-4646-8ee4-a85834dd691b@github.com> On Wed, 11 Sep 2024 21:02:41 GMT, Matias Saavedra Silva wrote: >> This patch cleans up the use of `get_new_method()` so callers don't have to worry about throwing `NoSuchMethodError`. The method is refactored to throw the error and avoid ever returning nullptr. Verified with tier1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Coleen suggestion A refactoring should not change behaviour. Maybe this only looked like an opportunity for refactoring but in reality is not? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20874#issuecomment-2350904545 From jbhateja at openjdk.org Sat Sep 14 08:30:48 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 14 Sep 2024 08:30:48 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v11] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review suggestions incorporated. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/4301c817..71114d0d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=09-10 Stats: 330 lines in 22 files changed: 0 ins; 0 del; 330 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From aph at openjdk.org Sat Sep 14 08:38:08 2024 From: aph at openjdk.org (Andrew Haley) Date: Sat, 14 Sep 2024 08:38:08 GMT Subject: RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM In-Reply-To: References: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> Message-ID: On Fri, 13 Sep 2024 16:53:42 GMT, Vladimir Kozlov wrote: >> @vnkozlov Many thanks! >> Do you reproduce the regression on a public benchmark that I can also try? >> Now I restrict CodeEntryAlignment=16 for V1 and V2 only. And I restart my performance tests. > >> Do you reproduce the regression on a public benchmark that I can also try? > > It was our internal benchmark. > @vnkozlov Many thanks! Do you reproduce the regression on a public benchmark that I can also try? Now I restrict CodeEntryAlignment=16 for V1 and V2 only. And I restart my performance tests. I think that's the wrong way around. The `default` should be 64, maybe with smaller settings where we know it helps performance. It makes little sense to set the default CodeEntryAlignment to less than the icache line size. except in severely constrained environments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20864#issuecomment-2350914179 From jbhateja at openjdk.org Sat Sep 14 08:40:44 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 14 Sep 2024 08:40:44 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v12] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Update AARCH64 specific test using UNSIGNED_* comparison operators. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/71114d0d..ec7c7553 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=10-11 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From jbhateja at openjdk.org Sat Sep 14 09:11:06 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 14 Sep 2024 09:11:06 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v2] In-Reply-To: <2483R4bBJDN4UpBlRJSQVE2KjdctYIy0j__kzRGRDHc=.71baed04-9185-4111-a3ce-ce32a40cb570@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> <2483R4bBJDN4UpBlRJSQVE2KjdctYIy0j__kzRGRDHc=.71baed04-9185-4111-a3ce-ce32a40cb570@github.com> Message-ID: On Fri, 13 Sep 2024 19:14:29 GMT, Sandhya Viswanathan wrote: >> Hi @sviswa7, I was suggesting emitting toShuffle() + toVector() only if it's needed under a target specific hook, since indexes are anyways passed though vector. Please let me know if you find blow explanation too constraining. >> https://github.com/openjdk/jdk/pull/20508#issuecomment-2349801299 > > I think VectorLoadShuffle removal optimizations should be a separate PR and well thought out. So far the contract has been that rearrange always gets the shuffle through VectorLoadShuffle and I would like to keep that contract in this PR. Hi @sviswa7, @PaulSandoz , I will modify PR#20508 accordingly to honor the contract at IR level and address VectorLoadShuffle optimization for both flavors of selectFrom API in a follow up PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1759701508 From aph at openjdk.org Sat Sep 14 10:21:07 2024 From: aph at openjdk.org (Andrew Haley) Date: Sat, 14 Sep 2024 10:21:07 GMT Subject: RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM In-Reply-To: References: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> Message-ID: On Fri, 13 Sep 2024 16:53:42 GMT, Vladimir Kozlov wrote: >> @vnkozlov Many thanks! >> Do you reproduce the regression on a public benchmark that I can also try? >> Now I restrict CodeEntryAlignment=16 for V1 and V2 only. And I restart my performance tests. > >> Do you reproduce the regression on a public benchmark that I can also try? > > It was our internal benchmark. > @vnkozlov Many thanks! Do you reproduce the regression on a public benchmark that I can also try? Now I restrict CodeEntryAlignment=16 for V1 and V2 only. And I restart my performance tests. This may have as much to do with the smallish icache on the Ampere parts as the specific Arm microarchitecture, so nothing to do with N2. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20864#issuecomment-2350942224 From aph at openjdk.org Sat Sep 14 16:25:05 2024 From: aph at openjdk.org (Andrew Haley) Date: Sat, 14 Sep 2024 16:25:05 GMT Subject: RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM In-Reply-To: References: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> Message-ID: On Fri, 13 Sep 2024 16:53:42 GMT, Vladimir Kozlov wrote: >> @vnkozlov Many thanks! >> Do you reproduce the regression on a public benchmark that I can also try? >> Now I restrict CodeEntryAlignment=16 for V1 and V2 only. And I restart my performance tests. > >> Do you reproduce the regression on a public benchmark that I can also try? > > It was our internal benchmark. > > @vnkozlov Many thanks! Do you reproduce the regression on a public benchmark that I can also try? Now I restrict CodeEntryAlignment=16 for V1 and V2 only. And I restart my performance tests. > > This may have as much to do with the smallish icache Sorry, I meant last level cache ------------- PR Comment: https://git.openjdk.org/jdk/pull/20864#issuecomment-2351049968 From stuefe at openjdk.org Sun Sep 15 06:17:14 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 15 Sep 2024 06:17:14 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: <4iHDH-GpSa-uPqe0IwwP6notHRyrOTiecqCSX9kYCe0=.d7969fd0-3066-42fa-82d4-842c10baee1c@github.com> Message-ID: On Wed, 11 Sep 2024 21:15:21 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert accidental change of UCOH default > > I was starting to understand the concerns with having prototype_header in Klass. It seems like it would simplify encoding the klass for object allocation. My recent change https://bugs.openjdk.org/browse/JDK-8338526 breaks this. You need to pass a parameter to Klass() to tell whether to encode the klass pointer or not, and pass this to Klass() constructor. > > diff --git a/src/hotspot/share/oops/instanceKlass.cpp b/src/hotspot/share/oops/instanceKlass.cpp > index fd198f54fc9..7aa4bd24948 100644 > --- a/src/hotspot/share/oops/instanceKlass.cpp > +++ b/src/hotspot/share/oops/instanceKlass.cpp > @@ -511,7 +511,7 @@ InstanceKlass::InstanceKlass() { > } > > InstanceKlass::InstanceKlass(const ClassFileParser& parser, KlassKind kind, ReferenceType reference_type) : > - Klass(kind), > + Klass(kind, (!parser.is_interface() && !parser.is_abstract())), > _nest_members(nullptr), > _nest_host(nullptr), > _permitted_subclasses(nullptr), @coleenp > I was starting to understand the concerns with having prototype_header in Klass. It seems like it would simplify encoding the klass for object allocation. My recent change https://bugs.openjdk.org/browse/JDK-8338526 breaks this. You need to pass a parameter to Klass() to tell whether to encode the klass pointer or not, and pass this to Klass() constructor. > I solved this differently (Roman will merge this into his PR). static markWord make_prototype(const Klass* kls) { markWord prototype = markWord::prototype(); #ifdef _LP64 if (UseCompactObjectHeaders) { // With compact object headers, the narrow Klass ID is part of the mark word. // We therfore seed the mark word with the narrow Klass ID. // Note that only those Klass that can be instantiated have a narrow Klass ID. // For those who don't, we leave the klass bits empty and assert if someone // tries to use those. const narrowKlass nk = CompressedKlassPointers::is_encodable(kls) ? CompressedKlassPointers::encode(const_cast(kls)) : 0; prototype = prototype.set_narrow_klass(nk); } #endif return prototype; } inline bool CompressedKlassPointers::is_encodable(const void* address) { check_init(_base); // An address can only be encoded if: // // 1) the address lies within the klass range. // 2) It is suitably aligned to 2^encoding_shift. This only really matters for // +UseCompactObjectHeaders, since the encoding shift can be large (max 10 bits -> 1KB). return is_aligned(address, klass_alignment_in_bytes()) && address >= _klass_range_start && address < _klass_range_end; } So, we put an nKlass into the prototype if we can. We can, if the Klass address is encodable. It is encodable if it lives in the encoded Klass range and is correctly aligned. No need to pass this information via another channel: its right there, in the Klass address. This works even before Klass is initialized. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2351399143 From stuefe at openjdk.org Sun Sep 15 06:17:15 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 15 Sep 2024 06:17:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 11:25:41 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix FullGCForwarding initialization > > src/hotspot/share/memory/metaspace/metablock.hpp line 51: > >> 49: size_t word_size() const { return _word_size; } >> 50: bool is_empty() const { return _base == nullptr; } >> 51: bool is_nonempty() const { return _base != nullptr; } > > Can `_base == nullptr` but `_word_size != 0`? No ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1759973362 From epeter at openjdk.org Sun Sep 15 07:19:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 15 Sep 2024 07:19:15 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v8] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 13 Sep 2024 18:27:07 GMT, Jatin Bhateja wrote: > > Can you please **define** somewhere what it means to `prune indexes`? It does not help me much more than the previous "massaging indexes" you had before I asked you to change it. > > > Also: I'm a little worried about the semantics change of the RearrangeNode that you did with the changes in RearrangeNode::Ideal. It looks a little "hacky", especially in conjunction with the vector_indexes_needs_massaging method. Can you give a clear definition of the semantics of RearrangeNode and vector_indexes_needs_massaging, please? > > > > > > You have also not responded to this yet. It seems to me that before your proposed change, `RearrangeNode` had a clear and easy semantic, and now you somehow "hack it" to work with your `vector_indexes_needs_pruning`. Can you explain please why this makes sense and add a comment to `RearrangeNode` what its semantics is? > > In case target does not directly support two vector selection instruction we lower the IR to its constituents, this is better than intrinsification failure as it saves costly vector boxing penalties. > > Consider in terms of desired compiler IR and not rearrange API semantics, VectorRearrange IR node generally expects shape conformance b/w vector to be permuted and index vector, since shuffle indices are held in byte array based backing storage hence compiler injects VectorLoadShuffle nodes to upcast the byte vector lanes holding indexes to match the input vector lane. Since selectFrom API already passes the indexes through vector hence we can save emitting redundant toShuffle() + toVector() operations in some cases apart from some target specific scenarios e.g. AVX2 targets [do not support direct short vector permute]instruction "VPERMW", hence we need to [massage the index vector](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L8771) to emulate desired permutation using byte permute instruction. > > VectorLoadShuffle abstraction hides target specific index massaging which is why adding a target specific hook like Matcher::vector_indexes_needs_pruning compiler to selectively emit VectorLoadShuffle. I still do not see a **definition of the semantics of RearrangeNode**: what inputs does it accept and what does it do with them? Can you put this explanation as comment in the code, please? It sounds like this is what the `massaging` / `pruning` is: `emulate desired permutation using byte permute instruction.` You should find an accordingly more suiting name for the method name. Maybe it is something like `must_emulate_permutation_with...`. Or maybe it is rather a `supported` kind of question? I leave that up to you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2351426263 From fyang at openjdk.org Mon Sep 16 02:46:11 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 16 Sep 2024 02:46:11 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v7] In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 07:43:39 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> Thanks. >> >> This improvement is based on java.base/share/native/libzip/zlib/zcrc32.c, I made some modification to N (to 16) related code, then re-generate the tables needed, finally vectorize the code (original implementation in zcrc32.c is just scalar code). >> >> ## Test >> test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java, >> test/jdk/java/util/zip/TestCRC32.java >> >> ## Performance >> >> ###?on bananapi >> >> with patch >> >> Benchmark | (count) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- >> TestCRC32.testCRC32Update | 64 | avgt | 10 | 222.297 | 0.106 | ns/op >> TestCRC32.testCRC32Update | 128 | avgt | 10 | 365.144 | 0.196 | ns/op >> TestCRC32.testCRC32Update | 256 | avgt | 10 | 687.14 | 0.235 | ns/op >> TestCRC32.testCRC32Update | 512 | avgt | 10 | 1043.833 | 0.083 | ns/op >> TestCRC32.testCRC32Update | 2048 | avgt | 10 | 3299.928 | 1.361 | ns/op >> TestCRC32.testCRC32Update | 16384 | avgt | 10 | 24384.502 | 25.298 | ns/op >> TestCRC32.testCRC32Update | 65536 | avgt | 10 | 103200.458 | 8.297 | ns/op >> >> >> >> without patch >> >> Benchmark | (count) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- >> TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.878 | 0.02 | ns/op >> TestCRC32.testCRC32Update | 128 | avgt | 10 | 364.173 | 0.032 | ns/op >> TestCRC32.testCRC32Update | 256 | avgt | 10 | 685.815 | 0.055 | ns/op >> TestCRC32.testCRC32Update | 512 | avgt | 10 | 1329.049 | 0.084 | ns/op >> TestCRC32.testCRC32Update | 2048 | avgt | 10 | 5189.302 | 0.666 | ns/op >> TestCRC32.testCRC32Update | 16384 | avgt | 10 | 41250.873 | 23.882 | ns/op >> TestCRC32.testCRC32Update | 65536 | avgt | 10 | 171664.002 | 15.011 | ns/op >> >> > ... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > fix len Thanks for the update. Several minor comments remain. BTW: Do you have the updated JMH data to see? src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1464: > 1462: // 3. finally vectorize the code (original implementation in zcrc32.c is just scalar code). > 1463: // New tables for vector version is after table3. > 1464: void MacroAssembler::vector_update_crc32(Register crc, Register buf, Register len, const int64_t unroll_words, `unroll_words` not used anymore? src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1466: > 1464: void MacroAssembler::vector_update_crc32(Register crc, Register buf, Register len, const int64_t unroll_words, > 1465: Register tmp1, Register tmp2, Register tmp3, Register tmp4, > 1466: Register table0, Register table3, const int64_t single_table_size) { I personally prefer to make `single_table_size` a const local for this assember routine: `const int64_t single_table_size = 256;` src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1483: > 1481: assert(MaxVectorSize > 32, "sanity"); > 1482: vsetivli(zr, N, Assembler::e32, Assembler::m1, Assembler::ma, Assembler::ta); > 1483: } Please put a new line after this else block. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1485: > 1483: } > 1484: vmv_v_x(vcrc, zr); > 1485: zext_w(crc, crc); What is this 'zext_w' for? And I see following two instructions on `kernel_crc32` entry which I think already zero-extended 32-bit `crc`? mv(tmp5, right_32_bits); andn(crc, tmp5, crc); src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1504: > 1502: addi(buf, buf, N*W); > 1503: > 1504: mv(t1, 0xff); May be put value 0xff in another used tmp register (say `tmp5`) so that we could reuse it in the for loop? The purpose is to save the `mv(t1, 0xff);` instruction in the for loop. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1581: > 1579: const int64_t tmp_limit = MaxVectorSize >= 32 ? unroll_words*3 : unroll_words*5; > 1580: sub(tmp1, len, tmp_limit); > 1581: bge(tmp1, zr, L_vector_entry); Seem more obvious to move `tmp_limit` into `tmp1` and do: `bge(len, tmp1, L_vector_entry);` This will save us one subtract instruction when `tmp_limit` is big (i.e., when `is_simm12(tmp_limit)` is false). ------------- PR Review: https://git.openjdk.org/jdk/pull/20910#pullrequestreview-2305674484 PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1760448381 PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1760449048 PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1760465567 PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1760468317 PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1760463764 PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1760450687 From jbhateja at openjdk.org Mon Sep 16 02:58:41 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 16 Sep 2024 02:58:41 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Disabling VectorLoadShuffle bypassing optimization to comply with rearrange semantics at IR level. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/1c00f417..7c80bfce Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=08-09 Stats: 321 lines in 51 files changed: 57 ins; 97 del; 167 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From jbhateja at openjdk.org Mon Sep 16 03:02:08 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 16 Sep 2024 03:02:08 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v8] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: <4IqtmftuGBNSj8_1HsI3x9eKBSf4QhpoKELYs1EanLE=.15ae8f1b-f586-403a-88d6-9193bba90fb2@github.com> On Sun, 15 Sep 2024 07:16:17 GMT, Emanuel Peter wrote: > > > Can you please **define** somewhere what it means to `prune indexes`? It does not help me much more than the previous "massaging indexes" you had before I asked you to change it. > > > > Also: I'm a little worried about the semantics change of the RearrangeNode that you did with the changes in RearrangeNode::Ideal. It looks a little "hacky", especially in conjunction with the vector_indexes_needs_massaging method. Can you give a clear definition of the semantics of RearrangeNode and vector_indexes_needs_massaging, please? > > > > > > > > > You have also not responded to this yet. It seems to me that before your proposed change, `RearrangeNode` had a clear and easy semantic, and now you somehow "hack it" to work with your `vector_indexes_needs_pruning`. Can you explain please why this makes sense and add a comment to `RearrangeNode` what its semantics is? > > > > > > In case target does not directly support two vector selection instruction we lower the IR to its constituents, this is better than intrinsification failure as it saves costly vector boxing penalties. > > Consider in terms of desired compiler IR and not rearrange API semantics, VectorRearrange IR node generally expects shape conformance b/w vector to be permuted and index vector, since shuffle indices are held in byte array based backing storage hence compiler injects VectorLoadShuffle nodes to upcast the byte vector lanes holding indexes to match the input vector lane. Since selectFrom API already passes the indexes through vector hence we can save emitting redundant toShuffle() + toVector() operations in some cases apart from some target specific scenarios e.g. AVX2 targets [do not support direct short vector permute]instruction "VPERMW", hence we need to [massage the index vector](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L8771) to emulate desired permutation using byte permute instruction. > > VectorLoadShuffle abstraction hides target specific index massaging which is why adding a target specific hook like Matcher::vector_indexes_needs_pruning compiler to selectively emit VectorLoadShuffle. > > I still do not see a **definition of the semantics of RearrangeNode**: what inputs does it accept and what does it do with them? > > Can you put this explanation as comment in the code, please? > > It sounds like this is what the `massaging` / `pruning` is: `emulate desired permutation using byte permute instruction.` You should find an accordingly more suiting name for the method name. Maybe it is something like `must_emulate_permutation_with...`. Or maybe it is rather a `supported` kind of question? I leave that up to you. Hi @eme64 , As per discussion on [PR# 20634 ](https://github.com/openjdk/jdk/pull/20634#discussion_r1759701508), we plan to suppress VectorLoadShuffle bypassing optimization for now and address this as a follow up optimization for both the flavors of selectFrom API. I have addressed your comments. Kindly verify. Best Regards, Jatin ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2351944720 From fyang at openjdk.org Mon Sep 16 03:23:03 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 16 Sep 2024 03:23:03 GMT Subject: RFR: 8339771: RISC-V: Reduce icache flushes [v2] In-Reply-To: References: Message-ID: <_cSl91MWyiNI4ZY3cBEOzxzNItUyIw-7kHP8UJxBj50=.920fcc4b-8dac-45e0-a2ab-c4f2dcd4276c@github.com> On Tue, 10 Sep 2024 12:53:18 GMT, Robbin Ehn wrote: >> Hey, please consider, >> >> All code which is offline (behind a barrier) do not need global icache flushes. >> As we can instead in slow path locally (thread and hart) emit fence.i. >> But if we were to be context switch do a hart which have not had fence.i emitted we can still fetch stale instructions. >> To handle this case new now have kernel support: >> https://docs.kernel.org/arch/riscv/cmodx.html >> >> It's not perfect as we will be emitting fence.i on any context switch for any thread with this patch, even if that thread do not execute on code heap (non attached native thread), and even if there was no changes to code heap. >> But this is in many cases much faster as the icache flush global IPI is very intrusive. >> Particular cases are running a concurrent gc with small head room. >> In such scenario I measured 15% increased throughput on VF2. >> A large CPU or less head room (faster GC cycles) will yield even more performance boost. >> >> Note that this requires 6.10 kernel. >> >> I'm running VF2 with 6.11-rc3 kernel and this passed tier1-3. (With setting on) >> >> Later we probably want this default on, but as it's hard to test I'll leave default off. > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Comment, moved init after feature enabling This is interesting. Thanks for trying this out. BTW: Will this reflect on performance numbers of popular benchmarks workloads? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20913#issuecomment-2351957337 From amitkumar at openjdk.org Mon Sep 16 04:56:36 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 16 Sep 2024 04:56:36 GMT Subject: RFR: 8339416: [s390x] Provide implementation for resolve_global_jobject [v2] In-Reply-To: References: Message-ID: > This PR provides "resolve_global_jobject" method implementation for s390x-port. > > Testing: > * Tier1 test with Fastdebug; > * Added these changes on top of https://github.com/openjdk/jdk/pull/20479 and modified the call in the stubGenerator_s390.cpp file; > * 1. Ran tier1 test with a call to "resolve_jobect" > * 2. Ran tier1 test with a call to "resolve_global_jobject" > > I didn't see any new failure appearing there. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: removes extra line ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20986/files - new: https://git.openjdk.org/jdk/pull/20986/files/fa36e66b..95a832e4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20986&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20986&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20986.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20986/head:pull/20986 PR: https://git.openjdk.org/jdk/pull/20986 From amitkumar at openjdk.org Mon Sep 16 04:56:36 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 16 Sep 2024 04:56:36 GMT Subject: RFR: 8339416: [s390x] Provide implementation for resolve_global_jobject [v2] In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 04:53:11 GMT, Amit Kumar wrote: >> This PR provides "resolve_global_jobject" method implementation for s390x-port. >> >> Testing: >> * Tier1 test with Fastdebug; >> * Added these changes on top of https://github.com/openjdk/jdk/pull/20479 and modified the call in the stubGenerator_s390.cpp file; >> * 1. Ran tier1 test with a call to "resolve_jobect" >> * 2. Ran tier1 test with a call to "resolve_global_jobject" >> >> I didn't see any new failure appearing there. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > removes extra line src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3681: > 3679: bs->resolve_global_jobject(this, value, tmp1, tmp2); > 3680: } > 3681: Suggestion: ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20986#discussion_r1760529772 From aboldtch at openjdk.org Mon Sep 16 05:29:03 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 16 Sep 2024 05:29:03 GMT Subject: RFR: 8340009: Improve the output from assert_different_registers In-Reply-To: <_lgzZBTM5wRKfgsMXdgbdR45SxD6WJZprnJBw6Gk1SU=.78dd43ee-4984-4037-b828-5fd8fb1ce791@github.com> References: <_lgzZBTM5wRKfgsMXdgbdR45SxD6WJZprnJBw6Gk1SU=.78dd43ee-4984-4037-b828-5fd8fb1ce791@github.com> Message-ID: On Thu, 12 Sep 2024 12:56:13 GMT, Stefan Karlsson wrote: > `assert_different_registers` is a mechanism we use to ensure that we don't use the same register in different variables. When the assert triggers it is not immediately clear where and why the assert failed. > > For example, if I introduce an intentional violation: > > diff --git a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp > index fde868a64b3..551878ac09d 100644 > --- a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp > +++ b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp > @@ -1188,7 +1188,8 @@ void MacroAssembler::lookup_interface_method(Register recv_klass, > Register scan_temp, > Label& L_no_such_interface, > bool return_method) { > - assert_different_registers(recv_klass, intf_klass, scan_temp); > + Register joker = intf_klass; > + assert_different_registers(recv_klass, intf_klass, scan_temp, joker); > assert_different_registers(method_result, intf_klass, scan_temp); > assert(recv_klass != method_result || !return_method, > "recv_klass can be destroyed when method isn't needed"); > > I get this error message: > > # Internal Error (src/hotspot/share/asm/register.hpp:287), pid=42568, tid=9731 > # assert(!regs[i]->is_valid() || regs[i] != regs[j]) failed: Multiple uses of register: c_rarg0 > > The indicated file and line number refers to the `assert_different_registers` implementation and not the offending call site. More over, it's unclear from the assert which of the four variables contain the same register. > > I'd like to propose a few changes: > 1) That we report the indices of the conflicting registers > 2) That we report the correct file and line number > 3) That we hide the is_valid() check to lower the noise in the output. Not strictly necessary, but I think it looks nicer. > > After these suggestions we'll get error messages that look like this: > > # Internal Error (src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp:1187), pid=59065, tid=8963 > # assert(regs[i] != regs[j]) failed: regs[1] and regs[3] are both: c_rarg0 > > Which makes it easy to see that variables 1 and 3 are conflicting and by looking at the indicated file and line, it is clear that it is `intf_klass` and `joker` that are the offending variables. > > There might be a way to use more macros to propagate the variable names, but I propose that we start with this incremental improvement. The only _hairy_ macro thing is the millions of ways to create a FOREACH x macro. As long as you have one of those you can simply do: #define FOREACH(X, ...) // ... #define ASSERT_DIFFERENT_REGISTERS_X(A) #A, #define assert_different_registers(...) \ do { \ const char* names[] = { \ FOREACH(ASSERT_DIFFERENT_REGISTERS_X, __VA_ARGS__) \ }; \ assert_different_registers_impl(__FILE__, __LINE__, names, __VA_ARGS__); \ } while (0) Maybe you can skip the cross macro and use `#__VA_ARGS__` directly, parse it and print subspans in that string. But doing that correctly is probably even _hairier_ than the macros. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20965#issuecomment-2352042158 From shade at openjdk.org Mon Sep 16 05:35:09 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 16 Sep 2024 05:35:09 GMT Subject: RFR: 8340105: Expose BitMap::print_on in release builds In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 10:21:51 GMT, Aleksey Shipilev wrote: > A small quality of life improvement. This irritates me often enough when I am looking at various bitmaps in release builds. BitMap::print_on is not available in release builds, and bitmaps in debug builds are sometimes different than the ones in release builds. This often forces me to do additional hack to expose it. I think it should just be available in release builds to begin with. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20995#issuecomment-2352045494 From shade at openjdk.org Mon Sep 16 05:35:10 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 16 Sep 2024 05:35:10 GMT Subject: Integrated: 8340105: Expose BitMap::print_on in release builds In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 10:21:51 GMT, Aleksey Shipilev wrote: > A small quality of life improvement. This irritates me often enough when I am looking at various bitmaps in release builds. BitMap::print_on is not available in release builds, and bitmaps in debug builds are sometimes different than the ones in release builds. This often forces me to do additional hack to expose it. I think it should just be available in release builds to begin with. This pull request has now been integrated. Changeset: 74add0e2 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/74add0e2e071a8c8e9547e5a1757b5950b780539 Stats: 9 lines in 2 files changed: 1 ins; 8 del; 0 mod 8340105: Expose BitMap::print_on in release builds Reviewed-by: stuefe, stefank ------------- PR: https://git.openjdk.org/jdk/pull/20995 From amitkumar at openjdk.org Mon Sep 16 06:01:07 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 16 Sep 2024 06:01:07 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v2] In-Reply-To: <7GRqYZv0orxGXAMb7ldeV5skNK7FraE-ovnhwlMDKNg=.0018fbb2-36ed-4870-80f6-6aaf026cd9f2@github.com> References: <7GRqYZv0orxGXAMb7ldeV5skNK7FraE-ovnhwlMDKNg=.0018fbb2-36ed-4870-80f6-6aaf026cd9f2@github.com> Message-ID: On Fri, 13 Sep 2024 13:19:26 GMT, Fredrik Bredberg wrote: >> Removed the concept of an ObjectMonitor Responsible thread. >> >> The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. >> >> The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). >> >> After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. >> >> Passes tier1-tier7 on supported platforms. >> x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. >> Arm32 and Zero doesn't need any changes as far as I can tell. > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Update one, after the review src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3685: > 3683: z_stg(currentHeader, Address(Z_thread, JavaThread::unlocked_inflated_monitor_offset())); > 3684: > 3685: z_cr(currentHeader, Z_thread); // Set flag = NE I ran tier1 test and don't see any new failure appearing. How about using `z_ltgr` here ? Suggestion: z_ltgr(oop, oop); // Set flag = NE ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1760569255 From rcastanedalo at openjdk.org Mon Sep 16 06:56:18 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 16 Sep 2024 06:56:18 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: Message-ID: <6UJOrZqmfsJj6pRzMjPdlYt191QgBV6fIv1qJAYsv60=.15284272-464f-4321-b76c-3412dafc6c63@github.com> On Fri, 13 Sep 2024 13:11:45 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Various touch-ups src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2576: > 2574: } else { > 2575: lea(dst, Address(obj, index, Address::lsl(scale))); > 2576: ldr(dst, Address(dst, offset)); Do you have a reproducer (or, better yet, a test case) that exercises this case? I ran Oracle's internal CI tiers 1-5 and could never hit it. Could this happen for x64 as well? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1760617744 From luhenry at openjdk.org Mon Sep 16 07:29:07 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 16 Sep 2024 07:29:07 GMT Subject: RFR: 8329816: Add SLEEF version 3.6.1 [v5] In-Reply-To: References: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> Message-ID: On Fri, 30 Aug 2024 16:57:18 GMT, Magnus Ihse Bursie wrote: >> [JDK-8312425](https://bugs.openjdk.org/browse/JDK-8312425) is looking to optimize vector math operations by leveraging the SLEEF library. For legal reasons the actual contribution of the SLEEF files needs to be handled separately. >> >> This is a new attempt at solving [JDK-8329816](https://bugs.openjdk.org/browse/JDK-8329816); the original attempt is here: https://github.com/openjdk/jdk/pull/19185. This PR is based on the discussions on how to move forward that was held in that original PR. > > Magnus Ihse Bursie has updated the pull request incrementally with two additional commits since the last revision: > > - Use "whitespace" as an uncountable noun > > Co-authored-by: Erik Joelsson <37597443+erikj79 at users.noreply.github.com> > - I suck at English verb forms > > Co-authored-by: Erik Joelsson <37597443+erikj79 at users.noreply.github.com> Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20781#pullrequestreview-2305912067 From epeter at openjdk.org Mon Sep 16 07:51:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 07:51:15 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> Message-ID: <0Ak63KM4JYdbOT33Q8XPcv096MKzjY6vyl4xZkapZwM=.6b92e423-9bd5-4109-a0b3-75dbeaf40315@github.com> On Mon, 16 Sep 2024 02:58:41 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Disabling VectorLoadShuffle bypassing optimization to comply with rearrange semantics at IR level. Looks better, I still have a few comments. src/hotspot/share/opto/vectorIntrinsics.cpp line 2739: > 2737: return true; > 2738: } > 2739: @jatin-bhateja You still have 3x `unbox failed v1` here. I already commented this earlier, and you resolved it and gave it a thumbs up ? Can you please fix it now? src/hotspot/share/opto/vectornode.cpp line 2116: > 2114: const TypeVect* index_vect_type = index_vec->bottom_type()->is_vect(); > 2115: BasicType index_elem_bt = index_vect_type->element_basic_type(); > 2116: assert(!is_floating_point_type(index_elem_bt), ""); Why not verify this also in the constructor of `SelectFromTwoVectorNode`? Can you maybe explicitly verify what it must be rather than **not** be? src/hotspot/share/opto/vectornode.cpp line 2122: > 2120: // index format by subsequent VectorLoadShuffle. > 2121: int cast_vopc = VectorCastNode::opcode(0, index_elem_bt, true); > 2122: Node* index_byte_vec = phase->transform(VectorCastNode::make(cast_vopc, index_vec, T_BYTE, num_elem)); This cast assumes that the indices cannot have more than 8 bits. This would allow vector lengths of up to 256. This is fine for intel. But as far as I know ARM has in principle longer vectors - up to 2048 bytes. Should we maybe add some assert here to make sure we never badly truncate the index? src/hotspot/share/opto/vectornode.cpp line 2138: > 2136: > 2137: // Load indexes from byte vector and appropriatly massage them to target specific > 2138: // permutation index format. I would replace `massage` -> `transform` everywhere. src/hotspot/share/opto/vectornode.hpp line 1625: > 1623: Node* Ideal(PhaseGVN* phase, bool can_reshape); > 1624: virtual int Opcode() const; > 1625: }; `index` -> `indexes` because this is a vector, right? Otherwise I'll assume it is a scalar. Can you do some pseudo-code, that says how exactly the indices are interpreted? What if they are out of bounds? Does it wrap? Or assume they are in bounds? Undefined behaviour? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20508#pullrequestreview-2305905569 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760651336 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760674461 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760678107 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760680772 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760665944 From epeter at openjdk.org Mon Sep 16 07:51:16 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 07:51:16 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 30 Aug 2024 14:44:05 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding descriptive comments > > src/hotspot/share/opto/vectornode.cpp line 2104: > >> 2102: // MASK) >> 2103: // This shall prevent an intrinsification failure and associated argument >> 2104: // boxing penalties. > > A quick comment about how the mask is computed could be nice. > `MASK = INDEX < num_elem` @jatin-bhateja very nice, thanks! > src/hotspot/share/opto/vectornode.cpp line 2148: > >> 2146: >> 2147: BoolTest::mask pred = BoolTest::lt; >> 2148: ConINode* pred_node = (ConINode*)phase->makecon(TypeInt::make(pred)); > > Would `as_ConI()` be a better alternative to the `(ConINode*)` cast? Please at least add a comment why you are not following my suggestion. I feel like the work I put in to review is not being respected when comments are just silently resolved without any action or comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760673419 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760656304 From epeter at openjdk.org Mon Sep 16 07:51:17 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 07:51:17 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Mon, 16 Sep 2024 07:27:06 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectornode.cpp line 2148: >> >>> 2146: >>> 2147: BoolTest::mask pred = BoolTest::lt; >>> 2148: ConINode* pred_node = (ConINode*)phase->makecon(TypeInt::make(pred)); >> >> Would `as_ConI()` be a better alternative to the `(ConINode*)` cast? > > Please at least add a comment why you are not following my suggestion. I feel like the work I put in to review is not being respected when comments are just silently resolved without any action or comment. I really do think that `as_ConI()` would be the right thing here. In product it is just a cast, and in debug at least we have an assert. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760657072 From epeter at openjdk.org Mon Sep 16 07:51:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 07:51:18 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: <0Ak63KM4JYdbOT33Q8XPcv096MKzjY6vyl4xZkapZwM=.6b92e423-9bd5-4109-a0b3-75dbeaf40315@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> <0Ak63KM4JYdbOT33Q8XPcv096MKzjY6vyl4xZkapZwM=.6b92e423-9bd5-4109-a0b3-75dbeaf40315@github.com> Message-ID: <4VKGFHuL8RSSll0Pnqgg5DeesBdXys8JOZT64yGUBG8=.58b88db6-58c0-49ea-b01c-d2d814a93cae@github.com> On Mon, 16 Sep 2024 07:35:46 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Disabling VectorLoadShuffle bypassing optimization to comply with rearrange semantics at IR level. > > src/hotspot/share/opto/vectornode.hpp line 1625: > >> 1623: Node* Ideal(PhaseGVN* phase, bool can_reshape); >> 1624: virtual int Opcode() const; >> 1625: }; > > `index` -> `indexes` because this is a vector, right? Otherwise I'll assume it is a scalar. > Can you do some pseudo-code, that says how exactly the indices are interpreted? What if they are out of bounds? Does it wrap? Or assume they are in bounds? Undefined behaviour? For me good comments here would be immendely valuable, because it helps with other C2 optimizations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760667297 From epeter at openjdk.org Mon Sep 16 07:51:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Sep 2024 07:51:18 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: <_U2DgK6DAW3ZJhozsMhwHzggUFpj5fnHdLJOoYFcNJA=.1875811f-458f-4834-bb94-339a8ff7360d@github.com> On Fri, 13 Sep 2024 17:38:29 GMT, Jatin Bhateja wrote: >> That does not answer my question. If the backend operations you implemented would have the wrong vector-length: do we have any tests that would catch that? Often that requires not just going "up" with a loop but also "counting down" with the loop iv. Do you know what I mean? > > Patch includes tests for all the species (combination of vector type and sizes), each vector kernel is validated against equivalent scalar implementation, scenario which you are referring is implicitly handled though tests. Ok, just so that I can relax, can you please point me to this test that would implicitly verify that the backend has chosen the correct vector size? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1760671393 From tschatzl at openjdk.org Mon Sep 16 07:55:34 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 16 Sep 2024 07:55:34 GMT Subject: RFR: 8340119: Remove oopDesc::size_might_change() Message-ID: Hi all, please review this change that removes `oopDesc::size_might_change()` because since JDK-8337709 and JDK-8311163 no collector uses the objArray's length field during garbage collection any more. Testing: tier1-3 Thanks, Thomas ------------- Commit messages: - Hi all, Changes: https://git.openjdk.org/jdk/pull/20999/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20999&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340119 Stats: 14 lines in 3 files changed: 0 ins; 13 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20999.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20999/head:pull/20999 PR: https://git.openjdk.org/jdk/pull/20999 From amitkumar at openjdk.org Mon Sep 16 08:05:20 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 16 Sep 2024 08:05:20 GMT Subject: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v23] In-Reply-To: References: <-FcWfOFLvzxVi15ljQ7WQCDKL4Qnioew3EpOANiLlGI=.d7afc108-3dff-492b-889f-915dec0782f8@github.com> Message-ID: On Mon, 9 Sep 2024 13:32:24 GMT, Andrew Haley wrote: >> This patch expands the use of a hash table for secondary superclasses >> to the interpreter, C1, and runtime. It also adds a C2 implementation >> of hashed lookup in cases where the superclass isn't known at compile >> time. >> >> HotSpot shared runtime >> ---------------------- >> >> Building hashed secondary tables is now unconditional. It takes very >> little time, and now that the shared runtime always has the tables, it >> might as well take advantage of them. The shared code is easier to >> follow now, I think. >> >> There might be a performance issue with x86-64 in that we build >> HotSpot for a default x86-64 target that does not support popcount. >> This means that HotSpot C++ runtime on x86 always uses a software >> emulation for popcount, even though the vast majority of machines made >> for the past 20 years can do popcount in a single instruction. It >> wouldn't be terribly hard to do something about that. >> >> Having said that, the software popcount is really not bad. >> >> x86 >> --- >> >> x86 is rather tricky, because we still support >> `-XX:-UseSecondarySupersTable` and `-XX:+UseSecondarySupersCache`, as >> well as 32- and 64-bit ports. There's some further complication in >> that only `RCX` can be used as a shift count, so there's some register >> shuffling to do. All of this makes the logic in macroAssembler_x86.cpp >> rather gnarly, with multiple levels of conditionals at compile time >> and runtime. >> >> AArch64 >> ------- >> >> AArch64 is considerably more straightforward. We always have a >> popcount instruction and (thankfully) no 32-bit code to worry about. >> >> Generally >> --------- >> >> I would dearly love simply to rip out the "old" secondary supers cache >> support, but I've left it in just in case someone has a performance >> regression. >> >> The versions of `MacroAssembler::lookup_secondary_supers_table` that >> work with variable superclasses don't take a fixed set of temp >> registers, and neither do they call out to to a slow path subroutine. >> Instead, the slow patch is expanded inline. >> >> I don't think this is necessarily bad. Apart from the very rare cases >> where C2 can't determine the superclass to search for at compile time, >> this code is only used for generating stubs, and it seemed to me >> ridiculous to have stubs calling other stubs. >> >> I've followed the guidance from @iwanowww not to obsess too much about >> the performance of C1-compiled secondary supers lookups, and to prefer >> simplicity over absolute performance. Nonetheless, this i... > > Andrew Haley has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 61 commits: > > - Merge from 4ff72dc57e65e99b129f0ba28196994edf402018 > - Fix s390 > - Use post-incrememnt RegSet operator. > - Merge branch 'clean' into JDK-8331658-work > - Fix merge > - Merge branch 'clean' into JDK-8331658-work > - Merge from JDK head. > - Cleanup > - Fix shared code > - Fix shared code > - ... and 51 more: https://git.openjdk.org/jdk/compare/4ff72dc5...a7612674 src/hotspot/cpu/aarch64/aarch64.ad line 16079: > 16077: > 16078: ins_encode %{ > 16079: bool success = false; Suggestion: ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19989#discussion_r1760698601 From rcastanedalo at openjdk.org Mon Sep 16 08:07:18 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 16 Sep 2024 08:07:18 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 13:11:45 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Various touch-ups > * Note that oopDesc::klass_offset_in_bytes() is not used by +UCOH paths anymore. The only exception is C2, which uses it as a placeholder/identifier of the special memory slice that only LoadNKlass uses. The backend then extracts the original oop and loads its mark-word and extracts the narrow-Klass* from that. I agree that this is the simplest and least intrusive way of getting klass loading working in C2 for this experimental version of the feature. However, the approach seems brittle and error-prone, and it may be hard to maintain in the long run. Therefore, I think that a more principled and robust modeling will be needed, after this PR is integrated, in preparation for the non-experimental version. An alternative that seems promising is to hide the object header klass pointer extraction and make it part of the `LoadNKlass` node semantics, as illustrated in this example: ![alternative-modeling](https://github.com/user-attachments/assets/06243966-3065-4969-a2dd-d05133b36366) `LoadNKlass` nodes can then be expanded into more primitive operations (load and shift for compact headers, load with `klass_offset_in_bytes()` for original headers) within C2's back-end or even during code emission as sketched [here](https://github.com/robcasloz/jdk/commit/6cb4219f101e3be982264071c2cb1d0af1c6d754). @rkennke is this similar to what you tried out ("Expanding it as a macro")? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2352253326 From mdoerr at openjdk.org Mon Sep 16 08:19:11 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 16 Sep 2024 08:19:11 GMT Subject: RFR: 8340012: [C2] assert(KlassEncodingMetaspaceMax > pd) failed: change encoding max if new encoding after 8338526 [v2] In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 15:43:20 GMT, Martin Doerr wrote: >> After JDK-8338526, the transformation can only be done if the Klass* is in the encoding range (also see JBS issue for more details). I've also enhanced the assertion message. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add sanity check. Thanks for the reviews! Test results look good. Let's get rid of the failure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20971#issuecomment-2352270134 From mdoerr at openjdk.org Mon Sep 16 08:19:11 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 16 Sep 2024 08:19:11 GMT Subject: Integrated: 8340012: [C2] assert(KlassEncodingMetaspaceMax > pd) failed: change encoding max if new encoding after 8338526 In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 17:16:08 GMT, Martin Doerr wrote: > After JDK-8338526, the transformation can only be done if the Klass* is in the encoding range (also see JBS issue for more details). I've also enhanced the assertion message. This pull request has now been integrated. Changeset: 6be15c3d Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/6be15c3d0bf0bb3625f2ecd43d7aa10e81f6edd8 Stats: 9 lines in 3 files changed: 7 ins; 0 del; 2 mod 8340012: [C2] assert(KlassEncodingMetaspaceMax > pd) failed: change encoding max if new encoding after 8338526 Reviewed-by: kvn, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/20971 From rcastanedalo at openjdk.org Mon Sep 16 08:19:24 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 16 Sep 2024 08:19:24 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v20] In-Reply-To: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Fri, 13 Sep 2024 23:16:32 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix a few style issues > > src/hotspot/share/gc/g1/g1BarrierSet.cpp line 65: > >> 63: #else >> 64: make_barrier_set_c2(), >> 65: #endif > > I assume it is temporary until all ports a ready (except 32-bit x86 may be). Right? Right, all code guarded by `G1_LATE_BARRIER_MIGRATION_SUPPORT` will be removed before integration. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1760716721 From mli at openjdk.org Mon Sep 16 08:30:06 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 16 Sep 2024 08:30:06 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v7] In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 02:43:35 GMT, Fei Yang wrote: > Thanks for the update. Several minor comments remain. BTW: Do you have the updated JMH data to see? Thanks for the detailed review. With the latest patch, jmh shows no much different result. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20910#issuecomment-2352294203 From mli at openjdk.org Mon Sep 16 08:34:46 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 16 Sep 2024 08:34:46 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v8] In-Reply-To: References: Message-ID: <_4uhEXm6J6Ioe8kOpsavJsJb_jxR83kMN3JcD9RbsN8=.dd158833-edb6-464d-8c06-e1bf70414cb1@github.com> > Hi, > Can you help to review this patch? > Thanks. > > This improvement is based on java.base/share/native/libzip/zlib/zcrc32.c, I made some modification to N (to 16) related code, then re-generate the tables needed, finally vectorize the code (original implementation in zcrc32.c is just scalar code). > > ## Test > test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java, > test/jdk/java/util/zip/TestCRC32.java > > ## Performance > > ###?on bananapi > > with patch > > Benchmark | (count) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- > TestCRC32.testCRC32Update | 64 | avgt | 10 | 222.297 | 0.106 | ns/op > TestCRC32.testCRC32Update | 128 | avgt | 10 | 365.144 | 0.196 | ns/op > TestCRC32.testCRC32Update | 256 | avgt | 10 | 687.14 | 0.235 | ns/op > TestCRC32.testCRC32Update | 512 | avgt | 10 | 1043.833 | 0.083 | ns/op > TestCRC32.testCRC32Update | 2048 | avgt | 10 | 3299.928 | 1.361 | ns/op > TestCRC32.testCRC32Update | 16384 | avgt | 10 | 24384.502 | 25.298 | ns/op > TestCRC32.testCRC32Update | 65536 | avgt | 10 | 103200.458 | 8.297 | ns/op > > > > without patch > > Benchmark | (count) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- > TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.878 | 0.02 | ns/op > TestCRC32.testCRC32Update | 128 | avgt | 10 | 364.173 | 0.032 | ns/op > TestCRC32.testCRC32Update | 256 | avgt | 10 | 685.815 | 0.055 | ns/op > TestCRC32.testCRC32Update | 512 | avgt | 10 | 1329.049 | 0.084 | ns/op > TestCRC32.testCRC32Update | 2048 | avgt | 10 | 5189.302 | 0.666 | ns/op > TestCRC32.testCRC32Update | 16384 | avgt | 10 | 41250.873 | 23.882 | ns/op > TestCRC32.testCRC32Update | 65536 | avgt | 10 | 171664.002 | 15.011 | ns/op > > > > ### on K230 > > with patch > References: Message-ID: On Thu, 5 Sep 2024 17:46:49 GMT, Johan Sj?len wrote: > Hi, > > The code for the `Unsafe.setMemory` intrinsic has a few issues that this PR cleans up. > > 1. The labels are unused in x86-64 intrinsic > 2. The function stub has an incorrect function prototype as it clearly manipulates the array so the array is not const, and we don't read the array so it probably shouldn't be called `src`. That's probably just an issue of `UnsafeArrayCopyStub` being copied and altered insufficiently. > > Thanks. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20873#issuecomment-2352385210 From jsjolen at openjdk.org Mon Sep 16 09:16:14 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 16 Sep 2024 09:16:14 GMT Subject: Integrated: 8339627: Cleanup Unsafe.setMemory intrinsic code In-Reply-To: References: Message-ID: On Thu, 5 Sep 2024 17:46:49 GMT, Johan Sj?len wrote: > Hi, > > The code for the `Unsafe.setMemory` intrinsic has a few issues that this PR cleans up. > > 1. The labels are unused in x86-64 intrinsic > 2. The function stub has an incorrect function prototype as it clearly manipulates the array so the array is not const, and we don't read the array so it probably shouldn't be called `src`. That's probably just an issue of `UnsafeArrayCopyStub` being copied and altered insufficiently. > > Thanks. This pull request has now been integrated. Changeset: 54595188 Author: Johan Sj?len URL: https://git.openjdk.org/jdk/commit/545951889c1ea68646be600decaf2bf4c049600b Stats: 4 lines in 2 files changed: 0 ins; 3 del; 1 mod 8339627: Cleanup Unsafe.setMemory intrinsic code Reviewed-by: tschatzl, fbredberg ------------- PR: https://git.openjdk.org/jdk/pull/20873 From rcastanedalo at openjdk.org Mon Sep 16 09:31:13 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 16 Sep 2024 09:31:13 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v20] In-Reply-To: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Fri, 13 Sep 2024 23:18:44 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix a few style issues > > src/hotspot/share/opto/output.cpp line 2026: > >> 2024: if (n->is_MachNullCheck()) { >> 2025: assert(n->in(1)->as_Mach()->barrier_data() == 0, >> 2026: "Implicit null checks on memory accesses with barriers are not yet supported"); > > I don't see here changes in `lcm.cpp` which would prevent it. I did not make any changes because the current logic in `lcm.cpp` already prevents this, albeit in a rather accidental way: `PhaseCFG::implicit_null_check` requires that all inputs of a candidate memory operation dominate the null check ([here](https://github.com/robcasloz/jdk/blob/d21104ca8ff1eef88a9d87fb78dda3009414b5b8/src/hotspot/share/opto/lcm.cpp#L310-L328)) so that it can be hoisted. This fails if the candidate memory operation has barriers because these always require `MachTemp` nodes, which are placed in the same block as the candidate and break the dominance condition. See a longer explanation [here](https://github.com/openjdk/jdk/pull/19746/files#r1715387255). Should I add a check to `PhaseCFG::implicit_null_check` to discard these memory accesses more explicitly? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1760814745 From ihse at openjdk.org Mon Sep 16 10:30:46 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 16 Sep 2024 10:30:46 GMT Subject: RFR: 8329816: Add SLEEF version 3.6.1 [v6] In-Reply-To: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> References: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> Message-ID: > [JDK-8312425](https://bugs.openjdk.org/browse/JDK-8312425) is looking to optimize vector math operations by leveraging the SLEEF library. For legal reasons the actual contribution of the SLEEF files needs to be handled separately. > > This is a new attempt at solving [JDK-8329816](https://bugs.openjdk.org/browse/JDK-8329816); the original attempt is here: https://github.com/openjdk/jdk/pull/19185. This PR is based on the discussions on how to move forward that was held in that original PR. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Update legal/sleef.md ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20781/files - new: https://git.openjdk.org/jdk/pull/20781/files/e5fe681e..83dddf1f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20781&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20781&range=04-05 Stats: 407 lines in 1 file changed: 404 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20781.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20781/head:pull/20781 PR: https://git.openjdk.org/jdk/pull/20781 From jsjolen at openjdk.org Mon Sep 16 12:00:19 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 16 Sep 2024 12:00:19 GMT Subject: RFR: 8340178: Make ArrayWithFreeList have Index type and move to utilities Message-ID: <60GYHj6KckbaHKY1mDgIyiEjzkqdAKpRyNchQXi37xE=.2b6b0cbb-4066-4c56-9ff6-af58ffd55b38@github.com> Hi, This PR does multiple things: 1. Gives `AWFL` an index template `I` which specifies the type of the indices, this lets us have very small indices and that saves memory. 2. Gives `AWFL` the ability to store things in a static memory area of a specific length 3. Finally, moves it to utilities for general consumption For some context: I tried to give `GrowableArray` the index type feature, but I hit a brick wall at changing the assert messages. It's also not a feature which has consensus, some people like it, and some people think it's too complex. I find putting a smaller and hidden `resizable_array` class In AWFL to be an acceptable compromise. I also believe that `GA` will not find too much competition with `AWFL`, as it has a less rich API and is really meant as an allocator interface rather than a general array type. **Hint for reviewers:** Do NOT go into "Files changed", look at the commits to see the actual changes and ignore the commits with "Move" in the title. ------------- Commit messages: - Use int - No need for reinterpret cast - Style - Change test - Change AWFL - Move AWFL - Move test - Changes to NCSS Changes: https://git.openjdk.org/jdk/pull/20002/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20002&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340178 Stats: 567 lines in 5 files changed: 307 ins; 259 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20002/head:pull/20002 PR: https://git.openjdk.org/jdk/pull/20002 From jwaters at openjdk.org Mon Sep 16 12:00:19 2024 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 16 Sep 2024 12:00:19 GMT Subject: RFR: 8340178: Make ArrayWithFreeList have Index type and move to utilities In-Reply-To: <60GYHj6KckbaHKY1mDgIyiEjzkqdAKpRyNchQXi37xE=.2b6b0cbb-4066-4c56-9ff6-af58ffd55b38@github.com> References: <60GYHj6KckbaHKY1mDgIyiEjzkqdAKpRyNchQXi37xE=.2b6b0cbb-4066-4c56-9ff6-af58ffd55b38@github.com> Message-ID: On Wed, 3 Jul 2024 10:06:09 GMT, Johan Sj?len wrote: > Hi, > > This PR does multiple things: > > 1. Gives `AWFL` an index template `I` which specifies the type of the indices, this lets us have very small indices and that saves memory. > 2. Gives `AWFL` the ability to store things in a static memory area of a specific length > 3. Finally, moves it to utilities for general consumption > > For some context: > > I tried to give `GrowableArray` the index type feature, but I hit a brick wall at changing the assert messages. It's also not a feature which has consensus, some people like it, and some people think it's too complex. I find putting a smaller and hidden `resizable_array` class In AWFL to be an acceptable compromise. I also believe that `GA` will not find too much competition with `AWFL`, as it has a less rich API and is really meant as an allocator interface rather than a general array type. > > **Hint for reviewers:** Do NOT go into "Files changed", look at the commits to see the actual changes and ignore the commits with "Move" in the title. Bookmarking so I can remember to return to this when it's Ready for Review ------------- PR Comment: https://git.openjdk.org/jdk/pull/20002#issuecomment-2206098366 From stuefe at openjdk.org Mon Sep 16 12:17:26 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 16 Sep 2024 12:17:26 GMT Subject: RFR: 8340184: Bug in CompressedKlassPointers::is_in_encodable_range Message-ID: Since JDK-8338526, we keep only those Klass in the encoding range that need a narrowKlass id. To check whether a Klass has a narrowKlass id, we call `CompressedKlassPointers::is_in_encodable_range()`. There is a small bug that results from the confusion around "encoding range" vs "klass range". The **"Encoding Range"** is the range that can be encoded with the current encoding base, encoding shift, and (implicitly) the bit size of the narrowKlass. The encoding range reaches from `[ ... + 1 << (32 + ) ).`. Its size is either 4G (shift=0) or 32G (shift=3). The **"Klass Range"** is the range that actually holds Klass structures. It is part of the Encoding Range, but usually much smaller: - with zero-based encoding, the encoding base is zero, so it precedes the start of the Klass range - The encoding range can reach far beyond the end of the Klass range. For a more detailed explanation, including pleasing ASCII art, please refer to `compressedKlass.hpp` in this patch. ---- The error in this case, introduced with 8338526, was that we use the range `[ ... )` for `is_in_encodable_range()`. That can lead to false positives since `` can be zero. In a highly contrived theoretical case, we could mis-classify a Klass as being encodable if it lives in metaspace outside class space, but its metaspace region happens to be located below the class space start. The error is extremely unlikely because: - non-class Metaspace regions - freely allocated by mmap - typically live in high-address ranges - we only allow zero-based encoding with Xshare:off, itself a very unusual setting nowadays But the error highlights misuse of the term "encoding range", so it should be fixed. ---- Note that most of the patch has been selectively copied from Lilliput. Lilliput, with its non-22-bit narrowKlass, had needed to straighten out this code a while ago, and it is better for it. The fix: - we now use the actual *Klass Range* for the "is encodable" check. We don't use the encoding base. That is the real fix. - To do that, we need to keep track of the Klass Range inside `CompressedKlassPointers`. That is simple, since we are given this range during initialization. Minor cleanups: - I also removed the confusingly named `CompressedKlassPointers::range()`. This was a strange animal ( the distance between *klass range end* and *encoding range start* ) and was only used in a single place in the VM to compute the max. value of a *preshifted* narrowKlass (see macroAssembler_aarch64.cpp). - I renamed `is_in_encoding_range` to `is_encodable` since we don't check for the "encoding range", we check for the "klass range". New Tests: - regression test Additional testing: * [x] Linux x64 fastdebug, `runtime/cds`, `runtime/CompressedOops` and `gtest` * [x] MaxOS aarch64 fastdebug, `runtime/CompressedOops` and `gtest` ------------- Commit messages: - fix macos - JDK-8340184-Bug-in-CompressedKlassPointers-is_in_encodable_range Changes: https://git.openjdk.org/jdk/pull/21015/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21015&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340184 Stats: 240 lines in 10 files changed: 213 ins; 10 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/21015.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21015/head:pull/21015 PR: https://git.openjdk.org/jdk/pull/21015 From stuefe at openjdk.org Mon Sep 16 12:23:04 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 16 Sep 2024 12:23:04 GMT Subject: RFR: 8340184: Bug in CompressedKlassPointers::is_in_encodable_range In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 10:37:55 GMT, Thomas Stuefe wrote: > Since JDK-8338526, we keep only those Klass in the encoding range that need a narrowKlass id. > > To check whether a Klass has a narrowKlass id, we call `CompressedKlassPointers::is_in_encodable_range()`. There is a small bug that results from the confusion around "encoding range" vs "klass range". > > The **"Encoding Range"** is the range that can be encoded with the current encoding base, encoding shift, and (implicitly) the bit size of the narrowKlass. The encoding range reaches from `[ ... + 1 << (32 + ) ).`. Its size is either 4G (shift=0) or 32G (shift=3). > > The **"Klass Range"** is the range that actually holds Klass structures. It is part of the Encoding Range, but usually much smaller: > - with zero-based encoding, the encoding base is zero, so it precedes the start of the Klass range > - The encoding range can reach far beyond the end of the Klass range. > > For a more detailed explanation, including pleasing ASCII art, please refer to `compressedKlass.hpp` in this patch. > > ---- > > The error in this case, introduced with 8338526, was that we use the range `[ ... )` for `is_in_encodable_range()`. That can lead to false positives since `` can be zero. In a highly contrived theoretical case, we could mis-classify a Klass as being encodable if it lives in metaspace outside class space, but its metaspace region happens to be located below the class space start. > > The error is extremely unlikely because: > - non-class Metaspace regions - freely allocated by mmap - typically live in high-address ranges > - we only allow zero-based encoding with Xshare:off, itself a very unusual setting nowadays > > But the error highlights misuse of the term "encoding range", so it should be fixed. > > ---- > > Note that most of the patch has been selectively copied from Lilliput. Lilliput, with its non-22-bit narrowKlass, had needed to straighten out this code a while ago, and it is better for it. > > The fix: > - we now use the actual *Klass Range* for the "is encodable" check. We don't use the encoding base. That is the real fix. > - To do that, we need to keep track of the Klass Range inside `CompressedKlassPointers`. That is simple, since we are given this range during initialization. > > Minor cleanups: > - I also removed the confusingly named `CompressedKlassPointers::range()`. This was a strange animal ( the distance between *klass range end* and *encoding range start* ) and was only used in a single place in t... src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5084: > 5082: > 5083: if (operand_valid_for_logical_immediate( > 5084: /*is32*/false, (uint64_t)CompressedKlassPointers::base())) { Reviewer hint: The before and after code should be exactly identical. What we do here is we calculate the max. value a left-shifted nKlass can have, and compute a mask over it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21015#discussion_r1761042543 From rkennke at openjdk.org Mon Sep 16 12:38:00 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 16 Sep 2024 12:38:00 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v16] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 53 commits: - Fix test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java - Fix loop on aarch64 - clarify obscure assert in metasapce setup - Rework compressedklass encoding - remove stray debug output - Fixes post 8338526 - Merge commit '597788850041e7272a23714c05ba546ee6080856' into JDK-8305895-v4 - Various touch-ups - Hide log timestamps in test to prevent false failures - Revert accidental change of UCOH default - ... and 43 more: https://git.openjdk.org/jdk/compare/59778885...49c87547 ------------- Changes: https://git.openjdk.org/jdk/pull/20677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=15 Stats: 4605 lines in 190 files changed: 3252 ins; 724 del; 629 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Mon Sep 16 12:39:04 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 16 Sep 2024 12:39:04 GMT Subject: RFR: 8340184: Bug in CompressedKlassPointers::is_in_encodable_range In-Reply-To: References: Message-ID: <2P96xsn33JgpOtp73_g6BpFBqFZwbJ4POfZNGzU2V6U=.2308cc31-21b7-418d-85e4-1d038e88be68@github.com> On Mon, 16 Sep 2024 10:37:55 GMT, Thomas Stuefe wrote: > Since JDK-8338526, we keep only those Klass in the encoding range that need a narrowKlass id. > > To check whether a Klass has a narrowKlass id, we call `CompressedKlassPointers::is_in_encodable_range()`. There is a small bug that results from the confusion around "encoding range" vs "klass range". > > The **"Encoding Range"** is the range that can be encoded with the current encoding base, encoding shift, and (implicitly) the bit size of the narrowKlass, which is 32. The encoding range reaches from `[ ... + 1 << (32 + ) ).`. Its size is either 4G (shift=0) or 32G (shift=3). > > The **"Klass Range"** is the range that actually holds Klass structures. It is part of the Encoding Range, but usually much smaller: > - with zero-based encoding, the encoding base is zero, so it precedes the start of the Klass range > - The encoding range can reach far beyond the end of the Klass range. > > For a more detailed explanation, including pleasing ASCII art, please refer to `compressedKlass.hpp` in this patch. > > ---- > > The error in this case, introduced with 8338526, was that we use the range `[ ... )` for `is_in_encodable_range()`. That can lead to false positives since `` can be zero. In a highly contrived theoretical case, we could mis-classify a Klass as being encodable if it lives in metaspace outside class space, but its metaspace region happens to be located below the class space start. > > The error is extremely unlikely because: > - non-class Metaspace regions - freely allocated by mmap - typically live in high-address ranges > - we only allow zero-based encoding with Xshare:off, itself a very unusual setting nowadays > > But the error highlights misuse of the term "encoding range", so it should be fixed. > > ---- > > Note that most of the patch has been selectively copied from Lilliput. Lilliput, with its non-22-bit narrowKlass, had needed to straighten out this code a while ago, and it is better for it. > > The fix: > - we now use the actual *Klass Range* for the "is encodable" check. We don't use the encoding base. That is the real fix. > - To do that, we need to keep track of the Klass Range inside `CompressedKlassPointers`. That is simple, since we are given this range during initialization. > > Minor cleanups: > - I also removed the confusingly named `CompressedKlassPointers::range()`. This was a strange animal ( the distance between *klass range end* and *encoding range start* ) and was only used in a sing... Looks good to me! Thank you! ------------- Marked as reviewed by rkennke (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21015#pullrequestreview-2306520746 From stefank at openjdk.org Mon Sep 16 12:47:04 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 16 Sep 2024 12:47:04 GMT Subject: RFR: 8340119: Remove oopDesc::size_might_change() In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 14:12:58 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that removes `oopDesc::size_might_change()` because since JDK-8337709 and JDK-8311163 no collector uses the objArray's length field during garbage collection any more. > > Testing: tier1-3 > > Thanks, > Thomas Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20999#pullrequestreview-2306539675 From rkennke at openjdk.org Mon Sep 16 13:28:00 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 16 Sep 2024 13:28:00 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v17] In-Reply-To: References: Message-ID: <_5gI7i33xrOgXMTI_04oX9UDGwhVTtSNWoSiNfM3FOM=.b24979b3-dcde-401f-b2d8-9b201d303f57@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 54 commits: - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4 - Fix test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java - Fix loop on aarch64 - clarify obscure assert in metasapce setup - Rework compressedklass encoding - remove stray debug output - Fixes post 8338526 - Merge commit '597788850041e7272a23714c05ba546ee6080856' into JDK-8305895-v4 - Various touch-ups - Hide log timestamps in test to prevent false failures - ... and 44 more: https://git.openjdk.org/jdk/compare/54595188...2125cd81 ------------- Changes: https://git.openjdk.org/jdk/pull/20677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=16 Stats: 4598 lines in 190 files changed: 3245 ins; 719 del; 634 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Mon Sep 16 13:31:17 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 16 Sep 2024 13:31:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 13:13:01 GMT, Roman Kennke wrote: >>> @rkennke Can you please explain the changes in these tests: >>> >>> ``` >>> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java >>> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java >>> test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java >>> test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java >>> test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java >>> ``` >>> >>> You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},` >>> >>> This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. >>> >>> I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? >>> >>> Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction. >>> >>> Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well). >> >> IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible. >> >> I will re-evaluate those tests, and add comments or remove the restrictions. > >> > > @rkennke Can you please explain the changes in these tests: >> > > ``` >> > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java >> > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java >> > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java >> > > test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java >> > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java >> > > ``` >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > You added these IR rule restriction: `@IR(applyIf = {"UseCompactObjectHeaders", "false"},` >> > > This means that if `UseCompactObjectHeaders` is enabled, vectorization seems to be impacted - that could be concerning because it has a performance impact. >> > > I have recently changed a few things in SuperWord, so maybe some of them can be removed, because they now vectorize anyway? >> > > Of course some special tests may just rely on `UseCompactObjectHeaders == false` - but I would need some comments in the tests where you added it to justify why we add the restriction. >> > > Please also test this patch with the cross combinations of `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add `VerifyAlignVector` as well). >> > >> > >> > IIRC (it has been a while), the problem is that with Lilliput (and also without compact headers, but disabling compressed class-pointers -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] start at different offsets (12 and 16, respectively). That is because with compact headers, we are using the 4 bytes after the arraylength, but long-arrays cannot do that because of alignment constraints. The impact is that these tests don't work as expected, because vectorization triggers differently. I don't remember the details, TBH, but I believe they would now generate pre-loops, or some might even not vectorize at all. Those seemed to be use-cases that did not look very important, but I may be wrong. It would be nice to properly fix those tests, or make corresponding tests for compact headers, instead, or improve vectorization to better deal with the offset mismatch, if necessary/possible. >> > I will re-evaluate those tests, and add comments or remove the restrictions. >> >> If it has indeed been a while, then it might well be that some of them work now, since I did make some improvements to auto-vectorization ... > `LoadNKlass` nodes can then be expanded into more primitive operations (load and shift for compact headers, load with `klass_offset_in_bytes()` for original headers) within C2's back-end or even during code emission as sketched [here](https://github.com/robcasloz/jdk/commit/6cb4219f101e3be982264071c2cb1d0af1c6d754). @rkennke is this similar to what you tried out ("Expanding it as a macro")? No, this is not what I tried. I tried to completely expand LoadNKlass, and replace it with the lower nodes that load and shift the mark-word right there, in ideal graph. But your approach is saner: there is so much implicit knowledge about Load(N)Klass, and even klass_offset_in_bytes(), all over the place, it would be very hard to get this right without breaking something. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2352926265 From coleenp at openjdk.org Mon Sep 16 13:41:10 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 16 Sep 2024 13:41:10 GMT Subject: RFR: 8340184: Bug in CompressedKlassPointers::is_in_encodable_range In-Reply-To: References: Message-ID: <8_uszr_oNtUOK_Cefv7lmP5qhEZCEU7nl6tPhHsA7GU=.933520ad-bb6e-47f7-8add-af7a35516d52@github.com> On Mon, 16 Sep 2024 10:37:55 GMT, Thomas Stuefe wrote: > Since JDK-8338526, we keep only those Klass in the encoding range that need a narrowKlass id. > > To check whether a Klass has a narrowKlass id, we call `CompressedKlassPointers::is_in_encodable_range()`. There is a small bug that results from the confusion around "encoding range" vs "klass range". > > The **"Encoding Range"** is the range that can be encoded with the current encoding base, encoding shift, and (implicitly) the bit size of the narrowKlass, which is 32. The encoding range reaches from `[ ... + 1 << (32 + ) ).`. Its size is either 4G (shift=0) or 32G (shift=3). > > The **"Klass Range"** is the range that actually holds Klass structures. It is part of the Encoding Range, but usually much smaller: > - with zero-based encoding, the encoding base is zero, so it precedes the start of the Klass range > - The encoding range can reach far beyond the end of the Klass range. > > For a more detailed explanation, including pleasing ASCII art, please refer to `compressedKlass.hpp` in this patch. > > ---- > > The error in this case, introduced with 8338526, was that we use the range `[ ... )` for `is_in_encodable_range()`. That can lead to false positives since `` can be zero. In a highly contrived theoretical case, we could mis-classify a Klass as being encodable if it lives in metaspace outside class space, but its metaspace region happens to be located below the class space start. > > The error is extremely unlikely because: > - non-class Metaspace regions - freely allocated by mmap - typically live in high-address ranges > - we only allow zero-based encoding with Xshare:off, itself a very unusual setting nowadays > > But the error highlights misuse of the term "encoding range", so it should be fixed. > > ---- > > Note that most of the patch has been selectively copied from Lilliput. Lilliput, with its non-22-bit narrowKlass, had needed to straighten out this code a while ago, and it is better for it. > > The fix: > - we now use the actual *Klass Range* for the "is encodable" check. We don't use the encoding base. That is the real fix. > - To do that, we need to keep track of the Klass Range inside `CompressedKlassPointers`. That is simple, since we are given this range during initialization. > > Minor cleanups: > - I also removed the confusingly named `CompressedKlassPointers::range()`. This was a strange animal ( the distance between *klass range end* and *encoding range start* ) and was only used in a sing... I have some minor comments. src/hotspot/share/oops/compressedKlass.hpp line 39: > 37: // a contiguous memory range into which we place Klass that should be encodable. Not every Klass > 38: // needs to be encodable. There is only one such memory range. > 39: // If CDS is disabled, this Klass Range is the same as the class space. If CDS is enabled, the Can you say metaspace class space, or class space in metaspace? test/hotspot/gtest/oops/test_compressedKlass.cpp line 31: > 29: #include "unittest.hpp" > 30: > 31: TEST_VM(compressedKlass, basics) { We typically capitalize the test name. Does that conflict with the class name in the JVM? If so it could be CompressedKlassTest. test/hotspot/jtreg/gtest/CompressedKlassGtest.java line 2: > 1: /* > 2: * Copyright (c) 2020, 2021, Oracle and/or its affiliates. All rights reserved. I don't know why you need this. I thought the gtests just ran with TEST_VM. Also Copyright SAP is probably wrong for you now. ------------- Changes requested by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21015#pullrequestreview-2306664929 PR Review Comment: https://git.openjdk.org/jdk/pull/21015#discussion_r1761156751 PR Review Comment: https://git.openjdk.org/jdk/pull/21015#discussion_r1761166596 PR Review Comment: https://git.openjdk.org/jdk/pull/21015#discussion_r1761176362 From jwaters at openjdk.org Mon Sep 16 13:43:11 2024 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 16 Sep 2024 13:43:11 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v5] In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 13:56:42 GMT, Simon Tooke wrote: >> This PR changes the status of realpath() from a Posix-specific API to a globally available API, i.e. adding it to the "Hotspot Porting API". Code would refer to os::realpath() instead of os::Posix::realpath(). >> >> This requires the addition of a stub routine in os_posix.cpp and a Windows implementation of realpath(), using Windows _fullpath(). >> >> This PR depends on #20597 in that it removes the need for one #ifdef in that PR. Because of that, this PR will be modified when and if #20597 is integrated (or vice-versa) >> >> Please note that guidelines for doing this appear in src/hotspot/share/runtime/os.hpp > > Simon Tooke has updated the pull request incrementally with three additional commits since the last revision: > > - remove empty line > - fix indentation > - fix missing return statement Good work! ------------- Marked as reviewed by jwaters (Committer). PR Review: https://git.openjdk.org/jdk/pull/20683#pullrequestreview-2306707222 From jsjolen at openjdk.org Mon Sep 16 13:56:08 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Mon, 16 Sep 2024 13:56:08 GMT Subject: RFR: 8340184: Bug in CompressedKlassPointers::is_in_encodable_range In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 10:37:55 GMT, Thomas Stuefe wrote: > Since JDK-8338526, we keep only those Klass in the encoding range that need a narrowKlass id. > > To check whether a Klass has a narrowKlass id, we call `CompressedKlassPointers::is_in_encodable_range()`. There is a small bug that results from the confusion around "encoding range" vs "klass range". > > The **"Encoding Range"** is the range that can be encoded with the current encoding base, encoding shift, and (implicitly) the bit size of the narrowKlass, which is 32. The encoding range reaches from `[ ... + 1 << (32 + ) ).`. Its size is either 4G (shift=0) or 32G (shift=3). > > The **"Klass Range"** is the range that actually holds Klass structures. It is part of the Encoding Range, but usually much smaller: > - with zero-based encoding, the encoding base is zero, so it precedes the start of the Klass range > - The encoding range can reach far beyond the end of the Klass range. > > For a more detailed explanation, including pleasing ASCII art, please refer to `compressedKlass.hpp` in this patch. > > ---- > > The error in this case, introduced with 8338526, was that we use the range `[ ... )` for `is_in_encodable_range()`. That can lead to false positives since `` can be zero. In a highly contrived theoretical case, we could mis-classify a Klass as being encodable if it lives in metaspace outside class space, but its metaspace region happens to be located below the class space start. > > The error is extremely unlikely because: > - non-class Metaspace regions - freely allocated by mmap - typically live in high-address ranges > - we only allow zero-based encoding with Xshare:off, itself a very unusual setting nowadays > > But the error highlights misuse of the term "encoding range", so it should be fixed. > > ---- > > Note that most of the patch has been selectively copied from Lilliput. Lilliput, with its non-22-bit narrowKlass, had needed to straighten out this code a while ago, and it is better for it. > > The fix: > - we now use the actual *Klass Range* for the "is encodable" check. We don't use the encoding base. That is the real fix. > - To do that, we need to keep track of the Klass Range inside `CompressedKlassPointers`. That is simple, since we are given this range during initialization. > > Minor cleanups: > - I also removed the confusingly named `CompressedKlassPointers::range()`. This was a strange animal ( the distance between *klass range end* and *encoding range start* ) and was only used in a sing... LGTM, with some comments. src/hotspot/share/oops/compressedKlass.hpp line 117: > 115: > 116: // Start and end of the Klass Range. > 117: // Note: guaranteed to be aligned to 1<1< References: <8_uszr_oNtUOK_Cefv7lmP5qhEZCEU7nl6tPhHsA7GU=.933520ad-bb6e-47f7-8add-af7a35516d52@github.com> Message-ID: On Mon, 16 Sep 2024 13:32:34 GMT, Coleen Phillimore wrote: >> Since JDK-8338526, we keep only those Klass in the encoding range that need a narrowKlass id. >> >> To check whether a Klass has a narrowKlass id, we call `CompressedKlassPointers::is_in_encodable_range()`. There is a small bug that results from the confusion around "encoding range" vs "klass range". >> >> The **"Encoding Range"** is the range that can be encoded with the current encoding base, encoding shift, and (implicitly) the bit size of the narrowKlass, which is 32. The encoding range reaches from `[ ... + 1 << (32 + ) ).`. Its size is either 4G (shift=0) or 32G (shift=3). >> >> The **"Klass Range"** is the range that actually holds Klass structures. It is part of the Encoding Range, but usually much smaller: >> - with zero-based encoding, the encoding base is zero, so it precedes the start of the Klass range >> - The encoding range can reach far beyond the end of the Klass range. >> >> For a more detailed explanation, including pleasing ASCII art, please refer to `compressedKlass.hpp` in this patch. >> >> ---- >> >> The error in this case, introduced with 8338526, was that we use the range `[ ... )` for `is_in_encodable_range()`. That can lead to false positives since `` can be zero. In a highly contrived theoretical case, we could mis-classify a Klass as being encodable if it lives in metaspace outside class space, but its metaspace region happens to be located below the class space start. >> >> The error is extremely unlikely because: >> - non-class Metaspace regions - freely allocated by mmap - typically live in high-address ranges >> - we only allow zero-based encoding with Xshare:off, itself a very unusual setting nowadays >> >> But the error highlights misuse of the term "encoding range", so it should be fixed. >> >> ---- >> >> Note that most of the patch has been selectively copied from Lilliput. Lilliput, with its non-22-bit narrowKlass, had needed to straighten out this code a while ago, and it is better for it. >> >> The fix: >> - we now use the actual *Klass Range* for the "is encodable" check. We don't use the encoding base. That is the real fix. >> - To do that, we need to keep track of the Klass Range inside `CompressedKlassPointers`. That is simple, since we are given this range during initialization. >> >> Minor cleanups: >> - I also removed the confusingly named `CompressedKlassPointers::range()`. This was a strange animal ( the distance between *klass ran... > > test/hotspot/gtest/oops/test_compressedKlass.cpp line 31: > >> 29: #include "unittest.hpp" >> 30: >> 31: TEST_VM(compressedKlass, basics) { > > We typically capitalize the test name. Does that conflict with the class name in the JVM? If so it could be CompressedKlassTest. To pile on: I'd even prefer to `CapitalizeTestNames` also, as is recommended by gtest. I know that it's not how we typically do it in HotSpot, so this is a nit from my side. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21015#discussion_r1761207540 From coleenp at openjdk.org Mon Sep 16 13:59:05 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 16 Sep 2024 13:59:05 GMT Subject: RFR: 8340184: Bug in CompressedKlassPointers::is_in_encodable_range In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 10:37:55 GMT, Thomas Stuefe wrote: > Since JDK-8338526, we keep only those Klass in the encoding range that need a narrowKlass id. > > To check whether a Klass has a narrowKlass id, we call `CompressedKlassPointers::is_in_encodable_range()`. There is a small bug that results from the confusion around "encoding range" vs "klass range". > > The **"Encoding Range"** is the range that can be encoded with the current encoding base, encoding shift, and (implicitly) the bit size of the narrowKlass, which is 32. The encoding range reaches from `[ ... + 1 << (32 + ) ).`. Its size is either 4G (shift=0) or 32G (shift=3). > > The **"Klass Range"** is the range that actually holds Klass structures. It is part of the Encoding Range, but usually much smaller: > - with zero-based encoding, the encoding base is zero, so it precedes the start of the Klass range > - The encoding range can reach far beyond the end of the Klass range. > > For a more detailed explanation, including pleasing ASCII art, please refer to `compressedKlass.hpp` in this patch. > > ---- > > The error in this case, introduced with 8338526, was that we use the range `[ ... )` for `is_in_encodable_range()`. That can lead to false positives since `` can be zero. In a highly contrived theoretical case, we could mis-classify a Klass as being encodable if it lives in metaspace outside class space, but its metaspace region happens to be located below the class space start. > > The error is extremely unlikely because: > - non-class Metaspace regions - freely allocated by mmap - typically live in high-address ranges > - we only allow zero-based encoding with Xshare:off, itself a very unusual setting nowadays > > But the error highlights misuse of the term "encoding range", so it should be fixed. > > ---- > > Note that most of the patch has been selectively copied from Lilliput. Lilliput, with its non-22-bit narrowKlass, had needed to straighten out this code a while ago, and it is better for it. > > The fix: > - we now use the actual *Klass Range* for the "is encodable" check. We don't use the encoding base. That is the real fix. > - To do that, we need to keep track of the Klass Range inside `CompressedKlassPointers`. That is simple, since we are given this range during initialization. > > Minor cleanups: > - I also removed the confusingly named `CompressedKlassPointers::range()`. This was a strange animal ( the distance between *klass range end* and *encoding range start* ) and was only used in a sing... I do like that _klass_range_start and _klass_range_end are explicit in this change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21015#issuecomment-2352999551 From stuefe at openjdk.org Mon Sep 16 14:14:04 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 16 Sep 2024 14:14:04 GMT Subject: RFR: 8340184: Bug in CompressedKlassPointers::is_in_encodable_range In-Reply-To: <8_uszr_oNtUOK_Cefv7lmP5qhEZCEU7nl6tPhHsA7GU=.933520ad-bb6e-47f7-8add-af7a35516d52@github.com> References: <8_uszr_oNtUOK_Cefv7lmP5qhEZCEU7nl6tPhHsA7GU=.933520ad-bb6e-47f7-8add-af7a35516d52@github.com> Message-ID: <5tT_A7Dew_o2vNc4dDqKBcwFqI0In3Yo1aXr3RZGXjA=.f9cb7989-5916-4c45-a1a0-6471899c3484@github.com> On Mon, 16 Sep 2024 13:37:10 GMT, Coleen Phillimore wrote: > I don't know why you need this. I thought the gtests just ran with TEST_VM. I want to see the CompressedKlass tests succeeding with these options: -Xshare:off etc. Because that triggers zero based encoding, and that made the error appearant. Otherwise, the error was hidden, because we usually run with encoding base == start-of-CDS (aka start of Klass range). > Also Copyright SAP is probably wrong for you now. Good catch. Its completely wrong now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21015#discussion_r1761244417 From stuefe at openjdk.org Mon Sep 16 14:32:25 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 16 Sep 2024 14:32:25 GMT Subject: RFR: 8340184: Bug in CompressedKlassPointers::is_in_encodable_range [v2] In-Reply-To: References: Message-ID: > Since JDK-8338526, we keep only those Klass in the encoding range that need a narrowKlass id. > > To check whether a Klass has a narrowKlass id, we call `CompressedKlassPointers::is_in_encodable_range()`. There is a small bug that results from the confusion around "encoding range" vs "klass range". > > The **"Encoding Range"** is the range that can be encoded with the current encoding base, encoding shift, and (implicitly) the bit size of the narrowKlass, which is 32. The encoding range reaches from `[ ... + 1 << (32 + ) ).`. Its size is either 4G (shift=0) or 32G (shift=3). > > The **"Klass Range"** is the range that actually holds Klass structures. It is part of the Encoding Range, but usually much smaller: > - with zero-based encoding, the encoding base is zero, so it precedes the start of the Klass range > - The encoding range can reach far beyond the end of the Klass range. > > For a more detailed explanation, including pleasing ASCII art, please refer to `compressedKlass.hpp` in this patch. > > ---- > > The error in this case, introduced with 8338526, was that we use the range `[ ... )` for `is_in_encodable_range()`. That can lead to false positives since `` can be zero. In a highly contrived theoretical case, we could mis-classify a Klass as being encodable if it lives in metaspace outside class space, but its metaspace region happens to be located below the class space start. > > The error is extremely unlikely because: > - non-class Metaspace regions - freely allocated by mmap - typically live in high-address ranges > - we only allow zero-based encoding with Xshare:off, itself a very unusual setting nowadays > > But the error highlights misuse of the term "encoding range", so it should be fixed. > > ---- > > Note that most of the patch has been selectively copied from Lilliput. Lilliput, with its non-22-bit narrowKlass, had needed to straighten out this code a while ago, and it is better for it. > > The fix: > - we now use the actual *Klass Range* for the "is encodable" check. We don't use the encoding base. That is the real fix. > - To do that, we need to keep track of the Klass Range inside `CompressedKlassPointers`. That is simple, since we are given this range during initialization. > > Minor cleanups: > - I also removed the confusingly named `CompressedKlassPointers::range()`. This was a strange animal ( the distance between *klass range end* and *encoding range start* ) and was only used in a sing... Thomas Stuefe has updated the pull request incrementally with four additional commits since the last revision: - comment fix - comment fix missing close paranthesys - Feedback Johan - Feedback Coleen ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21015/files - new: https://git.openjdk.org/jdk/pull/21015/files/ab82126d..ca949b83 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21015&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21015&range=00-01 Stats: 10 lines in 3 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/21015.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21015/head:pull/21015 PR: https://git.openjdk.org/jdk/pull/21015 From coleenp at openjdk.org Mon Sep 16 14:32:26 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 16 Sep 2024 14:32:26 GMT Subject: RFR: 8340184: Bug in CompressedKlassPointers::is_in_encodable_range [v2] In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 14:29:38 GMT, Thomas Stuefe wrote: >> Since JDK-8338526, we keep only those Klass in the encoding range that need a narrowKlass id. >> >> To check whether a Klass has a narrowKlass id, we call `CompressedKlassPointers::is_in_encodable_range()`. There is a small bug that results from the confusion around "encoding range" vs "klass range". >> >> The **"Encoding Range"** is the range that can be encoded with the current encoding base, encoding shift, and (implicitly) the bit size of the narrowKlass, which is 32. The encoding range reaches from `[ ... + 1 << (32 + ) ).`. Its size is either 4G (shift=0) or 32G (shift=3). >> >> The **"Klass Range"** is the range that actually holds Klass structures. It is part of the Encoding Range, but usually much smaller: >> - with zero-based encoding, the encoding base is zero, so it precedes the start of the Klass range >> - The encoding range can reach far beyond the end of the Klass range. >> >> For a more detailed explanation, including pleasing ASCII art, please refer to `compressedKlass.hpp` in this patch. >> >> ---- >> >> The error in this case, introduced with 8338526, was that we use the range `[ ... )` for `is_in_encodable_range()`. That can lead to false positives since `` can be zero. In a highly contrived theoretical case, we could mis-classify a Klass as being encodable if it lives in metaspace outside class space, but its metaspace region happens to be located below the class space start. >> >> The error is extremely unlikely because: >> - non-class Metaspace regions - freely allocated by mmap - typically live in high-address ranges >> - we only allow zero-based encoding with Xshare:off, itself a very unusual setting nowadays >> >> But the error highlights misuse of the term "encoding range", so it should be fixed. >> >> ---- >> >> Note that most of the patch has been selectively copied from Lilliput. Lilliput, with its non-22-bit narrowKlass, had needed to straighten out this code a while ago, and it is better for it. >> >> The fix: >> - we now use the actual *Klass Range* for the "is encodable" check. We don't use the encoding base. That is the real fix. >> - To do that, we need to keep track of the Klass Range inside `CompressedKlassPointers`. That is simple, since we are given this range during initialization. >> >> Minor cleanups: >> - I also removed the confusingly named `CompressedKlassPointers::range()`. This was a strange animal ( the distance between *klass ran... > > Thomas Stuefe has updated the pull request incrementally with four additional commits since the last revision: > > - comment fix > - comment fix missing close paranthesys > - Feedback Johan > - Feedback Coleen Looks really good. src/hotspot/cpu/aarch64/compressedKlass_aarch64.cpp line 134: > 132: // Remember the Klass range: > 133: _klass_range_start = addr; > 134: _klass_range_end = addr + len; In the Lilliput review, I think you were moving this to shared code with an ifdef so that these variables are all set in one place. This is ok here for now. test/hotspot/jtreg/gtest/CompressedKlassGtest.java line 31: > 29: * mode, we start with CDS disabled, a small class space and a large (albeit uncommitted, to save memory) heap. The > 30: * JVM will likely place the class space in low-address territory. > 31: * (If it does not manage to do this, the test will note that and print "skipped" Ok, I see why you have this. thanks for the comment. One nit, the ) is missing in this sentence. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21015#pullrequestreview-2306833750 PR Review Comment: https://git.openjdk.org/jdk/pull/21015#discussion_r1761265199 PR Review Comment: https://git.openjdk.org/jdk/pull/21015#discussion_r1761263495 From stuefe at openjdk.org Mon Sep 16 14:32:26 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 16 Sep 2024 14:32:26 GMT Subject: RFR: 8340184: Bug in CompressedKlassPointers::is_in_encodable_range In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 13:56:07 GMT, Coleen Phillimore wrote: >> Since JDK-8338526, we keep only those Klass in the encoding range that need a narrowKlass id. >> >> To check whether a Klass has a narrowKlass id, we call `CompressedKlassPointers::is_in_encodable_range()`. There is a small bug that results from the confusion around "encoding range" vs "klass range". >> >> The **"Encoding Range"** is the range that can be encoded with the current encoding base, encoding shift, and (implicitly) the bit size of the narrowKlass, which is 32. The encoding range reaches from `[ ... + 1 << (32 + ) ).`. Its size is either 4G (shift=0) or 32G (shift=3). >> >> The **"Klass Range"** is the range that actually holds Klass structures. It is part of the Encoding Range, but usually much smaller: >> - with zero-based encoding, the encoding base is zero, so it precedes the start of the Klass range >> - The encoding range can reach far beyond the end of the Klass range. >> >> For a more detailed explanation, including pleasing ASCII art, please refer to `compressedKlass.hpp` in this patch. >> >> ---- >> >> The error in this case, introduced with 8338526, was that we use the range `[ ... )` for `is_in_encodable_range()`. That can lead to false positives since `` can be zero. In a highly contrived theoretical case, we could mis-classify a Klass as being encodable if it lives in metaspace outside class space, but its metaspace region happens to be located below the class space start. >> >> The error is extremely unlikely because: >> - non-class Metaspace regions - freely allocated by mmap - typically live in high-address ranges >> - we only allow zero-based encoding with Xshare:off, itself a very unusual setting nowadays >> >> But the error highlights misuse of the term "encoding range", so it should be fixed. >> >> ---- >> >> Note that most of the patch has been selectively copied from Lilliput. Lilliput, with its non-22-bit narrowKlass, had needed to straighten out this code a while ago, and it is better for it. >> >> The fix: >> - we now use the actual *Klass Range* for the "is encodable" check. We don't use the encoding base. That is the real fix. >> - To do that, we need to keep track of the Klass Range inside `CompressedKlassPointers`. That is simple, since we are given this range during initialization. >> >> Minor cleanups: >> - I also removed the confusingly named `CompressedKlassPointers::range()`. This was a strange animal ( the distance between *klass ran... > > I do like that _klass_range_start and _klass_range_end are explicit in this change. Thanks @coleenp and @jdksjolen . I changed the name of the gtest to uppercase, and touched up some of the comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21015#issuecomment-2353076510 From stuefe at openjdk.org Mon Sep 16 14:32:26 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 16 Sep 2024 14:32:26 GMT Subject: RFR: 8340184: Bug in CompressedKlassPointers::is_in_encodable_range [v2] In-Reply-To: References: Message-ID: <_zMRQI15oBre9EnN718oqn6i6AidClL0VMUvLj0KL-s=.69c03b94-6cb8-470b-8498-112cf416dc45@github.com> On Mon, 16 Sep 2024 14:23:16 GMT, Coleen Phillimore wrote: >> Thomas Stuefe has updated the pull request incrementally with four additional commits since the last revision: >> >> - comment fix >> - comment fix missing close paranthesys >> - Feedback Johan >> - Feedback Coleen > > src/hotspot/cpu/aarch64/compressedKlass_aarch64.cpp line 134: > >> 132: // Remember the Klass range: >> 133: _klass_range_start = addr; >> 134: _klass_range_end = addr + len; > > In the Lilliput review, I think you were moving this to shared code with an ifdef so that these variables are all set in one place. This is ok here for now. Yes I did this. I like the Lilliput variant better too. > test/hotspot/jtreg/gtest/CompressedKlassGtest.java line 31: > >> 29: * mode, we start with CDS disabled, a small class space and a large (albeit uncommitted, to save memory) heap. The >> 30: * JVM will likely place the class space in low-address territory. >> 31: * (If it does not manage to do this, the test will note that and print "skipped" > > Ok, I see why you have this. thanks for the comment. One nit, the ) is missing in this sentence. Fixed :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21015#discussion_r1761267520 PR Review Comment: https://git.openjdk.org/jdk/pull/21015#discussion_r1761273994 From stuefe at openjdk.org Mon Sep 16 14:32:26 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 16 Sep 2024 14:32:26 GMT Subject: RFR: 8340184: Bug in CompressedKlassPointers::is_in_encodable_range [v2] In-Reply-To: References: Message-ID: <-FcsPQievoF6bU_Y8jaxcyhmvhp9LjQlH5eBneDuIQ4=.1747c3e7-aa50-4376-8461-11d69a0fbbf8@github.com> On Mon, 16 Sep 2024 14:29:38 GMT, Thomas Stuefe wrote: >> Since JDK-8338526, we keep only those Klass in the encoding range that need a narrowKlass id. >> >> To check whether a Klass has a narrowKlass id, we call `CompressedKlassPointers::is_in_encodable_range()`. There is a small bug that results from the confusion around "encoding range" vs "klass range". >> >> The **"Encoding Range"** is the range that can be encoded with the current encoding base, encoding shift, and (implicitly) the bit size of the narrowKlass, which is 32. The encoding range reaches from `[ ... + 1 << (32 + ) ).`. Its size is either 4G (shift=0) or 32G (shift=3). >> >> The **"Klass Range"** is the range that actually holds Klass structures. It is part of the Encoding Range, but usually much smaller: >> - with zero-based encoding, the encoding base is zero, so it precedes the start of the Klass range >> - The encoding range can reach far beyond the end of the Klass range. >> >> For a more detailed explanation, including pleasing ASCII art, please refer to `compressedKlass.hpp` in this patch. >> >> ---- >> >> The error in this case, introduced with 8338526, was that we use the range `[ ... )` for `is_in_encodable_range()`. That can lead to false positives since `` can be zero. In a highly contrived theoretical case, we could mis-classify a Klass as being encodable if it lives in metaspace outside class space, but its metaspace region happens to be located below the class space start. >> >> The error is extremely unlikely because: >> - non-class Metaspace regions - freely allocated by mmap - typically live in high-address ranges >> - we only allow zero-based encoding with Xshare:off, itself a very unusual setting nowadays >> >> But the error highlights misuse of the term "encoding range", so it should be fixed. >> >> ---- >> >> Note that most of the patch has been selectively copied from Lilliput. Lilliput, with its non-22-bit narrowKlass, had needed to straighten out this code a while ago, and it is better for it. >> >> The fix: >> - we now use the actual *Klass Range* for the "is encodable" check. We don't use the encoding base. That is the real fix. >> - To do that, we need to keep track of the Klass Range inside `CompressedKlassPointers`. That is simple, since we are given this range during initialization. >> >> Minor cleanups: >> - I also removed the confusingly named `CompressedKlassPointers::range()`. This was a strange animal ( the distance between *klass ran... > > Thomas Stuefe has updated the pull request incrementally with four additional commits since the last revision: > > - comment fix > - comment fix missing close paranthesys > - Feedback Johan > - Feedback Coleen src/hotspot/share/oops/compressedKlass.hpp line 96: > 94: // 0x8_0000_0000 0x8_4800_0000 0x9_0000_0000 > 95: // > 96: Note that I like this comment better than the one we have in Lilliput, since its more concise; I will use this in Lilliput too (with some minor additions for 22bit narrowKlass). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21015#discussion_r1761271267 From stuefe at openjdk.org Mon Sep 16 14:32:26 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 16 Sep 2024 14:32:26 GMT Subject: RFR: 8340184: Bug in CompressedKlassPointers::is_in_encodable_range [v2] In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 13:49:50 GMT, Johan Sj?len wrote: >> Thomas Stuefe has updated the pull request incrementally with four additional commits since the last revision: >> >> - comment fix >> - comment fix missing close paranthesys >> - Feedback Johan >> - Feedback Coleen > > src/hotspot/share/oops/compressedKlass.hpp line 117: > >> 115: >> 116: // Start and end of the Klass Range. >> 117: // Note: guaranteed to be aligned to 1< >>1< > This is confusing to me, you mean `1 << klass_alignment_in_bytes`, right? No, its correct. "1 << shift" is klass alignment, in bytes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21015#discussion_r1761263668 From stuefe at openjdk.org Mon Sep 16 14:35:05 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 16 Sep 2024 14:35:05 GMT Subject: RFR: 8340184: Bug in CompressedKlassPointers::is_in_encodable_range In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 13:56:07 GMT, Coleen Phillimore wrote: >> Since JDK-8338526, we keep only those Klass in the encoding range that need a narrowKlass id. >> >> To check whether a Klass has a narrowKlass id, we call `CompressedKlassPointers::is_in_encodable_range()`. There is a small bug that results from the confusion around "encoding range" vs "klass range". >> >> The **"Encoding Range"** is the range that can be encoded with the current encoding base, encoding shift, and (implicitly) the bit size of the narrowKlass, which is 32. The encoding range reaches from `[ ... + 1 << (32 + ) ).`. Its size is either 4G (shift=0) or 32G (shift=3). >> >> The **"Klass Range"** is the range that actually holds Klass structures. It is part of the Encoding Range, but usually much smaller: >> - with zero-based encoding, the encoding base is zero, so it precedes the start of the Klass range >> - The encoding range can reach far beyond the end of the Klass range. >> >> For a more detailed explanation, including pleasing ASCII art, please refer to `compressedKlass.hpp` in this patch. >> >> ---- >> >> The error in this case, introduced with 8338526, was that we use the range `[ ... )` for `is_in_encodable_range()`. That can lead to false positives since `` can be zero. In a highly contrived theoretical case, we could mis-classify a Klass as being encodable if it lives in metaspace outside class space, but its metaspace region happens to be located below the class space start. >> >> The error is extremely unlikely because: >> - non-class Metaspace regions - freely allocated by mmap - typically live in high-address ranges >> - we only allow zero-based encoding with Xshare:off, itself a very unusual setting nowadays >> >> But the error highlights misuse of the term "encoding range", so it should be fixed. >> >> ---- >> >> Note that most of the patch has been selectively copied from Lilliput. Lilliput, with its non-22-bit narrowKlass, had needed to straighten out this code a while ago, and it is better for it. >> >> The fix: >> - we now use the actual *Klass Range* for the "is encodable" check. We don't use the encoding base. That is the real fix. >> - To do that, we need to keep track of the Klass Range inside `CompressedKlassPointers`. That is simple, since we are given this range during initialization. >> >> Minor cleanups: >> - I also removed the confusingly named `CompressedKlassPointers::range()`. This was a strange animal ( the distance between *klass ran... > > I do like that _klass_range_start and _klass_range_end are explicit in this change. Many thanks, @coleenp and @rkennke and @jdksjolen, for the speedy review. I will wait the obligatory 24hrs and do some more tests, then I will push. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21015#issuecomment-2353099580 From lmesnik at openjdk.org Mon Sep 16 15:39:04 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 16 Sep 2024 15:39:04 GMT Subject: RFR: 8336874: WhiteBoxAPI: assert(!method->is_abstract() && (osr_bci == InvocationEntryBci || !method->is_native())) failed: cannot compile abstract/native methods In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 19:47:47 GMT, Sonia Zaldana Calles wrote: > Hi all, > > This PR addresses [8336874](https://bugs.openjdk.org/browse/JDK-8336874) ensuring enqueuing an abstract method for compilation doesn't hit an assert with WhiteBox. > > Testing: > - [x] Added test case passes. > > Thanks, > Sonia The overall fix looks good, see small test suggestion in comment. test/hotspot/jtreg/compiler/whitebox/TestCompileAbstractMethod.java line 48: > 46: assert m1 != null; > 47: WHITE_BOX.enqueueMethodForCompilation(m1, CompilerWhiteBoxTest.COMP_LEVEL_FULL_OPTIMIZATION); > 48: if (WHITE_BOX.isMethodCompiled(m1)) { I think it is better to check that 'enqueueMethodForCompilation' returns false for abstract method additionally. The isMethodCompiled should always returns true even if WB try to push abstract method but it is rejected on later steps. ------------- Marked as reviewed by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20973#pullrequestreview-2307026060 PR Review Comment: https://git.openjdk.org/jdk/pull/20973#discussion_r1761390168 From kvn at openjdk.org Mon Sep 16 15:51:15 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 16 Sep 2024 15:51:15 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v20] In-Reply-To: References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Mon, 16 Sep 2024 09:28:30 GMT, Roberto Casta?eda Lozano wrote: > Should I add a check to PhaseCFG::implicit_null_check to discard these memory accesses more explicitly? Yes, please. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1761413544 From coleenp at openjdk.org Mon Sep 16 15:57:05 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 16 Sep 2024 15:57:05 GMT Subject: RFR: 8340184: Bug in CompressedKlassPointers::is_in_encodable_range [v2] In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 14:32:25 GMT, Thomas Stuefe wrote: >> Since JDK-8338526, we keep only those Klass in the encoding range that need a narrowKlass id. >> >> To check whether a Klass has a narrowKlass id, we call `CompressedKlassPointers::is_in_encodable_range()`. There is a small bug that results from the confusion around "encoding range" vs "klass range". >> >> The **"Encoding Range"** is the range that can be encoded with the current encoding base, encoding shift, and (implicitly) the bit size of the narrowKlass, which is 32. The encoding range reaches from `[ ... + 1 << (32 + ) ).`. Its size is either 4G (shift=0) or 32G (shift=3). >> >> The **"Klass Range"** is the range that actually holds Klass structures. It is part of the Encoding Range, but usually much smaller: >> - with zero-based encoding, the encoding base is zero, so it precedes the start of the Klass range >> - The encoding range can reach far beyond the end of the Klass range. >> >> For a more detailed explanation, including pleasing ASCII art, please refer to `compressedKlass.hpp` in this patch. >> >> ---- >> >> The error in this case, introduced with 8338526, was that we use the range `[ ... )` for `is_in_encodable_range()`. That can lead to false positives since `` can be zero. In a highly contrived theoretical case, we could mis-classify a Klass as being encodable if it lives in metaspace outside class space, but its metaspace region happens to be located below the class space start. >> >> The error is extremely unlikely because: >> - non-class Metaspace regions - freely allocated by mmap - typically live in high-address ranges >> - we only allow zero-based encoding with Xshare:off, itself a very unusual setting nowadays >> >> But the error highlights misuse of the term "encoding range", so it should be fixed. >> >> ---- >> >> Note that most of the patch has been selectively copied from Lilliput. Lilliput, with its non-22-bit narrowKlass, had needed to straighten out this code a while ago, and it is better for it. >> >> The fix: >> - we now use the actual *Klass Range* for the "is encodable" check. We don't use the encoding base. That is the real fix. >> - To do that, we need to keep track of the Klass Range inside `CompressedKlassPointers`. That is simple, since we are given this range during initialization. >> >> Minor cleanups: >> - I also removed the confusingly named `CompressedKlassPointers::range()`. This was a strange animal ( the distance between *klass ran... > > Thomas Stuefe has updated the pull request incrementally with four additional commits since the last revision: > > - comment fix > - comment fix missing close paranthesys > - Feedback Johan > - Feedback Coleen Unblocking approval. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21015#pullrequestreview-2307077066 From aboldtch at openjdk.org Mon Sep 16 16:21:20 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 16 Sep 2024 16:21:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v17] In-Reply-To: <_5gI7i33xrOgXMTI_04oX9UDGwhVTtSNWoSiNfM3FOM=.b24979b3-dcde-401f-b2d8-9b201d303f57@github.com> References: <_5gI7i33xrOgXMTI_04oX9UDGwhVTtSNWoSiNfM3FOM=.b24979b3-dcde-401f-b2d8-9b201d303f57@github.com> Message-ID: On Mon, 16 Sep 2024 13:28:00 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 54 commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4 > - Fix test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java > - Fix loop on aarch64 > - clarify obscure assert in metasapce setup > - Rework compressedklass encoding > - remove stray debug output > - Fixes post 8338526 > - Merge commit '597788850041e7272a23714c05ba546ee6080856' into JDK-8305895-v4 > - Various touch-ups > - Hide log timestamps in test to prevent false failures > - ... and 44 more: https://git.openjdk.org/jdk/compare/54595188...2125cd81 src/hotspot/cpu/aarch64/aarch64.ad line 6459: > 6457: format %{ "ldrw $dst, $mem\t# compressed class ptr" %} > 6458: ins_encode %{ > 6459: __ load_nklass_compact_c2($dst$$Register, $mem$$base$$Register, $mem$$index$$Register, $mem$$scale, $mem$$disp); I wonder if something along the line of this is required here. Suggestion: Address addr = mem2address($mem->opcode(), $mem$$base$$Register, $mem$$index, $mem$$scale, $mem$$disp); __ load_nklass_compact_c2($dst$$Register, __ adjust_compact_object_header_address_c2(addr, rscratch1)); With `adjust_compact_object_header_address_c2` being: ```C++ Address C2_MacroAssembler::adjust_compact_object_header_address_c2(Address addr, Register tmp) { // The incoming address is pointing into obj-start + klass_offset_in_bytes. We need to extract // obj-start, so that we can load from the object's mark-word instead. Usually the address // comes as obj-start in addr.base() and klass_offset_in_bytes in addr.offset(). if (addr.getMode() != Address::base_plus_offset) { lea(tmp, addr); addr = Address(tmp, -oopDesc::klass_offset_in_bytes()); } else { addr = Address(addr.base(), addr.offset() - oopDesc::klass_offset_in_bytes()); } return legitimize_address(addr, 8, tmp); } Maybe it is the case that we never get the case where `$mem->opcode()` is not `lsl` variant, nor that the offset is to far away for an immediate fixed by `legitimize_address`. But it seems like this would at least make those cases correct, while avoiding the `lea` in the common case. Maybe someone with better experience in aarch64 macroassembler+ad files and C2 can give an opinion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1761455581 From pchilanomate at openjdk.org Mon Sep 16 16:32:24 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 16 Sep 2024 16:32:24 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v2] In-Reply-To: <7GRqYZv0orxGXAMb7ldeV5skNK7FraE-ovnhwlMDKNg=.0018fbb2-36ed-4870-80f6-6aaf026cd9f2@github.com> References: <7GRqYZv0orxGXAMb7ldeV5skNK7FraE-ovnhwlMDKNg=.0018fbb2-36ed-4870-80f6-6aaf026cd9f2@github.com> Message-ID: On Fri, 13 Sep 2024 13:19:26 GMT, Fredrik Bredberg wrote: >> Removed the concept of an ObjectMonitor Responsible thread. >> >> The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. >> >> The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). >> >> After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. >> >> Passes tier1-tier7 on supported platforms. >> x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. >> Arm32 and Zero doesn't need any changes as far as I can tell. > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Update one, after the review Changes look good to me, but I have some comments. Thanks src/hotspot/share/runtime/objectMonitor.cpp line 358: > 356: void ObjectMonitor::enter_for_with_contention_mark(JavaThread* locking_thread, ObjectMonitorContentionMark& contention_mark) { > 357: // Used by LightweightSynchronizer::inflate_and_enter in deoptimization path to enter for another thread. > 358: bool success = ObjectMonitor::TryLock_with_contention_mark(locking_thread, contention_mark); No need to use qualified name. src/hotspot/share/runtime/objectMonitor.cpp line 376: > 374: } > 375: > 376: bool success = ObjectMonitor::TryLock_with_contention_mark(locking_thread, contention_mark); No need to use qualified name. src/hotspot/share/runtime/objectMonitor.cpp line 1267: > 1265: return; > 1266: } > 1267: } Can't we replace all this code for a call to TryLock? src/hotspot/share/runtime/objectMonitor.hpp line 360: > 358: > 359: enum class TryLockResult { Interference = -1, HasOwner = 0, Success = 1 }; > 360: TryLockResult TryLock(JavaThread* current); This CamelCase syntax is used for private methods. We should change it to try_lock now that we are calling it from SharedRuntime code. Another alternative is to keep it private and just use the already available try_enter(). That has the benefit of not having to make TryLockResult public either. If we want to skip the checks after TryLock in try_enter we could add a check_owner_already boolean. src/hotspot/share/runtime/objectMonitor.hpp line 376: > 374: ObjectWaiter* DequeueWaiter(); > 375: void DequeueSpecificWaiter(ObjectWaiter* waiter); > 376: bool TryLock_with_contention_mark(JavaThread* locking_thread, ObjectMonitorContentionMark& contention_mark); Following the existing style this should be TryLockWithContentionMark. src/hotspot/share/runtime/sharedRuntime.cpp line 1973: > 1971: // Some other thread acquired the lock (or the monitor was > 1972: // deflated). Either way we are done. > 1973: current->inc_held_monitor_count(-1); We can just use dec_held_monitor_count(). ------------- PR Review: https://git.openjdk.org/jdk/pull/19454#pullrequestreview-2306836866 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1761435185 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1761435388 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1761438126 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1761422614 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1761426498 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1761265517 From rcastanedalo at openjdk.org Mon Sep 16 16:34:59 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 16 Sep 2024 16:34:59 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v21] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with three additional commits since the last revision: - Add missing IR test to test run - Skip barrier refining for non-OOP stores and stores without barrier data - Assert that m is input to n in Matcher::is_encode_and_store_pattern ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/141020e6..653f9acf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=19-20 Stats: 21 lines in 3 files changed: 16 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From sgibbons at openjdk.org Mon Sep 16 16:36:11 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Mon, 16 Sep 2024 16:36:11 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v6] In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 22:36:45 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm >> >> Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup >> -- | -- | -- | -- >> MathBench.tanhDouble | 70900 | 95618 | 1.35x > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Update test/jdk/java/lang/Math/HyperbolicTests.java > > Co-authored-by: Andrey Turbanov I hand-verified the code. ------------- Marked as reviewed by sgibbons (Committer). PR Review: https://git.openjdk.org/jdk/pull/20657#pullrequestreview-2307159316 From rcastanedalo at openjdk.org Mon Sep 16 16:37:32 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 16 Sep 2024 16:37:32 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v20] In-Reply-To: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Fri, 13 Sep 2024 22:51:07 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix a few style issues > > src/hotspot/share/opto/matcher.cpp line 2845: > >> 2843: n->Opcode() == Op_StoreN && >> 2844: m->is_EncodeP(); >> 2845: } > > Add comment that `m` is input of `n`. I thought about adding assert too but I will leave it to you. Added the assertion (commit a480d70b). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1761478462 From psandoz at openjdk.org Mon Sep 16 16:47:11 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 16 Sep 2024 16:47:11 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v2] In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Fri, 13 Sep 2024 22:30:36 GMT, Sandhya Viswanathan wrote: >> Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. >> >> Summary of changes is as follows: >> 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. >> 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code >> >> For the following source: >> >> >> public void test() { >> var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); >> for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { >> var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); >> index.selectFrom(inpvect).intoArray(byteres, j); >> } >> } >> >> >> The code generated for inner main now looks as follows: >> ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 >> 0x00007f40d02274d0: movslq %ebx,%r13 >> 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) >> 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) >> 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) >> 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) >> 0x00007f40d022751f: add $0x40,%ebx >> 0x00007f40d0227522: cmp %r8d,%ebx >> 0x00007f40d0227525: jl 0x00007f40d02274d0 >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Java changes are good (I created a CSR). The approach in HotSpot looks good to me, but need HotSpot reviewers. ------------- Marked as reviewed by psandoz (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20634#pullrequestreview-2307180561 From mdoerr at openjdk.org Mon Sep 16 16:49:07 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 16 Sep 2024 16:49:07 GMT Subject: RFR: 8339416: [s390x] Provide implementation for resolve_global_jobject [v2] In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 04:56:36 GMT, Amit Kumar wrote: >> This PR provides "resolve_global_jobject" method implementation for s390x-port. >> >> Testing: >> * Tier1 test with Fastdebug; >> * Added these changes on top of https://github.com/openjdk/jdk/pull/20479 and modified the call in the stubGenerator_s390.cpp file; >> * 1. Ran tier1 test with a call to "resolve_jobect" >> * 2. Ran tier1 test with a call to "resolve_global_jobject" >> >> I didn't see any new failure appearing there. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > removes extra line Looks correct to me. src/hotspot/cpu/s390/gc/shared/barrierSetAssembler_s390.cpp line 108: > 106: } > 107: > 108: // Generic implementation. GCs can provide an optimized one. You may want to implement an optimized `G1BarrierSetAssembler::resolve_jobject` and `ModRefBarrierSetAssembler::resolve_jobject`. Otherwise, those GCs may get a regression. src/hotspot/cpu/s390/gc/shared/barrierSetAssembler_s390.cpp line 146: > 144: > 145: __ z_ltgr(value, value); > 146: __ z_bre(done); // null as-is. "Use null as-is." sounds better. ------------- PR Review: https://git.openjdk.org/jdk/pull/20986#pullrequestreview-2307174155 PR Review Comment: https://git.openjdk.org/jdk/pull/20986#discussion_r1761487260 PR Review Comment: https://git.openjdk.org/jdk/pull/20986#discussion_r1761487962 From rcastanedalo at openjdk.org Mon Sep 16 16:49:25 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 16 Sep 2024 16:49:25 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v20] In-Reply-To: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Fri, 13 Sep 2024 23:14:19 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix a few style issues > > src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 241: > >> 239: assert(newval_bottom->isa_ptr() || newval_bottom->isa_narrowoop(), "newval should be an OOP"); >> 240: TypePtr::PTR newval_type = newval_bottom->make_ptr()->ptr(); >> 241: uint8_t barrier_data = store->barrier_data(); > > Should you check barrier data for 0? > `is_ptr()` has wide set of types. It includes TypeRawPtr, TypeKlassPtr and TypeMetadataPtr. Where you filtering them? I added the check and excluded other pointers than OOPs, narrow OOPs, and null pointers (needed because null in uncompressed OOP mode is typed as `AnyPtr`) in commit 10bc0d2c. Note that these checks are not strictly required for correctness, because for all other pointers the corresponding barrier data would be 0, and the only potential operations over it would be bit clearing. But I still think they have value in that they communicate more clearly the intent and scope of the optimization. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1761494258 From coleenp at openjdk.org Mon Sep 16 16:57:12 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 16 Sep 2024 16:57:12 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v2] In-Reply-To: <7GRqYZv0orxGXAMb7ldeV5skNK7FraE-ovnhwlMDKNg=.0018fbb2-36ed-4870-80f6-6aaf026cd9f2@github.com> References: <7GRqYZv0orxGXAMb7ldeV5skNK7FraE-ovnhwlMDKNg=.0018fbb2-36ed-4870-80f6-6aaf026cd9f2@github.com> Message-ID: On Fri, 13 Sep 2024 13:19:26 GMT, Fredrik Bredberg wrote: >> Removed the concept of an ObjectMonitor Responsible thread. >> >> The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. >> >> The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). >> >> After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. >> >> Passes tier1-tier7 on supported platforms. >> x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. >> Arm32 and Zero doesn't need any changes as far as I can tell. > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Update one, after the review I have a few more comments and questions. Thank you! I was wondering if we could replace TryLock with try_lock, and other camel case renames in a new/future patch to further clean this up. src/hotspot/share/runtime/objectMonitor.cpp line 899: > 897: } > 898: > 899: if (try_set_owner_from(DEFLATER_MARKER, current) == DEFLATER_MARKER) { This is nice. iirc you don't need this because TryLock cancels deflation now. Did I get this right? src/hotspot/share/runtime/objectMonitor.cpp line 920: > 918: // To that end, the exit() operation must have at least STST|LDST > 919: // "release" barrier semantics. Specifically, there must be at least a > 920: // STST|LDST barrier in exit() before the ST of null into _owner that drops This sentence: Is the membar before or after the ST that drops the lock? src/hotspot/share/runtime/objectMonitor.cpp line 1224: > 1222: // falls to the new owner. > 1223: // > 1224: void* owner = try_set_owner_from(nullptr, current); Is this the same code as TryLock now? Except a little different... Could this call TryLock and return if the lock becomes owned by another thread, like in SharedRuntime::monitor_exit_helper() ? ------------- PR Review: https://git.openjdk.org/jdk/pull/19454#pullrequestreview-2303457835 PR Comment: https://git.openjdk.org/jdk/pull/19454#issuecomment-2353438681 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1759098067 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1759085173 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1759103684 From coleenp at openjdk.org Mon Sep 16 16:57:13 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 16 Sep 2024 16:57:13 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v2] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 13:11:32 GMT, Fredrik Bredberg wrote: >> src/hotspot/share/runtime/objectMonitor.cpp line 588: >> >>> 586: } else { >>> 587: // The lock had been free momentarily, but we lost the race to the lock. >>> 588: own = prev_own; >> >> So this retries now and doesn't break. Is it because it could be the DEFLATER_MARKER ? > > It could be the deflator (or someone else). Anyhow, we will retry. Now that I'm reading the code more slowly, I see that the reason for this loop is the deflator. Can you add this to this comment, so I can read this faster next time :) // The lock had been free momentarily, but we lost the race to the lock. Retry in case the lock was acquired by the deflator. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1761449963 From sviswanathan at openjdk.org Mon Sep 16 17:02:09 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 16 Sep 2024 17:02:09 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: <0QUwAu8wCrqU-BSbINCiBATZje4xib3rLEZKgG9mHhE=.fed2bc28-b4c3-417d-b4d6-3b5ce1e34c67@github.com> On Fri, 13 Sep 2024 19:45:11 GMT, Paul Sandoz wrote: >>> Given `rearrange` with 1 vector gets wrapping indices semantics. I think we should stop normalizing indices when converting a `Vector` into a `VectorShuffle` (currently we wrap all out-of-bound elements to `[-VLEN, 0)`). Then the rearrange with 2 vectors will also wrap similarly (all indices are `& (VLEN * 2 - 1)`, then indices `[0, VLEN)` maps to the first vector and indices `[VLEN, 2 * VLEN)` map to the second vector). We will normalize the indices when we invoke `VectorShuffle::toVector` which I think is much less used than `Vector::toShuffle`. What do you think? >> >> The guidance from Paul Sandoz and John Rose is to keep the the partial wrapping at shuffle construction as is for now and only change the rearrange and selectFrom apis. > >> >> The guidance from Paul Sandoz and John Rose is to keep the the partial wrapping at shuffle construction as is for now and only change the rearrange and selectFrom apis. > > Yes, we are trying to take smaller incremental steps. Once the we are done with this work we can step back and discuss/review what to do about shuffles. @PaulSandoz Thanks a lot for the review and the CSR. I will look forward to Hotspot review and CSR progress/approval. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2353449454 From psandoz at openjdk.org Mon Sep 16 17:08:17 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 16 Sep 2024 17:08:17 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v12] In-Reply-To: References: Message-ID: On Sat, 14 Sep 2024 08:40:44 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Update AARCH64 specific test using UNSIGNED_* comparison operators. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java line 573: > 571: * @see VectorMath#addSaturating(int, int) > 572: */ > 573: public static final Associative SADD = assoc("SADD", "+", VectorSupport.VECTOR_OP_SADD, VO_NOFP); Change from type `Associative` to `Binary` for `SADD` and `SUADD`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1761527956 From duke at openjdk.org Mon Sep 16 17:11:05 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Mon, 16 Sep 2024 17:11:05 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v6] In-Reply-To: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: > Hello, > > Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. > > The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. > > At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. > > # Performance > > ## Neoverse N1 > > > -------------------------------------------------------------------------------------------- > Version Baseline This patch > -------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > -------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op > ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op > ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op > ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op > ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op > ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op > ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op > ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op > ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op > ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op > ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op > ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op > ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op > ArraysHashCode.multibytes 10 avgt 15 5.4... Mikhail Ablakatov has updated the pull request incrementally with three additional commits since the last revision: - Optimize both the stub and inlined parts of the implementation Process T_CHAR/T_SHORT elements using T8H arrangement instead of T4H. Add a non-unrolled vectorized loop to the stub to handle vectorizable tail portions of arrays multiple to 4/8 elements (for ints / other types). Make the stub process array as a whole instead of relying on the inlined part to process an unvectorizable tail. - cleanup: add comments and simplify the orr ins - cleanup: remove redundant copyright notice ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18487/files - new: https://git.openjdk.org/jdk/pull/18487/files/eb9708c9..bfa93695 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=04-05 Stats: 325 lines in 4 files changed: 226 ins; 36 del; 63 mod Patch: https://git.openjdk.org/jdk/pull/18487.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18487/head:pull/18487 PR: https://git.openjdk.org/jdk/pull/18487 From matsaave at openjdk.org Mon Sep 16 17:23:07 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Mon, 16 Sep 2024 17:23:07 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v4] In-Reply-To: <4Gj8ZTCGlUStSrs7YIGVKbPrHlCXEu2ujjHplNSNSEo=.65a5fac4-c7e1-4646-8ee4-a85834dd691b@github.com> References: <4Gj8ZTCGlUStSrs7YIGVKbPrHlCXEu2ujjHplNSNSEo=.65a5fac4-c7e1-4646-8ee4-a85834dd691b@github.com> Message-ID: On Sat, 14 Sep 2024 08:09:04 GMT, David Holmes wrote: > A refactoring should not change behaviour. Maybe this only looked like an opportunity for refactoring but in reality is not? The change of behavior here is there will no longer be any potential crashes resulting from `get_new_method()` returning a nullptr. The callers no longer have to worry about throwing NSME because it is taken care of inside the function. Before this patch, the callers either check for is_old() or expect the new method to never be null. The goal here is to take care of all that inside the method while maintaining functionality. Also Dean opened a new issue here with regards to ciMethod::equals [JDK-8340141](https://bugs.openjdk.org/browse/JDK-8340141) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20874#issuecomment-2353489867 From mikael at openjdk.org Mon Sep 16 17:30:21 2024 From: mikael at openjdk.org (Mikael Vidstedt) Date: Mon, 16 Sep 2024 17:30:21 GMT Subject: RFR: 8329816: Add SLEEF version 3.6.1 [v6] In-Reply-To: References: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> Message-ID: On Mon, 16 Sep 2024 10:30:46 GMT, Magnus Ihse Bursie wrote: >> [JDK-8312425](https://bugs.openjdk.org/browse/JDK-8312425) is looking to optimize vector math operations by leveraging the SLEEF library. For legal reasons the actual contribution of the SLEEF files needs to be handled separately. >> >> This is a new attempt at solving [JDK-8329816](https://bugs.openjdk.org/browse/JDK-8329816); the original attempt is here: https://github.com/openjdk/jdk/pull/19185. This PR is based on the discussions on how to move forward that was held in that original PR. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Update legal/sleef.md make/UpdateSleefSource.gmk line 48: > 46: > 47: ifeq ($(CMAKE), ) > 48: $(error CMake not found. Please install cmake and rerun confugure) typo -> configure make/UpdateSleefSource.gmk line 132: > 130: $(CROSS_COMPILATION_SRC_FILES): $(sleef_cross_build) > 131: > 132: # Finally, copy the generated files (and one needed static files) into our files -> file (or drop "one") src/jdk.incubator.vector/linux/native/libsleef/README.md line 28: > 26: To update the version of libsleef that is used in the JDK, clone > 27: `https://github.com/shibatch/sleef.git`, and copy all files, except the `docs` > 28: and `.github` directories, into And, for completeness, `.git` src/jdk.incubator.vector/linux/native/libsleef/README.md line 49: > 47: > 48: Now, you can repeat this for the next platform. For instance, you can > 49: create a separate profile using `configure --with-conf-name=riscv64` and then profile -> configuration? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20781#discussion_r1761517507 PR Review Comment: https://git.openjdk.org/jdk/pull/20781#discussion_r1761522889 PR Review Comment: https://git.openjdk.org/jdk/pull/20781#discussion_r1761563950 PR Review Comment: https://git.openjdk.org/jdk/pull/20781#discussion_r1761530302 From duke at openjdk.org Mon Sep 16 17:53:13 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Mon, 16 Sep 2024 17:53:13 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v5] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Tue, 27 Aug 2024 16:22:31 GMT, Andrew Haley wrote: >> Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: >> >> cleanup: use a constexpr function for intpow instead of a templated class > > This is what I'm seeing now. Scorching fast with large blocks, poor with smaller ones. > > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 0.532 ? 0.036 ns/op > ArraysHashCode.bytes 2 avgt 5 0.812 ? 0.011 ns/op > ArraysHashCode.bytes 4 avgt 5 1.104 ? 0.020 ns/op > ArraysHashCode.bytes 8 avgt 5 2.136 ? 0.032 ns/op > ArraysHashCode.bytes 12 avgt 5 3.596 ? 0.061 ns/op > ArraysHashCode.bytes 16 avgt 5 5.278 ? 0.240 ns/op > ArraysHashCode.bytes 20 avgt 5 7.390 ? 0.043 ns/op > ArraysHashCode.bytes 24 avgt 5 9.606 ? 0.059 ns/op > ArraysHashCode.bytes 28 avgt 5 12.144 ? 0.064 ns/op > ArraysHashCode.bytes 32 avgt 5 3.898 ? 0.096 ns/op > ArraysHashCode.bytes 36 avgt 5 4.468 ? 0.113 ns/op > ArraysHashCode.bytes 40 avgt 5 4.481 ? 0.082 ns/op > ArraysHashCode.bytes 44 avgt 5 5.143 ? 0.060 ns/op > ArraysHashCode.bytes 48 avgt 5 6.727 ? 0.103 ns/op > ArraysHashCode.bytes 52 avgt 5 8.844 ? 0.029 ns/op > ArraysHashCode.bytes 56 avgt 5 11.108 ? 0.108 ns/op > ArraysHashCode.bytes 60 avgt 5 13.864 ? 0.071 ns/op > ArraysHashCode.bytes 64 avgt 5 5.796 ? 0.146 ns/op Hi @theRealAph , I've updated the implementation so that arrays with 8 or more elements are now handled by the Neon stub. You can find a performance comparison below. There are significant performance improvements for relatively short arrays, from 16 elements long and above. To keep the change concise, I chose not to introduce new stubs for handling special cases like arrays that are 8-15 elements long. Adding the code you referenced in the quote below to the inlined intrinsic would significantly increase code size of the inlined portion so it was kept as is. > - Maybe replace the serial tail-handling iteration with the 4-wide vectorized version which you presented earlier. While I was at it, I also noticed that we can handle `short`/`char` arrays using `T8H` arrangement instead of `T4H`. During development, I found that this further improves the performance for these types. Below are the benchmark results for different data types collected on a Neoverse-V2 CPU. The graphs use GB/s as a metric, so higher values indicate better performance. For detailed JMH outputs, please see the attached files. bfa9369 represents the current state of this PR, and 31dc328 represents its previous state. Thank you for your suggestions! I look forward to your feedback on these updates. ![bytes](https://github.com/user-attachments/assets/1f58f6db-be82-4a7c-95fc-5c190381c9c2) ![shorts](https://github.com/user-attachments/assets/71f26f55-c9b1-4009-b1af-15db904b4f87) ![ints](https://github.com/user-attachments/assets/5e6651f9-0a0f-419d-ae10-9c7cdd2e3254) [ArraysHashCode-v2-31dc328.txt](https://github.com/user-attachments/files/17017053/ArraysHashCode-v2-31dc328.txt) [ArraysHashCode-v2-bfa9369.txt](https://github.com/user-attachments/files/17017054/ArraysHashCode-v2-bfa9369.txt) ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2353546358 From duke at openjdk.org Mon Sep 16 18:01:49 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 16 Sep 2024 18:01:49 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v7] In-Reply-To: References: Message-ID: > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x Srinivas Vamsi Parasa has updated the pull request incrementally with three additional commits since the last revision: - Update HyperbolicTests.java Remove the path to random library - update copyright year and remove unused random from HyperbolicTests - remove tanh tests in seprate file ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20657/files - new: https://git.openjdk.org/jdk/pull/20657/files/ca3314c5..e908eb44 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=05-06 Stats: 1640 lines in 2 files changed: 735 ins; 892 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/20657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20657/head:pull/20657 PR: https://git.openjdk.org/jdk/pull/20657 From bulasevich at openjdk.org Mon Sep 16 18:16:08 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Mon, 16 Sep 2024 18:16:08 GMT Subject: RFR: 8339573: Update CodeCacheSegmentSize and CodeEntryAlignment for ARM In-Reply-To: References: <2eVsVNQ1NsUA6GGcaztqwCs86hu4mh1XTbJUEQH9Its=.41837726-4bbf-44c2-9f7c-724ef656419a@github.com> Message-ID: On Sat, 14 Sep 2024 16:22:05 GMT, Andrew Haley wrote: >>> Do you reproduce the regression on a public benchmark that I can also try? >> >> It was our internal benchmark. > >> > @vnkozlov Many thanks! Do you reproduce the regression on a public benchmark that I can also try? Now I restrict CodeEntryAlignment=16 for V1 and V2 only. And I restart my performance tests. >> >> This may have as much to do with the smallish icache > > Sorry, I meant last level cache @theRealAph > It makes little sense to set the default CodeEntryAlignment to less than the icache line size. except in severely constrained environments. Why do we need CodeEntryAlignment? The instruction prefetcher has more time to load the next cache line if execution starts at the beginning of the current cache line. But this consideration makes more sense for OptoLoopAlignment. Ideally, the entire loop body fits into a limited number of instruction cache lines - this is unlikely to happen with the entire nmethod body. I have experimented with code entry alignment on native application (repeatedly calling a large number of aligned/unaligned short methods) and found that for Neoverse N2 CPU 64-byte alignment is preferable, while no difference was observed for Neoverse V2. I am not sure if this is a feature of the processor implementation or a feature of the Neoverse architecture. The Neoverse N2/V2 technical reference manuals are pretty much the same about L1 instruction memory system features. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20864#issuecomment-2353588638 From duke at openjdk.org Mon Sep 16 18:22:56 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 16 Sep 2024 18:22:56 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v8] In-Reply-To: References: Message-ID: > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - update tanh additional tests - Merge branch 'master' of https://git.openjdk.java.net/jdk into onetanh - Update HyperbolicTests.java Remove the path to random library - update copyright year and remove unused random from HyperbolicTests - remove tanh tests in seprate file - Update test/jdk/java/lang/Math/HyperbolicTests.java Co-authored-by: Andrey Turbanov - quad precision tanh tests - c1 and template generator fixes - update libm tanh reference test with code review suggestions - Add stub initialization and extra tanh tests - ... and 2 more: https://git.openjdk.org/jdk/compare/7110935c...3664be15 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20657/files - new: https://git.openjdk.org/jdk/pull/20657/files/e908eb44..3664be15 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=06-07 Stats: 111889 lines in 2917 files changed: 66503 ins; 28786 del; 16600 mod Patch: https://git.openjdk.org/jdk/pull/20657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20657/head:pull/20657 PR: https://git.openjdk.org/jdk/pull/20657 From duke at openjdk.org Mon Sep 16 18:22:56 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 16 Sep 2024 18:22:56 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v2] In-Reply-To: References: <0cm-1lCXQVJaCJbfp7evFyNvkLFxwUyfSukdy40aVxY=.7129df0b-d9a0-4a3b-b7f6-53f767b01ef6@github.com> Message-ID: On Fri, 13 Sep 2024 23:10:19 GMT, Sandhya Viswanathan wrote: >> Hi Joe (@jddarcy), >> >> As suggested by Sandhya (@sviswa7), I added ~750 fixed point tests for tanh in `TanhTests.java` using the quad precision tanh implementation in libquadmath library from gcc. >> >> Please let me know if this looks good. > > @vamsi-parasa In my thoughts the best way to do this is add the additional tests points to HyperbolicTests.java itself in the testcases array of testTanh() method. We should remove all the other changes from HyperbolicTests.java. Also no need for separate TanhTests.java file. Hi Sandhya(@sviswa7), please see the updated code in `HyperbolicTests.java` which removes the previous random based tests with the new fixed point tests. Also removed the `TanhTests.java`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1761642227 From szaldana at openjdk.org Mon Sep 16 18:26:06 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Mon, 16 Sep 2024 18:26:06 GMT Subject: RFR: 8336874: WhiteBoxAPI: assert(!method->is_abstract() && (osr_bci == InvocationEntryBci || !method->is_native())) failed: cannot compile abstract/native methods In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 15:35:50 GMT, Leonid Mesnik wrote: >> Hi all, >> >> This PR addresses [8336874](https://bugs.openjdk.org/browse/JDK-8336874) ensuring enqueuing an abstract method for compilation doesn't hit an assert with WhiteBox. >> >> Testing: >> - [x] Added test case passes. >> >> Thanks, >> Sonia > > test/hotspot/jtreg/compiler/whitebox/TestCompileAbstractMethod.java line 48: > >> 46: assert m1 != null; >> 47: WHITE_BOX.enqueueMethodForCompilation(m1, CompilerWhiteBoxTest.COMP_LEVEL_FULL_OPTIMIZATION); >> 48: if (WHITE_BOX.isMethodCompiled(m1)) { > > I think it is better to check that 'enqueueMethodForCompilation' returns false for abstract method additionally. > The isMethodCompiled should always returns true even if WB try to push abstract method but it is rejected on later steps. Hi @lmesnik, Could you help me understand a bit better why `isMethodCompiled` should always return true? I?m cross referencing with what?s implemented in [whitebox.cpp](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/whitebox.cpp#L854). In this case, the method handle for `m1` doesn?t have a `nmethod` so the function returns early on the `nullptr` check. Is this behaviour incorrect? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20973#discussion_r1761647907 From psandoz at openjdk.org Mon Sep 16 18:47:17 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 16 Sep 2024 18:47:17 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> Message-ID: On Mon, 16 Sep 2024 02:58:41 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Disabling VectorLoadShuffle bypassing optimization to comply with rearrange semantics at IR level. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 561: > 559: for (int i = 0; i < vlen; i++) { > 560: int index = ((int)vecPayload1[i]); > 561: res[i] = index >= vlen ? vecPayload3[index & (vlen - 1)] : vecPayload2[index]; This is incorrect as the index could be negative. You need to wrap in the range `[0, 2 * vlen - 1]` before the comparison and selection. int index = ((int)vecPayload1[i]) & ((vlen << 1) - 1)); res[i] = index < vlen ? vecPayload2[index] : vecPayload3[index - vlen]; src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2974: > 2972: final $abstractvectortype$ selectFromTemplate(Class> indexVecClass, > 2973: $abstractvectortype$ v1, $abstractvectortype$ v2) { > 2974: int twoVectorLen = length() * 2; We should assert that the length is a power of two. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1761663646 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1761667602 From lmesnik at openjdk.org Mon Sep 16 18:52:11 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 16 Sep 2024 18:52:11 GMT Subject: RFR: 8336874: WhiteBoxAPI: assert(!method->is_abstract() && (osr_bci == InvocationEntryBci || !method->is_native())) failed: cannot compile abstract/native methods In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 18:23:38 GMT, Sonia Zaldana Calles wrote: >> test/hotspot/jtreg/compiler/whitebox/TestCompileAbstractMethod.java line 48: >> >>> 46: assert m1 != null; >>> 47: WHITE_BOX.enqueueMethodForCompilation(m1, CompilerWhiteBoxTest.COMP_LEVEL_FULL_OPTIMIZATION); >>> 48: if (WHITE_BOX.isMethodCompiled(m1)) { >> >> I think it is better to check that 'enqueueMethodForCompilation' returns false for abstract method additionally. >> The isMethodCompiled should always returns true even if WB try to push abstract method but it is rejected on later steps. > > Hi @lmesnik, Could you help me understand a bit better why `isMethodCompiled` should always return true? I?m cross referencing with what?s implemented in [whitebox.cpp](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/whitebox.cpp#L854). > > In this case, the method handle for `m1` doesn?t have a `nmethod` so the function returns early on the `nullptr` check. Is this behaviour incorrect? Sorry, I mistaken. I meant that `isMethodCompiled` should always return false for abstract method. The compiler can't compile it. So this check is fine. But for your fix it is better to check return value of `enqueueMethodForCompilation` as I mentioned. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20973#discussion_r1761687397 From szaldana at openjdk.org Mon Sep 16 18:58:45 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Mon, 16 Sep 2024 18:58:45 GMT Subject: RFR: 8336874: WhiteBoxAPI: assert(!method->is_abstract() && (osr_bci == InvocationEntryBci || !method->is_native())) failed: cannot compile abstract/native methods [v2] In-Reply-To: References: Message-ID: > Hi all, > > This PR addresses [8336874](https://bugs.openjdk.org/browse/JDK-8336874) ensuring enqueuing an abstract method for compilation doesn't hit an assert with WhiteBox. > > Testing: > - [x] Added test case passes. > > Thanks, > Sonia Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: Adding enqueue check to be false ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20973/files - new: https://git.openjdk.org/jdk/pull/20973/files/3069fdb4..3cded521 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20973&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20973&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20973.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20973/head:pull/20973 PR: https://git.openjdk.org/jdk/pull/20973 From szaldana at openjdk.org Mon Sep 16 18:58:45 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Mon, 16 Sep 2024 18:58:45 GMT Subject: RFR: 8336874: WhiteBoxAPI: assert(!method->is_abstract() && (osr_bci == InvocationEntryBci || !method->is_native())) failed: cannot compile abstract/native methods [v2] In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 18:49:48 GMT, Leonid Mesnik wrote: >> Hi @lmesnik, Could you help me understand a bit better why `isMethodCompiled` should always return true? I?m cross referencing with what?s implemented in [whitebox.cpp](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/whitebox.cpp#L854). >> >> In this case, the method handle for `m1` doesn?t have a `nmethod` so the function returns early on the `nullptr` check. Is this behaviour incorrect? > > Sorry, I mistaken. I meant that `isMethodCompiled` should always return false for abstract method. The compiler can't compile it. So this check is fine. But for your fix it is better to check return value of `enqueueMethodForCompilation` as I mentioned. Understood, thanks for checking! I pushed an update checking the return value of `enqueueMethodForCompilation` instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20973#discussion_r1761691512 From lmesnik at openjdk.org Mon Sep 16 19:18:06 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 16 Sep 2024 19:18:06 GMT Subject: RFR: 8336874: WhiteBoxAPI: assert(!method->is_abstract() && (osr_bci == InvocationEntryBci || !method->is_native())) failed: cannot compile abstract/native methods [v2] In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 18:58:45 GMT, Sonia Zaldana Calles wrote: >> Hi all, >> >> This PR addresses [8336874](https://bugs.openjdk.org/browse/JDK-8336874) ensuring enqueuing an abstract method for compilation doesn't hit an assert with WhiteBox. >> >> Testing: >> - [x] Added test case passes. >> >> Thanks, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Adding enqueue check to be false Thanks for updating test! ------------- Marked as reviewed by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20973#pullrequestreview-2307551181 From sviswanathan at openjdk.org Mon Sep 16 20:53:06 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 16 Sep 2024 20:53:06 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v4] In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 20:37:27 GMT, Jatin Bhateja wrote: >> - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The >> condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. >> - This saves emitting an explicit MOVZX instruction after setCC. >> - These new instructions are encoded using 4 byte Extended EVEX encoding. >> >> Validation performed over stand alone test point using Intel SDE. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution. src/hotspot/cpu/x86/assembler_x86.cpp line 16052: > 16050: > 16051: // Encoding Format : eevex_prefix | opcode_cc | modrm > 16052: int encode = vex_prefix_and_encode(dst->encoding(), 0, 0, VEX_SIMD_F2, /* MAP4 */VEX_OPCODE_0F_3C, &attributes); We could replace this with: int encode = evex_prefix_and_encode_ndd(dst->encoding(), 0, 0, VEX_SIMD_F2, /* MAP4 */VEX_OPCODE_0F_3C, &attributes); src/hotspot/cpu/x86/macroAssembler_x86.cpp line 10426: > 10424: > 10425: void MacroAssembler::setcc(Assembler::Condition comparison, Register dst) { > 10426: if (VM_Version::supports_apx_f()) { We could check UseAPX here instead of VM_Version::supports_apx_f(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20920#discussion_r1761638922 PR Review Comment: https://git.openjdk.org/jdk/pull/20920#discussion_r1761942325 From kvn at openjdk.org Mon Sep 16 21:08:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 16 Sep 2024 21:08:13 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v4] In-Reply-To: References: Message-ID: <4iyuq0MSWBGxjPqJWbtQP8Y8KL6gmIkHXDbK_pmDgqA=.28b2de70-3506-40ec-aa8c-96349e7c8de4@github.com> On Mon, 16 Sep 2024 20:45:45 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolution. > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 10426: > >> 10424: >> 10425: void MacroAssembler::setcc(Assembler::Condition comparison, Register dst) { >> 10426: if (VM_Version::supports_apx_f()) { > > We could check UseAPX here instead of VM_Version::supports_apx_f(). I think switching off a feature in `vm_version` file based on flags setting is correct. So that in the rest of code we can simple check `VM_Version::supports_*()`. Currently not all code follow this but it is preferable way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20920#discussion_r1761963930 From psandoz at openjdk.org Mon Sep 16 21:21:11 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 16 Sep 2024 21:21:11 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> Message-ID: On Mon, 16 Sep 2024 02:58:41 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Disabling VectorLoadShuffle bypassing optimization to comply with rearrange semantics at IR level. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2970: > 2968: > 2969: > 2970: /*package-private*/ I think we can simplify with: /*package-private*/ @ForceInline final $abstractvectortype$ selectFromTemplate(Class> indexVecClass, $abstractvectortype$ v1, $abstractvectortype$ v2) { int twoVectorLenMask = (length() << 1) - 1; #if[FP] Vector<$Boxbitstype$> wrapped_indexes = this.convert(VectorOperators.{#if[intOrFloat]?F2I:D2L}, 0) .lanewise(VectorOperators.AND, twoVectorLenMask); return VectorSupport.selectFromTwoVectorOp(getClass(), indexVecClass , $type$.class, $bitstype$.class, length(), wrapped_indexes, v1, v2, (vec1, vec2, vec3) -> selectFromTwoVectorHelper(vec1, vec2, vec3) ); #else[FP] $abstractvectortype$ wrapped_indexes = this.lanewise(VectorOperators.AND, twoVectorLenMask); return VectorSupport.selectFromTwoVectorOp(getClass(), indexVecClass, $type$.class, $type$.class, length(), wrapped_indexes, v1, v2, (vec1, vec2, vec3) -> selectFromTwoVectorHelper(vec1, vec2, vec3) ); #end[FP] } (Note that's without the assert - see separate comment). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1761977004 From iklam at openjdk.org Mon Sep 16 21:54:49 2024 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 16 Sep 2024 21:54:49 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v6] In-Reply-To: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: > This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > **Overview** > > - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. > - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. > - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. > - The boot classes are loaded as part of `vmClasses::resolve_all()` > - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). > - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. > > **All-or-nothing Loading** > > - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. > - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: > - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. > - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. > - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. > - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exact same set of modules are visible whe... Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: @ashu-mehra reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20843/files - new: https://git.openjdk.org/jdk/pull/20843/files/66a4ff41..bcddf963 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20843&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20843&range=04-05 Stats: 18 lines in 3 files changed: 10 ins; 1 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/20843.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20843/head:pull/20843 PR: https://git.openjdk.org/jdk/pull/20843 From iklam at openjdk.org Mon Sep 16 21:54:49 2024 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 16 Sep 2024 21:54:49 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v5] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Fri, 13 Sep 2024 16:09:25 GMT, Ashutosh Mehra wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> @ashu-mehra comments > > src/hotspot/share/cds/aotClassLinker.cpp line 227: > >> 225: } >> 226: >> 227: int AOTClassLinker::num_initiated_classes(oop loader1, oop loader2) { > > The two loader arguments here are quite confusing marking it hard to understand the code. Can it be refactored as this: > > > int AOTClassLinker::num_platform_initiated_classes() { > // AOTLinkedClassBulkLoader will initiate loading of all public boot classes in the platform loader. > return num_initiated_classes(nullptr); > } > > int AOTClassLinker::num_app_initiated_classes() { > // AOTLinkedClassBulkLoader will initiate loading of all public boot/platform classes in the app loader. > return num_platform_initiated_classes + num_initiated_classes(SystemDictionary::java_platform_loader()); > } > > int AOTClassLinker::num_initiated_classes(oop loader) { > int n = 0; > for (int i = 0; i < _sorted_candidates->length(); i++) { > InstanceKlass* ik = _sorted_candidates->at(i); > if (ik->is_public() && !ik->is_hidden() && > (ik->class_loader() == loader) { > n++; > } > } > > return n; > } I renamed the function to `AOTClassLinker::count_public_classes()` and made it handle only a single loader each time. This function is not speed critical so I think this way it's much easier to read. > src/hotspot/share/cds/aotLinkedClassBulkLoader.cpp line 199: > >> 197: InstanceKlass* ik = classes->at(i); >> 198: assert(ik->is_loaded(), "must have already been loaded by a parent loader"); >> 199: assert(ik->class_loader() != initiating_loader(), "must be a parent loader"); > > Can we also add an assert that ik->class_loader() must be either boot or platform loader. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1762007825 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1762007881 From asmehra at openjdk.org Mon Sep 16 22:37:07 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Mon, 16 Sep 2024 22:37:07 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v6] In-Reply-To: <6rIWXK2IjOxEPUvdbFchC8d191QHOC2RhrRdl3K7wxo=.8a5f1a8b-0fb6-4d3a-8a5e-c224f17408fc@github.com> References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> <6rIWXK2IjOxEPUvdbFchC8d191QHOC2RhrRdl3K7wxo=.8a5f1a8b-0fb6-4d3a-8a5e-c224f17408fc@github.com> Message-ID: On Sat, 7 Sep 2024 00:30:34 GMT, Ioi Lam wrote: >> I've taken an initial look through but there is an awful lot to try and digest here. I've flagged numerous typos and minor nits. >> >> One general query: does this stuff work if the user defines their own initial application classloader? > >> I've taken an initial look through but there is an awful lot to try and digest here. I've flagged numerous typos and minor nits. >> >> One general query: does this stuff work if the user defines their own initial application classloader? > > Hi David thanks for the review. I've pushed a new version that has most of your suggestions. > > I also added code to avoid loading the CDS archive if it has aot-linked classes, and the user has specified `-Djava.system.class.loader` @iklam thank you for addressing the comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20843#issuecomment-2354146490 From asmehra at openjdk.org Mon Sep 16 22:37:06 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Mon, 16 Sep 2024 22:37:06 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v6] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Mon, 16 Sep 2024 21:54:49 GMT, Ioi Lam wrote: >> This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> **Overview** >> >> - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. >> - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. >> - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. >> - The boot classes are loaded as part of `vmClasses::resolve_all()` >> - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). >> - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. >> >> **All-or-nothing Loading** >> >> - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. >> - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: >> - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. >> - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. >> - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. >> - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exa... > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @ashu-mehra reviews Marked as reviewed by asmehra (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20843#pullrequestreview-2307949022 From duke at openjdk.org Mon Sep 16 23:01:08 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 16 Sep 2024 23:01:08 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v9] In-Reply-To: References: Message-ID: <1TAqO7DOjjkXpbdTmsDbByq9kxPnaX1Ev57KnKWakjQ=.e0c25c34-fc49-445a-8cd2-7dd0fae64e80@github.com> > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 13 additional commits since the last revision: - Merge branch 'master' of https://git.openjdk.java.net/jdk into onetanh - update tanh additional tests - Merge branch 'master' of https://git.openjdk.java.net/jdk into onetanh - Update HyperbolicTests.java Remove the path to random library - update copyright year and remove unused random from HyperbolicTests - remove tanh tests in seprate file - Update test/jdk/java/lang/Math/HyperbolicTests.java Co-authored-by: Andrey Turbanov - quad precision tanh tests - c1 and template generator fixes - update libm tanh reference test with code review suggestions - ... and 3 more: https://git.openjdk.org/jdk/compare/e48dd57e...b438555e ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20657/files - new: https://git.openjdk.org/jdk/pull/20657/files/3664be15..b438555e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=07-08 Stats: 344 lines in 9 files changed: 329 ins; 2 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/20657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20657/head:pull/20657 PR: https://git.openjdk.org/jdk/pull/20657 From sviswanathan at openjdk.org Mon Sep 16 23:02:12 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 16 Sep 2024 23:02:12 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v4] In-Reply-To: <4iyuq0MSWBGxjPqJWbtQP8Y8KL6gmIkHXDbK_pmDgqA=.28b2de70-3506-40ec-aa8c-96349e7c8de4@github.com> References: <4iyuq0MSWBGxjPqJWbtQP8Y8KL6gmIkHXDbK_pmDgqA=.28b2de70-3506-40ec-aa8c-96349e7c8de4@github.com> Message-ID: On Mon, 16 Sep 2024 21:05:01 GMT, Vladimir Kozlov wrote: >> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 10426: >> >>> 10424: >>> 10425: void MacroAssembler::setcc(Assembler::Condition comparison, Register dst) { >>> 10426: if (VM_Version::supports_apx_f()) { >> >> We could check UseAPX here instead of VM_Version::supports_apx_f(). > > I think switching off a feature in `vm_version` file based on flags setting is correct. > So that in the rest of code we can simple check `VM_Version::supports_*()`. > Currently not all code follow this but it is preferable way. Sounds good, let us keep it this way (VM_Version::supports_*()). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20920#discussion_r1762059128 From ysr at openjdk.org Mon Sep 16 23:50:16 2024 From: ysr at openjdk.org (Y. Srinivas Ramakrishna) Date: Mon, 16 Sep 2024 23:50:16 GMT Subject: RFR: 8318480: Obsolete UseCounterDecay and remove CounterDecayMinIntervalLength [v4] In-Reply-To: <50YDKFPHpqCEnhBk5eBeKWpbTJIHfFpQCfOcdVE8OhE=.75b95951-2c37-4f48-9a0d-fd52251f5771@github.com> References: <50YDKFPHpqCEnhBk5eBeKWpbTJIHfFpQCfOcdVE8OhE=.75b95951-2c37-4f48-9a0d-fd52251f5771@github.com> Message-ID: On Fri, 17 Nov 2023 13:09:00 GMT, Daniel Lund?n wrote: >> This changeset obsoletes the leftover (i.e., no longer used for anything) product compiler flag `UseCounterDecay` (requires CSR) and removes the leftover develop flag `CounterDecayMinIntervalLength`. >> >> Changes: >> - Obsolete `UseCounterDecay` in JDK 22 and expire it in JDK 23. >> - Completely remove `CounterDecayMinIntervalLength`. >> >> ### Testing >> Platforms: windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64. >> - `tier1` >> - HotSpot parts of `tier2` and `tier3` > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Obsolete UseCounterDecay src/hotspot/share/runtime/globals.hpp line 1225: > 1223: develop(intx, CounterHalfLifeTime, 30, \ > 1224: "Half-life time of invocation counters (in seconds)") \ > 1225: \ Thanks for making these changes. However, it seems like the obsolete `CounterHalfLifeTime` was missed in this cleanup? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16673#discussion_r1762090336 From iklam at openjdk.org Mon Sep 16 23:55:41 2024 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 16 Sep 2024 23:55:41 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v2] In-Reply-To: References: Message-ID: > This is the 4th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > **Problem:** > > This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store [`sun.invoke.util.Wrapper`](https://github.com/openjdk/jdk/blob/c3711dc90980fb3e63ff199612c201c4464626bf/src/java.base/share/classes/sun/invoke/util/Wrapper.java) enums in the AOT cache. Although CDS has some limited support for storing enums, the `Wrapper` type is too complex for the existing solution to handle. Please see the JBS issue for details. > > **Solution:** > > In the assembly phase, we store the initialized states of the `Wrapper` class (captured in a `java.lang.Class` object, a.k.a. the *mirror* of this class) into the AOT cache. > > In production run, we no longer execute `Wrapper::`, because all the static fields (contained in its mirror) are already initialized. > > **Review Notes:** > > - The new capability is controlled by `CDSConfig::is_initing_classes_at_dump_time()`. We can aot-initialize classes only if `-XX:+AOTClassLinking` is enabled. > - The old (more limited) support for enums is still there (it's required when `AOTClassLinking` is disabled). See the call to `CDSEnumKlass::handle_enum_obj()` in heapShared.cpp. > - `AOTClassInitializer::can_archive_initialized_mirror()` decides what classes can be aot-initialized. This is currently a very small set of classes, but will expand in [JDK-8293336](https://bugs.openjdk.org/browse/JDK-8293336) > - Before, `HeapShared::archive_java_mirrors()` would clear out all the states in the archived mirrors. With this PR, the states of aot-initialized classes are preserved via `HeapShared::copy_aot_initialized_mirror()`. > - During the early state of the production run, `AOTLinkedClassBulkLoader::init_required_classes_for_loader()` is called to make sure that: > - all aot-initialized classes are moved into the `initialized` state (without executing its `` method). This is done in `InstanceKlass::initialize_from_cds()` > - the classes of all the objects that are reachable from the aot-initialized mirrors are initialized. See comments above ` HeapShared::init_classes_reachable_from_archived_mirrors()` > > **Caveats:** > > Not all Enum classes can be stored in the initialized state. E.g., some Enums might have static fields that depend on the environment: > > > enum Foo { > [....] > static fin... Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: @vnkozlov comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20958/files - new: https://git.openjdk.org/jdk/pull/20958/files/e0508278..f630cd37 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20958&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20958&range=00-01 Stats: 4 lines in 1 file changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20958.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20958/head:pull/20958 PR: https://git.openjdk.org/jdk/pull/20958 From iklam at openjdk.org Mon Sep 16 23:55:41 2024 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 16 Sep 2024 23:55:41 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v2] In-Reply-To: References: Message-ID: <5eDG2CH2elPAe2f_ikTWkSPj_1LM1rivHbQH_jiHKKE=.c4480e55-c054-4ea3-83b5-062886985978@github.com> On Thu, 12 Sep 2024 16:17:22 GMT, Vladimir Kozlov wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> @vnkozlov comments > > src/hotspot/share/oops/instanceKlass.cpp line 828: > >> 826: link_class(CHECK); >> 827: >> 828: #ifdef AZZERT > > ? "ASSERT" Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20958#discussion_r1762093890 From iklam at openjdk.org Tue Sep 17 00:02:19 2024 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 17 Sep 2024 00:02:19 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v3] In-Reply-To: References: Message-ID: > This is the 4th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > **Problem:** > > This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store [`sun.invoke.util.Wrapper`](https://github.com/openjdk/jdk/blob/c3711dc90980fb3e63ff199612c201c4464626bf/src/java.base/share/classes/sun/invoke/util/Wrapper.java) enums in the AOT cache. Although CDS has some limited support for storing enums, the `Wrapper` type is too complex for the existing solution to handle. Please see the JBS issue for details. > > **Solution:** > > In the assembly phase, we store the initialized states of the `Wrapper` class (captured in a `java.lang.Class` object, a.k.a. the *mirror* of this class) into the AOT cache. > > In production run, we no longer execute `Wrapper::`, because all the static fields (contained in its mirror) are already initialized. > > **Review Notes:** > > - The new capability is controlled by `CDSConfig::is_initing_classes_at_dump_time()`. We can aot-initialize classes only if `-XX:+AOTClassLinking` is enabled. > - The old (more limited) support for enums is still there (it's required when `AOTClassLinking` is disabled). See the call to `CDSEnumKlass::handle_enum_obj()` in heapShared.cpp. > - `AOTClassInitializer::can_archive_initialized_mirror()` decides what classes can be aot-initialized. This is currently a very small set of classes, but will expand in [JDK-8293336](https://bugs.openjdk.org/browse/JDK-8293336) > - Before, `HeapShared::archive_java_mirrors()` would clear out all the states in the archived mirrors. With this PR, the states of aot-initialized classes are preserved via `HeapShared::copy_aot_initialized_mirror()`. > - During the early state of the production run, `AOTLinkedClassBulkLoader::init_required_classes_for_loader()` is called to make sure that: > - all aot-initialized classes are moved into the `initialized` state (without executing its `` method). This is done in `InstanceKlass::initialize_from_cds()` > - the classes of all the objects that are reachable from the aot-initialized mirrors are initialized. See comments above ` HeapShared::init_classes_reachable_from_archived_mirrors()` > > **Caveats:** > > Not all Enum classes can be stored in the initialized state. E.g., some Enums might have static fields that depend on the environment: > > > enum Foo { > [....] > static fin... Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap - @vnkozlov comments - Clean up; removed unrelated changes in classPrinter.cpp - more cleanup - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap - More clean up for JDK-8293187 - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap - Simplified implemented by AOTClassInitializer. - ... and 1 more: https://git.openjdk.org/jdk/compare/bcddf963...e15e76cd ------------- Changes: https://git.openjdk.org/jdk/pull/20958/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20958&range=02 Stats: 815 lines in 20 files changed: 742 ins; 16 del; 57 mod Patch: https://git.openjdk.org/jdk/pull/20958.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20958/head:pull/20958 PR: https://git.openjdk.org/jdk/pull/20958 From iklam at openjdk.org Tue Sep 17 00:06:19 2024 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 17 Sep 2024 00:06:19 GMT Subject: RFR: 8293337: Store method handle intrinsics in AOT cache [v2] In-Reply-To: <4fMDUSZRg0HcIiZmr-yqr7vleFXrD_zNXpdd_pfgHQ8=.4a679e86-b0f9-4101-bcc2-f49d8bcb417b@github.com> References: <4fMDUSZRg0HcIiZmr-yqr7vleFXrD_zNXpdd_pfgHQ8=.4a679e86-b0f9-4101-bcc2-f49d8bcb417b@github.com> Message-ID: > This is the 5th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store Java heap objects that have native pointers to the C++ `Method` objects returned by `SystemDictionary::find_method_handle_intrinsic()` > > These `Method` objects are created within the JVM. They do not belong to any actual Java classes. We store all these `Method` objects into the AOT cache, so that they can be referenced by other artifacts in the AOT cache. > > --- > See [here](https://bugs.openjdk.org/browse/JDK-8315737) for the sequence of dependent RFEs for implementing JEP 483. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: - @vnkozlov comment - added NOT_CDS_RETURN - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics - some clean up - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics - 8293337: Archive method handle intrinsics ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20959/files - new: https://git.openjdk.org/jdk/pull/20959/files/9385da0c..a57e9f00 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20959&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20959&range=00-01 Stats: 53 lines in 10 files changed: 22 ins; 2 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/20959.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20959/head:pull/20959 PR: https://git.openjdk.org/jdk/pull/20959 From iklam at openjdk.org Tue Sep 17 00:06:19 2024 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 17 Sep 2024 00:06:19 GMT Subject: RFR: 8293337: Store method handle intrinsics in AOT cache [v2] In-Reply-To: References: <4fMDUSZRg0HcIiZmr-yqr7vleFXrD_zNXpdd_pfgHQ8=.4a679e86-b0f9-4101-bcc2-f49d8bcb417b@github.com> Message-ID: On Fri, 13 Sep 2024 21:45:45 GMT, Vladimir Kozlov wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: >> >> - @vnkozlov comment - added NOT_CDS_RETURN >> - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics >> - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics >> - some clean up >> - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics >> - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics >> - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics >> - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics >> - 8293337: Archive method handle intrinsics > > src/hotspot/share/classfile/systemDictionary.hpp line 349: > >> 347: // Second part of load_shared_class >> 348: static void load_shared_class_misc(InstanceKlass* ik, ClassLoaderData* loader_data) NOT_CDS_RETURN; >> 349: static void restore_archived_method_handle_intrinsics_impl(TRAPS); > > Missing `NOT_CDS_RETURN` ? Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20959#discussion_r1762100793 From duke at openjdk.org Tue Sep 17 00:41:20 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 17 Sep 2024 00:41:20 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v10] In-Reply-To: References: Message-ID: <2GF3hOTuSrN9ejN_aaNWV6g5zfBdVJ93kiPjIiPUKQE=.c61b7a6c-edd9-4edd-b866-6d4969591c8a@github.com> > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: add call to the additional tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20657/files - new: https://git.openjdk.org/jdk/pull/20657/files/b438555e..1ee4c1fe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=08-09 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20657/head:pull/20657 PR: https://git.openjdk.org/jdk/pull/20657 From duke at openjdk.org Tue Sep 17 03:44:14 2024 From: duke at openjdk.org (duke) Date: Tue, 17 Sep 2024 03:44:14 GMT Subject: Withdrawn: 8332697: ubsan: shenandoahSimpleBitMap.inline.hpp:68:23: runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be represented in type 'long int' In-Reply-To: References: Message-ID: On Fri, 12 Jul 2024 20:53:04 GMT, Henry Lin wrote: > Cast the result of `nth_bit(n)` to `uintptr_t` to prevent signed integer overflow error reported by `ubsan`. Unsigned overflow is not undefined behavior and is not checked by `ubsan`. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/20164 From iwalulya at openjdk.org Tue Sep 17 03:47:06 2024 From: iwalulya at openjdk.org (Ivan Walulya) Date: Tue, 17 Sep 2024 03:47:06 GMT Subject: RFR: 8340119: Remove oopDesc::size_might_change() In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 14:12:58 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that removes `oopDesc::size_might_change()` because since JDK-8337709 and JDK-8311163 no collector uses the objArray's length field during garbage collection any more. > > Testing: tier1-3 > > Thanks, > Thomas Marked as reviewed by iwalulya (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20999#pullrequestreview-2308194786 From jbhateja at openjdk.org Tue Sep 17 04:38:07 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 04:38:07 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v4] In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 18:17:44 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolution. > > src/hotspot/cpu/x86/assembler_x86.cpp line 16052: > >> 16050: >> 16051: // Encoding Format : eevex_prefix | opcode_cc | modrm >> 16052: int encode = vex_prefix_and_encode(dst->encoding(), 0, 0, VEX_SIMD_F2, /* MAP4 */VEX_OPCODE_0F_3C, &attributes); > > We could replace this with: > int encode = evex_prefix_and_encode_ndd(dst->encoding(), 0, 0, VEX_SIMD_F2, /* MAP4 */VEX_OPCODE_0F_3C, &attributes); Zero upper setCC repurpose the NDD bit which is always set by default. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20920#discussion_r1762273655 From jbhateja at openjdk.org Tue Sep 17 04:38:08 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 04:38:08 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v4] In-Reply-To: References: <4iyuq0MSWBGxjPqJWbtQP8Y8KL6gmIkHXDbK_pmDgqA=.28b2de70-3506-40ec-aa8c-96349e7c8de4@github.com> Message-ID: On Mon, 16 Sep 2024 22:59:01 GMT, Sandhya Viswanathan wrote: >> I think switching off a feature in `vm_version` file based on flags setting is correct. >> So that in the rest of code we can simple check `VM_Version::supports_*()`. >> Currently not all code follow this but it is preferable way. > > Sounds good, let us keep it this way (VM_Version::supports_*()). Yes, CPU [feature is disabled](https://github.com/openjdk/jdk/commit/a4cf1918c963cbe0b0eee6db580f0769c0cbdbcc#diff-6ed856c57ddbe33e49883adb7c52ec51ed377e5f697dfd6d8bea505a97bfc5a5R1049) if UseAVX is set to false ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20920#discussion_r1762273687 From duke at openjdk.org Tue Sep 17 04:41:20 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 17 Sep 2024 04:41:20 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v11] In-Reply-To: References: Message-ID: <4eB1-LVi9xoS-lksSPbpyf39ebjZvsaqKQrP0XSMOTE=.48731919-0181-4aeb-97f6-ea2f22ac3410@github.com> > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: remove -ve tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20657/files - new: https://git.openjdk.org/jdk/pull/20657/files/1ee4c1fe..aa163896 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=09-10 Stats: 727 lines in 1 file changed: 0 ins; 364 del; 363 mod Patch: https://git.openjdk.org/jdk/pull/20657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20657/head:pull/20657 PR: https://git.openjdk.org/jdk/pull/20657 From rcastanedalo at openjdk.org Tue Sep 17 05:20:30 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 17 Sep 2024 05:20:30 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v22] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Discard memory accesses with barrier data as implicit null check candidates ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/653f9acf..71a51bfc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=20-21 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Tue Sep 17 05:20:30 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 17 Sep 2024 05:20:30 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v20] In-Reply-To: References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Mon, 16 Sep 2024 15:48:32 GMT, Vladimir Kozlov wrote: >> I did not make any changes because the current logic in `lcm.cpp` already prevents this, albeit in a rather accidental way: `PhaseCFG::implicit_null_check` requires that all inputs of a candidate memory operation dominate the null check ([here](https://github.com/robcasloz/jdk/blob/d21104ca8ff1eef88a9d87fb78dda3009414b5b8/src/hotspot/share/opto/lcm.cpp#L310-L328)) so that it can be hoisted. This fails if the candidate memory operation has barriers because these always require `MachTemp` nodes, which are placed in the same block as the candidate and break the dominance condition. See a longer explanation [here](https://github.com/openjdk/jdk/pull/19746/files#r1715387255). >> >> Should I add a check to `PhaseCFG::implicit_null_check` to discard these memory accesses more explicitly? > >> Should I add a check to PhaseCFG::implicit_null_check to discard these memory accesses more explicitly? > > Yes, please. Done (commit 71a51bfc). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1762318179 From stuefe at openjdk.org Tue Sep 17 05:26:13 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 17 Sep 2024 05:26:13 GMT Subject: Integrated: 8340184: Bug in CompressedKlassPointers::is_in_encodable_range In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 10:37:55 GMT, Thomas Stuefe wrote: > Since JDK-8338526, we keep only those Klass in the encoding range that need a narrowKlass id. > > To check whether a Klass has a narrowKlass id, we call `CompressedKlassPointers::is_in_encodable_range()`. There is a small bug that results from the confusion around "encoding range" vs "klass range". > > The **"Encoding Range"** is the range that can be encoded with the current encoding base, encoding shift, and (implicitly) the bit size of the narrowKlass, which is 32. The encoding range reaches from `[ ... + 1 << (32 + ) ).`. Its size is either 4G (shift=0) or 32G (shift=3). > > The **"Klass Range"** is the range that actually holds Klass structures. It is part of the Encoding Range, but usually much smaller: > - with zero-based encoding, the encoding base is zero, so it precedes the start of the Klass range > - The encoding range can reach far beyond the end of the Klass range. > > For a more detailed explanation, including pleasing ASCII art, please refer to `compressedKlass.hpp` in this patch. > > ---- > > The error in this case, introduced with 8338526, was that we use the range `[ ... )` for `is_in_encodable_range()`. That can lead to false positives since `` can be zero. In a highly contrived theoretical case, we could mis-classify a Klass as being encodable if it lives in metaspace outside class space, but its metaspace region happens to be located below the class space start. > > The error is extremely unlikely because: > - non-class Metaspace regions - freely allocated by mmap - typically live in high-address ranges > - we only allow zero-based encoding with Xshare:off, itself a very unusual setting nowadays > > But the error highlights misuse of the term "encoding range", so it should be fixed. > > ---- > > Note that most of the patch has been selectively copied from Lilliput. Lilliput, with its non-22-bit narrowKlass, had needed to straighten out this code a while ago, and it is better for it. > > The fix: > - we now use the actual *Klass Range* for the "is encodable" check. We don't use the encoding base. That is the real fix. > - To do that, we need to keep track of the Klass Range inside `CompressedKlassPointers`. That is simple, since we are given this range during initialization. > > Minor cleanups: > - I also removed the confusingly named `CompressedKlassPointers::range()`. This was a strange animal ( the distance between *klass range end* and *encoding range start* ) and was only used in a sing... This pull request has now been integrated. Changeset: 7849f252 Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/7849f252937dc774a1935cc4c68f2a46649f180b Stats: 240 lines in 10 files changed: 213 ins; 10 del; 17 mod 8340184: Bug in CompressedKlassPointers::is_in_encodable_range Reviewed-by: coleenp, rkennke, jsjolen ------------- PR: https://git.openjdk.org/jdk/pull/21015 From fyang at openjdk.org Tue Sep 17 06:06:05 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 17 Sep 2024 06:06:05 GMT Subject: RFR: 8339771: RISC-V: Reduce icache flushes [v2] In-Reply-To: References: Message-ID: <2RI2J8bFwNJHqV6Zc2Hhk3izYqEdqf-U6N3Dm21R97c=.ac3a009a-2aec-4203-992c-b8c19588a98c@github.com> On Tue, 10 Sep 2024 12:53:18 GMT, Robbin Ehn wrote: >> Hey, please consider, >> >> All code which is offline (behind a barrier) do not need global icache flushes. >> As we can instead in slow path locally (thread and hart) emit fence.i. >> But if we were to be context switch do a hart which have not had fence.i emitted we can still fetch stale instructions. >> To handle this case new now have kernel support: >> https://docs.kernel.org/arch/riscv/cmodx.html >> >> It's not perfect as we will be emitting fence.i on any context switch for any thread with this patch, even if that thread do not execute on code heap (non attached native thread), and even if there was no changes to code heap. >> But this is in many cases much faster as the icache flush global IPI is very intrusive. >> Particular cases are running a concurrent gc with small head room. >> In such scenario I measured 15% increased throughput on VF2. >> A large CPU or less head room (faster GC cycles) will yield even more performance boost. >> >> Note that this requires 6.10 kernel. >> >> I'm running VF2 with 6.11-rc3 kernel and this passed tier1-3. (With setting on) >> >> Later we probably want this default on, but as it's hard to test I'll leave default off. > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Comment, moved init after feature enabling I am trying to compare this with https://github.com/openjdk/jdk/pull/9770 which removes unnecessary fence.i used in user space. I was thinking that this PR might add those fence.i back under `UseCtxFencei`. But seems there are differences there. Previously, we emitted fence.i in `patch_callers_callsite()` and `MacroAssembler::emit_static_call_stub()`, which looks similar like what aarch64 does. But this PR adds fence.i in different places like in `generate_method_entry_barrier()` and `ZBarrierSetAssembler::patch_barrier_relocation()`. Do you have more details to help understand? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20913#issuecomment-2354602969 From fyang at openjdk.org Tue Sep 17 06:13:04 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 17 Sep 2024 06:13:04 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v8] In-Reply-To: <_4uhEXm6J6Ioe8kOpsavJsJb_jxR83kMN3JcD9RbsN8=.dd158833-edb6-464d-8c06-e1bf70414cb1@github.com> References: <_4uhEXm6J6Ioe8kOpsavJsJb_jxR83kMN3JcD9RbsN8=.dd158833-edb6-464d-8c06-e1bf70414cb1@github.com> Message-ID: On Mon, 16 Sep 2024 08:34:46 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> Thanks. >> >> This improvement is based on java.base/share/native/libzip/zlib/zcrc32.c, I made some modification to N (to 16) related code, then re-generate the tables needed, finally vectorize the code (original implementation in zcrc32.c is just scalar code). >> >> ## Test >> test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java, >> test/jdk/java/util/zip/TestCRC32.java >> >> ## Performance >> >> ###?on bananapi >> >> with patch >> >> Benchmark | (count) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- >> TestCRC32.testCRC32Update | 64 | avgt | 10 | 222.297 | 0.106 | ns/op >> TestCRC32.testCRC32Update | 128 | avgt | 10 | 365.144 | 0.196 | ns/op >> TestCRC32.testCRC32Update | 256 | avgt | 10 | 687.14 | 0.235 | ns/op >> TestCRC32.testCRC32Update | 512 | avgt | 10 | 1043.833 | 0.083 | ns/op >> TestCRC32.testCRC32Update | 2048 | avgt | 10 | 3299.928 | 1.361 | ns/op >> TestCRC32.testCRC32Update | 16384 | avgt | 10 | 24384.502 | 25.298 | ns/op >> TestCRC32.testCRC32Update | 65536 | avgt | 10 | 103200.458 | 8.297 | ns/op >> >> >> >> without patch >> >> Benchmark | (count) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- >> TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.878 | 0.02 | ns/op >> TestCRC32.testCRC32Update | 128 | avgt | 10 | 364.173 | 0.032 | ns/op >> TestCRC32.testCRC32Update | 256 | avgt | 10 | 685.815 | 0.055 | ns/op >> TestCRC32.testCRC32Update | 512 | avgt | 10 | 1329.049 | 0.084 | ns/op >> TestCRC32.testCRC32Update | 2048 | avgt | 10 | 5189.302 | 0.666 | ns/op >> TestCRC32.testCRC32Update | 16384 | avgt | 10 | 41250.873 | 23.882 | ns/op >> TestCRC32.testCRC32Update | 65536 | avgt | 10 | 171664.002 | 15.011 | ns/op >> >> > ... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > minors src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1476: > 1474: > 1475: add(tableN16, table3, 1*single_table_size*sizeof(juint), tmp1); > 1476: mv(t0, 0xff); Thanks for the quick update. But I am a bit worried about keeping a long-lived value in 't0'. `t0` as a scratch register are implicitly used by various assembler routines, so I think it's error-prone to do this. Can you choose another one? Seems `tmp5` or `tmp6` in `kernel_crc32` are usable for our purpose. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1762408173 From rehn at openjdk.org Tue Sep 17 06:44:05 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 17 Sep 2024 06:44:05 GMT Subject: RFR: 8339771: RISC-V: Reduce icache flushes [v2] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 12:53:18 GMT, Robbin Ehn wrote: >> Hey, please consider, >> >> All code which is offline (behind a barrier) do not need global icache flushes. >> As we can instead in slow path locally (thread and hart) emit fence.i. >> But if we were to be context switch do a hart which have not had fence.i emitted we can still fetch stale instructions. >> To handle this case new now have kernel support: >> https://docs.kernel.org/arch/riscv/cmodx.html >> >> It's not perfect as we will be emitting fence.i on any context switch for any thread with this patch, even if that thread do not execute on code heap (non attached native thread), and even if there was no changes to code heap. >> But this is in many cases much faster as the icache flush global IPI is very intrusive. >> Particular cases are running a concurrent gc with small head room. >> In such scenario I measured 15% increased throughput on VF2. >> A large CPU or less head room (faster GC cycles) will yield even more performance boost. >> >> Note that this requires 6.10 kernel. >> >> I'm running VF2 with 6.11-rc3 kernel and this passed tier1-3. (With setting on) >> >> Later we probably want this default on, but as it's hard to test I'll leave default off. > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Comment, moved init after feature enabling default -XX:-UseCtxFencei -XX:+UseCtxFencei fop 2830 2731 3.63% h2 25361 25403 -0.17% jython 32482 32006 1.49% luindex 4231 4369 -3.16% lusearch 6142 5867 4.69% lusearch-fix 6337 6243 1.51% pmd 9171 8970 2.24% 1.46% zgc xmx512m fop 2583 2376 8.71% h2 34520 33518 2.99% jython 36494 35243 3.55% luindex 4615 4497 2.62% lusearch 6705 6732 -0.40% lusearch-fix 6827 6485 5.27% pmd 8385 8128 3.16% 3.70% zgc xmx128m fop 3336 2718 22.74% jython 44385 35957 23.44% luindex 4552 4460 2.06% lusearch 9604 7272 32.07% lusearch-fix 10095 7489 34.80% pmd 11238 9169 22.57% 22.95% ------------- PR Comment: https://git.openjdk.org/jdk/pull/20913#issuecomment-2354669265 From jbhateja at openjdk.org Tue Sep 17 07:10:54 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 07:10:54 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v11] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with two additional commits since the last revision: - Jcheck clearance - Review comments resolution. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/7c80bfce..29530047 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=09-10 Stats: 402 lines in 41 files changed: 98 ins; 98 del; 206 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From jbhateja at openjdk.org Tue Sep 17 07:10:54 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 07:10:54 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: <0Ak63KM4JYdbOT33Q8XPcv096MKzjY6vyl4xZkapZwM=.6b92e423-9bd5-4109-a0b3-75dbeaf40315@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> <0Ak63KM4JYdbOT33Q8XPcv096MKzjY6vyl4xZkapZwM=.6b92e423-9bd5-4109-a0b3-75dbeaf40315@github.com> Message-ID: On Mon, 16 Sep 2024 07:45:51 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Disabling VectorLoadShuffle bypassing optimization to comply with rearrange semantics at IR level. > > src/hotspot/share/opto/vectornode.cpp line 2122: > >> 2120: // index format by subsequent VectorLoadShuffle. >> 2121: int cast_vopc = VectorCastNode::opcode(0, index_elem_bt, true); >> 2122: Node* index_byte_vec = phase->transform(VectorCastNode::make(cast_vopc, index_vec, T_BYTE, num_elem)); > > This cast assumes that the indices cannot have more than 8 bits. This would allow vector lengths of up to 256. This is fine for intel. But as far as I know ARM has in principle longer vectors - up to 2048 bytes. Should we maybe add some assert here to make sure we never badly truncate the index? Shuffle overall is on our todo list, its a know limitation which we tried lifting once, yes you read it correctly, its a limitation for AARCH64 SVE once a 2048 bits vector systems are available, IIRC current max vector size on any available AARCH64 system is 256 bits, with Neoverse V2 they shrink the vector size back to 16 bytes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1762504446 From jbhateja at openjdk.org Tue Sep 17 07:10:54 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 07:10:54 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: <8QUaed-UNR5ura5MXAeccEXQgaSOUaM_JCHvrUUeCVE=.d895b3db-6e3c-4351-9147-81eb303536f9@github.com> On Mon, 16 Sep 2024 07:27:44 GMT, Emanuel Peter wrote: >> Please at least add a comment why you are not following my suggestion. I feel like the work I put in to review is not being respected when comments are just silently resolved without any action or comment. > > I really do think that `as_ConI()` would be the right thing here. In product it is just a cast, and in debug at least we have an assert. DONE **It just got overlooked @eme64, we respect reviewer suggestions and value the time you invest in polishing our patches, thanks again :-)** ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1762504618 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1762504671 From jbhateja at openjdk.org Tue Sep 17 07:10:55 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 07:10:55 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> Message-ID: On Mon, 16 Sep 2024 18:35:42 GMT, Paul Sandoz wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Disabling VectorLoadShuffle bypassing optimization to comply with rearrange semantics at IR level. > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 561: > >> 559: for (int i = 0; i < vlen; i++) { >> 560: int index = ((int)vecPayload1[i]); >> 561: res[i] = index >= vlen ? vecPayload3[index & (vlen - 1)] : vecPayload2[index]; > > This is incorrect as the index could be negative. You need to wrap in the range `[0, 2 * vlen - 1]` before the comparison and selection. > > int index = ((int)vecPayload1[i]) & ((vlen << 1) - 1)); > res[i] = index < vlen ? vecPayload2[index] : vecPayload3[index - vlen]; Hi @PaulSandoz , we already pass wrapped indexes to this helper routine called from fallback implementation. > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2974: > >> 2972: final $abstractvectortype$ selectFromTemplate(Class> indexVecClass, >> 2973: $abstractvectortype$ v1, $abstractvectortype$ v2) { >> 2974: int twoVectorLen = length() * 2; > > We should assert that the length is a power of two. API only accepts vector parameters and there is no means though public facing API to create a vector of NPOT sizes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1762504366 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1762504318 From jbhateja at openjdk.org Tue Sep 17 07:10:55 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 07:10:55 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Fri, 13 Sep 2024 14:43:42 GMT, Emanuel Peter wrote: >> Original API did throw IndexOutOfBoundsException, but later on we have moved away from exception throwing semantics to wrapping semantics. >> Please find details at following comment >> https://github.com/openjdk/jdk/pull/20508#issuecomment-2306344606 > > And do we test that the wrapping works correctly? VectorAPI Jtreg framework is based on testNG, our custom data providers associated with various test methods ensure to generates range of values which are beyond valid index range, this should check the wrapping scenarios. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1762504894 From jbhateja at openjdk.org Tue Sep 17 07:10:55 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 07:10:55 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: <_U2DgK6DAW3ZJhozsMhwHzggUFpj5fnHdLJOoYFcNJA=.1875811f-458f-4834-bb94-339a8ff7360d@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <_U2DgK6DAW3ZJhozsMhwHzggUFpj5fnHdLJOoYFcNJA=.1875811f-458f-4834-bb94-339a8ff7360d@github.com> Message-ID: On Mon, 16 Sep 2024 07:40:33 GMT, Emanuel Peter wrote: >> Patch includes tests for all the species (combination of vector type and sizes), each vector kernel is validated against equivalent scalar implementation, scenario which you are referring is implicitly handled though tests. > > Ok, just so that I can relax, can you please point me to this test that would implicitly verify that the backend has chosen the correct vector size? Each test method validates the intrinsic code against equivalent scalar implementation, it should catch if backend emits instruction with incorrect vector size. https://github.com/openjdk/jdk/pull/20508/files#diff-95c582657bf90bef3530e67cb143865d070fd2e8e4538849e3dce6061b0d5f2dR4863 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1762504831 From jbhateja at openjdk.org Tue Sep 17 07:14:57 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 07:14:57 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v13] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Missed code fragment from last review comment resolution. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/ec7c7553..a6f8ee8b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=11-12 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From rehn at openjdk.org Tue Sep 17 07:15:06 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 17 Sep 2024 07:15:06 GMT Subject: RFR: 8339771: RISC-V: Reduce icache flushes [v2] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 12:53:18 GMT, Robbin Ehn wrote: >> Hey, please consider, >> >> All code which is offline (behind a barrier) do not need global icache flushes. >> As we can instead in slow path locally (thread and hart) emit fence.i. >> But if we were to be context switch do a hart which have not had fence.i emitted we can still fetch stale instructions. >> To handle this case new now have kernel support: >> https://docs.kernel.org/arch/riscv/cmodx.html >> >> It's not perfect as we will be emitting fence.i on any context switch for any thread with this patch, even if that thread do not execute on code heap (non attached native thread), and even if there was no changes to code heap. >> But this is in many cases much faster as the icache flush global IPI is very intrusive. >> Particular cases are running a concurrent gc with small head room. >> In such scenario I measured 15% increased throughput on VF2. >> A large CPU or less head room (faster GC cycles) will yield even more performance boost. >> >> Note that this requires 6.10 kernel. >> >> I'm running VF2 with 6.11-rc3 kernel and this passed tier1-3. (With setting on) >> >> Later we probably want this default on, but as it's hard to test I'll leave default off. > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Comment, moved init after feature enabling We have two major categories of cmodx.: 1. Exectioner is not aware of any changes. 1.1 Writer can icache_flush_all *or* executioner can *always* emit fence.i AND UseCtxFencei. If the code is not changed often the inline fence.i is just costly and icache_flush_all is better. 1.2. We can enhance this by signaling a thread handshake which will force all threads to emit a fence.i. Unclear if this is worth, because under some situations it can take a while to flush that out. 2. Exectioner is *aware* of any changes made: 2.1 After safepoint. No need to do *icache_flush_all* in a safepoint. Just emit fence.i when leaving + UseCtxFencei. 2.2 Nmethod entry barrier, same here. The patch you are refering to was dealing with 1, which we shouldn't IMHO. I'm dealing with 2, in a safepoint any code changes do not require *icache_flush_all* and all entries into nmethod is handle throught the barrier. Changes to nmethod do not need *icache_flush_all* of barrier used. 1: When we updates oops in code stream during safepoint or with nmethod barrier locked in Relocation::pd_set_data_value we do not need to flush. 2: When ZGC updates the color in code stream with nmethod barrier lock we do not need to flush. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20913#issuecomment-2354722731 From mli at openjdk.org Tue Sep 17 07:18:54 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 17 Sep 2024 07:18:54 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v9] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > Thanks. > > This improvement is based on java.base/share/native/libzip/zlib/zcrc32.c, I made some modification to N (to 16) related code, then re-generate the tables needed, finally vectorize the code (original implementation in zcrc32.c is just scalar code). > > ## Test > test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java, > test/jdk/java/util/zip/TestCRC32.java > > ## Performance > > ###?on bananapi > > with patch > > Benchmark | (count) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- > TestCRC32.testCRC32Update | 64 | avgt | 10 | 222.297 | 0.106 | ns/op > TestCRC32.testCRC32Update | 128 | avgt | 10 | 365.144 | 0.196 | ns/op > TestCRC32.testCRC32Update | 256 | avgt | 10 | 687.14 | 0.235 | ns/op > TestCRC32.testCRC32Update | 512 | avgt | 10 | 1043.833 | 0.083 | ns/op > TestCRC32.testCRC32Update | 2048 | avgt | 10 | 3299.928 | 1.361 | ns/op > TestCRC32.testCRC32Update | 16384 | avgt | 10 | 24384.502 | 25.298 | ns/op > TestCRC32.testCRC32Update | 65536 | avgt | 10 | 103200.458 | 8.297 | ns/op > > > > without patch > > Benchmark | (count) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- > TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.878 | 0.02 | ns/op > TestCRC32.testCRC32Update | 128 | avgt | 10 | 364.173 | 0.032 | ns/op > TestCRC32.testCRC32Update | 256 | avgt | 10 | 685.815 | 0.055 | ns/op > TestCRC32.testCRC32Update | 512 | avgt | 10 | 1329.049 | 0.084 | ns/op > TestCRC32.testCRC32Update | 2048 | avgt | 10 | 5189.302 | 0.666 | ns/op > TestCRC32.testCRC32Update | 16384 | avgt | 10 | 41250.873 | 23.882 | ns/op > TestCRC32.testCRC32Update | 65536 | avgt | 10 | 171664.002 | 15.011 | ns/op > > > > ### on K230 > > with patch > > I get this error message: > > # Internal Error (src/hotspot/share/asm/register.hpp:287), pid=42568, tid=9731 > # assert(!regs[i]->is_valid() || regs[i] != regs[j]) failed: Multiple uses of register: c_rarg0 > > The indicated file and line number refers to the `assert_different_registers` implementation and not the offending call site. More over, it's unclear from the assert which of the four variables contain the same register. > > I'd like to propose a few changes: > 1) That we report the indices of the conflicting registers > 2) That we report the correct file and line number > 3) That we hide the is_valid() check to lower the noise in the output. Not strictly necessary, but I think it looks nicer. > > After these suggestions we'll get error messages that look like this: > > # Internal Error (src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp:1187), pid=59065, tid=8963 > # assert(regs[i] != regs[j]) failed: regs[1] and regs[3] are both: c_rarg0 > > Which makes it easy to see that variables 1 and 3 are conflicting and by looking at the indicated file and line, it is clear that it is `intf_klass` and `joker` that are the offending variables. > > There might be a way to use more macros to propagate the variable names, but I propose that we start with this incremental improvement. Nice improvement! ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20965#pullrequestreview-2308648340 From dlunden at openjdk.org Tue Sep 17 08:13:15 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 17 Sep 2024 08:13:15 GMT Subject: RFR: 8318480: Obsolete UseCounterDecay and remove CounterDecayMinIntervalLength [v4] In-Reply-To: References: <50YDKFPHpqCEnhBk5eBeKWpbTJIHfFpQCfOcdVE8OhE=.75b95951-2c37-4f48-9a0d-fd52251f5771@github.com> Message-ID: On Mon, 16 Sep 2024 23:47:16 GMT, Y. Srinivas Ramakrishna wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Obsolete UseCounterDecay > > src/hotspot/share/runtime/globals.hpp line 1225: > >> 1223: develop(intx, CounterHalfLifeTime, 30, \ >> 1224: "Half-life time of invocation counters (in seconds)") \ >> 1225: \ > > Thanks for making these changes. However, it seems like the obsolete `CounterHalfLifeTime` was missed in this cleanup? @ysramakrishna Thanks and yes, it should also be removed. I'll make a new issue and PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16673#discussion_r1762605720 From tschatzl at openjdk.org Tue Sep 17 08:14:10 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 17 Sep 2024 08:14:10 GMT Subject: RFR: 8340119: Remove oopDesc::size_might_change() In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 12:44:47 GMT, Stefan Karlsson wrote: >> Hi all, >> >> please review this change that removes `oopDesc::size_might_change()` because since JDK-8337709 and JDK-8311163 no collector uses the objArray's length field during garbage collection any more. >> >> Testing: tier1-3 >> >> Thanks, >> Thomas > > Marked as reviewed by stefank (Reviewer). Thanks @stefank @walulyai for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/20999#issuecomment-2354838860 From tschatzl at openjdk.org Tue Sep 17 08:14:11 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 17 Sep 2024 08:14:11 GMT Subject: Integrated: 8340119: Remove oopDesc::size_might_change() In-Reply-To: References: Message-ID: <_36ECtpdSwVJvoCEy0ePdVgf3zS1AHpWfRWiV03P9Dw=.97815a56-b684-46bb-8602-b77fffe6b29c@github.com> On Fri, 13 Sep 2024 14:12:58 GMT, Thomas Schatzl wrote: > Hi all, > > please review this change that removes `oopDesc::size_might_change()` because since JDK-8337709 and JDK-8311163 no collector uses the objArray's length field during garbage collection any more. > > Testing: tier1-3 > > Thanks, > Thomas This pull request has now been integrated. Changeset: 7834662c Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/7834662ca35aeb202d177fde1044add611240ecd Stats: 14 lines in 3 files changed: 0 ins; 13 del; 1 mod 8340119: Remove oopDesc::size_might_change() Reviewed-by: stefank, iwalulya ------------- PR: https://git.openjdk.org/jdk/pull/20999 From dholmes at openjdk.org Tue Sep 17 08:48:07 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 17 Sep 2024 08:48:07 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v4] In-Reply-To: References: Message-ID: <2Kt-rS4x-EV54mfNP2HxKulPfhEceohp_YzY4H23RfM=.1a6867cc-038b-4212-ae17-8fc0a84e5ce1@github.com> On Wed, 11 Sep 2024 21:02:41 GMT, Matias Saavedra Silva wrote: >> This patch cleans up the use of `get_new_method()` so callers don't have to worry about throwing `NoSuchMethodError`. The method is refactored to throw the error and avoid ever returning nullptr. Verified with tier1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Coleen suggestion Note I was responding to Coleen's suggested change to add a fatal() which would be a change in behaviour - though it is unclear to me if the fatal() would replace a crash. It just seems that it is unclear when a caller can never encounter a null or deleted method and so is guaranteed not to get the NMSE throwing "method". IIUC correctly with the current code the caller decides what is possible and employs the NMSE as needed. But the new code internalises the NMSE and then you have to reason that some callers (who appear not to be prepared for it) will in fact never get it. That doesn't seem to be an overall improvement to me - sorry. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20874#issuecomment-2354913061 From jbhateja at openjdk.org Tue Sep 17 08:49:23 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 08:49:23 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v5] In-Reply-To: References: Message-ID: > - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The > condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. > - This saves emitting an explicit MOVZX instruction after setCC. > - These new instructions are encoded using 4 byte Extended EVEX encoding. > > Validation performed over stand alone test point using Intel SDE. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolution. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20920/files - new: https://git.openjdk.org/jdk/pull/20920/files/c1c42d38..dc37dea6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20920&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20920&range=03-04 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20920/head:pull/20920 PR: https://git.openjdk.org/jdk/pull/20920 From jbhateja at openjdk.org Tue Sep 17 08:49:24 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 08:49:24 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v4] In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 20:37:27 GMT, Jatin Bhateja wrote: >> - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The >> condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. >> - This saves emitting an explicit MOVZX instruction after setCC. >> - These new instructions are encoded using 4 byte Extended EVEX encoding. >> >> Validation performed over stand alone test point using Intel SDE. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution. src/hotspot/cpu/x86/assembler_x86.cpp line 16052: > 16050: > 16051: // Encoding Format : eevex_prefix | opcode_cc | modrm > 16052: int encode = vex_prefix_and_encode(dst->encoding(), 0, 0, VEX_SIMD_F2, /* MAP4 */VEX_OPCODE_0F_3C, &attributes); Suggestion: int encode = vex_prefix_and_encode(0, 0, dst->encoding(), VEX_SIMD_F2, /* MAP4 */VEX_OPCODE_0F_3C, &attributes); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20920#discussion_r1762668007 From dlunden at openjdk.org Tue Sep 17 09:07:15 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 17 Sep 2024 09:07:15 GMT Subject: RFR: 8318480: Obsolete UseCounterDecay and remove CounterDecayMinIntervalLength [v4] In-Reply-To: References: <50YDKFPHpqCEnhBk5eBeKWpbTJIHfFpQCfOcdVE8OhE=.75b95951-2c37-4f48-9a0d-fd52251f5771@github.com> Message-ID: On Tue, 17 Sep 2024 08:10:43 GMT, Daniel Lund?n wrote: >> src/hotspot/share/runtime/globals.hpp line 1225: >> >>> 1223: develop(intx, CounterHalfLifeTime, 30, \ >>> 1224: "Half-life time of invocation counters (in seconds)") \ >>> 1225: \ >> >> Thanks for making these changes. However, it seems like the obsolete `CounterHalfLifeTime` was missed in this cleanup? > > @ysramakrishna Thanks and yes, it should also be removed. I'll make a new issue and PR. For reference: https://github.com/openjdk/jdk/pull/21034 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16673#discussion_r1762713911 From stefank at openjdk.org Tue Sep 17 09:22:10 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 17 Sep 2024 09:22:10 GMT Subject: RFR: 8340009: Improve the output from assert_different_registers In-Reply-To: <_lgzZBTM5wRKfgsMXdgbdR45SxD6WJZprnJBw6Gk1SU=.78dd43ee-4984-4037-b828-5fd8fb1ce791@github.com> References: <_lgzZBTM5wRKfgsMXdgbdR45SxD6WJZprnJBw6Gk1SU=.78dd43ee-4984-4037-b828-5fd8fb1ce791@github.com> Message-ID: On Thu, 12 Sep 2024 12:56:13 GMT, Stefan Karlsson wrote: > `assert_different_registers` is a mechanism we use to ensure that we don't use the same register in different variables. When the assert triggers it is not immediately clear where and why the assert failed. > > For example, if I introduce an intentional violation: > > diff --git a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp > index fde868a64b3..551878ac09d 100644 > --- a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp > +++ b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp > @@ -1188,7 +1188,8 @@ void MacroAssembler::lookup_interface_method(Register recv_klass, > Register scan_temp, > Label& L_no_such_interface, > bool return_method) { > - assert_different_registers(recv_klass, intf_klass, scan_temp); > + Register joker = intf_klass; > + assert_different_registers(recv_klass, intf_klass, scan_temp, joker); > assert_different_registers(method_result, intf_klass, scan_temp); > assert(recv_klass != method_result || !return_method, > "recv_klass can be destroyed when method isn't needed"); > > I get this error message: > > # Internal Error (src/hotspot/share/asm/register.hpp:287), pid=42568, tid=9731 > # assert(!regs[i]->is_valid() || regs[i] != regs[j]) failed: Multiple uses of register: c_rarg0 > > The indicated file and line number refers to the `assert_different_registers` implementation and not the offending call site. More over, it's unclear from the assert which of the four variables contain the same register. > > I'd like to propose a few changes: > 1) That we report the indices of the conflicting registers > 2) That we report the correct file and line number > 3) That we hide the is_valid() check to lower the noise in the output. Not strictly necessary, but I think it looks nicer. > > After these suggestions we'll get error messages that look like this: > > # Internal Error (src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp:1187), pid=59065, tid=8963 > # assert(regs[i] != regs[j]) failed: regs[1] and regs[3] are both: c_rarg0 > > Which makes it easy to see that variables 1 and 3 are conflicting and by looking at the indicated file and line, it is clear that it is `intf_klass` and `joker` that are the offending variables. > > There might be a way to use more macros to propagate the variable names, but I propose that we start with this incremental improvement. Thanks for all the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20965#issuecomment-2354994225 From stefank at openjdk.org Tue Sep 17 09:22:10 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 17 Sep 2024 09:22:10 GMT Subject: Integrated: 8340009: Improve the output from assert_different_registers In-Reply-To: <_lgzZBTM5wRKfgsMXdgbdR45SxD6WJZprnJBw6Gk1SU=.78dd43ee-4984-4037-b828-5fd8fb1ce791@github.com> References: <_lgzZBTM5wRKfgsMXdgbdR45SxD6WJZprnJBw6Gk1SU=.78dd43ee-4984-4037-b828-5fd8fb1ce791@github.com> Message-ID: <_ZBntVyzK8YRrUYE6H55C89aNsyeBeIXwzx_8u_ggv4=.19f18e36-e9a9-4083-ba68-13bd08616da1@github.com> On Thu, 12 Sep 2024 12:56:13 GMT, Stefan Karlsson wrote: > `assert_different_registers` is a mechanism we use to ensure that we don't use the same register in different variables. When the assert triggers it is not immediately clear where and why the assert failed. > > For example, if I introduce an intentional violation: > > diff --git a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp > index fde868a64b3..551878ac09d 100644 > --- a/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp > +++ b/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp > @@ -1188,7 +1188,8 @@ void MacroAssembler::lookup_interface_method(Register recv_klass, > Register scan_temp, > Label& L_no_such_interface, > bool return_method) { > - assert_different_registers(recv_klass, intf_klass, scan_temp); > + Register joker = intf_klass; > + assert_different_registers(recv_klass, intf_klass, scan_temp, joker); > assert_different_registers(method_result, intf_klass, scan_temp); > assert(recv_klass != method_result || !return_method, > "recv_klass can be destroyed when method isn't needed"); > > I get this error message: > > # Internal Error (src/hotspot/share/asm/register.hpp:287), pid=42568, tid=9731 > # assert(!regs[i]->is_valid() || regs[i] != regs[j]) failed: Multiple uses of register: c_rarg0 > > The indicated file and line number refers to the `assert_different_registers` implementation and not the offending call site. More over, it's unclear from the assert which of the four variables contain the same register. > > I'd like to propose a few changes: > 1) That we report the indices of the conflicting registers > 2) That we report the correct file and line number > 3) That we hide the is_valid() check to lower the noise in the output. Not strictly necessary, but I think it looks nicer. > > After these suggestions we'll get error messages that look like this: > > # Internal Error (src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp:1187), pid=59065, tid=8963 > # assert(regs[i] != regs[j]) failed: regs[1] and regs[3] are both: c_rarg0 > > Which makes it easy to see that variables 1 and 3 are conflicting and by looking at the indicated file and line, it is clear that it is `intf_klass` and `joker` that are the offending variables. > > There might be a way to use more macros to propagate the variable names, but I propose that we start with this incremental improvement. This pull request has now been integrated. Changeset: c6721a0f Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/c6721a0fa2582c3ddf1ef0a6e16a09234432939c Stats: 16 lines in 2 files changed: 7 ins; 0 del; 9 mod 8340009: Improve the output from assert_different_registers Reviewed-by: aboldtch, dholmes, shade, mli ------------- PR: https://git.openjdk.org/jdk/pull/20965 From rkennke at openjdk.org Tue Sep 17 09:35:02 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 17 Sep 2024 09:35:02 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v18] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 57 commits: - fix CompressedClassPointersEncodingScheme yet again for linux aarch64 - Fixes post-8340184 - Merge upstream up to and including 8340184 - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4 - Fix test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java - Fix loop on aarch64 - clarify obscure assert in metasapce setup - Rework compressedklass encoding - remove stray debug output - Fixes post 8338526 - ... and 47 more: https://git.openjdk.org/jdk/compare/7849f252...28a26aed ------------- Changes: https://git.openjdk.org/jdk/pull/20677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=17 Stats: 4518 lines in 190 files changed: 3180 ins; 718 del; 620 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From thartmann at openjdk.org Tue Sep 17 09:55:12 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 17 Sep 2024 09:55:12 GMT Subject: RFR: 8340012: [C2] assert(KlassEncodingMetaspaceMax > pd) failed: change encoding max if new encoding after 8338526 [v2] In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 15:43:20 GMT, Martin Doerr wrote: >> After JDK-8338526, the transformation can only be done if the Klass* is in the encoding range (also see JBS issue for more details). I've also enhanced the assertion message. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add sanity check. This caused a regression: [JDK-8340230](https://bugs.openjdk.org/browse/JDK-8340230) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20971#issuecomment-2355109537 From mdoerr at openjdk.org Tue Sep 17 10:01:10 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 17 Sep 2024 10:01:10 GMT Subject: RFR: 8340012: [C2] assert(KlassEncodingMetaspaceMax > pd) failed: change encoding max if new encoding after 8338526 [v2] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 09:52:13 GMT, Tobias Hartmann wrote: > This caused a regression: [JDK-8340230](https://bugs.openjdk.org/browse/JDK-8340230) Thanks! Seems like this uncovered a pre-existing bug. I'll take a look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20971#issuecomment-2355129447 From stuefe at openjdk.org Tue Sep 17 10:02:17 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 17 Sep 2024 10:02:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 12:25:37 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix FullGCForwarding initialization > > src/hotspot/share/memory/classLoaderMetaspace.hpp line 81: > >> 79: metaspace::MetaspaceArena* class_space_arena() const { return _class_space_arena; } >> 80: >> 81: bool have_class_space_arena() const { return _class_space_arena != nullptr; } > > This is unnecessary. Instead of having this and having to remember to check for nullness each time, just change the `_class_space_arena` to point to the same arena as the `_non_class_space_arena` does when we run with `-XX:-UseCompressedClassPointers` I'd prefer not to. This logic (when -UCCP class space arena is NULL, with the implicit assumption that both are different entities) has been in there forever, and changing that is out of scope for and unrelated to this PR. I am not sure what will break if I change this but don't want to chase risk test errors at this point (one example, reporting would have to be adapted to recognize that both arenas are the same, and there are plenty of tests that would also need to be fixd). This can be done in a follow-up RFE if necessary. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762917467 From stuefe at openjdk.org Tue Sep 17 10:05:20 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 17 Sep 2024 10:05:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 13:05:10 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix FullGCForwarding initialization > > src/hotspot/share/memory/classLoaderMetaspace.cpp line 165: > >> 163: MetaBlock bl(ptr, word_size); >> 164: // If the block would be reusable for a Klass, add to class arena, otherwise to >> 165: // then non-class arena. > > Nit: spelling, "the" Okay ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762928041 From stuefe at openjdk.org Tue Sep 17 10:16:24 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 17 Sep 2024 10:16:24 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 13:50:59 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix FullGCForwarding initialization > > src/hotspot/share/memory/metaspace.cpp line 656: > >> 654: // Adjust size of the compressed class space. >> 655: >> 656: const size_t res_align = reserve_alignment(); > > Can you change the name to `root_chunk_size`? It feels wrong, since this is a deeply hidden implementation detail.\ I will remove this temporary variable, which will also make the diff smaller. > src/hotspot/share/memory/metaspace.hpp line 112: > >> 110: static size_t max_allocation_word_size(); >> 111: >> 112: // Minimum allocation alignment, in bytes. All MetaData shall be aligned correclty > > Nit: Spelling, "correctly" Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762968742 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762972938 From stuefe at openjdk.org Tue Sep 17 10:23:19 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 17 Sep 2024 10:23:19 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 11:25:56 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix FullGCForwarding initialization > > src/hotspot/share/memory/metaspace/metablock.hpp line 48: > >> 46: >> 47: MetaWord* base() const { return _base; } >> 48: const MetaWord* end() const { return _base + _word_size; } > > `assert(is_nonempty())` Raises the question of why here and not in other accessors? Note that the only patch via which end() is called already asserts for non-empty-ness (MetaspaceArena::contains). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762985723 From jsjolen at openjdk.org Tue Sep 17 10:31:19 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 17 Sep 2024 10:31:19 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 09:59:49 GMT, Thomas Stuefe wrote: >> src/hotspot/share/memory/classLoaderMetaspace.hpp line 81: >> >>> 79: metaspace::MetaspaceArena* class_space_arena() const { return _class_space_arena; } >>> 80: >>> 81: bool have_class_space_arena() const { return _class_space_arena != nullptr; } >> >> This is unnecessary. Instead of having this and having to remember to check for nullness each time, just change the `_class_space_arena` to point to the same arena as the `_non_class_space_arena` does when we run with `-XX:-UseCompressedClassPointers` > > I'd prefer not to. > > This logic (when -UCCP class space arena is NULL, with the implicit assumption that both are different entities) has been in there forever, and changing that is out of scope for and unrelated to this PR. I am not sure what will break if I change this but don't want to risk test errors at this point (one example, reporting would have to be adapted to recognize that both arenas are the same, and there are plenty of tests that would also need to be fixd). > > This can be done in a follow-up RFE if necessary. OK, that's fine. >> src/hotspot/share/memory/metaspace.cpp line 656: >> >>> 654: // Adjust size of the compressed class space. >>> 655: >>> 656: const size_t res_align = reserve_alignment(); >> >> Can you change the name to `root_chunk_size`? > > It feels wrong, since this is a deeply hidden implementation detail.\ > > I will remove this temporary variable, which will also make the diff smaller. Sounds OK, I wanted the name change to indicate that "hey, deep impl detail where we use this to mean something else". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762993568 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762994772 From stuefe at openjdk.org Tue Sep 17 10:31:20 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 17 Sep 2024 10:31:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: <32_SIVHDWyZyYSvbV1jUHc631MTKUP2Thh_M9Q71jrc=.351aed23-599d-4a53-9cc0-0e9c85ecdf03@github.com> On Wed, 11 Sep 2024 11:29:38 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix FullGCForwarding initialization > > src/hotspot/share/memory/metaspace/metablock.hpp line 52: > >> 50: bool is_empty() const { return _base == nullptr; } >> 51: bool is_nonempty() const { return _base != nullptr; } >> 52: void reset() { _base = nullptr; _word_size = 0; } > > Is this function really necessary? According to my IDE it's only used in tests and even then the `MetaBlock` isn't used afterwards (so it has no effect). see test_clms.cpp, test_random function, used in two places there. > src/hotspot/share/memory/metaspace/metaspaceArena.hpp line 84: > >> 82: // between threads and needs to be synchronized in CLMS. >> 83: >> 84: const size_t _allocation_alignment_words; > > Nit: Document this? All other members are documented. ok ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762993378 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762995731 From stuefe at openjdk.org Tue Sep 17 10:31:23 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 17 Sep 2024 10:31:23 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v18] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 11:40:24 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 57 commits: >> >> - fix CompressedClassPointersEncodingScheme yet again for linux aarch64 >> - Fixes post-8340184 >> - Merge upstream up to and including 8340184 >> - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4 >> - Fix test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java >> - Fix loop on aarch64 >> - clarify obscure assert in metasapce setup >> - Rework compressedklass encoding >> - remove stray debug output >> - Fixes post 8338526 >> - ... and 47 more: https://git.openjdk.org/jdk/compare/7849f252...28a26aed > > src/hotspot/share/memory/metaspace/metaspaceArena.hpp line 44: > >> 42: class FreeBlocks; >> 43: >> 44: struct ArenaStats; > > Nit: Sort? ok ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1762994972 From jsjolen at openjdk.org Tue Sep 17 10:47:25 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 17 Sep 2024 10:47:25 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v18] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 09:35:02 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 57 commits: > > - fix CompressedClassPointersEncodingScheme yet again for linux aarch64 > - Fixes post-8340184 > - Merge upstream up to and including 8340184 > - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4 > - Fix test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java > - Fix loop on aarch64 > - clarify obscure assert in metasapce setup > - Rework compressedklass encoding > - remove stray debug output > - Fixes post 8338526 > - ... and 47 more: https://git.openjdk.org/jdk/compare/7849f252...28a26aed Hi, We've gone through the rest of the Metaspace code and looked at the tests. It looks OK to us. Would like to see some style cleanups in the tests, but that can wait as a follow up. test/hotspot/gtest/metaspace/test_clms.cpp line 193: > 191: > 192: { > 193: // Nonclass arena allocation. The style in this source file isn't really up to scratch, especially *these* lines. Anyway, it's in the tests, so I'm OK with this being fixed in a follow up RFE. ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2309360771 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1763005291 From stooke at openjdk.org Tue Sep 17 10:57:43 2024 From: stooke at openjdk.org (Simon Tooke) Date: Tue, 17 Sep 2024 10:57:43 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v6] In-Reply-To: References: Message-ID: > This PR changes the status of realpath() from a Posix-specific API to a globally available API, i.e. adding it to the "Hotspot Porting API". Code would refer to os::realpath() instead of os::Posix::realpath(). > > This requires the addition of a stub routine in os_posix.cpp and a Windows implementation of realpath(), using Windows _fullpath(). > > This PR depends on #20597 in that it removes the need for one #ifdef in that PR. Because of that, this PR will be modified when and if #20597 is integrated (or vice-versa) > > Please note that guidelines for doing this appear in src/hotspot/share/runtime/os.hpp Simon Tooke has updated the pull request incrementally with one additional commit since the last revision: added gtest for realpath ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20683/files - new: https://git.openjdk.org/jdk/pull/20683/files/33c4b402..20f697a6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20683&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20683&range=04-05 Stats: 44 lines in 1 file changed: 44 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20683.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20683/head:pull/20683 PR: https://git.openjdk.org/jdk/pull/20683 From amitkumar at openjdk.org Tue Sep 17 11:01:40 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 17 Sep 2024 11:01:40 GMT Subject: RFR: 8339416: [s390x] Provide implementation for resolve_global_jobject [v3] In-Reply-To: References: Message-ID: > This PR provides "resolve_global_jobject" method implementation for s390x-port. > > Testing: > * Tier1 test with Fastdebug; > * Added these changes on top of https://github.com/openjdk/jdk/pull/20479 and modified the call in the stubGenerator_s390.cpp file; > * 1. Ran tier1 test with a call to "resolve_jobect" > * 2. Ran tier1 test with a call to "resolve_global_jobject" > > I didn't see any new failure appearing there. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: implements ModRefBarrierSetAssembler::resolve_jobject ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20986/files - new: https://git.openjdk.org/jdk/pull/20986/files/95a832e4..25d9e9bc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20986&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20986&range=01-02 Stats: 21 lines in 3 files changed: 16 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20986.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20986/head:pull/20986 PR: https://git.openjdk.org/jdk/pull/20986 From mdoerr at openjdk.org Tue Sep 17 11:01:40 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 17 Sep 2024 11:01:40 GMT Subject: RFR: 8339416: [s390x] Provide implementation for resolve_global_jobject [v3] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 10:58:23 GMT, Amit Kumar wrote: >> This PR provides "resolve_global_jobject" method implementation for s390x-port. >> >> Testing: >> * Tier1 test with Fastdebug; >> * Added these changes on top of https://github.com/openjdk/jdk/pull/20479 and modified the call in the stubGenerator_s390.cpp file; >> * 1. Ran tier1 test with a call to "resolve_jobect" >> * 2. Ran tier1 test with a call to "resolve_global_jobject" >> >> I didn't see any new failure appearing there. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > implements ModRefBarrierSetAssembler::resolve_jobject LGTM. Note that the generic implementation of `resolve_jobject` is now unused, but will be used if ZGC or Shenandoah get implemented. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20986#pullrequestreview-2309110989 From lucy at openjdk.org Tue Sep 17 11:01:41 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 17 Sep 2024 11:01:41 GMT Subject: RFR: 8339416: [s390x] Provide implementation for resolve_global_jobject [v3] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 10:58:23 GMT, Amit Kumar wrote: >> This PR provides "resolve_global_jobject" method implementation for s390x-port. >> >> Testing: >> * Tier1 test with Fastdebug; >> * Added these changes on top of https://github.com/openjdk/jdk/pull/20479 and modified the call in the stubGenerator_s390.cpp file; >> * 1. Ran tier1 test with a call to "resolve_jobect" >> * 2. Ran tier1 test with a call to "resolve_global_jobject" >> >> I didn't see any new failure appearing there. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > implements ModRefBarrierSetAssembler::resolve_jobject LGTM. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20986#pullrequestreview-2309391444 From amitkumar at openjdk.org Tue Sep 17 11:01:41 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 17 Sep 2024 11:01:41 GMT Subject: RFR: 8339416: [s390x] Provide implementation for resolve_global_jobject [v2] In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 16:41:27 GMT, Martin Doerr wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> removes extra line > > src/hotspot/cpu/s390/gc/shared/barrierSetAssembler_s390.cpp line 108: > >> 106: } >> 107: >> 108: // Generic implementation. GCs can provide an optimized one. > > You may want to implement an optimized `G1BarrierSetAssembler::resolve_jobject` and `ModRefBarrierSetAssembler::resolve_jobject`. Otherwise, those GCs may get a regression. I have added implementation for `ModRefBarrierSetAssembler::resolve_jobject`. It seems `G1BarrierSetAssembler::resolve_jobject` is already present, do I need to do anything extra than that ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20986#discussion_r1762686675 From mdoerr at openjdk.org Tue Sep 17 11:01:41 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 17 Sep 2024 11:01:41 GMT Subject: RFR: 8339416: [s390x] Provide implementation for resolve_global_jobject [v2] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 08:57:59 GMT, Amit Kumar wrote: >> src/hotspot/cpu/s390/gc/shared/barrierSetAssembler_s390.cpp line 108: >> >>> 106: } >>> 107: >>> 108: // Generic implementation. GCs can provide an optimized one. >> >> You may want to implement an optimized `G1BarrierSetAssembler::resolve_jobject` and `ModRefBarrierSetAssembler::resolve_jobject`. Otherwise, those GCs may get a regression. > > I have added implementation for `ModRefBarrierSetAssembler::resolve_jobject`. It seems `G1BarrierSetAssembler::resolve_jobject` is already present, do I need to do anything extra than that ? I think it's fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20986#discussion_r1762839103 From thartmann at openjdk.org Tue Sep 17 11:07:15 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 17 Sep 2024 11:07:15 GMT Subject: RFR: 8335334: Stress mode to randomly execute unstable if traps Message-ID: Unstable if traps are supposed to be taken rarely. This patch introduces a `StressUnstableIfTraps` flag that forces unstable if traps to be taken randomly and thus potentially triggering intermittent bugs such as incorrect debug information. It works by adding another if before the unstable if that checks a "random" condition at runtime (a simple shared counter) and then either takes the trap or executes the original, unstable if. This stress option also has the nice side effect of triggering re-compilation of methods that would otherwise not be re-compiled (see [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843)). I had to adjust a few tests that rely on methods being compiled / deoptimized because with the stress option enabled, deoptimization might unexpectedly (not) happen. It reliably triggers [JDK-8335977](https://bugs.openjdk.org/browse/JDK-8335977), [JDK-8320308](https://bugs.openjdk.org/browse/JDK-8320308) and [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843) in our testing. Tested with multiple runs up to tier6. I'll integrate it into our (Oracle internal) testing and CTW [(JDK-8340302)](https://bugs.openjdk.org/browse/JDK-8340302) once all the bugs that it currently triggers are fixed. Thanks, Tobias ------------- Commit messages: - Added stress test - Test fix - Randomly skip emitting an uncommon trap - Small fix - Merge branch 'master' into JDK-8335334 - Test fix - Test fix - Small fix - More randomization - Merge branch 'master' into JDK-8335334 - ... and 9 more: https://git.openjdk.org/jdk/compare/e1ebeef0...8af8f423 Changes: https://git.openjdk.org/jdk/pull/21037/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21037&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335334 Stats: 114 lines in 13 files changed: 107 ins; 5 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21037.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21037/head:pull/21037 PR: https://git.openjdk.org/jdk/pull/21037 From thartmann at openjdk.org Tue Sep 17 11:14:04 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 17 Sep 2024 11:14:04 GMT Subject: RFR: 8335334: Stress mode to randomly execute unstable if traps In-Reply-To: References: Message-ID: <_LwEIK5hJsW7GLifRuJW7PXoUup18z4YcCUYZCcri70=.4743b0a5-8992-4605-98ed-2efa9611bfd1@github.com> On Tue, 17 Sep 2024 11:01:27 GMT, Tobias Hartmann wrote: > Unstable if traps are supposed to be taken rarely. This patch introduces a `StressUnstableIfTraps` flag that forces unstable if traps to be taken randomly and thus potentially triggering intermittent bugs such as incorrect debug information. It works by adding another if before the unstable if that checks a "random" condition at runtime (a simple shared counter) and then either takes the trap or executes the original, unstable if. > > This stress option also has the nice side effect of triggering re-compilation of methods that would otherwise not be re-compiled (see [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843)). > > I had to adjust a few tests that rely on methods being compiled / deoptimized because with the stress option enabled, deoptimization might unexpectedly (not) happen. > > It reliably triggers [JDK-8335977](https://bugs.openjdk.org/browse/JDK-8335977), [JDK-8320308](https://bugs.openjdk.org/browse/JDK-8320308) and [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843) in our testing. > > Tested with multiple runs up to tier6. I'll integrate it into our (Oracle internal) testing and CTW [(JDK-8340302)](https://bugs.openjdk.org/browse/JDK-8340302) once all the bugs that it currently triggers are fixed. > > Thanks, > Tobias src/hotspot/share/opto/compile.cpp line 722: > 720: } > 721: > 722: if (StressLCM || StressGCM || StressIGVN || StressCCP || We need to initialize the seed earlier because `StressUnstableIfTraps` is already used during parsing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21037#discussion_r1763047875 From fbredberg at openjdk.org Tue Sep 17 12:06:09 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Tue, 17 Sep 2024 12:06:09 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v2] In-Reply-To: References: <7GRqYZv0orxGXAMb7ldeV5skNK7FraE-ovnhwlMDKNg=.0018fbb2-36ed-4870-80f6-6aaf026cd9f2@github.com> Message-ID: On Mon, 16 Sep 2024 15:54:52 GMT, Patricio Chilano Mateo wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update one, after the review > > src/hotspot/share/runtime/objectMonitor.hpp line 360: > >> 358: >> 359: enum class TryLockResult { Interference = -1, HasOwner = 0, Success = 1 }; >> 360: TryLockResult TryLock(JavaThread* current); > > This CamelCase syntax is used for private methods. We should change it to try_lock now that we are calling it from SharedRuntime code. Another alternative is to keep it private and just use the already available try_enter(). That has the benefit of not having to make TryLockResult public either. If we want to skip the checks after TryLock in try_enter we could add a check_owner_already boolean. Good suggestion! I'll go with the "Another alternative...". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1763120938 From mli at openjdk.org Tue Sep 17 12:22:42 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 17 Sep 2024 12:22:42 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v10] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > Thanks. > > This improvement is based on java.base/share/native/libzip/zlib/zcrc32.c, I made some modification to N (to 16) related code, then re-generate the tables needed, finally vectorize the code (original implementation in zcrc32.c is just scalar code). > > ## Test > test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java, > test/jdk/java/util/zip/TestCRC32.java > > ## Performance > > ###?on bananapi > > with patch > > Benchmark | (count) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- > TestCRC32.testCRC32Update | 64 | avgt | 10 | 222.297 | 0.106 | ns/op > TestCRC32.testCRC32Update | 128 | avgt | 10 | 365.144 | 0.196 | ns/op > TestCRC32.testCRC32Update | 256 | avgt | 10 | 687.14 | 0.235 | ns/op > TestCRC32.testCRC32Update | 512 | avgt | 10 | 1043.833 | 0.083 | ns/op > TestCRC32.testCRC32Update | 2048 | avgt | 10 | 3299.928 | 1.361 | ns/op > TestCRC32.testCRC32Update | 16384 | avgt | 10 | 24384.502 | 25.298 | ns/op > TestCRC32.testCRC32Update | 65536 | avgt | 10 | 103200.458 | 8.297 | ns/op > > > > without patch > > Benchmark | (count) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- > TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.878 | 0.02 | ns/op > TestCRC32.testCRC32Update | 128 | avgt | 10 | 364.173 | 0.032 | ns/op > TestCRC32.testCRC32Update | 256 | avgt | 10 | 685.815 | 0.055 | ns/op > TestCRC32.testCRC32Update | 512 | avgt | 10 | 1329.049 | 0.084 | ns/op > TestCRC32.testCRC32Update | 2048 | avgt | 10 | 5189.302 | 0.666 | ns/op > TestCRC32.testCRC32Update | 16384 | avgt | 10 | 41250.873 | 23.882 | ns/op > TestCRC32.testCRC32Update | 65536 | avgt | 10 | 171664.002 | 15.011 | ns/op > > > > ### on K230 > > with patch > >> Benchmark | (count) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- >> TestCRC32.testCRC32Update | 64 | avgt | 10 | 222.297 | 0.106 | ns/op >> TestCRC32.testCRC32Update | 128 | avgt | 10 | 365.144 | 0.196 | ns/op >> TestCRC32.testCRC32Update | 256 | avgt | 10 | 687.14 | 0.235 | ns/op >> TestCRC32.testCRC32Update | 512 | avgt | 10 | 1043.833 | 0.083 | ns/op >> TestCRC32.testCRC32Update | 2048 | avgt | 10 | 3299.928 | 1.361 | ns/op >> TestCRC32.testCRC32Update | 16384 | avgt | 10 | 24384.502 | 25.298 | ns/op >> TestCRC32.testCRC32Update | 65536 | avgt | 10 | 103200.458 | 8.297 | ns/op >> >> >> >> without patch >> >> Benchmark | (count) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- >> TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.878 | 0.02 | ns/op >> TestCRC32.testCRC32Update | 128 | avgt | 10 | 364.173 | 0.032 | ns/op >> TestCRC32.testCRC32Update | 256 | avgt | 10 | 685.815 | 0.055 | ns/op >> TestCRC32.testCRC32Update | 512 | avgt | 10 | 1329.049 | 0.084 | ns/op >> TestCRC32.testCRC32Update | 2048 | avgt | 10 | 5189.302 | 0.666 | ns/op >> TestCRC32.testCRC32Update | 16384 | avgt | 10 | 41250.873 | 23.882 | ns/op >> TestCRC32.testCRC32Update | 65536 | avgt | 10 | 171664.002 | 15.011 | ns/op >> >> > ... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > add assert Generally looks good to me. Thanks. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1505: > 1503: vxor_vv(vword, vword, vcrc); > 1504: > 1505: addi(buf, buf, N*W); The `N*W` here seems a bit strange to me. I don't think the update of `buf` here should depend on `W`, right? So maybe `N * 4` instead? src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1535: > 1533: mv(crc, zr); > 1534: for (int i = 0; i < N; i++) { > 1535: lwu(t1, Address(buf, i*W)); Similar here. The address offset calculation here shouldn't depend on `W`, right? Maybe `i * 4` instead? BTW: Could a vectorized load would help here? Say `vle32_v(vtmp, buf)`. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1550: > 1548: } > 1549: } > 1550: addi(buf, buf, N*W); Similar here. Maybe `N*4`? ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20910#pullrequestreview-2308718752 PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1763167871 PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1763170651 PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1763195471 From ihse at openjdk.org Tue Sep 17 13:01:15 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 17 Sep 2024 13:01:15 GMT Subject: Integrated: 8329816: Add SLEEF version 3.6.1 In-Reply-To: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> References: <0jiUrj5NGvjH0SFZpGfNVI-6IiQSIb_wmlRKdcTv5i8=.cf61b636-e36c-4672-aeeb-227bf509923a@github.com> Message-ID: On Thu, 29 Aug 2024 23:07:16 GMT, Magnus Ihse Bursie wrote: > [JDK-8312425](https://bugs.openjdk.org/browse/JDK-8312425) is looking to optimize vector math operations by leveraging the SLEEF library. For legal reasons the actual contribution of the SLEEF files needs to be handled separately. > > This is a new attempt at solving [JDK-8329816](https://bugs.openjdk.org/browse/JDK-8329816); the original attempt is here: https://github.com/openjdk/jdk/pull/19185. This PR is based on the discussions on how to move forward that was held in that original PR. This pull request has now been integrated. Changeset: b39e6a84 Author: Magnus Ihse Bursie URL: https://git.openjdk.org/jdk/commit/b39e6a84ef947661b5c878d02213da3a79bc026c Stats: 120709 lines in 175 files changed: 120709 ins; 0 del; 0 mod 8329816: Add SLEEF version 3.6.1 Reviewed-by: erikj, mli, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/20781 From coleenp at openjdk.org Tue Sep 17 13:15:08 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 17 Sep 2024 13:15:08 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v2] In-Reply-To: References: <7GRqYZv0orxGXAMb7ldeV5skNK7FraE-ovnhwlMDKNg=.0018fbb2-36ed-4870-80f6-6aaf026cd9f2@github.com> Message-ID: <0ZfEU7MwCLHUhJz_t3G2OiC3_2JjC0c7PUM5z2rOSUw=.99cea3aa-46a0-402e-ba0f-d68542041ed0@github.com> On Tue, 17 Sep 2024 12:03:57 GMT, Fredrik Bredberg wrote: >> src/hotspot/share/runtime/objectMonitor.hpp line 360: >> >>> 358: >>> 359: enum class TryLockResult { Interference = -1, HasOwner = 0, Success = 1 }; >>> 360: TryLockResult TryLock(JavaThread* current); >> >> This CamelCase syntax is used for private methods. We should change it to try_lock now that we are calling it from SharedRuntime code. Another alternative is to keep it private and just use the already available try_enter(). That has the benefit of not having to make TryLockResult public either. If we want to skip the checks after TryLock in try_enter we could add a check_owner_already boolean. > > Good suggestion! I'll go with the "Another alternative...". Yes, that's a good suggestion. try_enter() is already exported. Also change the comment in try_enter(): // TryLock avoids the CAS and handles deflation. because it used to be a try_set_owner_from(), and now there's two reasons it can't go back. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1763225745 From stooke at openjdk.org Tue Sep 17 13:16:50 2024 From: stooke at openjdk.org (Simon Tooke) Date: Tue, 17 Sep 2024 13:16:50 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v7] In-Reply-To: References: Message-ID: > This PR changes the status of realpath() from a Posix-specific API to a globally available API, i.e. adding it to the "Hotspot Porting API". Code would refer to os::realpath() instead of os::Posix::realpath(). > > This requires the addition of a stub routine in os_posix.cpp and a Windows implementation of realpath(), using Windows _fullpath(). > > This PR depends on #20597 in that it removes the need for one #ifdef in that PR. Because of that, this PR will be modified when and if #20597 is integrated (or vice-versa) > > Please note that guidelines for doing this appear in src/hotspot/share/runtime/os.hpp Simon Tooke has updated the pull request incrementally with one additional commit since the last revision: use MAX_PATH only ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20683/files - new: https://git.openjdk.org/jdk/pull/20683/files/20f697a6..7daad7c7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20683&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20683&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20683.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20683/head:pull/20683 PR: https://git.openjdk.org/jdk/pull/20683 From fbredberg at openjdk.org Tue Sep 17 13:18:13 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Tue, 17 Sep 2024 13:18:13 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v2] In-Reply-To: References: <7GRqYZv0orxGXAMb7ldeV5skNK7FraE-ovnhwlMDKNg=.0018fbb2-36ed-4870-80f6-6aaf026cd9f2@github.com> Message-ID: On Mon, 16 Sep 2024 16:05:25 GMT, Patricio Chilano Mateo wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update one, after the review > > src/hotspot/share/runtime/objectMonitor.cpp line 1267: > >> 1265: return; >> 1266: } >> 1267: } > > Can't we replace all this code for a call to TryLock? I think we can. Thanks for the suggestion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1763230029 From fbredberg at openjdk.org Tue Sep 17 13:21:10 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Tue, 17 Sep 2024 13:21:10 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v2] In-Reply-To: References: <7GRqYZv0orxGXAMb7ldeV5skNK7FraE-ovnhwlMDKNg=.0018fbb2-36ed-4870-80f6-6aaf026cd9f2@github.com> Message-ID: On Fri, 13 Sep 2024 15:50:13 GMT, Coleen Phillimore wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update one, after the review > > src/hotspot/share/runtime/objectMonitor.cpp line 1224: > >> 1222: // falls to the new owner. >> 1223: // >> 1224: void* owner = try_set_owner_from(nullptr, current); > > Is this the same code as TryLock now? Except a little different... Could this call TryLock and return if the lock becomes owned by another thread, like in SharedRuntime::monitor_exit_helper() ? It seems it can call TryLock, which was also pointed out by @pchilano. Thanks for also spotting this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1763235839 From mli at openjdk.org Tue Sep 17 13:28:39 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 17 Sep 2024 13:28:39 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v11] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > Thanks. > > This improvement is based on java.base/share/native/libzip/zlib/zcrc32.c, I made some modification to N (to 16) related code, then re-generate the tables needed, finally vectorize the code (original implementation in zcrc32.c is just scalar code). > > ## Test > test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java, > test/jdk/java/util/zip/TestCRC32.java > > ## Performance > > ###?on bananapi > > with patch > > Benchmark | (count) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- > TestCRC32.testCRC32Update | 64 | avgt | 10 | 222.297 | 0.106 | ns/op > TestCRC32.testCRC32Update | 128 | avgt | 10 | 365.144 | 0.196 | ns/op > TestCRC32.testCRC32Update | 256 | avgt | 10 | 687.14 | 0.235 | ns/op > TestCRC32.testCRC32Update | 512 | avgt | 10 | 1043.833 | 0.083 | ns/op > TestCRC32.testCRC32Update | 2048 | avgt | 10 | 3299.928 | 1.361 | ns/op > TestCRC32.testCRC32Update | 16384 | avgt | 10 | 24384.502 | 25.298 | ns/op > TestCRC32.testCRC32Update | 65536 | avgt | 10 | 103200.458 | 8.297 | ns/op > > > > without patch > > Benchmark | (count) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- > TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.878 | 0.02 | ns/op > TestCRC32.testCRC32Update | 128 | avgt | 10 | 364.173 | 0.032 | ns/op > TestCRC32.testCRC32Update | 256 | avgt | 10 | 685.815 | 0.055 | ns/op > TestCRC32.testCRC32Update | 512 | avgt | 10 | 1329.049 | 0.084 | ns/op > TestCRC32.testCRC32Update | 2048 | avgt | 10 | 5189.302 | 0.666 | ns/op > TestCRC32.testCRC32Update | 16384 | avgt | 10 | 41250.873 | 23.882 | ns/op > TestCRC32.testCRC32Update | 65536 | avgt | 10 | 171664.002 | 15.011 | ns/op > > > > ### on K230 > > with patch > > 920: // STST|LDST barrier in exit() before the ST of null into _owner that drops > > This sentence: Is the membar before or after the ST that drops the lock? It's the release barrier before the store that drops the lock. I'ts described [here](https://github.com/openjdk/jdk/blob/d2c6db2b9673ab8381bc7c86ada90da552350cb8/src/hotspot/share/runtime/objectMonitor.cpp#L1104). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1763250484 From fyang at openjdk.org Tue Sep 17 13:35:13 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 17 Sep 2024 13:35:13 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v11] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 13:28:39 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> Thanks. >> >> This improvement is based on java.base/share/native/libzip/zlib/zcrc32.c, I made some modification to N (to 16) related code, then re-generate the tables needed, finally vectorize the code (original implementation in zcrc32.c is just scalar code). >> >> ## Test >> test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java, >> test/jdk/java/util/zip/TestCRC32.java >> >> ## Performance >> >> ###?on bananapi >> >> with patch >> >> Benchmark | (count) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- >> TestCRC32.testCRC32Update | 64 | avgt | 10 | 222.297 | 0.106 | ns/op >> TestCRC32.testCRC32Update | 128 | avgt | 10 | 365.144 | 0.196 | ns/op >> TestCRC32.testCRC32Update | 256 | avgt | 10 | 687.14 | 0.235 | ns/op >> TestCRC32.testCRC32Update | 512 | avgt | 10 | 1043.833 | 0.083 | ns/op >> TestCRC32.testCRC32Update | 2048 | avgt | 10 | 3299.928 | 1.361 | ns/op >> TestCRC32.testCRC32Update | 16384 | avgt | 10 | 24384.502 | 25.298 | ns/op >> TestCRC32.testCRC32Update | 65536 | avgt | 10 | 103200.458 | 8.297 | ns/op >> >> >> >> without patch >> >> Benchmark | (count) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- >> TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.878 | 0.02 | ns/op >> TestCRC32.testCRC32Update | 128 | avgt | 10 | 364.173 | 0.032 | ns/op >> TestCRC32.testCRC32Update | 256 | avgt | 10 | 685.815 | 0.055 | ns/op >> TestCRC32.testCRC32Update | 512 | avgt | 10 | 1329.049 | 0.084 | ns/op >> TestCRC32.testCRC32Update | 2048 | avgt | 10 | 5189.302 | 0.666 | ns/op >> TestCRC32.testCRC32Update | 16384 | avgt | 10 | 41250.873 | 23.882 | ns/op >> TestCRC32.testCRC32Update | 65536 | avgt | 10 | 171664.002 | 15.011 | ns/op >> >> > ... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > minor Updated change LGTM. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20910#pullrequestreview-2309780876 From mli at openjdk.org Tue Sep 17 13:50:42 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 17 Sep 2024 13:50:42 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v12] In-Reply-To: References: Message-ID: <6RgppiD0RTuKLmCQAqB1vDhFwvZbfkSHOzKw6GFfcPk=.008267b0-5e6d-427b-98e6-ac860c0f9ab3@github.com> > Hi, > Can you help to review this patch? > Thanks. > > This improvement is based on java.base/share/native/libzip/zlib/zcrc32.c, I made some modification to N (to 16) related code, then re-generate the tables needed, finally vectorize the code (original implementation in zcrc32.c is just scalar code). > > ## Test > test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java, > test/jdk/java/util/zip/TestCRC32.java > > ## Performance > > ###?on bananapi > > with patch > > Benchmark | (count) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- > TestCRC32.testCRC32Update | 64 | avgt | 10 | 222.297 | 0.106 | ns/op > TestCRC32.testCRC32Update | 128 | avgt | 10 | 365.144 | 0.196 | ns/op > TestCRC32.testCRC32Update | 256 | avgt | 10 | 687.14 | 0.235 | ns/op > TestCRC32.testCRC32Update | 512 | avgt | 10 | 1043.833 | 0.083 | ns/op > TestCRC32.testCRC32Update | 2048 | avgt | 10 | 3299.928 | 1.361 | ns/op > TestCRC32.testCRC32Update | 16384 | avgt | 10 | 24384.502 | 25.298 | ns/op > TestCRC32.testCRC32Update | 65536 | avgt | 10 | 103200.458 | 8.297 | ns/op > > > > without patch > > Benchmark | (count) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- > TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.878 | 0.02 | ns/op > TestCRC32.testCRC32Update | 128 | avgt | 10 | 364.173 | 0.032 | ns/op > TestCRC32.testCRC32Update | 256 | avgt | 10 | 685.815 | 0.055 | ns/op > TestCRC32.testCRC32Update | 512 | avgt | 10 | 1329.049 | 0.084 | ns/op > TestCRC32.testCRC32Update | 2048 | avgt | 10 | 5189.302 | 0.666 | ns/op > TestCRC32.testCRC32Update | 16384 | avgt | 10 | 41250.873 | 23.882 | ns/op > TestCRC32.testCRC32Update | 65536 | avgt | 10 | 171664.002 | 15.011 | ns/op > > > > ### on K230 > > with patch > >> Benchmark | (count) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- >> TestCRC32.testCRC32Update | 64 | avgt | 10 | 222.297 | 0.106 | ns/op >> TestCRC32.testCRC32Update | 128 | avgt | 10 | 365.144 | 0.196 | ns/op >> TestCRC32.testCRC32Update | 256 | avgt | 10 | 687.14 | 0.235 | ns/op >> TestCRC32.testCRC32Update | 512 | avgt | 10 | 1043.833 | 0.083 | ns/op >> TestCRC32.testCRC32Update | 2048 | avgt | 10 | 3299.928 | 1.361 | ns/op >> TestCRC32.testCRC32Update | 16384 | avgt | 10 | 24384.502 | 25.298 | ns/op >> TestCRC32.testCRC32Update | 65536 | avgt | 10 | 103200.458 | 8.297 | ns/op >> >> >> >> without patch >> >> Benchmark | (count) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- >> TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.878 | 0.02 | ns/op >> TestCRC32.testCRC32Update | 128 | avgt | 10 | 364.173 | 0.032 | ns/op >> TestCRC32.testCRC32Update | 256 | avgt | 10 | 685.815 | 0.055 | ns/op >> TestCRC32.testCRC32Update | 512 | avgt | 10 | 1329.049 | 0.084 | ns/op >> TestCRC32.testCRC32Update | 2048 | avgt | 10 | 5189.302 | 0.666 | ns/op >> TestCRC32.testCRC32Update | 16384 | avgt | 10 | 41250.873 | 23.882 | ns/op >> TestCRC32.testCRC32Update | 65536 | avgt | 10 | 171664.002 | 15.011 | ns/op >> >> > ... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > vectorize xor Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20910#pullrequestreview-2309883207 From fyang at openjdk.org Tue Sep 17 14:12:11 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 17 Sep 2024 14:12:11 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v10] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 13:45:44 GMT, Hamlin Li wrote: >> Seem not help too much, as we need to slidedown vtmp in every loop round like vcrc, that means we can not save instruction; on the other side, as the `lwu` in the outer loop is continuous load, we can expect most of the actual laod is indeed from the cache. >> >> Unless we can also vetorize most of the code of outer loop (i < N), i.e. vectorize the subsequent `xorr` to `vxor_vv`, but seems we can not do that, because in every loop round `i`, it depends on `crc` result of previous loop round. > > Sorry, I gave it another thought. > Although we can not vectorize the whole out loop, we can still put one `xor` outside of the outer loop. Yes. Looks better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20910#discussion_r1763322224 From stooke at openjdk.org Tue Sep 17 14:14:31 2024 From: stooke at openjdk.org (Simon Tooke) Date: Tue, 17 Sep 2024 14:14:31 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v8] In-Reply-To: References: Message-ID: > This PR changes the status of realpath() from a Posix-specific API to a globally available API, i.e. adding it to the "Hotspot Porting API". Code would refer to os::realpath() instead of os::Posix::realpath(). > > This requires the addition of a stub routine in os_posix.cpp and a Windows implementation of realpath(), using Windows _fullpath(). > > This PR depends on #20597 in that it removes the need for one #ifdef in that PR. Because of that, this PR will be modified when and if #20597 is integrated (or vice-versa) > > Please note that guidelines for doing this appear in src/hotspot/share/runtime/os.hpp Simon Tooke has updated the pull request incrementally with one additional commit since the last revision: Define MAX_PATH if required ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20683/files - new: https://git.openjdk.org/jdk/pull/20683/files/7daad7c7..7757e90e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20683&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20683&range=06-07 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20683.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20683/head:pull/20683 PR: https://git.openjdk.org/jdk/pull/20683 From coleenp at openjdk.org Tue Sep 17 14:22:10 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 17 Sep 2024 14:22:10 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v2] In-Reply-To: References: <7GRqYZv0orxGXAMb7ldeV5skNK7FraE-ovnhwlMDKNg=.0018fbb2-36ed-4870-80f6-6aaf026cd9f2@github.com> Message-ID: On Tue, 17 Sep 2024 13:27:36 GMT, Fredrik Bredberg wrote: >> src/hotspot/share/runtime/objectMonitor.cpp line 920: >> >>> 918: // To that end, the exit() operation must have at least STST|LDST >>> 919: // "release" barrier semantics. Specifically, there must be at least a >>> 920: // STST|LDST barrier in exit() before the ST of null into _owner that drops >> >> This sentence: Is the membar before or after the ST that drops the lock? > > It's the release barrier before the store that drops the lock. I'ts described [here](https://github.com/openjdk/jdk/blob/d2c6db2b9673ab8381bc7c86ada90da552350cb8/src/hotspot/share/runtime/objectMonitor.cpp#L1104). Ok, thanks. I get it mixed up with the membar(StoreLoad) fence after storing null to the owner field. Nothing to change. I was just checking. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1763339017 From aph at openjdk.org Tue Sep 17 14:36:13 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 17 Sep 2024 14:36:13 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v6] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: <2T40nvtVeQwgpYCpm4AZueRaehOjHtXAHQapZSZjHgc=.da18e1d3-7df1-4131-bfd7-2eabc0339964@github.com> On Mon, 16 Sep 2024 17:11:05 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request incrementally with three additional commits since the last revision: > > - Optimize both the stub and inlined parts of the implementation > > Process T_CHAR/T_SHORT elements using T8H arrangement instead of T4H. > Add a non-unrolled vectorized loop to the stub to handle vectorizable > tail portions of arrays multiple to 4/8 elements (for ints / other > types). Make the stub process array as a whole instead of relying on > the inlined part to process an unvectorizable tail. > - cleanup: add comments and simplify the orr ins > - cleanup: remove redundant copyright notice src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5420: > 5418: > 5419: // Put 0-3'th powers of 31 into a single SIMD register together. The register will be used in > 5420: // the SMALL and LARGE LOOPS' EPILOQUES. The initialization is hoisted here and the register's Suggestion: // the SMALL and LARGE LOOPS' epilogues. The initialization is hoisted here and the register's ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1763363838 From duke at openjdk.org Tue Sep 17 14:43:46 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Tue, 17 Sep 2024 14:43:46 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v7] In-Reply-To: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: > Hello, > > Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. > > The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. > > At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. > > # Performance > > ## Neoverse N1 > > > -------------------------------------------------------------------------------------------- > Version Baseline This patch > -------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > -------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op > ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op > ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op > ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op > ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op > ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op > ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op > ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op > ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op > ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op > ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op > ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op > ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op > ArraysHashCode.multibytes 10 avgt 15 5.4... Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: cleanup: fix comment formatting Co-authored-by: Andrew Haley ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18487/files - new: https://git.openjdk.org/jdk/pull/18487/files/bfa93695..03821dfd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18487.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18487/head:pull/18487 PR: https://git.openjdk.org/jdk/pull/18487 From adinn at openjdk.org Tue Sep 17 15:50:18 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 17 Sep 2024 15:50:18 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v7] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Tue, 17 Sep 2024 14:43:46 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: > > cleanup: fix comment formatting > > Co-authored-by: Andrew Haley src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 103: > 101: // offsets past |(r - lc) % uf| pairs of load + madd insns i.e. it only executes > 102: // r % uf load + madds. Iteration eats up the remainder, uf elements at a time. > 103: assert(is_power_of_2(unroll_factor), "can't use this value to calculate the jump target PC"); The comment above needs adjusting in the light of your latest change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1763483032 From kvn at openjdk.org Tue Sep 17 16:12:24 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 17 Sep 2024 16:12:24 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v22] In-Reply-To: References: Message-ID: <9qt_iRfNSfdLuraZ18LqQx_xVt7xNPF9SBRZXwWkIig=.f86597de-6440-4235-9152-55b3a08c0d89@github.com> On Tue, 17 Sep 2024 05:20:30 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Discard memory accesses with barrier data as implicit null check candidates Looks good to me. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2310210106 From coleenp at openjdk.org Tue Sep 17 16:19:10 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 17 Sep 2024 16:19:10 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v4] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 21:02:41 GMT, Matias Saavedra Silva wrote: >> This patch cleans up the use of `get_new_method()` so callers don't have to worry about throwing `NoSuchMethodError`. The method is refactored to throw the error and avoid ever returning nullptr. Verified with tier1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Coleen suggestion It is an improvement to not have all the callers have is_deleted() ? NSME : get_new_method(m); I added the case for LinkResolver for JDK-JDK-8327250 and forgot deleted methods. We really want something non-null to be returned for the callers of LinkResolver::resolved/selected_method(). This result is stored in various places like the CompiledIC, but I think the compiledIC it's stored in will be for a deoptimized nmethod since that nmethod was a victim of redefinition. Maybe @fisk and @sspitsyn can comment on this. I was throwing out the idea of 'fatal' error because we do want people to stop using -XX:+AllowRedefinitionToAddDeleteMethods. If we wanted to do this, it would be a different PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20874#issuecomment-2356373137 From duke at openjdk.org Tue Sep 17 16:24:29 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Tue, 17 Sep 2024 16:24:29 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v8] In-Reply-To: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: > Hello, > > Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. > > The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. > > At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. > > # Performance > > ## Neoverse N1 > > > -------------------------------------------------------------------------------------------- > Version Baseline This patch > -------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > -------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op > ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op > ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op > ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op > ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op > ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op > ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op > ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op > ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op > ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op > ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op > ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op > ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op > ArraysHashCode.multibytes 10 avgt 15 5.4... Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: cleanup: adjust a comment in the light of the latest change ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18487/files - new: https://git.openjdk.org/jdk/pull/18487/files/03821dfd..6b8eb78c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=06-07 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/18487.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18487/head:pull/18487 PR: https://git.openjdk.org/jdk/pull/18487 From duke at openjdk.org Tue Sep 17 16:24:30 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Tue, 17 Sep 2024 16:24:30 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v7] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Tue, 17 Sep 2024 15:47:08 GMT, Andrew Dinn wrote: >> Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: >> >> cleanup: fix comment formatting >> >> Co-authored-by: Andrew Haley > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 103: > >> 101: // offsets past |(r - lc) % uf| pairs of load + madd insns i.e. it only executes >> 102: // r % uf load + madds. Iteration eats up the remainder, uf elements at a time. >> 103: assert(is_power_of_2(unroll_factor), "can't use this value to calculate the jump target PC"); > > The comment above needs adjusting in the light of your latest change. Done, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1763533579 From jbhateja at openjdk.org Tue Sep 17 16:35:43 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 16:35:43 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v6] In-Reply-To: References: Message-ID: > - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The > condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. > - This saves emitting an explicit MOVZX instruction after setCC. > - These new instructions are encoded using 4 byte Extended EVEX encoding. > > Validation performed over stand alone test point using Intel SDE. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Post NDD patch cleanups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20920/files - new: https://git.openjdk.org/jdk/pull/20920/files/dc37dea6..8673c736 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20920&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20920&range=04-05 Stats: 6 lines in 1 file changed: 0 ins; 5 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20920.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20920/head:pull/20920 PR: https://git.openjdk.org/jdk/pull/20920 From psandoz at openjdk.org Tue Sep 17 17:03:14 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 17 Sep 2024 17:03:14 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> Message-ID: On Tue, 17 Sep 2024 07:02:15 GMT, Jatin Bhateja wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 561: >> >>> 559: for (int i = 0; i < vlen; i++) { >>> 560: int index = ((int)vecPayload1[i]); >>> 561: res[i] = index >= vlen ? vecPayload3[index & (vlen - 1)] : vecPayload2[index]; >> >> This is incorrect as the index could be negative. You need to wrap in the range `[0, 2 * vlen - 1]` before the comparison and selection. >> >> int index = ((int)vecPayload1[i]) & ((vlen << 1) - 1)); >> res[i] = index < vlen ? vecPayload2[index] : vecPayload3[index - vlen]; > > Hi @PaulSandoz , we already pass wrapped indexes to this helper routine called from fallback implementation. Opps yes, the masking was throwing me off. Can you please add a comment and/or rename the parameters e.g., so `v1` is renamed to `wrappedIndex`? Also i would recommend not doing the masking, it is very misleading and instead do the subtraction. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1763582764 From psandoz at openjdk.org Tue Sep 17 17:07:16 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 17 Sep 2024 17:07:16 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> Message-ID: On Tue, 17 Sep 2024 07:02:12 GMT, Jatin Bhateja wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/X-Vector.java.template line 2974: >> >>> 2972: final $abstractvectortype$ selectFromTemplate(Class> indexVecClass, >>> 2973: $abstractvectortype$ v1, $abstractvectortype$ v2) { >>> 2974: int twoVectorLen = length() * 2; >> >> We should assert that the length is a power of two. > > API only accepts vector parameters and there is no means though public facing API to create a vector of NPOT sizes. https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Vector.java#L842C58-L843C27 You missed the first bit of the sentence linked to "With the possible exception of the {@linkplain VectorShape#S_Max_BIT maximum shape}". In generally the specification avoids assuming POT where it is not explicitly stated (i.e., the constant shapes). In this case we align with the specification of `VectorShuffle::wrapIndex`. We don't need to implement NPOT but we need a reminder in the implementation where we make that assumption. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1763587293 From sviswanathan at openjdk.org Tue Sep 17 17:14:09 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 17 Sep 2024 17:14:09 GMT Subject: RFR: 8339790: Support Intel APX setzucc instruction [v6] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 16:35:43 GMT, Jatin Bhateja wrote: >> - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The >> condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. >> - This saves emitting an explicit MOVZX instruction after setCC. >> - These new instructions are encoded using 4 byte Extended EVEX encoding. >> >> Validation performed over stand alone test point using Intel SDE. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Post NDD patch cleanups LGTM ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20920#pullrequestreview-2310353618 From kvn at openjdk.org Tue Sep 17 17:28:15 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 17 Sep 2024 17:28:15 GMT Subject: RFR: 8335334: Stress mode to randomly execute unstable if traps In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 11:01:27 GMT, Tobias Hartmann wrote: > Unstable if traps are supposed to be taken rarely. This patch introduces a `StressUnstableIfTraps` flag that forces unstable if traps to be taken randomly and thus potentially triggering intermittent bugs such as incorrect debug information. It works by adding another if before the unstable if that checks a "random" condition at runtime (a simple shared counter) and then either takes the trap or executes the original, unstable if. > > This stress option also has the nice side effect of triggering re-compilation of methods that would otherwise not be re-compiled (see [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843)). > > I had to adjust a few tests that rely on methods being compiled / deoptimized because with the stress option enabled, deoptimization might unexpectedly (not) happen. > > It reliably triggers [JDK-8335977](https://bugs.openjdk.org/browse/JDK-8335977), [JDK-8320308](https://bugs.openjdk.org/browse/JDK-8320308) and [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843) in our testing. > > Tested with multiple runs up to tier6. I'll integrate it into our (Oracle internal) testing and CTW [(JDK-8340302)](https://bugs.openjdk.org/browse/JDK-8340302) once all the bugs that it currently triggers are fixed. > > Thanks, > Tobias Few comments. src/hotspot/share/opto/parse2.cpp line 1385: > 1383: bool do_stress_trap = StressUnstableIfTraps && ((C->random() % 2) == 0); > 1384: if (do_stress_trap) { > 1385: Node* counter_addr = makecon(TypeRawPtr::make((address)&_trap_stress_counter)); Would it be easier if you use new Ideal macro node for this and expand it in macro expansion phase? src/hotspot/share/opto/parse2.cpp line 1497: > 1495: incr_store = store_to_memory(control(), counter_addr, counter, T_INT, Compile::AliasIdxRaw, MemNode::unordered); > 1496: } > 1497: >From the glance it looks like the code above. Should you put it into separate method to call it in both places? src/hotspot/share/opto/parse2.cpp line 1589: > 1587: // Search for an unstable if trap > 1588: CallStaticJavaNode* trap = nullptr; > 1589: for (int i = 0; i <= 1; ++i) { Should we check that it is `IfNode` and it has 2 output edges? May be assert? src/hotspot/share/opto/parse2.cpp line 1590: > 1588: CallStaticJavaNode* trap = nullptr; > 1589: for (int i = 0; i <= 1; ++i) { > 1590: Node* out = orig_iff->raw_out(i)->find_out_with(Op_CallStaticJava); Why not cast (CallStaticJava*) here? src/hotspot/share/opto/parse2.cpp line 1591: > 1589: for (int i = 0; i <= 1; ++i) { > 1590: Node* out = orig_iff->raw_out(i)->find_out_with(Op_CallStaticJava); > 1591: if (out != nullptr && out->isa_CallStaticJava() && out->as_CallStaticJava()->is_uncommon_trap()) { You don't need `out->isa_CallStaticJava()` because `find_out_with()` will return `nullptr` in other cases. ------------- PR Review: https://git.openjdk.org/jdk/pull/21037#pullrequestreview-2310269564 PR Review Comment: https://git.openjdk.org/jdk/pull/21037#discussion_r1763603794 PR Review Comment: https://git.openjdk.org/jdk/pull/21037#discussion_r1763554614 PR Review Comment: https://git.openjdk.org/jdk/pull/21037#discussion_r1763612378 PR Review Comment: https://git.openjdk.org/jdk/pull/21037#discussion_r1763616657 PR Review Comment: https://git.openjdk.org/jdk/pull/21037#discussion_r1763614982 From jbhateja at openjdk.org Tue Sep 17 17:49:09 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 17 Sep 2024 17:49:09 GMT Subject: Integrated: 8339790: Support Intel APX setzucc instruction In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 19:36:51 GMT, Jatin Bhateja wrote: > - Support APX variant of SETcc, which supports zero-upper semantics (full register writer). Sets the destination operand to 0 or 1 depending on the settings of the status flags (CF, SF, OF, ZF, and PF) in the EFLAGS register. The destination operand points to a byte register or a byte in memory. The > condition code suffix (cc) indicates the condition being tested for. Additionally, if ND = 1 and the destination is a GPR, then also set the upper 56 bits of the GPR to 0. > - This saves emitting an explicit MOVZX instruction after setCC. > - These new instructions are encoded using 4 byte Extended EVEX encoding. > > Validation performed over stand alone test point using Intel SDE. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: 90e92f98 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/90e92f98a6685b196b979853436668cf2b9f2117 Stats: 73 lines in 7 files changed: 22 ins; 25 del; 26 mod 8339790: Support Intel APX setzucc instruction Reviewed-by: sviswanathan, jkarthikeyan, kvn ------------- PR: https://git.openjdk.org/jdk/pull/20920 From kvn at openjdk.org Tue Sep 17 17:59:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 17 Sep 2024 17:59:05 GMT Subject: RFR: 8293337: Store method handle intrinsics in AOT cache [v2] In-Reply-To: References: <4fMDUSZRg0HcIiZmr-yqr7vleFXrD_zNXpdd_pfgHQ8=.4a679e86-b0f9-4101-bcc2-f49d8bcb417b@github.com> Message-ID: On Tue, 17 Sep 2024 00:06:19 GMT, Ioi Lam wrote: >> This is the 5th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store Java heap objects that have native pointers to the C++ `Method` objects returned by `SystemDictionary::find_method_handle_intrinsic()` >> >> These `Method` objects are created within the JVM. They do not belong to any actual Java classes. We store all these `Method` objects into the AOT cache, so that they can be referenced by other artifacts in the AOT cache. >> >> --- >> See [here](https://bugs.openjdk.org/browse/JDK-8315737) for the sequence of dependent RFEs for implementing JEP 483. > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - @vnkozlov comment - added NOT_CDS_RETURN > - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics > - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics > - some clean up > - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics > - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics > - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics > - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics > - 8293337: Archive method handle intrinsics Looks good to me. But I am not expert in this code. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20959#pullrequestreview-2310450484 From kvn at openjdk.org Tue Sep 17 18:02:06 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 17 Sep 2024 18:02:06 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v3] In-Reply-To: References: Message-ID: <5D6O9eX2pGjnjE7sLxV2mN70gj-srOa1SIDfhgSPQH0=.d73305ea-9050-41c3-9085-3ebc94679318@github.com> On Tue, 17 Sep 2024 00:02:19 GMT, Ioi Lam wrote: >> This is the 4th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> **Problem:** >> >> This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store [`sun.invoke.util.Wrapper`](https://github.com/openjdk/jdk/blob/c3711dc90980fb3e63ff199612c201c4464626bf/src/java.base/share/classes/sun/invoke/util/Wrapper.java) enums in the AOT cache. Although CDS has some limited support for storing enums, the `Wrapper` type is too complex for the existing solution to handle. Please see the JBS issue for details. >> >> **Solution:** >> >> In the assembly phase, we store the initialized states of the `Wrapper` class (captured in a `java.lang.Class` object, a.k.a. the *mirror* of this class) into the AOT cache. >> >> In production run, we no longer execute `Wrapper::`, because all the static fields (contained in its mirror) are already initialized. >> >> **Review Notes:** >> >> - The new capability is controlled by `CDSConfig::is_initing_classes_at_dump_time()`. We can aot-initialize classes only if `-XX:+AOTClassLinking` is enabled. >> - The old (more limited) support for enums is still there (it's required when `AOTClassLinking` is disabled). See the call to `CDSEnumKlass::handle_enum_obj()` in heapShared.cpp. >> - `AOTClassInitializer::can_archive_initialized_mirror()` decides what classes can be aot-initialized. This is currently a very small set of classes, but will expand in [JDK-8293336](https://bugs.openjdk.org/browse/JDK-8293336) >> - Before, `HeapShared::archive_java_mirrors()` would clear out all the states in the archived mirrors. With this PR, the states of aot-initialized classes are preserved via `HeapShared::copy_aot_initialized_mirror()`. >> - During the early state of the production run, `AOTLinkedClassBulkLoader::init_required_classes_for_loader()` is called to make sure that: >> - all aot-initialized classes are moved into the `initialized` state (without executing its `` method). This is done in `InstanceKlass::initialize_from_cds()` >> - the classes of all the objects that are reachable from the aot-initialized mirrors are initialized. See comments above ` HeapShared::init_classes_reachable_from_archived_mirrors()` >> >> **Caveats:** >> >> Not all Enum classes can be stored in the initialized state. E.g., some Enums might have static fields that depend on the e... > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - @vnkozlov comments > - Clean up; removed unrelated changes in classPrinter.cpp > - more cleanup > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - More clean up for JDK-8293187 > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - Simplified implemented by AOTClassInitializer. > - ... and 1 more: https://git.openjdk.org/jdk/compare/bcddf963...e15e76cd Good for me. But you need review from experts in this code. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20958#pullrequestreview-2310456505 From psandoz at openjdk.org Tue Sep 17 18:24:05 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 17 Sep 2024 18:24:05 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: <0LLon0mwzZnp8sR_306z0BoBUjXQAgLBn_KHP-37PC0=.802ff560-b97b-43ba-83b1-94d331d7e03e@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> <0LLon0mwzZnp8sR_306z0BoBUjXQAgLBn_KHP-37PC0=.802ff560-b97b-43ba-83b1-94d331d7e03e@github.com> Message-ID: On Thu, 22 Aug 2024 18:43:56 GMT, Paul Sandoz wrote: > Adding link to UTF-8 decoding use case for convenience and reminder: https://github.com/AugustNagro/utf8.java/blob/master/src/main/java/com/augustnagro/utf8/Utf8.java. Another related link to base 64 decoding https://github.com/simdutf/SimdBase64/ ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2356611024 From sviswanathan at openjdk.org Tue Sep 17 18:42:05 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 17 Sep 2024 18:42:05 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> <0LLon0mwzZnp8sR_306z0BoBUjXQAgLBn_KHP-37PC0=.802ff560-b97b-43ba-83b1-94d331d7e03e@github.com> Message-ID: On Tue, 17 Sep 2024 18:21:43 GMT, Paul Sandoz wrote: > > Adding link to UTF-8 decoding use case for convenience and reminder: https://github.com/AugustNagro/utf8.java/blob/master/src/main/java/com/augustnagro/utf8/Utf8.java. > > Another related link to base 64 decoding https://github.com/simdutf/SimdBase64/ Thanks Paul! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2356639277 From asmehra at openjdk.org Tue Sep 17 19:55:09 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 17 Sep 2024 19:55:09 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v6] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Mon, 16 Sep 2024 21:54:49 GMT, Ioi Lam wrote: >> This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> **Overview** >> >> - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. >> - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. >> - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. >> - The boot classes are loaded as part of `vmClasses::resolve_all()` >> - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). >> - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. >> >> **All-or-nothing Loading** >> >> - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. >> - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: >> - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. >> - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. >> - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. >> - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exa... > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @ashu-mehra reviews src/hotspot/share/runtime/threads.cpp line 322: > 320: universe_post_module_init(); > 321: > 322: if (CDSConfig::is_using_aot_linked_classes()) { call_initPhase2 has a timer that computes cost for initializing module system. Before this patch call_initPhase2 was only initializing the module system. But now it is doing work which is not part of the module system initialization. So probably in future we may want to refactor this work out of call_initPhase2. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1763851899 From gziemski at openjdk.org Tue Sep 17 20:02:18 2024 From: gziemski at openjdk.org (Gerard Ziemski) Date: Tue, 17 Sep 2024 20:02:18 GMT Subject: Integrated: 8337563: NMT: rename MEMFLAGS to MemTag In-Reply-To: References: Message-ID: <6f1uGL8qdaFMXKcWOHRwQb7z_up-MGAHqjOo9WEP3vA=.fa267ab6-4754-4175-912b-c10ca62cf7ac@github.com> On Thu, 5 Sep 2024 16:10:05 GMT, Gerard Ziemski wrote: > Please review this cleanup, where we rename `MEMFLAGS` to `MemTag`. > > `MEMFLAGS` implies that we can use more than one at the same time, but those are exclusive values, so `MemTag` is a more suitable name. > > This fix also includes a cleanup of all the related function/template parameter names and local variable names. > > Testing is pending... > > Note: there is more history in old closed PRs [https://github.com/openjdk/jdk/pull/20497](https://github.com/openjdk/jdk/pull/20497) and [https://github.com/openjdk/jdk/pull/20472](https://github.com/openjdk/jdk/pull/20472) This pull request has now been integrated. Changeset: eabfc6e4 Author: Gerard Ziemski URL: https://git.openjdk.org/jdk/commit/eabfc6e4d901c53b93a78da740ca376607d9576d Stats: 1533 lines in 127 files changed: 144 ins; 138 del; 1251 mod 8337563: NMT: rename MEMFLAGS to MemTag Reviewed-by: dholmes, coleenp, jsjolen ------------- PR: https://git.openjdk.org/jdk/pull/20872 From asmehra at openjdk.org Tue Sep 17 20:36:08 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 17 Sep 2024 20:36:08 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v3] In-Reply-To: References: Message-ID: <0QRMVGKYDVfu4Ie1N6RKC5B1QPxs7sQUvdnyZxayX9o=.e4fe8dc7-4cc8-44ba-99e7-dc5cacd85147@github.com> On Tue, 17 Sep 2024 00:02:19 GMT, Ioi Lam wrote: >> This is the 4th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> **Problem:** >> >> This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store [`sun.invoke.util.Wrapper`](https://github.com/openjdk/jdk/blob/c3711dc90980fb3e63ff199612c201c4464626bf/src/java.base/share/classes/sun/invoke/util/Wrapper.java) enums in the AOT cache. Although CDS has some limited support for storing enums, the `Wrapper` type is too complex for the existing solution to handle. Please see the JBS issue for details. >> >> **Solution:** >> >> In the assembly phase, we store the initialized states of the `Wrapper` class (captured in a `java.lang.Class` object, a.k.a. the *mirror* of this class) into the AOT cache. >> >> In production run, we no longer execute `Wrapper::`, because all the static fields (contained in its mirror) are already initialized. >> >> **Review Notes:** >> >> - The new capability is controlled by `CDSConfig::is_initing_classes_at_dump_time()`. We can aot-initialize classes only if `-XX:+AOTClassLinking` is enabled. >> - The old (more limited) support for enums is still there (it's required when `AOTClassLinking` is disabled). See the call to `CDSEnumKlass::handle_enum_obj()` in heapShared.cpp. >> - `AOTClassInitializer::can_archive_initialized_mirror()` decides what classes can be aot-initialized. This is currently a very small set of classes, but will expand in [JDK-8293336](https://bugs.openjdk.org/browse/JDK-8293336) >> - Before, `HeapShared::archive_java_mirrors()` would clear out all the states in the archived mirrors. With this PR, the states of aot-initialized classes are preserved via `HeapShared::copy_aot_initialized_mirror()`. >> - During the early state of the production run, `AOTLinkedClassBulkLoader::init_required_classes_for_loader()` is called to make sure that: >> - all aot-initialized classes are moved into the `initialized` state (without executing its `` method). This is done in `InstanceKlass::initialize_from_cds()` >> - the classes of all the objects that are reachable from the aot-initialized mirrors are initialized. See comments above ` HeapShared::init_classes_reachable_from_archived_mirrors()` >> >> **Caveats:** >> >> Not all Enum classes can be stored in the initialized state. E.g., some Enums might have static fields that depend on the e... > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - @vnkozlov comments > - Clean up; removed unrelated changes in classPrinter.cpp > - more cleanup > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - More clean up for JDK-8293187 > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - Simplified implemented by AOTClassInitializer. > - ... and 1 more: https://git.openjdk.org/jdk/compare/bcddf963...e15e76cd src/hotspot/share/cds/aotClassInitializer.cpp line 40: > 38: return true; > 39: } else if (ik->is_initialized() && > 40: (ik->name()->equals("jdk/internal/constant/PrimitiveClassDescImpl") || Is it possible for one of these classes to be not initialized at this stage? IIUC `ik` must be initialized if it is one of these classes. In so, can `ik->is_initialized()` be turned into an assert? src/hotspot/share/cds/heapShared.cpp line 996: > 994: > 995: assert( _runtime_default_subgraph_info != nullptr, "must be"); > 996: Array* klasses = _runtime_default_subgraph_info->subgraph_object_klasses(); Couple of questions here: 1. Does `_runtime_default_subgraph_info` only hold archived mirrors? 2. If we are init-ing classes required for archived mirrors here, why do we need special aot-initialization for `PrimitiveClassDescImpl`, `ReferenceClassDescImpl` and `ConstantDescs` in AOTClassInitializer::can_archive_initialized_mirror? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20958#discussion_r1763982914 PR Review Comment: https://git.openjdk.org/jdk/pull/20958#discussion_r1763982827 From asmehra at openjdk.org Tue Sep 17 20:58:12 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 17 Sep 2024 20:58:12 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v3] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 00:02:19 GMT, Ioi Lam wrote: >> This is the 4th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> **Problem:** >> >> This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store [`sun.invoke.util.Wrapper`](https://github.com/openjdk/jdk/blob/c3711dc90980fb3e63ff199612c201c4464626bf/src/java.base/share/classes/sun/invoke/util/Wrapper.java) enums in the AOT cache. Although CDS has some limited support for storing enums, the `Wrapper` type is too complex for the existing solution to handle. Please see the JBS issue for details. >> >> **Solution:** >> >> In the assembly phase, we store the initialized states of the `Wrapper` class (captured in a `java.lang.Class` object, a.k.a. the *mirror* of this class) into the AOT cache. >> >> In production run, we no longer execute `Wrapper::`, because all the static fields (contained in its mirror) are already initialized. >> >> **Review Notes:** >> >> - The new capability is controlled by `CDSConfig::is_initing_classes_at_dump_time()`. We can aot-initialize classes only if `-XX:+AOTClassLinking` is enabled. >> - The old (more limited) support for enums is still there (it's required when `AOTClassLinking` is disabled). See the call to `CDSEnumKlass::handle_enum_obj()` in heapShared.cpp. >> - `AOTClassInitializer::can_archive_initialized_mirror()` decides what classes can be aot-initialized. This is currently a very small set of classes, but will expand in [JDK-8293336](https://bugs.openjdk.org/browse/JDK-8293336) >> - Before, `HeapShared::archive_java_mirrors()` would clear out all the states in the archived mirrors. With this PR, the states of aot-initialized classes are preserved via `HeapShared::copy_aot_initialized_mirror()`. >> - During the early state of the production run, `AOTLinkedClassBulkLoader::init_required_classes_for_loader()` is called to make sure that: >> - all aot-initialized classes are moved into the `initialized` state (without executing its `` method). This is done in `InstanceKlass::initialize_from_cds()` >> - the classes of all the objects that are reachable from the aot-initialized mirrors are initialized. See comments above ` HeapShared::init_classes_reachable_from_archived_mirrors()` >> >> **Caveats:** >> >> Not all Enum classes can be stored in the initialized state. E.g., some Enums might have static fields that depend on the e... > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - @vnkozlov comments > - Clean up; removed unrelated changes in classPrinter.cpp > - more cleanup > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - More clean up for JDK-8293187 > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - Simplified implemented by AOTClassInitializer. > - ... and 1 more: https://git.openjdk.org/jdk/compare/bcddf963...e15e76cd src/hotspot/share/cds/aotLinkedClassBulkLoader.cpp line 242: > 240: // - classes that were AOT-initialized by AOTClassInitializer > 241: // - the classes of all objects that are reachable from the archived mirrors of > 242: // the AOT-linked classes for . It seems this function covers the first two categories, but not the AOT-linked classes. Is that correct? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20958#discussion_r1764005794 From asmehra at openjdk.org Tue Sep 17 21:44:06 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 17 Sep 2024 21:44:06 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v3] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 00:02:19 GMT, Ioi Lam wrote: >> This is the 4th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> **Problem:** >> >> This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store [`sun.invoke.util.Wrapper`](https://github.com/openjdk/jdk/blob/c3711dc90980fb3e63ff199612c201c4464626bf/src/java.base/share/classes/sun/invoke/util/Wrapper.java) enums in the AOT cache. Although CDS has some limited support for storing enums, the `Wrapper` type is too complex for the existing solution to handle. Please see the JBS issue for details. >> >> **Solution:** >> >> In the assembly phase, we store the initialized states of the `Wrapper` class (captured in a `java.lang.Class` object, a.k.a. the *mirror* of this class) into the AOT cache. >> >> In production run, we no longer execute `Wrapper::`, because all the static fields (contained in its mirror) are already initialized. >> >> **Review Notes:** >> >> - The new capability is controlled by `CDSConfig::is_initing_classes_at_dump_time()`. We can aot-initialize classes only if `-XX:+AOTClassLinking` is enabled. >> - The old (more limited) support for enums is still there (it's required when `AOTClassLinking` is disabled). See the call to `CDSEnumKlass::handle_enum_obj()` in heapShared.cpp. >> - `AOTClassInitializer::can_archive_initialized_mirror()` decides what classes can be aot-initialized. This is currently a very small set of classes, but will expand in [JDK-8293336](https://bugs.openjdk.org/browse/JDK-8293336) >> - Before, `HeapShared::archive_java_mirrors()` would clear out all the states in the archived mirrors. With this PR, the states of aot-initialized classes are preserved via `HeapShared::copy_aot_initialized_mirror()`. >> - During the early state of the production run, `AOTLinkedClassBulkLoader::init_required_classes_for_loader()` is called to make sure that: >> - all aot-initialized classes are moved into the `initialized` state (without executing its `` method). This is done in `InstanceKlass::initialize_from_cds()` >> - the classes of all the objects that are reachable from the aot-initialized mirrors are initialized. See comments above ` HeapShared::init_classes_reachable_from_archived_mirrors()` >> >> **Caveats:** >> >> Not all Enum classes can be stored in the initialized state. E.g., some Enums might have static fields that depend on the e... > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - @vnkozlov comments > - Clean up; removed unrelated changes in classPrinter.cpp > - more cleanup > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - More clean up for JDK-8293187 > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - Simplified implemented by AOTClassInitializer. > - ... and 1 more: https://git.openjdk.org/jdk/compare/bcddf963...e15e76cd src/hotspot/share/oops/instanceKlass.cpp line 846: > 844: // If we have a preinit mirror, we may come to here if a supertype is not > 845: // yet initialized. It will still be quicker than usual, as we will skip the > 846: // execution of of this class. How do we ensure is skipped in this case? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20958#discussion_r1764053035 From asmehra at openjdk.org Tue Sep 17 21:50:10 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 17 Sep 2024 21:50:10 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v3] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 00:02:19 GMT, Ioi Lam wrote: >> This is the 4th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> **Problem:** >> >> This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store [`sun.invoke.util.Wrapper`](https://github.com/openjdk/jdk/blob/c3711dc90980fb3e63ff199612c201c4464626bf/src/java.base/share/classes/sun/invoke/util/Wrapper.java) enums in the AOT cache. Although CDS has some limited support for storing enums, the `Wrapper` type is too complex for the existing solution to handle. Please see the JBS issue for details. >> >> **Solution:** >> >> In the assembly phase, we store the initialized states of the `Wrapper` class (captured in a `java.lang.Class` object, a.k.a. the *mirror* of this class) into the AOT cache. >> >> In production run, we no longer execute `Wrapper::`, because all the static fields (contained in its mirror) are already initialized. >> >> **Review Notes:** >> >> - The new capability is controlled by `CDSConfig::is_initing_classes_at_dump_time()`. We can aot-initialize classes only if `-XX:+AOTClassLinking` is enabled. >> - The old (more limited) support for enums is still there (it's required when `AOTClassLinking` is disabled). See the call to `CDSEnumKlass::handle_enum_obj()` in heapShared.cpp. >> - `AOTClassInitializer::can_archive_initialized_mirror()` decides what classes can be aot-initialized. This is currently a very small set of classes, but will expand in [JDK-8293336](https://bugs.openjdk.org/browse/JDK-8293336) >> - Before, `HeapShared::archive_java_mirrors()` would clear out all the states in the archived mirrors. With this PR, the states of aot-initialized classes are preserved via `HeapShared::copy_aot_initialized_mirror()`. >> - During the early state of the production run, `AOTLinkedClassBulkLoader::init_required_classes_for_loader()` is called to make sure that: >> - all aot-initialized classes are moved into the `initialized` state (without executing its `` method). This is done in `InstanceKlass::initialize_from_cds()` >> - the classes of all the objects that are reachable from the aot-initialized mirrors are initialized. See comments above ` HeapShared::init_classes_reachable_from_archived_mirrors()` >> >> **Caveats:** >> >> Not all Enum classes can be stored in the initialized state. E.g., some Enums might have static fields that depend on the e... > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - @vnkozlov comments > - Clean up; removed unrelated changes in classPrinter.cpp > - more cleanup > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - More clean up for JDK-8293187 > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - Simplified implemented by AOTClassInitializer. > - ... and 1 more: https://git.openjdk.org/jdk/compare/bcddf963...e15e76cd @iklam I have left some questions to have better understanding of the code. ------------- PR Review: https://git.openjdk.org/jdk/pull/20958#pullrequestreview-2311136071 From iklam at openjdk.org Tue Sep 17 23:25:43 2024 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 17 Sep 2024 23:25:43 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v7] In-Reply-To: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: <9P_DE8tDNQAzQxtfD-eXEAS7Lq3CsVNfOWHLetBXJPQ=.44255a05-36d6-4233-8da5-5ff5ad94a729@github.com> > This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > **Overview** > > - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. > - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. > - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. > - The boot classes are loaded as part of `vmClasses::resolve_all()` > - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). > - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. > > **All-or-nothing Loading** > > - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. > - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: > - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. > - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. > - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. > - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exact same set of modules are visible whe... Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: @ashu-mehra comment: move code outside of call_initPhase2(); also renamed BOOT/BOOT2 to BOOT1/BOOT2 and refactored code related to AOTLinkedClassCategory ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20843/files - new: https://git.openjdk.org/jdk/pull/20843/files/bcddf963..4e9668a0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20843&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20843&range=05-06 Stats: 194 lines in 8 files changed: 80 ins; 67 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/20843.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20843/head:pull/20843 PR: https://git.openjdk.org/jdk/pull/20843 From iklam at openjdk.org Tue Sep 17 23:29:46 2024 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 17 Sep 2024 23:29:46 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v8] In-Reply-To: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: <7FyX0AFx1IRbgWFlAvOwJYgv-bgJ4w8E56h6DXSrGow=.4cfe39cc-d85e-47ea-ad6c-25cf19f6be24@github.com> > This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > **Overview** > > - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. > - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. > - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. > - The boot classes are loaded as part of `vmClasses::resolve_all()` > - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). > - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. > > **All-or-nothing Loading** > > - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. > - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: > - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. > - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. > - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. > - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exact same set of modules are visible whe... Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: minor comment fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20843/files - new: https://git.openjdk.org/jdk/pull/20843/files/4e9668a0..bedf9a26 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20843&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20843&range=06-07 Stats: 4 lines in 2 files changed: 1 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20843.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20843/head:pull/20843 PR: https://git.openjdk.org/jdk/pull/20843 From iklam at openjdk.org Tue Sep 17 23:34:06 2024 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 17 Sep 2024 23:34:06 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v6] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: <5MelVAUu_MCX4APA_lPq1fLpbCtfKDzTVqjoWOWkSKk=.8b5c8db0-d624-4ec0-be11-22c0e8ba41b2@github.com> On Tue, 17 Sep 2024 19:52:29 GMT, Ashutosh Mehra wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> @ashu-mehra reviews > > src/hotspot/share/runtime/threads.cpp line 322: > >> 320: universe_post_module_init(); >> 321: >> 322: if (CDSConfig::is_using_aot_linked_classes()) { > > call_initPhase2 has a timer that computes cost for initializing module system. Before this patch call_initPhase2 was only initializing the module system. But now it is doing work which is not part of the module system initialization. So probably in future we may want to refactor this work out of call_initPhase2. Hi @ashu-mehra , I moved the code outside of `call_initPhase2()`, and consolidated it into `AOTLinkedClassBulkLoader::load_non_javabase_classes()`. While doing that, I noticed that the enums `BOOT` and `BOOT2` are misleading -- it seems the former would be a superset of the latter, but in fact they are disjoint. I think this is one reason that @dholmes-ora wanted different names. So I renamed them to `BOOT1` vs `BOOT2` (boot classes loaded in 1st vs 2nd phase). I also renamed the enum to AOTLinkedClassCategory and refactored the related code. There's a lot of renaming but the logic is unchanged. Ashu and David, please re-review https://github.com/openjdk/jdk/pull/20843/commits/4e9668a0b85fd5aa5839528e30c2955b424ac8ca ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1764211572 From ccheung at openjdk.org Tue Sep 17 23:50:19 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Tue, 17 Sep 2024 23:50:19 GMT Subject: RFR: 8328313: Archived module graph should allow identical --module-path to be specified during dump time and run time Message-ID: Prior to this patch, if `--module-path` is specified in the command line: during CDS dump time, full module graph will not be included in the CDS archive; during run time, full module graph will not be used. With this patch, the full module graph will be included in the CDS archive with the `--module-path` option. During run time, if the same `--module-path` option is specified, the archived module graph will be used. The checking of module paths between dump time and run time is more lenient compared with the checking of class paths; the ordering of the modules is unimportant, duplicate module names are ignored. E.g. the following is considered a match: dump time runtime m1,m2 m2,m1 m1,m2 m1,m2,m2 I included some [notes](https://bugs.openjdk.org/browse/JDK-8328313?focusedId=14699275&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14699275) in the bug report regarding some changes in the corelib classes. ------------- Commit messages: - 8328313: Archived module graph should allow identical --module-path to be specified during dump time and run time Changes: https://git.openjdk.org/jdk/pull/21048/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21048&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8328313 Stats: 460 lines in 17 files changed: 420 ins; 2 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/21048.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21048/head:pull/21048 PR: https://git.openjdk.org/jdk/pull/21048 From iklam at openjdk.org Wed Sep 18 01:03:42 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 18 Sep 2024 01:03:42 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v4] In-Reply-To: References: Message-ID: <61t4dE-JoDWfSklauGWliiwfSiYeBpxiTcSkFI43Npc=.d184114e-d9cb-4887-b0bc-6e08cbf67112@github.com> > This is the 4th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > **Problem:** > > This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store [`sun.invoke.util.Wrapper`](https://github.com/openjdk/jdk/blob/c3711dc90980fb3e63ff199612c201c4464626bf/src/java.base/share/classes/sun/invoke/util/Wrapper.java) enums in the AOT cache. Although CDS has some limited support for storing enums, the `Wrapper` type is too complex for the existing solution to handle. Please see the JBS issue for details. > > **Solution:** > > In the assembly phase, we store the initialized states of the `Wrapper` class (captured in a `java.lang.Class` object, a.k.a. the *mirror* of this class) into the AOT cache. > > In production run, we no longer execute `Wrapper::`, because all the static fields (contained in its mirror) are already initialized. > > **Review Notes:** > > - The new capability is controlled by `CDSConfig::is_initing_classes_at_dump_time()`. We can aot-initialize classes only if `-XX:+AOTClassLinking` is enabled. > - The old (more limited) support for enums is still there (it's required when `AOTClassLinking` is disabled). See the call to `CDSEnumKlass::handle_enum_obj()` in heapShared.cpp. > - `AOTClassInitializer::can_archive_initialized_mirror()` decides what classes can be aot-initialized. This is currently a very small set of classes, but will expand in [JDK-8293336](https://bugs.openjdk.org/browse/JDK-8293336) > - Before, `HeapShared::archive_java_mirrors()` would clear out all the states in the archived mirrors. With this PR, the states of aot-initialized classes are preserved via `HeapShared::copy_aot_initialized_mirror()`. > - During the early state of the production run, `AOTLinkedClassBulkLoader::init_required_classes_for_loader()` is called to make sure that: > - all aot-initialized classes are moved into the `initialized` state (without executing its `` method). This is done in `InstanceKlass::initialize_from_cds()` > - the classes of all the objects that are reachable from the aot-initialized mirrors are initialized. See comments above ` HeapShared::init_classes_reachable_from_archived_mirrors()` > > **Caveats:** > > Not all Enum classes can be stored in the initialized state. E.g., some Enums might have static fields that depend on the environment: > > > enum Foo { > [....] > static fin... Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: Improved in-line comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20958/files - new: https://git.openjdk.org/jdk/pull/20958/files/e15e76cd..21b8f6f9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20958&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20958&range=02-03 Stats: 14 lines in 2 files changed: 11 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20958.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20958/head:pull/20958 PR: https://git.openjdk.org/jdk/pull/20958 From iklam at openjdk.org Wed Sep 18 01:03:43 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 18 Sep 2024 01:03:43 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v3] In-Reply-To: <0QRMVGKYDVfu4Ie1N6RKC5B1QPxs7sQUvdnyZxayX9o=.e4fe8dc7-4cc8-44ba-99e7-dc5cacd85147@github.com> References: <0QRMVGKYDVfu4Ie1N6RKC5B1QPxs7sQUvdnyZxayX9o=.e4fe8dc7-4cc8-44ba-99e7-dc5cacd85147@github.com> Message-ID: On Tue, 17 Sep 2024 20:33:23 GMT, Ashutosh Mehra wrote: > Does _runtime_default_subgraph_info only hold information about archived mirrors? The default graph records classes of objects created in this step: void HeapShared::copy_objects() { assert(HeapShared::can_write(), "must be"); copy_interned_strings(); // << here copy_special_objects(); // << and here So apart from mirrors, it also records the class of String and 6 exception instances archived by `HeapShared::archive_exception_instance` > If we are init-ing classes required for archived mirrors here, why do we need special aot-initialization for PrimitiveClassDescImpl, ReferenceClassDescImpl and ConstantDescs in AOTClassInitializer::can_archive_initialized_mirror? I've added the following comments into `AOTClassInitializer::can_archive_initialized_mirror()` } else if (ik->is_initialized() && (ik->name()->equals("jdk/internal/constant/PrimitiveClassDescImpl") || ik->name()->equals("jdk/internal/constant/ReferenceClassDescImpl") || ik->name()->equals("java/lang/constant/ConstantDescs"))) { // The above 3 classes are special cases needed to support the aot-caching of // java.lang.invoke.MethodType instances: // - MethodType points to sun.invoke.util.Wrapper enums // - The Wrapper enums point to static final fields in the above 3 classes. // E.g., ConstantDescs.CD_Boolean. // - If we re-run the of these 3 classes again during the production // run, ConstantDescs.CD_Boolean will get a new value that has a different // object identity than the value referenced the the Wrapper enums. // - However, Wrapper requires object identity (it allows the use of == to // test the equality of ClassDesc, etc). // Therefore, we must preserve the static fields of these 3 classes from // the assembly phase. return true; > The comment seems to indicate this function should initialize the classes that are referenced by archived mirrors of aot-initialized classes. But there is no check for aot-initialized class in the body. I amended the comments above `HeapShared::init_classes_reachable_from_archived_mirrors()` to say: // enum Fruit { // APPLE, ORANGE, BANANA; // static final Set HAVE_SEEDS = new HashSet<>(Arrays.asList(APPLE, ORANGE)); // } // // the pre-inited mirror of Fruit references HashSet, which should be initialized // before any Java code can access the Fruit class. Note that HashSet itself doesn't // necessary need to be an aot-initialized class. That's why there's no check for `k` to be aot-initialized. This is necessary because the `` of `k` could have environment dependencies so it cannot be aot-initialized. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20958#discussion_r1764254366 From iklam at openjdk.org Wed Sep 18 01:09:09 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 18 Sep 2024 01:09:09 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v3] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 20:55:19 GMT, Ashutosh Mehra wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap >> - @vnkozlov comments >> - Clean up; removed unrelated changes in classPrinter.cpp >> - more cleanup >> - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap >> - More clean up for JDK-8293187 >> - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap >> - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap >> - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap >> - Simplified implemented by AOTClassInitializer. >> - ... and 1 more: https://git.openjdk.org/jdk/compare/bcddf963...e15e76cd > > src/hotspot/share/cds/aotLinkedClassBulkLoader.cpp line 242: > >> 240: // - classes that were AOT-initialized by AOTClassInitializer >> 241: // - the classes of all objects that are reachable from the archived mirrors of >> 242: // the AOT-linked classes for . > > It seems this function covers the first two categories, but not the AOT-linked classes. Is that correct? Correct. Not all aot-linked classes are initialized by this function. Most them are initialized on first use. > src/hotspot/share/oops/instanceKlass.cpp line 846: > >> 844: // If we have a preinit mirror, we may come to here if a supertype is not >> 845: // yet initialized. It will still be quicker than usual, as we will skip the >> 846: // execution of of this class. > > How do we ensure clinit is skipped in this case? There's a test like this inside `InstanceKlass::call_class_initializer()` if (has_aot_initialized_mirror() && CDSConfig::is_loading_heap()) { return; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20958#discussion_r1764256468 PR Review Comment: https://git.openjdk.org/jdk/pull/20958#discussion_r1764257341 From iklam at openjdk.org Wed Sep 18 01:16:49 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 18 Sep 2024 01:16:49 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v5] In-Reply-To: References: Message-ID: > This is the 4th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > **Problem:** > > This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store [`sun.invoke.util.Wrapper`](https://github.com/openjdk/jdk/blob/c3711dc90980fb3e63ff199612c201c4464626bf/src/java.base/share/classes/sun/invoke/util/Wrapper.java) enums in the AOT cache. Although CDS has some limited support for storing enums, the `Wrapper` type is too complex for the existing solution to handle. Please see the JBS issue for details. > > **Solution:** > > In the assembly phase, we store the initialized states of the `Wrapper` class (captured in a `java.lang.Class` object, a.k.a. the *mirror* of this class) into the AOT cache. > > In production run, we no longer execute `Wrapper::`, because all the static fields (contained in its mirror) are already initialized. > > **Review Notes:** > > - The new capability is controlled by `CDSConfig::is_initing_classes_at_dump_time()`. We can aot-initialize classes only if `-XX:+AOTClassLinking` is enabled. > - The old (more limited) support for enums is still there (it's required when `AOTClassLinking` is disabled). See the call to `CDSEnumKlass::handle_enum_obj()` in heapShared.cpp. > - `AOTClassInitializer::can_archive_initialized_mirror()` decides what classes can be aot-initialized. This is currently a very small set of classes, but will expand in [JDK-8293336](https://bugs.openjdk.org/browse/JDK-8293336) > - Before, `HeapShared::archive_java_mirrors()` would clear out all the states in the archived mirrors. With this PR, the states of aot-initialized classes are preserved via `HeapShared::copy_aot_initialized_mirror()`. > - During the early state of the production run, `AOTLinkedClassBulkLoader::init_required_classes_for_loader()` is called to make sure that: > - all aot-initialized classes are moved into the `initialized` state (without executing its `` method). This is done in `InstanceKlass::initialize_from_cds()` > - the classes of all the objects that are reachable from the aot-initialized mirrors are initialized. See comments above ` HeapShared::init_classes_reachable_from_archived_mirrors()` > > **Caveats:** > > Not all Enum classes can be stored in the initialized state. E.g., some Enums might have static fields that depend on the environment: > > > enum Foo { > [....] > static fin... Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: @ashu-mehra comment: assert that ConstantDescs, etc, must be initialized ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20958/files - new: https://git.openjdk.org/jdk/pull/20958/files/21b8f6f9..aa9629df Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20958&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20958&range=03-04 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20958.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20958/head:pull/20958 PR: https://git.openjdk.org/jdk/pull/20958 From iklam at openjdk.org Wed Sep 18 01:16:50 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 18 Sep 2024 01:16:50 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v3] In-Reply-To: <0QRMVGKYDVfu4Ie1N6RKC5B1QPxs7sQUvdnyZxayX9o=.e4fe8dc7-4cc8-44ba-99e7-dc5cacd85147@github.com> References: <0QRMVGKYDVfu4Ie1N6RKC5B1QPxs7sQUvdnyZxayX9o=.e4fe8dc7-4cc8-44ba-99e7-dc5cacd85147@github.com> Message-ID: On Tue, 17 Sep 2024 20:33:28 GMT, Ashutosh Mehra wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap >> - @vnkozlov comments >> - Clean up; removed unrelated changes in classPrinter.cpp >> - more cleanup >> - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap >> - More clean up for JDK-8293187 >> - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap >> - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap >> - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap >> - Simplified implemented by AOTClassInitializer. >> - ... and 1 more: https://git.openjdk.org/jdk/compare/bcddf963...e15e76cd > > src/hotspot/share/cds/aotClassInitializer.cpp line 40: > >> 38: return true; >> 39: } else if (ik->is_initialized() && >> 40: (ik->name()->equals("jdk/internal/constant/PrimitiveClassDescImpl") || > > Is it possible for one of these classes to be not initialized at this stage? IIUC `ik` must be initialized if it is one of these classes. In so, can `ik->is_initialized()` be turned into an assert? I added the assert as you suggested. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20958#discussion_r1764260436 From dholmes at openjdk.org Wed Sep 18 01:37:04 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 18 Sep 2024 01:37:04 GMT Subject: RFR: 8328313: Archived module graph should allow identical --module-path to be specified during dump time and run time In-Reply-To: References: Message-ID: <8QVb7SLhVWE1Q0U7_2oOpcltcNbKEYw4PJ1zi01U-tc=.207e30c8-4c9b-40da-a3a5-9389c4fcc44b@github.com> On Tue, 17 Sep 2024 23:44:40 GMT, Calvin Cheung wrote: > Prior to this patch, if `--module-path` is specified in the command line: > during CDS dump time, full module graph will not be included in the CDS archive; > during run time, full module graph will not be used. > > With this patch, the full module graph will be included in the CDS archive with the `--module-path` option. During run time, if the same `--module-path` option is specified, the archived module graph will be used. > > The checking of module paths between dump time and run time is more lenient compared with the checking of class paths; the ordering of the modules is unimportant, duplicate module names are ignored. > E.g. the following is considered a match: > dump time runtime > m1,m2 m2,m1 > m1,m2 m1,m2,m2 > > I included some [notes](https://bugs.openjdk.org/browse/JDK-8328313?focusedId=14699275&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14699275) in the bug report regarding some changes in the corelib classes. I've taken an initial pass through and this seems reasonable (a bit more complicated than the description suggested :) ). Thanks src/hotspot/share/cds/filemap.cpp line 931: > 929: bool FileMapInfo::is_jar_suffix(const char* filename) { > 930: const char* dot = strrchr(filename, '.'); > 931: if (strcmp(dot + 1, "jar") == 0) { What if there is no dot? We need a null check. src/hotspot/share/cds/filemap.hpp line 558: > 556: unsigned int dumptime_prefix_len, > 557: unsigned int runtime_prefix_len) NOT_CDS_RETURN_(false); > 558: bool is_jar_suffix(const char* filename); Suggestion: has_jar_suffix src/hotspot/share/cds/heapShared.cpp line 884: > 882: ClassLoaderExt::num_module_paths() > 0) { > 883: log_info(cds, heap)(" is_using_optimized_module_handling %d num_module_paths %d jdk.module.main %s", > 884: CDSConfig::is_using_optimized_module_handling(), ClassLoaderExt::num_module_paths(), Arguments::get_property("jdk.module.main")); Why are you printing a bool value as an int? I'm surprised one of the format checkers doesn't complain about it. ------------- PR Review: https://git.openjdk.org/jdk/pull/21048#pullrequestreview-2311418590 PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1764261758 PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1764267214 PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1764268462 From iklam at openjdk.org Wed Sep 18 01:53:51 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 18 Sep 2024 01:53:51 GMT Subject: RFR: 8293337: Store method handle intrinsics in AOT cache [v3] In-Reply-To: <4fMDUSZRg0HcIiZmr-yqr7vleFXrD_zNXpdd_pfgHQ8=.4a679e86-b0f9-4101-bcc2-f49d8bcb417b@github.com> References: <4fMDUSZRg0HcIiZmr-yqr7vleFXrD_zNXpdd_pfgHQ8=.4a679e86-b0f9-4101-bcc2-f49d8bcb417b@github.com> Message-ID: > This is the 5th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store Java heap objects that have native pointers to the C++ `Method` objects returned by `SystemDictionary::find_method_handle_intrinsic()` > > These `Method` objects are created within the JVM. They do not belong to any actual Java classes. We store all these `Method` objects into the AOT cache, so that they can be referenced by other artifacts in the AOT cache. > > --- > See [here](https://bugs.openjdk.org/browse/JDK-8315737) for the sequence of dependent RFEs for implementing JEP 483. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics - @vnkozlov comment - added NOT_CDS_RETURN - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics - some clean up - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics - 8293337: Archive method handle intrinsics ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20959/files - new: https://git.openjdk.org/jdk/pull/20959/files/a57e9f00..16b51d55 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20959&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20959&range=01-02 Stats: 224 lines in 10 files changed: 97 ins; 74 del; 53 mod Patch: https://git.openjdk.org/jdk/pull/20959.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20959/head:pull/20959 PR: https://git.openjdk.org/jdk/pull/20959 From iklam at openjdk.org Wed Sep 18 01:53:56 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 18 Sep 2024 01:53:56 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v6] In-Reply-To: References: Message-ID: <-4LeCYQ1YsDwY9O5gJyBjh7M3e7Cy2bgdaABkTZlIHU=.a554c02b-53ac-462c-bd46-d183bad75fe6@github.com> > This is the 4th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > **Problem:** > > This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store [`sun.invoke.util.Wrapper`](https://github.com/openjdk/jdk/blob/c3711dc90980fb3e63ff199612c201c4464626bf/src/java.base/share/classes/sun/invoke/util/Wrapper.java) enums in the AOT cache. Although CDS has some limited support for storing enums, the `Wrapper` type is too complex for the existing solution to handle. Please see the JBS issue for details. > > **Solution:** > > In the assembly phase, we store the initialized states of the `Wrapper` class (captured in a `java.lang.Class` object, a.k.a. the *mirror* of this class) into the AOT cache. > > In production run, we no longer execute `Wrapper::`, because all the static fields (contained in its mirror) are already initialized. > > **Review Notes:** > > - The new capability is controlled by `CDSConfig::is_initing_classes_at_dump_time()`. We can aot-initialize classes only if `-XX:+AOTClassLinking` is enabled. > - The old (more limited) support for enums is still there (it's required when `AOTClassLinking` is disabled). See the call to `CDSEnumKlass::handle_enum_obj()` in heapShared.cpp. > - `AOTClassInitializer::can_archive_initialized_mirror()` decides what classes can be aot-initialized. This is currently a very small set of classes, but will expand in [JDK-8293336](https://bugs.openjdk.org/browse/JDK-8293336) > - Before, `HeapShared::archive_java_mirrors()` would clear out all the states in the archived mirrors. With this PR, the states of aot-initialized classes are preserved via `HeapShared::copy_aot_initialized_mirror()`. > - During the early state of the production run, `AOTLinkedClassBulkLoader::init_required_classes_for_loader()` is called to make sure that: > - all aot-initialized classes are moved into the `initialized` state (without executing its `` method). This is done in `InstanceKlass::initialize_from_cds()` > - the classes of all the objects that are reachable from the aot-initialized mirrors are initialized. See comments above ` HeapShared::init_classes_reachable_from_archived_mirrors()` > > **Caveats:** > > Not all Enum classes can be stored in the initialized state. E.g., some Enums might have static fields that depend on the environment: > > > enum Foo { > [....] > static fin... Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap - @ashu-mehra comment: assert that ConstantDescs, etc, must be initialized - Improved in-line comments - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap - @vnkozlov comments - Clean up; removed unrelated changes in classPrinter.cpp - more cleanup - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap - More clean up for JDK-8293187 - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap - ... and 4 more: https://git.openjdk.org/jdk/compare/bedf9a26...36baf574 ------------- Changes: https://git.openjdk.org/jdk/pull/20958/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20958&range=05 Stats: 826 lines in 20 files changed: 753 ins; 16 del; 57 mod Patch: https://git.openjdk.org/jdk/pull/20958.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20958/head:pull/20958 PR: https://git.openjdk.org/jdk/pull/20958 From iklam at openjdk.org Wed Sep 18 02:31:43 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 18 Sep 2024 02:31:43 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v7] In-Reply-To: References: Message-ID: > This is the 4th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > **Problem:** > > This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store [`sun.invoke.util.Wrapper`](https://github.com/openjdk/jdk/blob/c3711dc90980fb3e63ff199612c201c4464626bf/src/java.base/share/classes/sun/invoke/util/Wrapper.java) enums in the AOT cache. Although CDS has some limited support for storing enums, the `Wrapper` type is too complex for the existing solution to handle. Please see the JBS issue for details. > > **Solution:** > > In the assembly phase, we store the initialized states of the `Wrapper` class (captured in a `java.lang.Class` object, a.k.a. the *mirror* of this class) into the AOT cache. > > In production run, we no longer execute `Wrapper::`, because all the static fields (contained in its mirror) are already initialized. > > **Review Notes:** > > - The new capability is controlled by `CDSConfig::is_initing_classes_at_dump_time()`. We can aot-initialize classes only if `-XX:+AOTClassLinking` is enabled. > - The old (more limited) support for enums is still there (it's required when `AOTClassLinking` is disabled). See the call to `CDSEnumKlass::handle_enum_obj()` in heapShared.cpp. > - `AOTClassInitializer::can_archive_initialized_mirror()` decides what classes can be aot-initialized. This is currently a very small set of classes, but will expand in [JDK-8293336](https://bugs.openjdk.org/browse/JDK-8293336) > - Before, `HeapShared::archive_java_mirrors()` would clear out all the states in the archived mirrors. With this PR, the states of aot-initialized classes are preserved via `HeapShared::copy_aot_initialized_mirror()`. > - During the early state of the production run, `AOTLinkedClassBulkLoader::init_required_classes_for_loader()` is called to make sure that: > - all aot-initialized classes are moved into the `initialized` state (without executing its `` method). This is done in `InstanceKlass::initialize_from_cds()` > - the classes of all the objects that are reachable from the aot-initialized mirrors are initialized. See comments above ` HeapShared::init_classes_reachable_from_archived_mirrors()` > > **Caveats:** > > Not all Enum classes can be stored in the initialized state. E.g., some Enums might have static fields that depend on the environment: > > > enum Foo { > [....] > static fin... Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: fixed merge ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20958/files - new: https://git.openjdk.org/jdk/pull/20958/files/36baf574..cfe2cc8b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20958&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20958&range=05-06 Stats: 9 lines in 1 file changed: 4 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20958.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20958/head:pull/20958 PR: https://git.openjdk.org/jdk/pull/20958 From iklam at openjdk.org Wed Sep 18 02:57:47 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 18 Sep 2024 02:57:47 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v9] In-Reply-To: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: > This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > **Overview** > > - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. > - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. > - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. > - The boot classes are loaded as part of `vmClasses::resolve_all()` > - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). > - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. > > **All-or-nothing Loading** > > - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. > - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: > - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. > - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. > - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. > - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exact same set of modules are visible whe... Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: Fixed ZERO build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20843/files - new: https://git.openjdk.org/jdk/pull/20843/files/bedf9a26..be1d0ef1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20843&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20843&range=07-08 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20843.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20843/head:pull/20843 PR: https://git.openjdk.org/jdk/pull/20843 From iklam at openjdk.org Wed Sep 18 02:59:37 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 18 Sep 2024 02:59:37 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v8] In-Reply-To: References: Message-ID: <1XujOZAE9Zl3KSlZAtUSPssVetp_bXJ58iWhgY0PYZE=.65bf692f-ad61-4252-b23f-2acca72ce1cf@github.com> > This is the 4th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > **Problem:** > > This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store [`sun.invoke.util.Wrapper`](https://github.com/openjdk/jdk/blob/c3711dc90980fb3e63ff199612c201c4464626bf/src/java.base/share/classes/sun/invoke/util/Wrapper.java) enums in the AOT cache. Although CDS has some limited support for storing enums, the `Wrapper` type is too complex for the existing solution to handle. Please see the JBS issue for details. > > **Solution:** > > In the assembly phase, we store the initialized states of the `Wrapper` class (captured in a `java.lang.Class` object, a.k.a. the *mirror* of this class) into the AOT cache. > > In production run, we no longer execute `Wrapper::`, because all the static fields (contained in its mirror) are already initialized. > > **Review Notes:** > > - The new capability is controlled by `CDSConfig::is_initing_classes_at_dump_time()`. We can aot-initialize classes only if `-XX:+AOTClassLinking` is enabled. > - The old (more limited) support for enums is still there (it's required when `AOTClassLinking` is disabled). See the call to `CDSEnumKlass::handle_enum_obj()` in heapShared.cpp. > - `AOTClassInitializer::can_archive_initialized_mirror()` decides what classes can be aot-initialized. This is currently a very small set of classes, but will expand in [JDK-8293336](https://bugs.openjdk.org/browse/JDK-8293336) > - Before, `HeapShared::archive_java_mirrors()` would clear out all the states in the archived mirrors. With this PR, the states of aot-initialized classes are preserved via `HeapShared::copy_aot_initialized_mirror()`. > - During the early state of the production run, `AOTLinkedClassBulkLoader::init_required_classes_for_loader()` is called to make sure that: > - all aot-initialized classes are moved into the `initialized` state (without executing its `` method). This is done in `InstanceKlass::initialize_from_cds()` > - the classes of all the objects that are reachable from the aot-initialized mirrors are initialized. See comments above ` HeapShared::init_classes_reachable_from_archived_mirrors()` > > **Caveats:** > > Not all Enum classes can be stored in the initialized state. E.g., some Enums might have static fields that depend on the environment: > > > enum Foo { > [....] > static fin... Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - erge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap - fixed merge - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap - @ashu-mehra comment: assert that ConstantDescs, etc, must be initialized - Improved in-line comments - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap - @vnkozlov comments - Clean up; removed unrelated changes in classPrinter.cpp - more cleanup - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap - ... and 6 more: https://git.openjdk.org/jdk/compare/be1d0ef1...0970a0e2 ------------- Changes: https://git.openjdk.org/jdk/pull/20958/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20958&range=07 Stats: 830 lines in 20 files changed: 757 ins; 16 del; 57 mod Patch: https://git.openjdk.org/jdk/pull/20958.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20958/head:pull/20958 PR: https://git.openjdk.org/jdk/pull/20958 From iklam at openjdk.org Wed Sep 18 03:02:19 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 18 Sep 2024 03:02:19 GMT Subject: RFR: 8293337: Store method handle intrinsics in AOT cache [v4] In-Reply-To: <4fMDUSZRg0HcIiZmr-yqr7vleFXrD_zNXpdd_pfgHQ8=.4a679e86-b0f9-4101-bcc2-f49d8bcb417b@github.com> References: <4fMDUSZRg0HcIiZmr-yqr7vleFXrD_zNXpdd_pfgHQ8=.4a679e86-b0f9-4101-bcc2-f49d8bcb417b@github.com> Message-ID: > This is the 5th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store Java heap objects that have native pointers to the C++ `Method` objects returned by `SystemDictionary::find_method_handle_intrinsic()` > > These `Method` objects are created within the JVM. They do not belong to any actual Java classes. We store all these `Method` objects into the AOT cache, so that they can be referenced by other artifacts in the AOT cache. > > --- > See [here](https://bugs.openjdk.org/browse/JDK-8315737) for the sequence of dependent RFEs for implementing JEP 483. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics - @vnkozlov comment - added NOT_CDS_RETURN - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics - some clean up - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics - ... and 1 more: https://git.openjdk.org/jdk/compare/cf26d3a4...988f101c ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20959/files - new: https://git.openjdk.org/jdk/pull/20959/files/16b51d55..988f101c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20959&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20959&range=02-03 Stats: 10 lines in 1 file changed: 5 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20959.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20959/head:pull/20959 PR: https://git.openjdk.org/jdk/pull/20959 From dholmes at openjdk.org Wed Sep 18 04:02:10 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 18 Sep 2024 04:02:10 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v8] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 14:14:31 GMT, Simon Tooke wrote: >> This PR changes the status of realpath() from a Posix-specific API to a globally available API, i.e. adding it to the "Hotspot Porting API". Code would refer to os::realpath() instead of os::Posix::realpath(). >> >> This requires the addition of a stub routine in os_posix.cpp and a Windows implementation of realpath(), using Windows _fullpath(). >> >> This PR depends on #20597 in that it removes the need for one #ifdef in that PR. Because of that, this PR will be modified when and if #20597 is integrated (or vice-versa) >> >> Please note that guidelines for doing this appear in src/hotspot/share/runtime/os.hpp > > Simon Tooke has updated the pull request incrementally with one additional commit since the last revision: > > Define MAX_PATH if required test/hotspot/gtest/runtime/test_os.cpp line 386: > 384: const char* returnedBuffer = os::realpath(path, buffer, 10); > 385: EXPECT_TRUE(errno == ENAMETOOLONG); > 386: EXPECT_TRUE(returnedBuffer == nullptr); Indentation is off through this code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1764358363 From dholmes at openjdk.org Wed Sep 18 05:40:11 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 18 Sep 2024 05:40:11 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v9] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Wed, 18 Sep 2024 02:57:47 GMT, Ioi Lam wrote: >> This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> **Overview** >> >> - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. >> - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. >> - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. >> - The boot classes are loaded as part of `vmClasses::resolve_all()` >> - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). >> - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. >> >> **All-or-nothing Loading** >> >> - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. >> - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: >> - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. >> - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. >> - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. >> - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exa... > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > Fixed ZERO build I have taken another pass through. A few queries and small items. Thanks src/hotspot/share/cds/aotLinkedClassBulkLoader.cpp line 66: > 64: > 65: void AOTLinkedClassBulkLoader::load_classes_in_loader(JavaThread* current, AOTLinkedClassCategory class_category, oop class_loader_oop) { > 66: ExceptionMark em(current); Why do you need the EM when you are explicitly checking for exceptions? src/hotspot/share/cds/aotLinkedClassBulkLoader.cpp line 67: > 65: void AOTLinkedClassBulkLoader::load_classes_in_loader(JavaThread* current, AOTLinkedClassCategory class_category, oop class_loader_oop) { > 66: ExceptionMark em(current); > 67: ResourceMark rm(current); The RM should go where it is actually needed for the logging. src/hotspot/share/cds/aotLinkedClassBulkLoader.cpp line 68: > 66: ExceptionMark em(current); > 67: ResourceMark rm(current); > 68: HandleMark hm(current); Why do you need a HM here? src/hotspot/share/cds/aotLinkedClassBulkLoader.cpp line 95: > 93: > 94: if (Universe::is_fully_initialized() && VerifyDuringStartup) { > 95: // Make sure we're still in a clean slate. Suggestion: // Make sure we're still in a clean state. src/hotspot/share/cds/aotLinkedClassBulkLoader.cpp line 132: > 130: break; > 131: case AOTLinkedClassCategory::UNREGISTERED: > 132: ShouldNotReachHere(); // Currently aot-linked classes are not supported for this category. Suggestion: case AOTLinkedClassCategory::UNREGISTERED: default: ShouldNotReachHere(); // Currently aot-linked classes are not supported for this category. src/hotspot/share/cds/aotLinkedClassBulkLoader.cpp line 170: > 168: log_error(cds)("Unable to resolve %s class from CDS archive: %s", category_name, ik->external_name()); > 169: log_error(cds)("Expected: " INTPTR_FORMAT ", actual: " INTPTR_FORMAT, p2i(ik), p2i(actual)); > 170: log_error(cds)("JVMTI class retransformation is not supported when archive was generated with -XX:+AOTClassLinking."); Nit: use a `logStream` instead of the three separate calls. src/hotspot/share/cds/aotLinkedClassTable.hpp line 34: > 32: class SerializeClosure; > 33: > 34: // Classes to be buik-loaded, in the "linked" state, at VM bootstrap. Suggestion: // Classes to be bulk-loaded, in the "linked" state, at VM bootstrap. src/hotspot/share/cds/archiveBuilder.cpp line 316: > 314: > 315: if (CDSConfig::is_dumping_aot_linked_classes()) { > 316: _estimated_hashtable_bytes += _klasses->length() * 16 * sizeof(Klass*); Why 16? src/hotspot/share/cds/archiveBuilder.cpp line 877: > 875: if (ik->is_hidden()) { > 876: ADD_COUNT(num_hidden_klasses); > 877: hidden = " hidden"; Why not do this at the same time you do the other hidden class updates above? src/hotspot/share/cds/cds_globals.hpp line 99: > 97: \ > 98: /*========== New "AOT" flags =========================================*/ \ > 99: /* The following 3 flags are aliases of -Xshare:dump, */ \ Nit: align the `*/`. src/hotspot/share/classfile/systemDictionary.cpp line 139: > 137: if (_java_platform_loader.is_empty()) { > 138: oop platform_loader = get_platform_class_loader_impl(CHECK); > 139: _java_platform_loader = OopHandle(Universe::vm_global(), platform_loader); Why has the order been switched here? test/hotspot/jtreg/runtime/cds/appcds/aotClassLinking/AOTClassLinkingVMOptions.java line 57: > 55: testCase("Archived full module graph must be enabled at runtime"); > 56: TestCommon.run("-cp", appJar, "-Djdk.module.validation=1", "Hello") > 57: .assertAbnormalExit("CDS archive has aot-linked classes." + Nit: align the dots ------------- PR Review: https://git.openjdk.org/jdk/pull/20843#pullrequestreview-2311449272 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1764390816 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1764391068 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1764391282 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1764392237 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1764393538 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1764394331 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1764395970 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1764396906 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1764402272 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1764405309 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1764412019 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1764414682 From dholmes at openjdk.org Wed Sep 18 05:40:15 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 18 Sep 2024 05:40:15 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v8] In-Reply-To: <7FyX0AFx1IRbgWFlAvOwJYgv-bgJ4w8E56h6DXSrGow=.4cfe39cc-d85e-47ea-ad6c-25cf19f6be24@github.com> References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> <7FyX0AFx1IRbgWFlAvOwJYgv-bgJ4w8E56h6DXSrGow=.4cfe39cc-d85e-47ea-ad6c-25cf19f6be24@github.com> Message-ID: On Tue, 17 Sep 2024 23:29:46 GMT, Ioi Lam wrote: >> This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> **Overview** >> >> - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. >> - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. >> - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. >> - The boot classes are loaded as part of `vmClasses::resolve_all()` >> - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). >> - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. >> >> **All-or-nothing Loading** >> >> - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. >> - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: >> - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. >> - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. >> - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. >> - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exa... > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > minor comment fix src/hotspot/share/cds/aotClassLinker.cpp line 121: > 119: assert(is_initialized(), "sanity"); > 120: > 121: if (!CDSConfig::is_dumping_aot_linked_classes() || !SystemDictionaryShared::is_builtin(ik)) { Shouldn't the CDSConfig check just be an assert - the caller is expected to check before trying to add candidates? src/hotspot/share/cds/aotClassLinker.cpp line 145: > 143: return false; > 144: } > 145: } Are we concerned with the possibility that we might be able to add some interfaces but not all, hence returning false, but with a subset of interfaces already added to the candidate list? I don't think it should be possible, but the code structure makes it look like it could be possible. src/hotspot/share/cds/aotClassLinker.cpp line 191: > 189: if (ik->class_loader() != class_loader) { > 190: continue; > 191: } This seems very inefficient. We call `write_classes` 4 times, potentially with different loaders. Because the candidates are sorted the classes belonging to the same loader are likely to be grouped due to package names. So the app loader classes are likely to be right at end, and we have to traverse all the boot/platform classes first before we get to them. Conversely after we have encountered the last boot loader class (for example) we keep the scanning the entire list. If the set were ordered based on loader then name, we would be able to stop once we see the loader change to not being the desired one. And a binaery search would let you find the start of a section more quickly. src/hotspot/share/cds/aotClassLinker.cpp line 194: > 192: if ((ik->module() == ModuleEntryTable::javabase_moduleEntry()) != is_javabase) { > 193: continue; > 194: } Why do we process system loader classes (i.e. application loader classes) if they need to be in java.base, as the application classes will never be in java.base. ??? src/hotspot/share/cds/aotClassLinker.cpp line 198: > 196: if (ik->is_shared() && CDSConfig::is_dumping_dynamic_archive()) { > 197: if (CDSConfig::is_using_aot_linked_classes()) { > 198: // This class was recorded as a AOT-linked for the base archive, Suggestion: // This class was recorded as AOT-linked for the base archive, src/hotspot/share/cds/aotClassLinker.cpp line 212: > 210: } else { > 211: const char* category = class_category_name(list.at(0)); > 212: log_info(cds, aot, link)("written %d class(es) for category %s", list.length(), category); Suggestion: log_info(cds, aot, link)("wrote %d class(es) for category %s", list.length(), category); src/hotspot/share/cds/aotClassLinker.hpp line 60: > 58: // - The visibility of C > 59: // > 60: // During an Production Run, the JVM can use an AOTCache with an AOTLinkedClassTable Suggestion: // During a Production Run, the JVM can use an AOTCache with an AOTLinkedClassTable src/hotspot/share/cds/aotClassLinker.hpp line 100: > 98: static bool is_vm_class(InstanceKlass* ik); > 99: > 100: // When CDS is enabled, is ik guatanteed to be linked at deployment time (and Suggestion: // When CDS is enabled, is ik guaranteed to be linked at deployment time (and ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1764280936 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1764279963 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1764292759 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1764293783 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1764293980 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1764294382 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1764295867 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1764298973 From dholmes at openjdk.org Wed Sep 18 05:40:15 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 18 Sep 2024 05:40:15 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v8] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> <7FyX0AFx1IRbgWFlAvOwJYgv-bgJ4w8E56h6DXSrGow=.4cfe39cc-d85e-47ea-ad6c-25cf19f6be24@github.com> Message-ID: On Wed, 18 Sep 2024 01:59:34 GMT, David Holmes wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> minor comment fix > > src/hotspot/share/cds/aotClassLinker.hpp line 60: > >> 58: // - The visibility of C >> 59: // >> 60: // During an Production Run, the JVM can use an AOTCache with an AOTLinkedClassTable > > Suggestion: > > // During a Production Run, the JVM can use an AOTCache with an AOTLinkedClassTable Why is "Production Run" capitalized? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1764296043 From fyang at openjdk.org Wed Sep 18 06:26:06 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 18 Sep 2024 06:26:06 GMT Subject: RFR: 8339771: RISC-V: Reduce icache flushes [v2] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 12:53:18 GMT, Robbin Ehn wrote: >> Hey, please consider, >> >> All code which is offline (behind a barrier) do not need global icache flushes. >> As we can instead in slow path locally (thread and hart) emit fence.i. >> But if we were to be context switch do a hart which have not had fence.i emitted we can still fetch stale instructions. >> To handle this case new now have kernel support: >> https://docs.kernel.org/arch/riscv/cmodx.html >> >> It's not perfect as we will be emitting fence.i on any context switch for any thread with this patch, even if that thread do not execute on code heap (non attached native thread), and even if there was no changes to code heap. >> But this is in many cases much faster as the icache flush global IPI is very intrusive. >> Particular cases are running a concurrent gc with small head room. >> In such scenario I measured 15% increased throughput on VF2. >> A large CPU or less head room (faster GC cycles) will yield even more performance boost. >> >> Note that this requires 6.10 kernel. >> >> I'm running VF2 with 6.11-rc3 kernel and this passed tier1-3. (With setting on) >> >> Later we probably want this default on, but as it's hard to test I'll leave default off. > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Comment, moved init after feature enabling Thanks for sharing the performance numbers. It's a pity that I can't try this on my big RV machine for now which runs an older customized 6.1 kernel. BTW: Should we add back `verify_cross_modify_fence_not_required()` which was called in `MacroAssembler::read_polling_page()` & `MacroAssembler::build_frame()` and removed by https://github.com/openjdk/jdk/pull/9770? src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3059: > 3057: void MacroAssembler::cmodx_fence() { > 3058: BLOCK_COMMENT("cmodx fence"); > 3059: if (VM_Version::supports_fencei_barrier()) { Seems more reasonable to turn this into an assertion? `assert(VM_Version::supports_fencei_barrier(), "must be");` ------------- PR Review: https://git.openjdk.org/jdk/pull/20913#pullrequestreview-2311569204 PR Review Comment: https://git.openjdk.org/jdk/pull/20913#discussion_r1764363758 From rcastanedalo at openjdk.org Wed Sep 18 07:18:12 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 18 Sep 2024 07:18:12 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v22] In-Reply-To: <9qt_iRfNSfdLuraZ18LqQx_xVt7xNPF9SBRZXwWkIig=.f86597de-6440-4235-9152-55b3a08c0d89@github.com> References: <9qt_iRfNSfdLuraZ18LqQx_xVt7xNPF9SBRZXwWkIig=.f86597de-6440-4235-9152-55b3a08c0d89@github.com> Message-ID: On Tue, 17 Sep 2024 16:09:30 GMT, Vladimir Kozlov wrote: > Looks good to me. Thanks for reviewing, Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2357686525 From jbhateja at openjdk.org Wed Sep 18 07:21:52 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 18 Sep 2024 07:21:52 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v12] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Incorporating review and documentation suggestions. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/29530047..31a58642 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=10-11 Stats: 96 lines in 8 files changed: 25 ins; 0 del; 71 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From jbhateja at openjdk.org Wed Sep 18 07:21:52 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 18 Sep 2024 07:21:52 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v7] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7bghGF2-qbhP1hJA2ljtdA3xSUSqiV0RLaOYm4AcZSQ=.eb3e36b2-5461-4755-ae71-2de89660649f@github.com> Message-ID: On Fri, 13 Sep 2024 14:49:01 GMT, Emanuel Peter wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 544: >> >>> 542: byte[] vpayload1 = ((ByteVector)v1).vec(); >>> 543: byte[] vpayload2 = ((ByteVector)v2).vec(); >>> 544: byte[] vpayload3 = ((ByteVector)v3).vec(); >> >> Is there a reason you are not using more descriptive names here instead of `vpayload1`? >> I also wonder if the `selectFromHelper` should not be named more specifically: `selectFromTwoVector(s)Helper`? > > You only gave me a thumbs up and no change - but comment resolved. Is that intentional? Makes me feel like you are ignoring my comments, and that discourages me from reviewing in the future. Routine was renamed as per you suggestion and first vector argument also appropriately renamed to wrappedIndex. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1764527888 From rcastanedalo at openjdk.org Wed Sep 18 07:49:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 18 Sep 2024 07:49:52 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v23] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with seven additional commits since the last revision: - Assert that unneeded stub tmp registers are not initialized in x64 and aarch64 platforms - Set tmp registers to noreg by default in G1PreBarrierStubC2::initialize_registers, for consistency - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion - Restore some asserts - Default values for tmp regs of G1PostBarrierStubC2 - 8334060: [arm32] Implementation of Late Barrier Expansion for G1 - 8330685: [arm32] share barrier spilling logic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/71a51bfc..13b93bd9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=21-22 Stats: 614 lines in 12 files changed: 521 ins; 36 del; 57 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Wed Sep 18 08:00:30 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 18 Sep 2024 08:00:30 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v23] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 07:49:52 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with seven additional commits since the last revision: > > - Assert that unneeded stub tmp registers are not initialized in x64 and aarch64 platforms > - Set tmp registers to noreg by default in G1PreBarrierStubC2::initialize_registers, for consistency > - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion > - Restore some asserts > - Default values for tmp regs of G1PostBarrierStubC2 > - 8334060: [arm32] Implementation of Late Barrier Expansion for G1 > - 8330685: [arm32] share barrier spilling logic Thanks for the arm 32-bits port @snazarkin! Merged in commit 3957c03f. Besides the arm 32-bits port, @snazarkin's changeset includes adding the possibility to use a third temporary register in the platform-independent class `G1PostBarrierStubC2`. This temporary register (`G1PostBarrierStubC2::_tmp3`) is initialized to `noreg` by default in `G1PostBarrierStubC2::initialize_registers`, so no other platform should be affected. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2357765066 From stooke at openjdk.org Wed Sep 18 08:06:40 2024 From: stooke at openjdk.org (Simon Tooke) Date: Wed, 18 Sep 2024 08:06:40 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v9] In-Reply-To: References: Message-ID: > This PR changes the status of realpath() from a Posix-specific API to a globally available API, i.e. adding it to the "Hotspot Porting API". Code would refer to os::realpath() instead of os::Posix::realpath(). > > This requires the addition of a stub routine in os_posix.cpp and a Windows implementation of realpath(), using Windows _fullpath(). > > This PR depends on #20597 in that it removes the need for one #ifdef in that PR. Because of that, this PR will be modified when and if #20597 is integrated (or vice-versa) > > Please note that guidelines for doing this appear in src/hotspot/share/runtime/os.hpp Simon Tooke has updated the pull request incrementally with one additional commit since the last revision: fix realpath() test for POSIX ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20683/files - new: https://git.openjdk.org/jdk/pull/20683/files/7757e90e..ee870f7c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20683&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20683&range=07-08 Stats: 19 lines in 1 file changed: 9 ins; 2 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/20683.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20683/head:pull/20683 PR: https://git.openjdk.org/jdk/pull/20683 From rehn at openjdk.org Wed Sep 18 08:34:06 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 18 Sep 2024 08:34:06 GMT Subject: RFR: 8339771: RISC-V: Reduce icache flushes [v2] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 04:10:43 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: >> >> Comment, moved init after feature enabling > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3059: > >> 3057: void MacroAssembler::cmodx_fence() { >> 3058: BLOCK_COMMENT("cmodx fence"); >> 3059: if (VM_Version::supports_fencei_barrier()) { > > Seems more reasonable to turn this into an assertion? `assert(VM_Version::supports_fencei_barrier(), "must be");` The cmodx_fence() is called after a safepoint or thread local handshake. Even if it costs tens? of cycles it is very little compared to hitting the poll, potentially comming from signal handler, hence therefore I kept it always 'on'. But we are only guaranteed to have fence.i when running on Linux, this is code is not specific to linux. I.e. MASM should be OS independent. As we already check this during boot in Linux specific code there is no way that assert can happen on Linux. But maybe our JDK implementation should require fence.i ? If we do then doing the assert make more sense, yes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20913#discussion_r1764626510 From rehn at openjdk.org Wed Sep 18 08:58:10 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 18 Sep 2024 08:58:10 GMT Subject: RFR: 8339771: RISC-V: Reduce icache flushes [v2] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 12:53:18 GMT, Robbin Ehn wrote: >> Hey, please consider, >> >> All code which is offline (behind a barrier) do not need global icache flushes. >> As we can instead in slow path locally (thread and hart) emit fence.i. >> But if we were to be context switch do a hart which have not had fence.i emitted we can still fetch stale instructions. >> To handle this case new now have kernel support: >> https://docs.kernel.org/arch/riscv/cmodx.html >> >> It's not perfect as we will be emitting fence.i on any context switch for any thread with this patch, even if that thread do not execute on code heap (non attached native thread), and even if there was no changes to code heap. >> But this is in many cases much faster as the icache flush global IPI is very intrusive. >> Particular cases are running a concurrent gc with small head room. >> In such scenario I measured 15% increased throughput on VF2. >> A large CPU or less head room (faster GC cycles) will yield even more performance boost. >> >> Note that this requires 6.10 kernel. >> >> I'm running VF2 with 6.11-rc3 kernel and this passed tier1-3. (With setting on) >> >> Later we probably want this default on, but as it's hard to test I'll leave default off. > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Comment, moved init after feature enabling > Thanks for sharing the performance numbers. It's a pity that I can't try this on my big RV machine for now which runs an older customized 6.1 kernel. BTW: Should we add similar checking like `verify_cross_modify_fence_not_required()` which was called in `MacroAssembler::read_polling_page()` & `MacroAssembler::build_frame()` and removed by #9770? (-XX:+VerifyCrossModifyFence) As we have two cases: Code stream changed during a safepoint you must emit before leaving the safepoint, which means e.g. a thread in native that returns to Java must emit it. Polls are only disarmed by threads them self and they always do a cmodx_fence after a disarm. (when they update the poll word) This case is already covered by VerifyCrossModifyFence. The second case, writing in the code stream in methods with nmethod barrier locked, would be essentially be a verfication of the barrier and the _patching_epoch. If this needs verification, we need that regardless of this patch since do need both of them working, otherwise we may enter a nmethod with bad oop and we do need the loadload fence before entering, otherwise we may load stale data. So I don't believe this patch require any additional verification, but we may need verification. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20913#issuecomment-2357881642 From amitkumar at openjdk.org Wed Sep 18 09:14:14 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 18 Sep 2024 09:14:14 GMT Subject: RFR: 8339416: [s390x] Provide implementation for resolve_global_jobject [v3] In-Reply-To: References: Message-ID: <_c6-ABnkV-FFryyBJ00CZIR-LRLNwUZz6jkWBQJZUJo=.b4f15e79-2f3d-4df2-bd56-6df1ba47c168@github.com> On Tue, 17 Sep 2024 11:01:40 GMT, Amit Kumar wrote: >> This PR provides "resolve_global_jobject" method implementation for s390x-port. >> >> Testing: >> * Tier1 test with Fastdebug; >> * Added these changes on top of https://github.com/openjdk/jdk/pull/20479 and modified the call in the stubGenerator_s390.cpp file; >> * 1. Ran tier1 test with a call to "resolve_jobect" >> * 2. Ran tier1 test with a call to "resolve_global_jobject" >> >> I didn't see any new failure appearing there. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > implements ModRefBarrierSetAssembler::resolve_jobject I did another round of testing similar to aboe mentioned in the description. Result came out clean again. So I guess we are good to go. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20986#issuecomment-2357920871 From jsjolen at openjdk.org Wed Sep 18 09:48:37 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 18 Sep 2024 09:48:37 GMT Subject: RFR: 8340178: Make ArrayWithFreeList have Index type and move to utilities [v2] In-Reply-To: <60GYHj6KckbaHKY1mDgIyiEjzkqdAKpRyNchQXi37xE=.2b6b0cbb-4066-4c56-9ff6-af58ffd55b38@github.com> References: <60GYHj6KckbaHKY1mDgIyiEjzkqdAKpRyNchQXi37xE=.2b6b0cbb-4066-4c56-9ff6-af58ffd55b38@github.com> Message-ID: > Hi, > > This PR does multiple things: > > 1. Gives `AWFL` an index template `I` which specifies the type of the indices, this lets us have very small indices and that saves memory. > 2. Gives `AWFL` the ability to store things in a static memory area of a specific length > 3. Finally, moves it to utilities for general consumption > > For some context: > > I tried to give `GrowableArray` the index type feature, but I hit a brick wall at changing the assert messages. It's also not a feature which has consensus, some people like it, and some people think it's too complex. I find putting a smaller and hidden `resizable_array` class In AWFL to be an acceptable compromise. I also believe that `GA` will not find too much competition with `AWFL`, as it has a less rich API and is really meant as an allocator interface rather than a general array type. > > **Hint for reviewers:** Do NOT go into "Files changed", look at the commits to see the actual changes and ignore the commits with "Move" in the title. Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - Fixes after merge - Merge remote-tracking branch 'openjdk/master' into move-to-utils - Use int - No need for reinterpret cast - Style - Change test - Change AWFL - Move AWFL - Move test - Changes to NCSS ------------- Changes: https://git.openjdk.org/jdk/pull/20002/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20002&range=01 Stats: 567 lines in 5 files changed: 307 ins; 259 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20002/head:pull/20002 PR: https://git.openjdk.org/jdk/pull/20002 From aph at openjdk.org Wed Sep 18 09:54:11 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 18 Sep 2024 09:54:11 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v8] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Tue, 17 Sep 2024 16:24:29 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: > > cleanup: adjust a comment in the light of the latest change OK, I think we're now good enough, performance wise, with and without the vectorized intrinsic: Benchmark (size) Mode Cnt Score Error Score Error Units ArraysHashCode.bytes 1 avgt 5 0.591 ? 0.043 0.584 ? 0.006 ns/op ArraysHashCode.bytes 2 avgt 5 1.343 ? 0.003 0.838 ? 0.016 ns/op ArraysHashCode.bytes 4 avgt 5 2.262 ? 0.028 1.096 ? 0.032 ns/op ArraysHashCode.bytes 8 avgt 5 2.432 ? 0.038 2.215 ? 0.049 ns/op ArraysHashCode.bytes 12 avgt 5 3.605 ? 0.042 2.292 ? 0.068 ns/op ArraysHashCode.bytes 16 avgt 5 5.149 ? 0.220 2.245 ? 0.132 ns/op ArraysHashCode.bytes 20 avgt 5 6.819 ? 0.266 2.575 ? 0.046 ns/op ArraysHashCode.bytes 24 avgt 5 8.478 ? 0.430 2.965 ? 0.085 ns/op ArraysHashCode.bytes 28 avgt 5 10.308 ? 0.386 3.047 ? 0.377 ns/op ArraysHashCode.bytes 32 avgt 5 12.425 ? 0.453 4.045 ? 0.123 ns/op ArraysHashCode.bytes 48 avgt 35 21.086 ? 0.061 4.756 ? 0.053 ns/op ArraysHashCode.bytes 64 avgt 35 32.817 ? 0.078 5.934 ? 0.039 ns/op > This is what I'm seeing now. Scorching fast with large blocks, poor with smaller ones. > > ``` > Benchmark (size) Mode Cnt Score Error Units > ArraysHashCode.bytes 1 avgt 5 0.532 ? 0.036 ns/op > ArraysHashCode.bytes 2 avgt 5 0.812 ? 0.011 ns/op > ArraysHashCode.bytes 4 avgt 5 1.104 ? 0.020 ns/op > ArraysHashCode.bytes 8 avgt 5 2.136 ? 0.032 ns/op > ArraysHashCode.bytes 12 avgt 5 3.596 ? 0.061 ns/op > ArraysHashCode.bytes 16 avgt 5 5.278 ? 0.240 ns/op > ArraysHashCode.bytes 20 avgt 5 7.390 ? 0.043 ns/op > ArraysHashCode.bytes 24 avgt 5 9.606 ? 0.059 ns/op > ArraysHashCode.bytes 28 avgt 5 12.144 ? 0.064 ns/op > ArraysHashCode.bytes 32 avgt 5 3.898 ? 0.096 ns/op > ArraysHashCode.bytes 36 avgt 5 4.468 ? 0.113 ns/op > ArraysHashCode.bytes 40 avgt 5 4.481 ? 0.082 ns/op > ArraysHashCode.bytes 44 avgt 5 5.143 ? 0.060 ns/op > ArraysHashCode.bytes 48 avgt 5 6.727 ? 0.103 ns/op > ArraysHashCode.bytes 52 avgt 5 8.844 ? 0.029 ns/op > ArraysHashCode.bytes 56 avgt 5 11.108 ? 0.108 ns/op > ArraysHashCode.bytes 60 avgt 5 13.864 ? 0.071 ns/op > ArraysHashCode.bytes 64 avgt 5 5.796 ? 0.146 ns/op > ``` ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2358012793 From rrich at openjdk.org Wed Sep 18 10:05:09 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 18 Sep 2024 10:05:09 GMT Subject: RFR: 8338995: New Object to ObjectMonitor mapping: PPC64 implementation [v3] In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 18:32:41 GMT, Martin Doerr wrote: >> PPC64 implementation of [JDK-8315884](https://bugs.openjdk.org/browse/JDK-8315884). > > Martin Doerr has updated the pull request incrementally with two additional commits since the last revision: > > - Remove empty line. > - Improve register usage and readability. Hi Martin, thanks for doing the port. Have you done a little bit of performance testing with `test/micro/org/openjdk/bench/vm/lang/LockUnlock.java`? Thanks, Richard. src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2825: > 2823: for (int i = 0; i < num_unrolled; i++) { > 2824: ld(tmp3, 0, cache_addr); > 2825: cmpd(CCR0, tmp3, obj); Please file a RFE to consistently use either `flag` or `CCR0`. src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2878: > 2876: ld(tmp2, in_bytes(ObjectMonitor::recursions_offset()), monitor); > 2877: addi(tmp2, tmp2, 1); > 2878: std(tmp2, in_bytes(ObjectMonitor::recursions_offset()), monitor); Can you replace the if-statement with the following? Suggestion: assert_different_registers(tmp2, monitor); int offset = in_bytes(ObjectMonitor::recursions_offset()) - (UseObjectMonitorTable ? 0 : monitor_tag); ld(tmp2, offset, monitor); addi(tmp2, tmp2, 1); std(tmp2, offset, monitor); src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 3013: > 3011: // null check with Flags == NE, no valid pointer below alignof(ObjectMonitor*) > 3012: cmpldi(CCR0, monitor, checked_cast(alignof(ObjectMonitor*))); > 3013: blt(CCR0, slow_path); Please file a RFE to consistently use either `flag` or `CCR0`. ------------- PR Review: https://git.openjdk.org/jdk/pull/20922#pullrequestreview-2310141350 PR Review Comment: https://git.openjdk.org/jdk/pull/20922#discussion_r1763476209 PR Review Comment: https://git.openjdk.org/jdk/pull/20922#discussion_r1764762441 PR Review Comment: https://git.openjdk.org/jdk/pull/20922#discussion_r1764719435 From stooke at openjdk.org Wed Sep 18 10:12:30 2024 From: stooke at openjdk.org (Simon Tooke) Date: Wed, 18 Sep 2024 10:12:30 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v10] In-Reply-To: References: Message-ID: > This PR changes the status of realpath() from a Posix-specific API to a globally available API, i.e. adding it to the "Hotspot Porting API". Code would refer to os::realpath() instead of os::Posix::realpath(). > > This requires the addition of a stub routine in os_posix.cpp and a Windows implementation of realpath(), using Windows _fullpath(). > > This PR depends on #20597 in that it removes the need for one #ifdef in that PR. Because of that, this PR will be modified when and if #20597 is integrated (or vice-versa) > > Please note that guidelines for doing this appear in src/hotspot/share/runtime/os.hpp Simon Tooke has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 15 additional commits since the last revision: - remove conditional compilation - Merge branch 'master' into pr_windows_realpath - fix realpath() test for POSIX - Define MAX_PATH if required - use MAX_PATH only - added gtest for realpath - remove empty line - fix indentation - fix missing return statement - properly test for buffer too small for path - ... and 5 more: https://git.openjdk.org/jdk/compare/2269f18b...15db6b75 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20683/files - new: https://git.openjdk.org/jdk/pull/20683/files/ee870f7c..15db6b75 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20683&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20683&range=08-09 Stats: 173592 lines in 1548 files changed: 156846 ins; 8724 del; 8022 mod Patch: https://git.openjdk.org/jdk/pull/20683.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20683/head:pull/20683 PR: https://git.openjdk.org/jdk/pull/20683 From stooke at openjdk.org Wed Sep 18 10:18:28 2024 From: stooke at openjdk.org (Simon Tooke) Date: Wed, 18 Sep 2024 10:18:28 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v11] In-Reply-To: References: Message-ID: <0vNiw1Z0gtC71V-K2bi7tyawwHZj2K8rERNB9afFYMM=.96ddf556-3a86-41d1-a508-a6da0b69cd2b@github.com> > This PR changes the status of realpath() from a Posix-specific API to a globally available API, i.e. adding it to the "Hotspot Porting API". Code would refer to os::realpath() instead of os::Posix::realpath(). > > This requires the addition of a stub routine in os_posix.cpp and a Windows implementation of realpath(), using Windows _fullpath(). > > This PR depends on #20597 in that it removes the need for one #ifdef in that PR. Because of that, this PR will be modified when and if #20597 is integrated (or vice-versa) > > Please note that guidelines for doing this appear in src/hotspot/share/runtime/os.hpp Simon Tooke has updated the pull request incrementally with one additional commit since the last revision: remove tabs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20683/files - new: https://git.openjdk.org/jdk/pull/20683/files/15db6b75..24bfde29 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20683&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20683&range=09-10 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/20683.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20683/head:pull/20683 PR: https://git.openjdk.org/jdk/pull/20683 From stooke at openjdk.org Wed Sep 18 10:18:29 2024 From: stooke at openjdk.org (Simon Tooke) Date: Wed, 18 Sep 2024 10:18:29 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v8] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 03:59:36 GMT, David Holmes wrote: >> Simon Tooke has updated the pull request incrementally with one additional commit since the last revision: >> >> Define MAX_PATH if required > > test/hotspot/gtest/runtime/test_os.cpp line 386: > >> 384: const char* returnedBuffer = os::realpath(path, buffer, 10); >> 385: EXPECT_TRUE(errno == ENAMETOOLONG); >> 386: EXPECT_TRUE(returnedBuffer == nullptr); > > Indentation is off through this code. thanks! fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1764793111 From stooke at openjdk.org Wed Sep 18 10:24:08 2024 From: stooke at openjdk.org (Simon Tooke) Date: Wed, 18 Sep 2024 10:24:08 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v11] In-Reply-To: <0vNiw1Z0gtC71V-K2bi7tyawwHZj2K8rERNB9afFYMM=.96ddf556-3a86-41d1-a508-a6da0b69cd2b@github.com> References: <0vNiw1Z0gtC71V-K2bi7tyawwHZj2K8rERNB9afFYMM=.96ddf556-3a86-41d1-a508-a6da0b69cd2b@github.com> Message-ID: On Wed, 18 Sep 2024 10:18:28 GMT, Simon Tooke wrote: >> This PR changes the status of realpath() from a Posix-specific API to a globally available API, i.e. adding it to the "Hotspot Porting API". Code would refer to os::realpath() instead of os::Posix::realpath(). >> >> This requires a Windows implementation of realpath(), using Windows _fullpath(), and renaming os::Posix::realpath() to os::realpath(). >> >> The main difference between POSIX and Windows behaviour is that POSIX actually requires an existing accessible file, while Windows will happily work with made-up filenames. >> >> Please note that guidelines for doing this appear in src/hotspot/share/runtime/os.hpp > > Simon Tooke has updated the pull request incrementally with one additional commit since the last revision: > > remove tabs I have added a gtest for some of the common error cases using realpath(). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20683#issuecomment-2358087840 From duke at openjdk.org Wed Sep 18 10:30:49 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Wed, 18 Sep 2024 10:30:49 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v9] In-Reply-To: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: > Hello, > > Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. > > The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. > > At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. > > # Performance > > ## Neoverse N1 > > > -------------------------------------------------------------------------------------------- > Version Baseline This patch > -------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > -------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op > ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op > ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op > ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op > ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op > ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op > ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op > ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op > ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op > ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op > ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op > ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op > ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op > ArraysHashCode.multibytes 10 avgt 15 5.4... Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - Merge branch 'master' into 8322770 - cleanup: adjust a comment in the light of the latest change - cleanup: fix comment formatting Co-authored-by: Andrew Haley - Optimize both the stub and inlined parts of the implementation Process T_CHAR/T_SHORT elements using T8H arrangement instead of T4H. Add a non-unrolled vectorized loop to the stub to handle vectorizable tail portions of arrays multiple to 4/8 elements (for ints / other types). Make the stub process array as a whole instead of relying on the inlined part to process an unvectorizable tail. - cleanup: add comments and simplify the orr ins - cleanup: remove redundant copyright notice - cleanup: use a constexpr function for intpow instead of a templated class - cleanup: address review comments - cleanup: remove a redundant parameter - 8322770: AArch64: C2: Implement VectorizedHashCode The code to calculate a hash code consists of two parts: a stub method that implements a vectorized loop using Neon instruction which processes 16 or 32 elements per iteration depending on the data type; and an unrolled inlined scalar loop that processes remaining tail elements. [Performance] [[Neoverse V2]] ``` | 328a053 (master) | dc2909f (this) | ---------------------------------------------------------------------------------------------------------- Benchmark (size) Mode Cnt | Score Error | Score Error | Units ---------------------------------------------------------------------------------------------------------- ArraysHashCode.bytes 1 avgt 15 | 0.805 ? 0.206 | 0.815 ? 0.141 | ns/op ArraysHashCode.bytes 10 avgt 15 | 4.362 ? 0.013 | 3.522 ? 0.124 | ns/op ArraysHashCode.bytes 100 avgt 15 | 78.374 ? 0.136 | 12.935 ? 0.016 | ns/op ArraysHashCode.bytes 10000 avgt 15 | 9247.335 ? 13.691 | 1344.770 ? 1.898 | ns/op ArraysHashCode.chars 1 avgt 15 | 0.731 ? 0.035 | 0.723 ? 0.046 | ns/op ArraysHashCode.chars 10 avgt 15 | 4.359 ? 0.007 | 3.385 ? 0.004 | ns/op ArraysHashCode.chars 100 avgt 15 | 78.374 ? 0.117 | 11.903 ? 0.023 | ns/op ArraysHashCode.chars 10000 avgt 15 | 9248.328 ? 13.644 | 1344.007 ? 1.795 | ns/op ArraysHashCode.ints 1 avgt 15 | 0.746 ? 0.083 | 0.631 ? 0.020 | ns/op ArraysHashCode.ints 10 avgt 15 | 4.357 ? 0.009 | 3.387 ? 0.005 | ns/op ArraysHashCode.ints 100 avgt 15 | 78.391 ? 0.103 | 10.934 ? 0.015 | ns/op ArraysHashCode.ints 10000 avgt 15 | 9248.125 ? 12.583 | 1340.644 ? 1.869 | ns/op ArraysHashCode.multibytes 1 avgt 15 | 0.555 ? 0.020 | 0.559 ? 0.020 | ns/op ArraysHashCode.multibytes 10 avgt 15 | 2.681 ? 0.020 | 2.175 ? 0.045 | ns/op ArraysHashCode.multibytes 100 avgt 15 | 36.954 ? 0.051 | 12.870 ? 0.021 | ns/op ArraysHashCode.multibytes 10000 avgt 15 | 4862.703 ? 6.909 | 720.774 ? 3.487 | ns/op ArraysHashCode.multichars 1 avgt 15 | 0.551 ? 0.017 | 0.552 ? 0.018 | ns/op ArraysHashCode.multichars 10 avgt 15 | 2.683 ? 0.018 | 2.182 ? 0.086 | ns/op ArraysHashCode.multichars 100 avgt 15 | 36.988 ? 0.054 | 8.830 ? 0.013 | ns/op ArraysHashCode.multichars 10000 avgt 15 | 4862.279 ? 6.839 | 756.074 ? 6.754 | ns/op ArraysHashCode.multiints 1 avgt 15 | 0.555 ? 0.018 | 0.557 ? 0.019 | ns/op ArraysHashCode.multiints 10 avgt 15 | 2.689 ? 0.029 | 2.184 ? 0.074 | ns/op ArraysHashCode.multiints 100 avgt 15 | 36.992 ? 0.044 | 8.098 ? 0.012 | ns/op ArraysHashCode.multiints 10000 avgt 15 | 4873.863 ? 6.689 | 783.540 ? 9.151 | ns/op ArraysHashCode.multishorts 1 avgt 15 | 0.563 ? 0.021 | 0.561 ? 0.021 | ns/op ArraysHashCode.multishorts 10 avgt 15 | 2.679 ? 0.020 | 2.164 ? 0.054 | ns/op ArraysHashCode.multishorts 100 avgt 15 | 36.976 ? 0.053 | 8.828 ? 0.013 | ns/op ArraysHashCode.multishorts 10000 avgt 15 | 4861.118 ? 7.057 | 748.952 ? 6.040 | ns/op ArraysHashCode.shorts 1 avgt 15 | 0.631 ? 0.020 | 0.643 ? 0.033 | ns/op ArraysHashCode.shorts 10 avgt 15 | 4.362 ? 0.005 | 3.400 ? 0.025 | ns/op ArraysHashCode.shorts 100 avgt 15 | 78.324 ? 0.151 | 11.892 ? 0.017 | ns/op ArraysHashCode.shorts 10000 avgt 15 | 9246.323 ? 13.126 | 1344.304 ? 1.906 | ns/op StringHashCode.Algorithm.defaultLatin1 1 avgt 15 | 0.946 ? 0.061 | 0.924 ? 0.001 | ns/op StringHashCode.Algorithm.defaultLatin1 10 avgt 15 | 4.334 ? 0.046 | 3.447 ? 0.051 | ns/op StringHashCode.Algorithm.defaultLatin1 100 avgt 15 | 78.136 ? 0.105 | 12.950 ? 0.048 | ns/op StringHashCode.Algorithm.defaultLatin1 10000 avgt 15 | 9266.117 ? 13.184 | 1345.097 ? 1.963 | ns/op StringHashCode.Algorithm.defaultUTF16 1 avgt 15 | 0.692 ? 0.035 | 0.687 ? 0.034 | ns/op StringHashCode.Algorithm.defaultUTF16 10 avgt 15 | 4.323 ? 0.023 | 3.394 ? 0.015 | ns/op StringHashCode.Algorithm.defaultUTF16 100 avgt 15 | 78.317 ? 0.109 | 11.911 ? 0.017 | ns/op StringHashCode.Algorithm.defaultUTF16 10000 avgt 15 | 9249.620 ? 14.594 | 1344.533 ? 1.908 | ns/op StringHashCode.cached N/A avgt 15 | 0.518 ? 0.017 | 0.530 ? 0.031 | ns/op StringHashCode.empty N/A avgt 15 | 0.733 ? 0.086 | 0.849 ? 0.168 | ns/op StringHashCode.notCached N/A avgt 15 | 0.687 ? 0.084 | 0.630 ? 0.018 | ns/op ``` [Test] jtreg::tier1 passed on AArch64 and x86. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18487/files - new: https://git.openjdk.org/jdk/pull/18487/files/6b8eb78c..f5918cca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=07-08 Stats: 177814 lines in 1617 files changed: 159782 ins; 9374 del; 8658 mod Patch: https://git.openjdk.org/jdk/pull/18487.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18487/head:pull/18487 PR: https://git.openjdk.org/jdk/pull/18487 From mdoerr at openjdk.org Wed Sep 18 10:37:07 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 18 Sep 2024 10:37:07 GMT Subject: RFR: 8338995: New Object to ObjectMonitor mapping: PPC64 implementation [v3] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 15:42:35 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with two additional commits since the last revision: >> >> - Remove empty line. >> - Improve register usage and readability. > > src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2825: > >> 2823: for (int i = 0; i < num_unrolled; i++) { >> 2824: ld(tmp3, 0, cache_addr); >> 2825: cmpd(CCR0, tmp3, obj); > > Please file a RFE to consistently use either `flag` or `CCR0`. I'm planning to remove `flag` and use `CCR0` consistently. But I'd like to postpone this and do it after the removal of the old locking modes. (Hoping this will happen soon.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20922#discussion_r1764819653 From mdoerr at openjdk.org Wed Sep 18 10:42:22 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 18 Sep 2024 10:42:22 GMT Subject: RFR: 8338995: New Object to ObjectMonitor mapping: PPC64 implementation [v3] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 09:51:34 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with two additional commits since the last revision: >> >> - Remove empty line. >> - Improve register usage and readability. > > src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2878: > >> 2876: ld(tmp2, in_bytes(ObjectMonitor::recursions_offset()), monitor); >> 2877: addi(tmp2, tmp2, 1); >> 2878: std(tmp2, in_bytes(ObjectMonitor::recursions_offset()), monitor); > > Can you replace the if-statement with the following? > > Suggestion: > > assert_different_registers(tmp2, monitor); > int offset = in_bytes(ObjectMonitor::recursions_offset()) - (UseObjectMonitorTable ? 0 : monitor_tag); > ld(tmp2, offset, monitor); > addi(tmp2, tmp2, 1); > std(tmp2, offset, monitor); Is `monitor` initialized as needed if `!UseObjectMonitorTable`? I can't see it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20922#discussion_r1764825189 From thartmann at openjdk.org Wed Sep 18 10:43:29 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 18 Sep 2024 10:43:29 GMT Subject: RFR: 8335334: Stress mode to randomly execute unstable if traps [v2] In-Reply-To: References: Message-ID: > Unstable if traps are supposed to be taken rarely. This patch introduces a `StressUnstableIfTraps` flag that forces unstable if traps to be taken randomly and thus potentially triggering intermittent bugs such as incorrect debug information. It works by adding another if before the unstable if that checks a "random" condition at runtime (a simple shared counter) and then either takes the trap or executes the original, unstable if. > > This stress option also has the nice side effect of triggering re-compilation of methods that would otherwise not be re-compiled (see [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843)). > > I had to adjust a few tests that rely on methods being compiled / deoptimized because with the stress option enabled, deoptimization might unexpectedly (not) happen. > > It reliably triggers [JDK-8335977](https://bugs.openjdk.org/browse/JDK-8335977), [JDK-8320308](https://bugs.openjdk.org/browse/JDK-8320308) and [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843) in our testing. > > Tested with multiple runs up to tier6. I'll integrate it into our (Oracle internal) testing and CTW [(JDK-8340302)](https://bugs.openjdk.org/browse/JDK-8340302) once all the bugs that it currently triggers are fixed. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21037/files - new: https://git.openjdk.org/jdk/pull/21037/files/8af8f423..3dbc5849 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21037&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21037&range=00-01 Stats: 28 lines in 2 files changed: 12 ins; 10 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/21037.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21037/head:pull/21037 PR: https://git.openjdk.org/jdk/pull/21037 From thartmann at openjdk.org Wed Sep 18 10:58:06 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 18 Sep 2024 10:58:06 GMT Subject: RFR: 8335334: Stress mode to randomly execute unstable if traps [v2] In-Reply-To: References: Message-ID: <5jVaID6SwKzjJHa4LrBIv-Ec7IQCg51hm9AN5Zw2-MY=.542a5b95-d135-49a6-a9db-f6e68fbe6a3f@github.com> On Wed, 18 Sep 2024 10:43:29 GMT, Tobias Hartmann wrote: >> Unstable if traps are supposed to be taken rarely. This patch introduces a `StressUnstableIfTraps` flag that forces unstable if traps to be taken randomly and thus potentially triggering intermittent bugs such as incorrect debug information. It works by adding another if before the unstable if that checks a "random" condition at runtime (a simple shared counter) and then either takes the trap or executes the original, unstable if. >> >> This stress option also has the nice side effect of triggering re-compilation of methods that would otherwise not be re-compiled (see [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843)). >> >> I had to adjust a few tests that rely on methods being compiled / deoptimized because with the stress option enabled, deoptimization might unexpectedly (not) happen. >> >> It reliably triggers [JDK-8335977](https://bugs.openjdk.org/browse/JDK-8335977), [JDK-8320308](https://bugs.openjdk.org/browse/JDK-8320308) and [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843) in our testing. >> >> Tested with multiple runs up to tier6. I'll integrate it into our (Oracle internal) testing and CTW [(JDK-8340302)](https://bugs.openjdk.org/browse/JDK-8340302) once all the bugs that it currently triggers are fixed. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Review comments Thanks for your review, Vladimir! I uploaded a new version and addressed your comments. ------------- PR Review: https://git.openjdk.org/jdk/pull/21037#pullrequestreview-2312305701 From thartmann at openjdk.org Wed Sep 18 10:58:08 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 18 Sep 2024 10:58:08 GMT Subject: RFR: 8335334: Stress mode to randomly execute unstable if traps [v2] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 17:13:34 GMT, Vladimir Kozlov wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments > > src/hotspot/share/opto/parse2.cpp line 1385: > >> 1383: bool do_stress_trap = StressUnstableIfTraps && ((C->random() % 2) == 0); >> 1384: if (do_stress_trap) { >> 1385: Node* counter_addr = makecon(TypeRawPtr::make((address)&_trap_stress_counter)); > > Would it be easier if you use new Ideal macro node for this and expand it in macro expansion phase? I think the problem with a macro node is that it might prevent optimizations that look through the memory graph, especially since the macro node would need to read and update (raw) memory. Also, such a simple load, increment and store does not pollute the graph too much to justify a macro node, I my opinion. > src/hotspot/share/opto/parse2.cpp line 1497: > >> 1495: incr_store = store_to_memory(control(), counter_addr, counter, T_INT, Compile::AliasIdxRaw, MemNode::unordered); >> 1496: } >> 1497: > > From the glance it looks like the code above. Should you put it into separate method to call it in both places? Make sense, I factored it out. > src/hotspot/share/opto/parse2.cpp line 1589: > >> 1587: // Search for an unstable if trap >> 1588: CallStaticJavaNode* trap = nullptr; >> 1589: for (int i = 0; i <= 1; ++i) { > > Should we check that it is `IfNode` and it has 2 output edges? May be assert? I added an assert. > src/hotspot/share/opto/parse2.cpp line 1590: > >> 1588: CallStaticJavaNode* trap = nullptr; >> 1589: for (int i = 0; i <= 1; ++i) { >> 1590: Node* out = orig_iff->raw_out(i)->find_out_with(Op_CallStaticJava); > > Why not cast (CallStaticJava*) here? Done. I simplified the code a bit further. > src/hotspot/share/opto/parse2.cpp line 1591: > >> 1589: for (int i = 0; i <= 1; ++i) { >> 1590: Node* out = orig_iff->raw_out(i)->find_out_with(Op_CallStaticJava); >> 1591: if (out != nullptr && out->isa_CallStaticJava() && out->as_CallStaticJava()->is_uncommon_trap()) { > > You don't need `out->isa_CallStaticJava()` because `find_out_with()` will return `nullptr` in other cases. Good catch. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21037#discussion_r1764843481 PR Review Comment: https://git.openjdk.org/jdk/pull/21037#discussion_r1764823071 PR Review Comment: https://git.openjdk.org/jdk/pull/21037#discussion_r1764823251 PR Review Comment: https://git.openjdk.org/jdk/pull/21037#discussion_r1764823627 PR Review Comment: https://git.openjdk.org/jdk/pull/21037#discussion_r1764823838 From thartmann at openjdk.org Wed Sep 18 11:10:47 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 18 Sep 2024 11:10:47 GMT Subject: RFR: 8335334: Stress mode to randomly execute unstable if traps [v3] In-Reply-To: References: Message-ID: > Unstable if traps are supposed to be taken rarely. This patch introduces a `StressUnstableIfTraps` flag that forces unstable if traps to be taken randomly and thus potentially triggering intermittent bugs such as incorrect debug information. It works by adding another if before the unstable if that checks a "random" condition at runtime (a simple shared counter) and then either takes the trap or executes the original, unstable if. > > This stress option also has the nice side effect of triggering re-compilation of methods that would otherwise not be re-compiled (see [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843)). > > I had to adjust a few tests that rely on methods being compiled / deoptimized because with the stress option enabled, deoptimization might unexpectedly (not) happen. > > It reliably triggers [JDK-8335977](https://bugs.openjdk.org/browse/JDK-8335977), [JDK-8320308](https://bugs.openjdk.org/browse/JDK-8320308) and [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843) in our testing. > > Tested with multiple runs up to tier6. I'll integrate it into our (Oracle internal) testing and CTW [(JDK-8340302)](https://bugs.openjdk.org/browse/JDK-8340302) once all the bugs that it currently triggers are fixed. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Moved declaration ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21037/files - new: https://git.openjdk.org/jdk/pull/21037/files/3dbc5849..8257e6e3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21037&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21037&range=01-02 Stats: 3 lines in 1 file changed: 2 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21037.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21037/head:pull/21037 PR: https://git.openjdk.org/jdk/pull/21037 From fyang at openjdk.org Wed Sep 18 11:50:09 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 18 Sep 2024 11:50:09 GMT Subject: RFR: 8339771: RISC-V: Reduce icache flushes [v2] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 08:30:52 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 3059: >> >>> 3057: void MacroAssembler::cmodx_fence() { >>> 3058: BLOCK_COMMENT("cmodx fence"); >>> 3059: if (VM_Version::supports_fencei_barrier()) { >> >> Seems more reasonable to turn this into an assertion? `assert(VM_Version::supports_fencei_barrier(), "must be");` > > The cmodx_fence() is called after a safepoint or thread local handshake. > Even if it costs tens? of cycles it is very little compared to hitting the poll, potentially comming from signal handler, hence therefore I kept it always 'on'. > > But we are only guaranteed to have fence.i when running on Linux, this is code is not specific to linux. > I.e. MASM should be OS independent. As we already check this during boot in Linux specific code there is no way that assert can happen on Linux. > > But maybe our JDK implementation should require fence.i ? > If we do then doing the assert make more sense, yes. Ah, I see. Make sense. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20913#discussion_r1764909465 From aph at openjdk.org Wed Sep 18 11:54:13 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 18 Sep 2024 11:54:13 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v9] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Wed, 18 Sep 2024 10:30:49 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' into 8322770 > - cleanup: adjust a comment in the light of the latest change > - cleanup: fix comment formatting > > Co-authored-by: Andrew Haley > - Optimize both the stub and inlined parts of the implementation > > Process T_CHAR/T_SHORT elements using T8H arrangement instead of T4H. > Add a non-unrolled vectorized loop to the stub to handle vectorizable > tail portions of arrays multiple to 4/8 elements (for ints / other > types). Make the stub process array as a whole instead of relying on > the inlined part to process an unvectorizable tail. > - cleanup: add comments and simplify the orr ins > - cleanup: remove redundant copyright notice > - cleanup: use a constexpr function for intpow instead of a templated class > - cleanup: address review comments > - cleanup: remove a redundant parameter > - 8322770: AArch64: C2: Implement VectorizedHashCode > > The code to calculate a hash code consists of two parts: a stub method that > implements a vectorized loop using Neon instruction which processes 16 or 32 > elements per iteration depending on the data type; and an unrolled inlined > scalar loop that processes remaining tail elements. > > [Performance] > > [[Neoverse V2]] > ``` > | 328a053 (master) | dc2909f (this) | > ---------------------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt | Score Error | Score Error | Units > ---------------------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 | 0.805 ? 0.206 | 0.815 ? 0.141 | ns/op > ArraysHashCode.bytes 10 avgt 15 | 4.362 ? 0.013 | 3.522 ? 0.124 | ns/op > ArraysHashCode.bytes 100 avgt 15 | 78.374 ? 0.136 | 12.935 ? 0.016 | ns/op > ArraysHashCode.bytes 10000 avgt 15 | 9247.335 ? 13.691 | 1344.770 ? 1.898 | ns/op > ArraysHashCode.chars 1 avgt 15 | 0.731 ? 0.035 | 0.723 ? 0.046... src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2877: > 2875: f(0b01111, 28, 24); \ > 2876: if (T == T4H || T == T8H) { \ > 2877: f(0b01, 23, 22), f(index & 0b11, 21, 20), rf(Vm, 16), f(op2, 15, 12), f(index >> 2 & 1, 11); \ This isn't right. Please go to test/hotspot/gtest/aarch64/aarch64-asmtest.py and add `mulv` to the set of tested instructions. Please make sure you test all modes. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5465: > 5463: __ addv(vmul0, load_arrangement, vmul0, vdata0); > 5464: } else if (load_arrangement == Assembler::T8B || load_arrangement == Assembler::T4H || > 5465: load_arrangement == Assembler::T8H) { Use a switch here, and everywhere else that a switch applies. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1764912213 PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1764915313 From aph at openjdk.org Wed Sep 18 12:08:15 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 18 Sep 2024 12:08:15 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v2] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <8e4e1ZzE5scPDGAZYMJP2jvd9L3tn2MYHF6QqjuLRC0=.50d76243-8d59-4aaf-abe4-1c0c80ff5988@github.com> Message-ID: On Wed, 21 Aug 2024 14:50:00 GMT, Mikhail Ablakatov wrote: >> src/hotspot/share/utilities/intpow.hpp line 29: >> >>> 27: #define SHARE_UTILITIES_INTPOW_HPP >>> 28: >>> 29: #include "metaprogramming/enableIf.hpp" >> >> There's no need for any of this metaprogramming. A constexpr function would be better. > > Replied in another thread: https://github.com/openjdk/jdk/pull/18487/files/4c6812f63bf9a6d5cf17c7899fe4a77e390c1645#r1725193686 OK. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1764929761 From duke at openjdk.org Wed Sep 18 12:08:14 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Wed, 18 Sep 2024 12:08:14 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v9] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: <0FzPz8JNa6T4frm9yoQQyLNpxF4AglB1b1ZVgqO2Z2M=.d65f68f0-1291-4fec-af83-88fc35f35398@github.com> On Wed, 18 Sep 2024 11:51:39 GMT, Andrew Haley wrote: >> Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - Merge branch 'master' into 8322770 >> - cleanup: adjust a comment in the light of the latest change >> - cleanup: fix comment formatting >> >> Co-authored-by: Andrew Haley >> - Optimize both the stub and inlined parts of the implementation >> >> Process T_CHAR/T_SHORT elements using T8H arrangement instead of T4H. >> Add a non-unrolled vectorized loop to the stub to handle vectorizable >> tail portions of arrays multiple to 4/8 elements (for ints / other >> types). Make the stub process array as a whole instead of relying on >> the inlined part to process an unvectorizable tail. >> - cleanup: add comments and simplify the orr ins >> - cleanup: remove redundant copyright notice >> - cleanup: use a constexpr function for intpow instead of a templated class >> - cleanup: address review comments >> - cleanup: remove a redundant parameter >> - 8322770: AArch64: C2: Implement VectorizedHashCode >> >> The code to calculate a hash code consists of two parts: a stub method that >> implements a vectorized loop using Neon instruction which processes 16 or 32 >> elements per iteration depending on the data type; and an unrolled inlined >> scalar loop that processes remaining tail elements. >> >> [Performance] >> >> [[Neoverse V2]] >> ``` >> | 328a053 (master) | dc2909f (this) | >> ---------------------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt | Score Error | Score Error | Units >> ---------------------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 | 0.805 ? 0.206 | 0.815 ? 0.141 | ns/op >> ArraysHashCode.bytes 10 avgt 15 | 4.362 ? 0.013 | 3.522 ? 0.124 | ns/op >> ArraysHashCode.bytes 100 avgt 15 | 78.374 ? 0.136 | 12.935 ? 0.016 | ns/op >> ArraysHashCode.bytes 10000 avgt 15 | 9247.335 ? 13.691 | 1344.770 ? 1.898 | ns/op >> ArraysHashCode.cha... > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5465: > >> 5463: __ addv(vmul0, load_arrangement, vmul0, vdata0); >> 5464: } else if (load_arrangement == Assembler::T8B || load_arrangement == Assembler::T4H || >> 5465: load_arrangement == Assembler::T8H) { > > Use a switch here, and everywhere else that a switch applies. The only other piece of code similar to this one is the one at line [5591](https://github.com/openjdk/jdk/pull/18487/files#diff-9112056f732229b18fec48fb0b20a3fe824de49d0abd41fbdb4202cfe70ad114R5591). Any other which I'm missing? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1764932886 From aph at openjdk.org Wed Sep 18 12:08:13 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 18 Sep 2024 12:08:13 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v9] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: <0HQYj18WDlekyIQSJsH9aRxy93drv-UCq0M9015oZyE=.89d705d1-6c95-4a3b-bb70-1fa31dce8171@github.com> On Wed, 18 Sep 2024 10:30:49 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' into 8322770 > - cleanup: adjust a comment in the light of the latest change > - cleanup: fix comment formatting > > Co-authored-by: Andrew Haley > - Optimize both the stub and inlined parts of the implementation > > Process T_CHAR/T_SHORT elements using T8H arrangement instead of T4H. > Add a non-unrolled vectorized loop to the stub to handle vectorizable > tail portions of arrays multiple to 4/8 elements (for ints / other > types). Make the stub process array as a whole instead of relying on > the inlined part to process an unvectorizable tail. > - cleanup: add comments and simplify the orr ins > - cleanup: remove redundant copyright notice > - cleanup: use a constexpr function for intpow instead of a templated class > - cleanup: address review comments > - cleanup: remove a redundant parameter > - 8322770: AArch64: C2: Implement VectorizedHashCode > > The code to calculate a hash code consists of two parts: a stub method that > implements a vectorized loop using Neon instruction which processes 16 or 32 > elements per iteration depending on the data type; and an unrolled inlined > scalar loop that processes remaining tail elements. > > [Performance] > > [[Neoverse V2]] > ``` > | 328a053 (master) | dc2909f (this) | > ---------------------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt | Score Error | Score Error | Units > ---------------------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 | 0.805 ? 0.206 | 0.815 ? 0.141 | ns/op > ArraysHashCode.bytes 10 avgt 15 | 4.362 ? 0.013 | 3.522 ? 0.124 | ns/op > ArraysHashCode.bytes 100 avgt 15 | 78.374 ? 0.136 | 12.935 ? 0.016 | ns/op > ArraysHashCode.bytes 10000 avgt 15 | 9247.335 ? 13.691 | 1344.770 ? 1.898 | ns/op > ArraysHashCode.chars 1 avgt 15 | 0.731 ? 0.035 | 0.723 ? 0.046... src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5416: > 5414: : load_arrangement == Assembler::T8H ? 36 // 9 insts > 5415: : load_arrangement == Assembler::T8B ? 40 // 10 insts > 5416: : -1; // invalid This is extremely fragile in the presence of code change. Can we not simply delete it? src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5563: > 5561: } > 5562: > 5563: start = __ offset(); What does this logic from 5552 onwards do? It at least deserves a comment. Can we not simply delete it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1764928217 PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1764927093 From rkennke at openjdk.org Wed Sep 18 12:11:31 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 18 Sep 2024 12:11:31 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: <6UJOrZqmfsJj6pRzMjPdlYt191QgBV6fIv1qJAYsv60=.15284272-464f-4321-b76c-3412dafc6c63@github.com> References: <6UJOrZqmfsJj6pRzMjPdlYt191QgBV6fIv1qJAYsv60=.15284272-464f-4321-b76c-3412dafc6c63@github.com> Message-ID: On Mon, 16 Sep 2024 06:53:42 GMT, Roberto Casta?eda Lozano wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Various touch-ups > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2576: > >> 2574: } else { >> 2575: lea(dst, Address(obj, index, Address::lsl(scale))); >> 2576: ldr(dst, Address(dst, offset)); > > Do you have a reproducer (or, better yet, a test case) that exercises this case? I ran Oracle's internal CI tiers 1-5 and could never hit it. Could this happen for x64 as well? AFAIK, this happens only when using compressed oops with a heap-base in r27. When running with that setting, we would get addresses like r27[nklass] or r27[nklass]+offset, both with scale=8. You would need large heaps, perhaps >4GB, to get this coops setting. The problem with aarch64 is that we can't have an address like r27[nklass]+offset, that's why we need to lea the r27[nklass] part first. Yes, this also happens on x86, but x86 supports rX[nklass]+offset addressing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1764937842 From duke at openjdk.org Wed Sep 18 12:15:11 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Wed, 18 Sep 2024 12:15:11 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v9] In-Reply-To: <0HQYj18WDlekyIQSJsH9aRxy93drv-UCq0M9015oZyE=.89d705d1-6c95-4a3b-bb70-1fa31dce8171@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <0HQYj18WDlekyIQSJsH9aRxy93drv-UCq0M9015oZyE=.89d705d1-6c95-4a3b-bb70-1fa31dce8171@github.com> Message-ID: On Wed, 18 Sep 2024 12:01:25 GMT, Andrew Haley wrote: >> Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - Merge branch 'master' into 8322770 >> - cleanup: adjust a comment in the light of the latest change >> - cleanup: fix comment formatting >> >> Co-authored-by: Andrew Haley >> - Optimize both the stub and inlined parts of the implementation >> >> Process T_CHAR/T_SHORT elements using T8H arrangement instead of T4H. >> Add a non-unrolled vectorized loop to the stub to handle vectorizable >> tail portions of arrays multiple to 4/8 elements (for ints / other >> types). Make the stub process array as a whole instead of relying on >> the inlined part to process an unvectorizable tail. >> - cleanup: add comments and simplify the orr ins >> - cleanup: remove redundant copyright notice >> - cleanup: use a constexpr function for intpow instead of a templated class >> - cleanup: address review comments >> - cleanup: remove a redundant parameter >> - 8322770: AArch64: C2: Implement VectorizedHashCode >> >> The code to calculate a hash code consists of two parts: a stub method that >> implements a vectorized loop using Neon instruction which processes 16 or 32 >> elements per iteration depending on the data type; and an unrolled inlined >> scalar loop that processes remaining tail elements. >> >> [Performance] >> >> [[Neoverse V2]] >> ``` >> | 328a053 (master) | dc2909f (this) | >> ---------------------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt | Score Error | Score Error | Units >> ---------------------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 | 0.805 ? 0.206 | 0.815 ? 0.141 | ns/op >> ArraysHashCode.bytes 10 avgt 15 | 4.362 ? 0.013 | 3.522 ? 0.124 | ns/op >> ArraysHashCode.bytes 100 avgt 15 | 78.374 ? 0.136 | 12.935 ? 0.016 | ns/op >> ArraysHashCode.bytes 10000 avgt 15 | 9247.335 ? 13.691 | 1344.770 ? 1.898 | ns/op >> ArraysHashCode.cha... > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5416: > >> 5414: : load_arrangement == Assembler::T8H ? 36 // 9 insts >> 5415: : load_arrangement == Assembler::T8B ? 40 // 10 insts >> 5416: : -1; // invalid > > This is extremely fragile in the presence of code change. Can we not simply delete it? There's a `guarantee()` at the end of the loop to verify the size so the code change shouldn't be left unnoticed. > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5563: > >> 5561: } >> 5562: >> 5563: start = __ offset(); > > What does this logic from 5552 onwards do? It at least deserves a comment. > Can we not simply delete it? It's here to make sure the loop takes the smallest number of aligned 32-byte instruction memory regions possible. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1764943608 PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1764941830 From epeter at openjdk.org Wed Sep 18 12:18:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Sep 2024 12:18:11 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v12] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Wed, 18 Sep 2024 07:21:52 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Incorporating review and documentation suggestions. Generally, from a C2 point of view this looks good now. ------------- PR Review: https://git.openjdk.org/jdk/pull/20508#pullrequestreview-2312501448 From epeter at openjdk.org Wed Sep 18 12:18:12 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Sep 2024 12:18:12 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v10] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <7kLX2-XUHL8Ej4GyQD8V7my-bqK2EZrQEyZkgZWYA6k=.cbbab922-db74-4173-a529-e4faf65db0e6@github.com> <0Ak63KM4JYdbOT33Q8XPcv096MKzjY6vyl4xZkapZwM=.6b92e423-9bd5-4109-a0b3-75dbeaf40315@github.com> Message-ID: On Tue, 17 Sep 2024 07:02:20 GMT, Jatin Bhateja wrote: >> src/hotspot/share/opto/vectornode.cpp line 2122: >> >>> 2120: // index format by subsequent VectorLoadShuffle. >>> 2121: int cast_vopc = VectorCastNode::opcode(0, index_elem_bt, true); >>> 2122: Node* index_byte_vec = phase->transform(VectorCastNode::make(cast_vopc, index_vec, T_BYTE, num_elem)); >> >> This cast assumes that the indices cannot have more than 8 bits. This would allow vector lengths of up to 256. This is fine for intel. But as far as I know ARM has in principle longer vectors - up to 2048 bytes. Should we maybe add some assert here to make sure we never badly truncate the index? > > Shuffle overhaul is on our todo list, its a know limitation which we tried lifting once, yes you read it correctly, its a limitation for AARCH64 SVE once a 2048 bits vector systems are available, IIRC current max vector size on any available AARCH64 system is 256 bits, with Neoverse V2 they shrink the vector size back to 16 bytes. Are there any asserts that would catch this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1764943566 From luhenry at openjdk.org Wed Sep 18 12:23:15 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Wed, 18 Sep 2024 12:23:15 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v12] In-Reply-To: <6RgppiD0RTuKLmCQAqB1vDhFwvZbfkSHOzKw6GFfcPk=.008267b0-5e6d-427b-98e6-ac860c0f9ab3@github.com> References: <6RgppiD0RTuKLmCQAqB1vDhFwvZbfkSHOzKw6GFfcPk=.008267b0-5e6d-427b-98e6-ac860c0f9ab3@github.com> Message-ID: <1hUhqi-UObt4g0RHq1asodz7Wa0CxYkgN5tN7DZnPLs=.0cad5acd-9aa0-40c5-b0e2-a01f8834ccc0@github.com> On Tue, 17 Sep 2024 13:50:42 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> Thanks. >> >> This improvement is based on java.base/share/native/libzip/zlib/zcrc32.c, I made some modification to N (to 16) related code, then re-generate the tables needed, finally vectorize the code (original implementation in zcrc32.c is just scalar code). >> >> ## Test >> test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java, >> test/jdk/java/util/zip/TestCRC32.java >> >> ## Performance >> >> ###?on bananapi >> >> with patch >> >> Benchmark | (count) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- >> TestCRC32.testCRC32Update | 64 | avgt | 10 | 222.297 | 0.106 | ns/op >> TestCRC32.testCRC32Update | 128 | avgt | 10 | 365.144 | 0.196 | ns/op >> TestCRC32.testCRC32Update | 256 | avgt | 10 | 687.14 | 0.235 | ns/op >> TestCRC32.testCRC32Update | 512 | avgt | 10 | 1043.833 | 0.083 | ns/op >> TestCRC32.testCRC32Update | 2048 | avgt | 10 | 3299.928 | 1.361 | ns/op >> TestCRC32.testCRC32Update | 16384 | avgt | 10 | 24384.502 | 25.298 | ns/op >> TestCRC32.testCRC32Update | 65536 | avgt | 10 | 103200.458 | 8.297 | ns/op >> >> >> >> without patch >> >> Benchmark | (count) | Mode | Cnt | Score | Error | Units >> -- | -- | -- | -- | -- | -- | -- >> TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.878 | 0.02 | ns/op >> TestCRC32.testCRC32Update | 128 | avgt | 10 | 364.173 | 0.032 | ns/op >> TestCRC32.testCRC32Update | 256 | avgt | 10 | 685.815 | 0.055 | ns/op >> TestCRC32.testCRC32Update | 512 | avgt | 10 | 1329.049 | 0.084 | ns/op >> TestCRC32.testCRC32Update | 2048 | avgt | 10 | 5189.302 | 0.666 | ns/op >> TestCRC32.testCRC32Update | 16384 | avgt | 10 | 41250.873 | 23.882 | ns/op >> TestCRC32.testCRC32Update | 65536 | avgt | 10 | 171664.002 | 15.011 | ns/op >> >> > ... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > vectorize xor Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20910#pullrequestreview-2312519559 From rkennke at openjdk.org Wed Sep 18 12:25:50 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 18 Sep 2024 12:25:50 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v20] In-Reply-To: References: Message-ID: <1o2b4fxBhqrlRqkNwKqZD1mgRNfTM16_NHZweEbd9SI=.1f68868b-1b98-4f78-9d37-2a805ffc932b@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 60 commits: - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4 - CompressedKlassPointers::is_encodable shall be callable with -UseCCP - Johan review feedback - fix CompressedClassPointersEncodingScheme yet again for linux aarch64 - Fixes post-8340184 - Merge upstream up to and including 8340184 - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4 - Fix test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java - Fix loop on aarch64 - clarify obscure assert in metasapce setup - ... and 50 more: https://git.openjdk.org/jdk/compare/19b2cee4...bb641621 ------------- Changes: https://git.openjdk.org/jdk/pull/20677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=19 Stats: 4525 lines in 190 files changed: 3194 ins; 718 del; 613 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From yzheng at openjdk.org Wed Sep 18 12:25:51 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 18 Sep 2024 12:25:51 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 12:52:03 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - CompressedKlassPointers::is_encodable shall be callable with -UseCCP > - Johan review feedback Could you please cherry pick https://github.com/mur47x111/jdk/commit/c45ebc2a89d0b25a3dd8cc46386e37a635ff9af2 for the JVMCI support? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2358324621 From epeter at openjdk.org Wed Sep 18 12:26:12 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Sep 2024 12:26:12 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v2] In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Fri, 13 Sep 2024 22:30:36 GMT, Sandhya Viswanathan wrote: >> Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. >> >> Summary of changes is as follows: >> 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. >> 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code >> >> For the following source: >> >> >> public void test() { >> var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); >> for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { >> var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); >> index.selectFrom(inpvect).intoArray(byteres, j); >> } >> } >> >> >> The code generated for inner main now looks as follows: >> ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 >> 0x00007f40d02274d0: movslq %ebx,%r13 >> 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) >> 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) >> 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) >> 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) >> 0x00007f40d022751f: add $0x40,%ebx >> 0x00007f40d0227522: cmp %r8d,%ebx >> 0x00007f40d0227525: jl 0x00007f40d02274d0 >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments I'm a bit confused by the name `shuffleWrapIndexes` and `inline_vector_shuffle_wrap_indexes`. Are you **shuffling wrap-indexes**? I don't know what that would even mean. I think you should name it `wrapShuffleIndexes`. Or is there any naming convention in the VectorAPI that prevents this? ------------- PR Review: https://git.openjdk.org/jdk/pull/20634#pullrequestreview-2312528484 From aph at openjdk.org Wed Sep 18 12:27:23 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 18 Sep 2024 12:27:23 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v9] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Wed, 18 Sep 2024 10:30:49 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' into 8322770 > - cleanup: adjust a comment in the light of the latest change > - cleanup: fix comment formatting > > Co-authored-by: Andrew Haley > - Optimize both the stub and inlined parts of the implementation > > Process T_CHAR/T_SHORT elements using T8H arrangement instead of T4H. > Add a non-unrolled vectorized loop to the stub to handle vectorizable > tail portions of arrays multiple to 4/8 elements (for ints / other > types). Make the stub process array as a whole instead of relying on > the inlined part to process an unvectorizable tail. > - cleanup: add comments and simplify the orr ins > - cleanup: remove redundant copyright notice > - cleanup: use a constexpr function for intpow instead of a templated class > - cleanup: address review comments > - cleanup: remove a redundant parameter > - 8322770: AArch64: C2: Implement VectorizedHashCode > > The code to calculate a hash code consists of two parts: a stub method that > implements a vectorized loop using Neon instruction which processes 16 or 32 > elements per iteration depending on the data type; and an unrolled inlined > scalar loop that processes remaining tail elements. > > [Performance] > > [[Neoverse V2]] > ``` > | 328a053 (master) | dc2909f (this) | > ---------------------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt | Score Error | Score Error | Units > ---------------------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 | 0.805 ? 0.206 | 0.815 ? 0.141 | ns/op > ArraysHashCode.bytes 10 avgt 15 | 4.362 ? 0.013 | 3.522 ? 0.124 | ns/op > ArraysHashCode.bytes 100 avgt 15 | 78.374 ? 0.136 | 12.935 ? 0.016 | ns/op > ArraysHashCode.bytes 10000 avgt 15 | 9247.335 ? 13.691 | 1344.770 ? 1.898 | ns/op > ArraysHashCode.chars 1 avgt 15 | 0.731 ? 0.035 | 0.723 ? 0.046... src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5411: > 5409: : -1; > 5410: guarantee(multiply_by_halves != -1, "unknown multiplication algorithm"); > 5411: It's hard to follow this logic. Try something like this: Suggestion: Assembler::SIMD_Arrangement load_arrangement; switch (eltype) { case T_BOOLEAN: case T_BYTE: load_arrangement = T8B; multiply_by_halves = true; break; case T_CHAR: case T_SHORT: load_arrangement = T8H; multiply_by_halves = true; break; case T_INT: load_arrangement = T4S; multiply_by_halves = false; break; default: ShouldNotReachHere(); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1764960787 From aph at openjdk.org Wed Sep 18 12:31:13 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 18 Sep 2024 12:31:13 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v9] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <0HQYj18WDlekyIQSJsH9aRxy93drv-UCq0M9015oZyE=.89d705d1-6c95-4a3b-bb70-1fa31dce8171@github.com> Message-ID: On Wed, 18 Sep 2024 12:10:43 GMT, Mikhail Ablakatov wrote: >> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5563: >> >>> 5561: } >>> 5562: >>> 5563: start = __ offset(); >> >> What does this logic from 5552 onwards do? It at least deserves a comment. >> Can we not simply delete it? > > It's here to make sure the loop takes the smallest number of aligned 32-byte instruction memory regions possible. Does that make a significant measurable difference? Why not simply 32-align the region? Then we can get rid of this `large_loop_size` calculation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1764966177 From rkennke at openjdk.org Wed Sep 18 12:38:21 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 18 Sep 2024 12:38:21 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 19:04:13 GMT, Chris Plummer wrote: >> I pulled your changes and I see one slight difference in the output. The following line is missing: >> >> `_metadata._compressed_klass: InstanceKlass for java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject` >> >> I realize that there is no `_metadata._compressed_klass` when you have compact headers, and that the Klass* is encoded in the `_mark` word, which is now looks something like this in the output: >> >> _mark: 16294762323640321 >> >> So you can say that the Klass* is embedded in the _mark work, but this isn't of much help to SA users. I think what is expected is that the visitor is passed a MetadataField object that when getValue() is called on it, the Klass mirror is returned. Maybe we need a new CompactKlassField type like we current have a NarrowKlassField field type, and it will do the decoding of the _mark work into a Klass. The current getKlass() is related to this. > > Thinking about this a bit more, maybe _mark needs to be a MetadataFile rather than CInt. This is a kind of odd situation. Basically we have a CInt field that is more than just simple bits used as flags or small integers. It also gets you to the Klass*. Possibly SA should treat _mark is two seprate fields; one that remains a CInt as it currently is and another that treats it as an encoded Klass* like the NarrowKlassField case. Do you think this needs to be addressed before integration? And if so, could you help with implementation? Or could we do it after intergration? Then please file a follow-up issue. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1764976086 From mli at openjdk.org Wed Sep 18 12:40:12 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 18 Sep 2024 12:40:12 GMT Subject: RFR: 8339738: RISC-V: Vectorize crc32 intrinsic [v12] In-Reply-To: References: <6RgppiD0RTuKLmCQAqB1vDhFwvZbfkSHOzKw6GFfcPk=.008267b0-5e6d-427b-98e6-ac860c0f9ab3@github.com> Message-ID: On Tue, 17 Sep 2024 14:08:54 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> vectorize xor > > Marked as reviewed by fyang (Reviewer). Thanks for your reviewing @RealFYang @luhenry ------------- PR Comment: https://git.openjdk.org/jdk/pull/20910#issuecomment-2358353932 From mli at openjdk.org Wed Sep 18 12:40:13 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 18 Sep 2024 12:40:13 GMT Subject: Integrated: 8339738: RISC-V: Vectorize crc32 intrinsic In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 10:24:20 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > Thanks. > > This improvement is based on java.base/share/native/libzip/zlib/zcrc32.c, I made some modification to N (to 16) related code, then re-generate the tables needed, finally vectorize the code (original implementation in zcrc32.c is just scalar code). > > ## Test > test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java, > test/jdk/java/util/zip/TestCRC32.java > > ## Performance > > ###?on bananapi > > with patch > > Benchmark | (count) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- > TestCRC32.testCRC32Update | 64 | avgt | 10 | 222.297 | 0.106 | ns/op > TestCRC32.testCRC32Update | 128 | avgt | 10 | 365.144 | 0.196 | ns/op > TestCRC32.testCRC32Update | 256 | avgt | 10 | 687.14 | 0.235 | ns/op > TestCRC32.testCRC32Update | 512 | avgt | 10 | 1043.833 | 0.083 | ns/op > TestCRC32.testCRC32Update | 2048 | avgt | 10 | 3299.928 | 1.361 | ns/op > TestCRC32.testCRC32Update | 16384 | avgt | 10 | 24384.502 | 25.298 | ns/op > TestCRC32.testCRC32Update | 65536 | avgt | 10 | 103200.458 | 8.297 | ns/op > > > > without patch > > Benchmark | (count) | Mode | Cnt | Score | Error | Units > -- | -- | -- | -- | -- | -- | -- > TestCRC32.testCRC32Update | 64 | avgt | 10 | 220.878 | 0.02 | ns/op > TestCRC32.testCRC32Update | 128 | avgt | 10 | 364.173 | 0.032 | ns/op > TestCRC32.testCRC32Update | 256 | avgt | 10 | 685.815 | 0.055 | ns/op > TestCRC32.testCRC32Update | 512 | avgt | 10 | 1329.049 | 0.084 | ns/op > TestCRC32.testCRC32Update | 2048 | avgt | 10 | 5189.302 | 0.666 | ns/op > TestCRC32.testCRC32Update | 16384 | avgt | 10 | 41250.873 | 23.882 | ns/op > TestCRC32.testCRC32Update | 65536 | avgt | 10 | 171664.002 | 15.011 | ns/op > > > > ### on K230 > > with patch > > 5411: > > It's hard to follow this logic. Try something like this: > Suggestion: > > > Assembler::SIMD_Arrangement load_arrangement; > switch (eltype) { > case T_BOOLEAN: > case T_BYTE: > load_arrangement = T8B; > multiply_by_halves = true; > break; > case T_CHAR: > case T_SHORT: > load_arrangement = T8H; > multiply_by_halves = true; > break; > case T_INT: > load_arrangement = T4S; > multiply_by_halves = false; > break; > default: > ShouldNotReachHere(); > } The current implementation reflects that the decision to process a register by halves depends on the arrangement used. In the previous version of this PR, we tested for `load_arrangement` in places where `multiply_by_halves` is tested now. This way, for example, changing the arrangement for `T_CHAR`/`T_SHORT` from `T8H` to `T4H` requires only changing the arrangement itself. Using the logic you suggest would require one to be aware of the connection between `load_arrangement` and `multiply_by_halves` that must be maintained. Therefore, I recommend leaving the code as it is. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1764995031 From epeter at openjdk.org Wed Sep 18 12:54:16 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Sep 2024 12:54:16 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v13] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 07:14:57 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Missed code fragment from last review comment resolution. Do we have tests for the publically exposed methods in this new file? Or are they only implicitly tested through the VectorAPI, and its tests? `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java` Why is this even called `VectorMath`? Because those ops are not at all restricted to vectorization, right? src/hotspot/cpu/x86/x86.ad line 6578: > 6576: %} > 6577: ins_pipe( pipe_slow ); > 6578: %} Above, you name both the `format` and method name with `_reg` and `_mem`, but here you do not do it for the method name. Would be nice to keep it consistent. src/hotspot/cpu/x86/x86.ad line 10793: > 10791: match(Set dst (SaturatingAddV (Binary dst (LoadVector src)) mask)); > 10792: match(Set dst (SaturatingSubV (Binary dst (LoadVector src)) mask)); > 10793: format %{ "vector_saturating_unsigned_masked $dst, $mask, $src" %} Suggestion: format %{ "vector_saturating_unsigned_subword_masked $dst, $mask, $src" %} src/hotspot/share/opto/vectornode.hpp line 81: > 79: static VectorNode* shift_count(int opc, Node* cnt, uint vlen, BasicType bt); > 80: static VectorNode* make(int opc, Node* n1, Node* n2, uint vlen, BasicType bt, bool is_var_shift = false); > 81: static VectorNode* make(int vopc, Node* n1, Node* n2, const TypeVect* vt, bool is_mask = false, bool is_var_shift = false, bool is_unsigned = false); Feels like this just slowly grows and grows... eventually we will have too many arguments. Not sure what is a better alternative though... src/hotspot/share/opto/vectornode.hpp line 386: > 384: class SaturatingSubVNode : public SaturatingVectorNode { > 385: public: > 386: SaturatingSubVNode(Node* in1, Node* in2, const TypeVect* vt, bool is_unsigned) : SaturatingVectorNode(in1,in2,vt,is_unsigned) {} Suggestion: SaturatingSubVNode(Node* in1, Node* in2, const TypeVect* vt, bool is_unsigned) : SaturatingVectorNode(in1, in2, vt, is_unsigned) {} spaces required by style guide src/hotspot/share/opto/vectornode.hpp line 598: > 596: class UMinVNode : public VectorNode { > 597: public: > 598: UMinVNode(Node* in1, Node* in2, const TypeVect* vt) : VectorNode(in1,in2,vt) { Suggestion: UMinVNode(Node* in1, Node* in2, const TypeVect* vt) : VectorNode(in1, in2, vt) { spaces required by style guide src/hotspot/share/opto/vectornode.hpp line 614: > 612: class UMaxVNode : public VectorNode { > 613: public: > 614: UMaxVNode(Node* in1, Node* in2, const TypeVect* vt) : VectorNode(in1,in2,vt) { Suggestion: UMaxVNode(Node* in1, Node* in2, const TypeVect* vt) : VectorNode(in1, in2, vt) { spaces required by style guide ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20507#pullrequestreview-2312554183 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1764975321 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1764982201 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1764985438 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1764987143 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1764987547 PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1764988807 From epeter at openjdk.org Wed Sep 18 12:54:17 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Sep 2024 12:54:17 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v13] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 12:34:58 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Missed code fragment from last review comment resolution. > > src/hotspot/cpu/x86/x86.ad line 6578: > >> 6576: %} >> 6577: ins_pipe( pipe_slow ); >> 6578: %} > > Above, you name both the `format` and method name with `_reg` and `_mem`, but here you do not do it for the method name. Would be nice to keep it consistent. Below you also do it inconsistently. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1764976079 From rkennke at openjdk.org Wed Sep 18 12:59:25 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 18 Sep 2024 12:59:25 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: On Mon, 9 Sep 2024 18:30:21 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with six additional commits since the last revision: >> >> - Print as warning when UCOH doesn't match in CDS archive >> - Improve initialization of mark-word in CDS ArchiveHeapWriter >> - Simplify getKlass() in SA >> - Simplify oopDesc::init_mark() >> - Get rid of forward_safe_* methods >> - GCForwarding touch-ups > > src/hotspot/share/oops/markWord.inline.hpp line 90: > >> 88: ShouldNotReachHere(); >> 89: return markWord(); >> 90: #endif > > Is the ifdef _LP64 necessary, since UseCompactObjectHeaders should always be false for 32 bits? Kindof. The problem is that klass_shift is larger than 31, and shifting with it would thus be UB and generate a compiler warning. I opted to simply not compile any of that code in 32bit builds. We could also define klass_shift differently on 32bit. Long-term (maybe with Lilliput2/4-byte-headers?) it would be nice to consolidate the header layout between 32 and 64 bit builds and not make any distinction anywhere. E.g. define markWord (or objectHeader?) in a single way, and use that to extract all the relevant stuff. It's not totally unlikely that we deprecate 32-bit builds before that can happen, though. > src/hotspot/share/oops/oop.inline.hpp line 90: > >> 88: } else { >> 89: return markWord::prototype(); >> 90: } > > Could this be unconditional since prototoype_header is initialized for all Klasses? yes, but there is ongoing effort (at Oracle) to get rid of ```Klass::_prototype_header``` altogether. Let's wait for that and see how it looks then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765003983 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765006669 From kbarrett at openjdk.org Wed Sep 18 13:05:15 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 18 Sep 2024 13:05:15 GMT Subject: RFR: 8340353: Remove CompressedOops::ptrs_base Message-ID: Please review this change that (1) Removes CompressedOops::ptrs_base(), changing all callers to instead call CompressedOops::base(). (2) Renames CompressedOops::ptrs_base_addr() to CompressedOops::base_addr(), updating all callers. Testing: mach5 tier1 GHA to test building on non-Oracle supported platforms ------------- Commit messages: - ptrs_base_addr => base_addr Changes: https://git.openjdk.org/jdk/pull/21060/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21060&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340353 Stats: 14 lines in 5 files changed: 1 ins; 3 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/21060.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21060/head:pull/21060 PR: https://git.openjdk.org/jdk/pull/21060 From aph at openjdk.org Wed Sep 18 13:21:19 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 18 Sep 2024 13:21:19 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v9] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: <6ekcH31ryktqs1NAEtBp2QPOuMSgPs84y6GOrAyvHXE=.b96cba1c-8ec0-453a-a584-bc821ce8a05c@github.com> On Wed, 18 Sep 2024 12:48:37 GMT, Mikhail Ablakatov wrote: >> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5411: >> >>> 5409: : -1; >>> 5410: guarantee(multiply_by_halves != -1, "unknown multiplication algorithm"); >>> 5411: >> >> It's hard to follow this logic. Try something like this: >> Suggestion: >> >> >> Assembler::SIMD_Arrangement load_arrangement; >> switch (eltype) { >> case T_BOOLEAN: >> case T_BYTE: >> load_arrangement = T8B; >> multiply_by_halves = true; >> break; >> case T_CHAR: >> case T_SHORT: >> load_arrangement = T8H; >> multiply_by_halves = true; >> break; >> case T_INT: >> load_arrangement = T4S; >> multiply_by_halves = false; >> break; >> default: >> ShouldNotReachHere(); >> } > > The current implementation reflects that the decision to process a register by halves depends on the arrangement used. In the previous version of this PR, we tested for `load_arrangement` in places where `multiply_by_halves` is tested now. This way, for example, changing the arrangement for `T_CHAR`/`T_SHORT` from `T8H` to `T4H` requires only changing the arrangement itself. Using the logic you suggest would require one to be aware of the connection between `load_arrangement` and `multiply_by_halves` that must be maintained. Therefore, I recommend leaving the code as it is. No, because the connection between load_arrangement and multiply_by_halves is inherent in the logic. Please keep things as simple as possible for this implementation. If a future engineer decides to extend this code they'll probably do something different. Speculating here about what might happen is over-engineering. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1765043415 From rrich at openjdk.org Wed Sep 18 13:22:06 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 18 Sep 2024 13:22:06 GMT Subject: RFR: 8338995: New Object to ObjectMonitor mapping: PPC64 implementation [v3] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 10:34:31 GMT, Martin Doerr wrote: >> src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2825: >> >>> 2823: for (int i = 0; i < num_unrolled; i++) { >>> 2824: ld(tmp3, 0, cache_addr); >>> 2825: cmpd(CCR0, tmp3, obj); >> >> Please file a RFE to consistently use either `flag` or `CCR0`. > > I'm planning to remove `flag` and use `CCR0` consistently. But I'd like to postpone this and do it after the removal of the old locking modes. (Hoping this will happen soon.) Reviews like this would be easier without the inconsistency. The sooner this is fixed the better. But that's just my humble opinion ;) >> src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2878: >> >>> 2876: ld(tmp2, in_bytes(ObjectMonitor::recursions_offset()), monitor); >>> 2877: addi(tmp2, tmp2, 1); >>> 2878: std(tmp2, in_bytes(ObjectMonitor::recursions_offset()), monitor); >> >> Can you replace the if-statement with the following? >> >> Suggestion: >> >> assert_different_registers(tmp2, monitor); >> int offset = in_bytes(ObjectMonitor::recursions_offset()) - (UseObjectMonitorTable ? 0 : monitor_tag); >> ld(tmp2, offset, monitor); >> addi(tmp2, tmp2, 1); >> std(tmp2, offset, monitor); > > Is `monitor` initialized as needed if `!UseObjectMonitorTable`? I can't see it. Yes, I think so. Note that `mark` and `monitor` are the same register. The difference is that the ObjectMonitor pointer in the mark word is tagged. This needs to be compensate for in the offset (L2876 in my suggestion). It might be better to use `monitor` also at L2814 and add a comment at the declaration of `monitor` that the pointer is tagged with `!UseObjectMonitorTable`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20922#discussion_r1765045501 PR Review Comment: https://git.openjdk.org/jdk/pull/20922#discussion_r1765041298 From rkennke at openjdk.org Wed Sep 18 13:23:44 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 18 Sep 2024 13:23:44 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: JVMCI support ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/bb641621..9ad2e62f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=19-20 Stats: 22 lines in 6 files changed: 16 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From mdoerr at openjdk.org Wed Sep 18 13:29:06 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 18 Sep 2024 13:29:06 GMT Subject: RFR: 8338995: New Object to ObjectMonitor mapping: PPC64 implementation [v3] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 13:19:54 GMT, Richard Reingruber wrote: >> I'm planning to remove `flag` and use `CCR0` consistently. But I'd like to postpone this and do it after the removal of the old locking modes. (Hoping this will happen soon.) > > Reviews like this would be easier without the inconsistency. The sooner this is fixed the better. But that's just my humble opinion ;) You mean only in the new code? I could do that. But I prefer not to touch more code before the Loom enhancements are integrated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20922#discussion_r1765056903 From rrich at openjdk.org Wed Sep 18 13:35:08 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 18 Sep 2024 13:35:08 GMT Subject: RFR: 8338995: New Object to ObjectMonitor mapping: PPC64 implementation [v3] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 13:26:11 GMT, Martin Doerr wrote: >> Reviews like this would be easier without the inconsistency. The sooner this is fixed the better. But that's just my humble opinion ;) > > You mean only in the new code? I could do that. But I prefer not to touch more code before the Loom enhancements are integrated. I mean the whole method and also the unlock method. Merging with loom is done in the loom repo. It should be ok if the renaming is done with a little bit of delay (1 week?) after this PR is pushed. Besides we don't know when the loom enhancements are integrated, do we? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20922#discussion_r1765068605 From duke at openjdk.org Wed Sep 18 13:35:17 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Wed, 18 Sep 2024 13:35:17 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v9] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <0HQYj18WDlekyIQSJsH9aRxy93drv-UCq0M9015oZyE=.89d705d1-6c95-4a3b-bb70-1fa31dce8171@github.com> Message-ID: On Wed, 18 Sep 2024 12:28:17 GMT, Andrew Haley wrote: > Does that make a significant measurable difference? I'll revert on this with performance numbers later. > Why not simply 32-align the region? Then we can get rid of this large_loop_size calculation. We aim to align the code only if it reduces the number of aligned 32-byte instruction memory regions the loop compromises. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1765068543 From stefank at openjdk.org Wed Sep 18 13:39:07 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 18 Sep 2024 13:39:07 GMT Subject: RFR: 8340353: Remove CompressedOops::ptrs_base In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 13:00:19 GMT, Kim Barrett wrote: > Please review this change that > > (1) Removes CompressedOops::ptrs_base(), changing all callers to instead call > CompressedOops::base(). > > (2) Renames CompressedOops::ptrs_base_addr() to CompressedOops::base_addr(), > updating all callers. > > Testing: > mach5 tier1 > GHA to test building on non-Oracle supported platforms Looks good to me. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21060#pullrequestreview-2312729637 From jbhateja at openjdk.org Wed Sep 18 13:43:32 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 18 Sep 2024 13:43:32 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v14] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/a6f8ee8b..5253706e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=12-13 Stats: 9 lines in 3 files changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From jbhateja at openjdk.org Wed Sep 18 13:47:08 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 18 Sep 2024 13:47:08 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v13] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 12:51:00 GMT, Emanuel Peter wrote: > Do we have tests for the publically exposed methods in this new file? Or are they only implicitly tested through the VectorAPI, and its tests? `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java` > > Why is this even called `VectorMath`? Because those ops are not at all restricted to vectorization, right? Nomenclature is suggested by Paul. We have sufficient test coverage of these APIs in JTREG tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2358514597 From jbhateja at openjdk.org Wed Sep 18 13:47:10 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 18 Sep 2024 13:47:10 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v13] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 12:35:28 GMT, Emanuel Peter wrote: >> src/hotspot/cpu/x86/x86.ad line 6578: >> >>> 6576: %} >>> 6577: ins_pipe( pipe_slow ); >>> 6578: %} >> >> Above, you name both the `format` and method name with `_reg` and `_mem`, but here you do not do it for the method name. Would be nice to keep it consistent. > > Below you also do it inconsistently. > Above, you name both the `format` and method name with `_reg` and `_mem`, but here you do not do it for the method name. Would be nice to keep it consistent. Memory operands are sufficient to implicitly infer memory flavor of opto assembly instruction. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20507#discussion_r1765088839 From stuefe at openjdk.org Wed Sep 18 14:00:25 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 18 Sep 2024 14:00:25 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 12:27:14 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix FullGCForwarding initialization > > src/hotspot/share/memory/classLoaderMetaspace.cpp line 87: > >> 85: klass_alignment_words, >> 86: "class arena"); >> 87: } > > As per my comment in the header file, change the code to this: > > ```c++ > if (class_context != nullptr) { > // ... Same as in PR > } else { > _class_space_arena = _non_class_space_arena; > } Rather not, see reasoning under https://github.com/openjdk/jdk/pull/20677/files#r1754330432 > src/hotspot/share/memory/classLoaderMetaspace.cpp line 118: > >> 116: #ifdef ASSERT >> 117: if (result.is_nonempty()) { >> 118: const bool in_class_arena = class_space_arena() != nullptr ? class_space_arena()->contains(result) : false; > > Unnecessary nullptr check if you take my suggestion, or you should switch to `have_class_space_arena`. See reasoning under https://github.com/openjdk/jdk/pull/20677/files#r1754335269 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765113297 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765113850 From mdoerr at openjdk.org Wed Sep 18 14:05:07 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 18 Sep 2024 14:05:07 GMT Subject: RFR: 8338995: New Object to ObjectMonitor mapping: PPC64 implementation [v3] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 13:17:26 GMT, Richard Reingruber wrote: >> Is `monitor` initialized as needed if `!UseObjectMonitorTable`? I can't see it. > > Yes, I think so. Note that `mark` and `monitor` are the same register. The difference is that the ObjectMonitor pointer in the mark word is tagged. This needs to be compensate for in the offset (L2876 in my suggestion). It might be better to use `monitor` also at L2814 and add a comment at the declaration of `monitor` that the pointer is tagged with `!UseObjectMonitorTable`. The idea is good. Unfortunately, it doesn't work because ld/std instructions can't add/subtract small offsets: https://github.com/openjdk/jdk/blob/471a51a5a4395f0bc6818c3c1d30455ce75500d6/src/hotspot/cpu/ppc/assembler_ppc.hpp#L1144 assert((x & 0x3) == 0) failed: unaligned offset ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20922#discussion_r1765122989 From mdoerr at openjdk.org Wed Sep 18 14:19:20 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 18 Sep 2024 14:19:20 GMT Subject: RFR: 8338995: New Object to ObjectMonitor mapping: PPC64 implementation [v4] In-Reply-To: References: Message-ID: > PPC64 implementation of [JDK-8315884](https://bugs.openjdk.org/browse/JDK-8315884). Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Consistently use CCR0 in compiler_fast_lock_lightweight_object and compiler_fast_unlock_lightweight_object ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20922/files - new: https://git.openjdk.org/jdk/pull/20922/files/571601f7..6581e5ab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20922&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20922&range=02-03 Stats: 25 lines in 1 file changed: 0 ins; 0 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/20922.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20922/head:pull/20922 PR: https://git.openjdk.org/jdk/pull/20922 From mdoerr at openjdk.org Wed Sep 18 14:19:20 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 18 Sep 2024 14:19:20 GMT Subject: RFR: 8338995: New Object to ObjectMonitor mapping: PPC64 implementation [v3] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 13:32:31 GMT, Richard Reingruber wrote: >> You mean only in the new code? I could do that. But I prefer not to touch more code before the Loom enhancements are integrated. > > I mean the whole method and also the unlock method. Merging with loom is done in the loom repo. It should be ok if the renaming is done with a little bit of delay (1 week?) after this PR is pushed. > Besides we don't know when the loom enhancements are integrated, do we? Ok, done with https://github.com/openjdk/jdk/pull/20922/commits/6581e5abd5628ad21ed8f2c01c1bee94a3ed286c. I hope this will not cause too much resolution work for others (also https://github.com/openjdk/jdk/pull/19454). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20922#discussion_r1765146722 From epeter at openjdk.org Wed Sep 18 14:25:13 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 18 Sep 2024 14:25:13 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v13] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 13:44:11 GMT, Jatin Bhateja wrote: > > Do we have tests for the publically exposed methods in this new file? Or are they only implicitly tested through the VectorAPI, and its tests? `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java` > > Why is this even called `VectorMath`? Because those ops are not at all restricted to vectorization, right? > > Nomenclature is suggested by Paul. @PaulSandoz Do you want to limit these **scalar** operations to a class name that implies **vector** use? > > We have sufficient test coverage of these APIs in JTREG tests. @jatin-bhateja I can't see any dedicated JTREG tests to the `VectorMath` methods. I only see the VectorAPI tests. Can you point me to the `VectorMath` tests? I'd like to review them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2358617657 From aph at openjdk.org Wed Sep 18 14:30:12 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 18 Sep 2024 14:30:12 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v9] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <0HQYj18WDlekyIQSJsH9aRxy93drv-UCq0M9015oZyE=.89d705d1-6c95-4a3b-bb70-1fa31dce8171@github.com> Message-ID: On Wed, 18 Sep 2024 13:32:29 GMT, Mikhail Ablakatov wrote: >> Does that make a significant measurable difference? >> Why not simply 32-align the region? Then we can get rid of this `large_loop_size` calculation. > >> Does that make a significant measurable difference? > > I'll revert on this with performance numbers later. > >> Why not simply 32-align the region? Then we can get rid of this large_loop_size calculation. > > We aim to align the code only if it reduces the number of aligned 32-byte instruction memory regions the loop compromises. If doing this really does help performance for this function, then we can find a good way to do it which doesn't require any lengths to be hard coded, and we can fully document the solution so others can use it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1765167556 From rrich at openjdk.org Wed Sep 18 14:31:06 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 18 Sep 2024 14:31:06 GMT Subject: RFR: 8338995: New Object to ObjectMonitor mapping: PPC64 implementation [v3] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 14:02:27 GMT, Martin Doerr wrote: >> Yes, I think so. Note that `mark` and `monitor` are the same register. The difference is that the ObjectMonitor pointer in the mark word is tagged. This needs to be compensate for in the offset (L2876 in my suggestion). It might be better to use `monitor` also at L2814 and add a comment at the declaration of `monitor` that the pointer is tagged with `!UseObjectMonitorTable`. > > The idea is good. Unfortunately, it doesn't work because ld/std instructions can't add/subtract small offsets: https://github.com/openjdk/jdk/blob/471a51a5a4395f0bc6818c3c1d30455ce75500d6/src/hotspot/cpu/ppc/assembler_ppc.hpp#L1144 > assert((x & 0x3) == 0) failed: unaligned offset Ah, I see. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20922#discussion_r1765167852 From asmehra at openjdk.org Wed Sep 18 14:33:08 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 18 Sep 2024 14:33:08 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v3] In-Reply-To: References: <0QRMVGKYDVfu4Ie1N6RKC5B1QPxs7sQUvdnyZxayX9o=.e4fe8dc7-4cc8-44ba-99e7-dc5cacd85147@github.com> Message-ID: On Wed, 18 Sep 2024 00:59:22 GMT, Ioi Lam wrote: > That's why there's no check for k to be aot-initialized. I was actually referring to the missing aot-initialized check for the Fruit class. As it stands, this method initializes the classes required by the archive mirrors as the _runtime_default_subgraph_info has all the archived mirrors. But not all classes that have archived mirror are aot-initialized. And from the Fruit class example in the comment it seems this method should only be initializing the classes that are required by archived mirrors of _aot-initialized classes_: // For example, if this enum class is initialized at AOT cache assembly time: // // enum Fruit { // APPLE, ORANGE, BANANA; // static final Set HAVE_SEEDS = new HashSet<>(Arrays.asList(APPLE, ORANGE)); // } // // the pre-inited mirror of Fruit references HashSet, which should be initialized // before any Java code can access the Fruit class. So based on the comment there should be a way to identify the subgraph_object_klasses of only the aot-initialized classes and initialize only those classes. Am I reading this wrong? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20958#discussion_r1765170921 From coleenp at openjdk.org Wed Sep 18 15:10:09 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 18 Sep 2024 15:10:09 GMT Subject: RFR: 8340353: Remove CompressedOops::ptrs_base In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 13:00:19 GMT, Kim Barrett wrote: > Please review this change that > > (1) Removes CompressedOops::ptrs_base(), changing all callers to instead call > CompressedOops::base(). > > (2) Renames CompressedOops::ptrs_base_addr() to CompressedOops::base_addr(), > updating all callers. > > Testing: > mach5 tier1 > GHA to test building on non-Oracle supported platforms This looks like a good rename and trivial as such. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21060#pullrequestreview-2313007324 From sviswanathan at openjdk.org Wed Sep 18 15:30:10 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 18 Sep 2024 15:30:10 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v2] In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Wed, 18 Sep 2024 12:23:48 GMT, Emanuel Peter wrote: > I'm a bit confused by the name `shuffleWrapIndexes` and `inline_vector_shuffle_wrap_indexes`. > > Are you **shuffling wrap-indexes**? I don't know what that would even mean. I think you should name it `wrapShuffleIndexes`. Or is there any naming convention in the VectorAPI that prevents this? Agree, wrapShuffleIndexes makes more sense. I will make the change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2358784233 From asmehra at openjdk.org Wed Sep 18 15:57:12 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 18 Sep 2024 15:57:12 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v8] In-Reply-To: <1XujOZAE9Zl3KSlZAtUSPssVetp_bXJ58iWhgY0PYZE=.65bf692f-ad61-4252-b23f-2acca72ce1cf@github.com> References: <1XujOZAE9Zl3KSlZAtUSPssVetp_bXJ58iWhgY0PYZE=.65bf692f-ad61-4252-b23f-2acca72ce1cf@github.com> Message-ID: On Wed, 18 Sep 2024 02:59:37 GMT, Ioi Lam wrote: >> This is the 4th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> **Problem:** >> >> This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store [`sun.invoke.util.Wrapper`](https://github.com/openjdk/jdk/blob/c3711dc90980fb3e63ff199612c201c4464626bf/src/java.base/share/classes/sun/invoke/util/Wrapper.java) enums in the AOT cache. Although CDS has some limited support for storing enums, the `Wrapper` type is too complex for the existing solution to handle. Please see the JBS issue for details. >> >> **Solution:** >> >> In the assembly phase, we store the initialized states of the `Wrapper` class (captured in a `java.lang.Class` object, a.k.a. the *mirror* of this class) into the AOT cache. >> >> In production run, we no longer execute `Wrapper::`, because all the static fields (contained in its mirror) are already initialized. >> >> **Review Notes:** >> >> - The new capability is controlled by `CDSConfig::is_initing_classes_at_dump_time()`. We can aot-initialize classes only if `-XX:+AOTClassLinking` is enabled. >> - The old (more limited) support for enums is still there (it's required when `AOTClassLinking` is disabled). See the call to `CDSEnumKlass::handle_enum_obj()` in heapShared.cpp. >> - `AOTClassInitializer::can_archive_initialized_mirror()` decides what classes can be aot-initialized. This is currently a very small set of classes, but will expand in [JDK-8293336](https://bugs.openjdk.org/browse/JDK-8293336) >> - Before, `HeapShared::archive_java_mirrors()` would clear out all the states in the archived mirrors. With this PR, the states of aot-initialized classes are preserved via `HeapShared::copy_aot_initialized_mirror()`. >> - During the early state of the production run, `AOTLinkedClassBulkLoader::init_required_classes_for_loader()` is called to make sure that: >> - all aot-initialized classes are moved into the `initialized` state (without executing its `` method). This is done in `InstanceKlass::initialize_from_cds()` >> - the classes of all the objects that are reachable from the aot-initialized mirrors are initialized. See comments above ` HeapShared::init_classes_reachable_from_archived_mirrors()` >> >> **Caveats:** >> >> Not all Enum classes can be stored in the initialized state. E.g., some Enums might have static fields that depend on the e... > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - erge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - fixed merge > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - @ashu-mehra comment: assert that ConstantDescs, etc, must be initialized > - Improved in-line comments > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - @vnkozlov comments > - Clean up; removed unrelated changes in classPrinter.cpp > - more cleanup > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - ... and 6 more: https://git.openjdk.org/jdk/compare/be1d0ef1...0970a0e2 src/hotspot/share/cds/aotClassInitializer.cpp line 50: > 48: // - If we re-run the of these 3 classes again during the production > 49: // run, ConstantDescs.CD_Boolean will get a new value that has a different > 50: // object identity than the value referenced the the Wrapper enums. typo: than the value referenced _the_ the Wrapper enums -> than the value referenced _by_ the Wrapper enums. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20958#discussion_r1765322651 From qamai at openjdk.org Wed Sep 18 16:15:38 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 18 Sep 2024 16:15:38 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation Message-ID: Hi, This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. Regarding the related issues: - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. Please take a look and leave reviews. Thanks a lot. The description of the original PR: This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. Upon these changes, a `rearrange` can emit more efficient code: var species = IntVector.SPECIES_128; var v1 = IntVector.fromArray(species, SRC1, 0); var v2 = IntVector.fromArray(species, SRC2, 0); v1.rearrange(v2.toShuffle()).intoArray(DST, 0); Before: movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} vmovdqu 0x10(%r10),%xmm2 movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} vmovdqu 0x10(%r10),%xmm1 movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} vmovdqu 0x10(%r10),%xmm0 vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask ; {external_word} vpackusdw %xmm0,%xmm0,%xmm0 vpackuswb %xmm0,%xmm0,%xmm0 vpmovsxbd %xmm0,%xmm3 vpcmpgtd %xmm3,%xmm1,%xmm3 vtestps %xmm3,%xmm3 jne 0x00007fc2acb4e0d8 vpmovzxbd %xmm0,%xmm0 vpermd %ymm2,%ymm0,%ymm0 movabs $0x751588f98,%r10 ; {oop([I{0x0000000751588f98})} vmovdqu %xmm0,0x10(%r10) After: movabs $0x751589c78,%r10 ; {oop([I{0x0000000751589c78})} vmovdqu 0x10(%r10),%xmm1 movabs $0x75158ac88,%r10 ; {oop([I{0x000000075158ac88})} vmovdqu 0x10(%r10),%xmm2 vpxor %xmm0,%xmm0,%xmm0 vpcmpgtd %xmm2,%xmm0,%xmm3 vtestps %xmm3,%xmm3 jne 0x00007fa818b27cb1 vpermd %ymm1,%ymm2,%ymm0 movabs $0x751588c68,%r10 ; {oop([I{0x0000000751588c68})} vmovdqu %xmm0,0x10(%r10) ------------- Commit messages: - copyright year - remove LoadShuffle from riscv, whitespace - tighten concrete types - [vectorapi] Refactor VectorShuffle implementation Changes: https://git.openjdk.org/jdk/pull/21042/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21042&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310691 Stats: 4984 lines in 64 files changed: 2984 ins; 981 del; 1019 mod Patch: https://git.openjdk.org/jdk/pull/21042.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21042/head:pull/21042 PR: https://git.openjdk.org/jdk/pull/21042 From qamai at openjdk.org Wed Sep 18 16:15:38 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 18 Sep 2024 16:15:38 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 16:13:55 GMT, Quan Anh Mai wrote: > Hi, > > This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. > > Regarding the related issues: > > - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. > - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` > - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. > > Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. > > Please take a look and leave reviews. Thanks a lot. > > The description of the original PR: > > This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: > > Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. > Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. > Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. > Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. > Upon these changes, a `rearrange` can emit more efficient code: > > var species = IntVector.SPECIES_128; > var v1 = IntVector.fromArray(species, SRC1, 0); > var v2 = IntVector.fromArray(species, SRC2, 0); > v1.rearrange(v2.toShuffle()).intoArray(DST, 0); > > Before: > movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} > vmovdqu 0x10(%r10),%xmm2 > movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} > vmovdqu 0x10(%r10),%xmm0 > vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask > ; {ex... @PaulSandoz What do you think regarding x86-32? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2356451016 From psandoz at openjdk.org Wed Sep 18 16:15:38 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 18 Sep 2024 16:15:38 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 16:59:07 GMT, Quan Anh Mai wrote: > @PaulSandoz What do you think regarding x86-32? I don't see anything obvious in the changes of this PR that would affect x86-32, but i ain't a HotSpot expert. Perhaps this just exacerbates some existing bug?@sviswa7 what do you think? My sense is to proceed, and problem list the failure, and attempt to find the source of the failure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2357043269 From sviswanathan at openjdk.org Wed Sep 18 16:15:38 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 18 Sep 2024 16:15:38 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation In-Reply-To: References: Message-ID: <5oex8dW9c0zy1RNdWjuA3bpaxACV_QGt3iij6SJ2kZ8=.a3d18f15-ed92-4d79-9df8-9e2d828fb33c@github.com> On Tue, 17 Sep 2024 22:29:01 GMT, Paul Sandoz wrote: > > @PaulSandoz What do you think regarding x86-32? > > I don't see anything obvious in the changes of this PR that would affect x86-32, but i ain't a HotSpot expert. Perhaps this just exacerbates some existing bug?@sviswa7 what do you think? > > My sense is to proceed, and problem list the failure, and attempt to find the source of the failure. Yes, let us proceed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2357222835 From qamai at openjdk.org Wed Sep 18 16:15:38 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 18 Sep 2024 16:15:38 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 22:29:01 GMT, Paul Sandoz wrote: >> @PaulSandoz What do you think regarding x86-32? > >> @PaulSandoz What do you think regarding x86-32? > > I don't see anything obvious in the changes of this PR that would affect x86-32, but i ain't a HotSpot expert. Perhaps this just exacerbates some existing bug?@sviswa7 what do you think? > > My sense is to proceed, and problem list the failure, and attempt to find the source of the failure. @PaulSandoz @sviswa7 Thanks for your advice, I have made the PR ready for review @iwanowww Could you take another look at this, please? @jatin-bhateja Could you verify that [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) does not occur? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2358879014 From jbhateja at openjdk.org Wed Sep 18 16:22:30 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 18 Sep 2024 16:22:30 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v15] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Test cleanups. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/5253706e..f81b2525 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=13-14 Stats: 370 lines in 11 files changed: 10 ins; 360 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From jbhateja at openjdk.org Wed Sep 18 16:26:13 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 18 Sep 2024 16:26:13 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v13] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 14:22:16 GMT, Emanuel Peter wrote: > > > Do we have tests for the publically exposed methods in this new file? Or are they only implicitly tested through the VectorAPI, and its tests? `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java` > > > Why is this even called `VectorMath`? Because those ops are not at all restricted to vectorization, right? > > > > > > Nomenclature is suggested by Paul. > > @PaulSandoz Do you want to limit these **scalar** operations to a class name that implies **vector** use? > > > We have sufficient test coverage of these APIs in JTREG tests. > > @jatin-bhateja I can't see any dedicated JTREG tests to the `VectorMath` methods. I only see the VectorAPI tests. Can you point me to the `VectorMath` tests? I'd like to review them. https://github.com/openjdk/jdk/pull/20507/files#diff-6031c9066a7d7a90cc002e93a1eb64f0371f09d385f42289d202426cc7516d2fR3019-R3264 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2358909462 From cjplummer at openjdk.org Wed Sep 18 16:41:20 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 18 Sep 2024 16:41:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 12:35:28 GMT, Roman Kennke wrote: >> Thinking about this a bit more, maybe _mark needs to be a MetadataField rather than CInt. This is a kind of odd situation. Basically we have a CInt field that is more than just simple bits used as flags or small integers. It also gets you to the Klass*. Possibly SA should treat _mark is two separate fields; one that remains a CInt as it currently is and another that treats it as an encoded Klass* like the NarrowKlassField case. > > Do you think this needs to be addressed before integration? And if so, could you help with implementation? Or could we do it after intergration? Then please file a follow-up issue. Ok. I filed [JDK-8340396](https://bugs.openjdk.org/browse/JDK-8340396). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765387764 From psandoz at openjdk.org Wed Sep 18 16:56:13 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 18 Sep 2024 16:56:13 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v13] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 14:22:16 GMT, Emanuel Peter wrote: > > > Why is this even called `VectorMath`? Because those ops are not at all restricted to vectorization, right? > > > > > > Nomenclature is suggested by Paul. > > @PaulSandoz Do you want to limit these **scalar** operations to a class name that implies **vector** use? > It's whatever math functions are required to in support of vector operations (as the JavaDoc indicates) that are not provided by other classes such as the boxed primitives or `java.lang.Math`. /** * The class {@code VectorMath} contains methods for performing * scalar numeric operations in support of vector numeric operations. */ public final class VectorMath { These are referenced by the vector operators e.g., /** Produce saturating {@code a+b}. Integral only. * @see VectorMath#addSaturating(int, int) */ public static final Binary SADD = binary("SADD", "+", VectorSupport.VECTOR_OP_SADD, VO_NOFP); And in addition these methods would be used by any tail computation (and the fallback code). At the moment we are uncertain whether such operations should reside elsewhere and we did not want to block progress. I am not beholden to the name, but so far i cannot think of a concise alternative.`VectorOperatorMath` is arguably more precise but more verbose. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2358973116 From sviswanathan at openjdk.org Wed Sep 18 17:00:30 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 18 Sep 2024 17:00:30 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v3] In-Reply-To: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: > Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. > > Summary of changes is as follows: > 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. > 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code > > For the following source: > > > public void test() { > var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); > for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { > var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); > index.selectFrom(inpvect).intoArray(byteres, j); > } > } > > > The code generated for inner main now looks as follows: > ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 > 0x00007f40d02274d0: movslq %ebx,%r13 > 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 > 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) > 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 > 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) > 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 > 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) > 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 > 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) > 0x00007f40d022751f: add $0x40,%ebx > 0x00007f40d0227522: cmp %r8d,%ebx > 0x00007f40d0227525: jl 0x00007f40d02274d0 > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Change method name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20634/files - new: https://git.openjdk.org/jdk/pull/20634/files/428f2289..87e103ee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20634&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20634&range=01-02 Stats: 45 lines in 37 files changed: 0 ins; 0 del; 45 mod Patch: https://git.openjdk.org/jdk/pull/20634.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20634/head:pull/20634 PR: https://git.openjdk.org/jdk/pull/20634 From psandoz at openjdk.org Wed Sep 18 17:07:11 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 18 Sep 2024 17:07:11 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v15] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 16:22:30 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Test cleanups. > > > Do we have tests for the publically exposed methods in this new file? Or are they only implicitly tested through the VectorAPI, and its tests? `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java` > > We have sufficient test coverage of these APIs in JTREG tests. > > @jatin-bhateja I can't see any dedicated JTREG tests to the `VectorMath` methods. I only see the VectorAPI tests. Can you point me to the `VectorMath` tests? I'd like to review them. I think Jatin is relying on the vector tests to also test the scalar operations by virtue that eventually the scalar result will be compared with the C2 result. Although both might produced the same result both maybe incorrect! We need some independent scalar tests, especially so if later on these are also made intrinsic. I shall volunteer to add some. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2358992714 From psandoz at openjdk.org Wed Sep 18 17:21:04 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 18 Sep 2024 17:21:04 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 16:13:55 GMT, Quan Anh Mai wrote: > Hi, > > This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. > > Regarding the related issues: > > - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. > - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` > - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. > > Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. > > Please take a look and leave reviews. Thanks a lot. > > The description of the original PR: > > This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: > > Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. > Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. > Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. > Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. > Upon these changes, a `rearrange` can emit more efficient code: > > var species = IntVector.SPECIES_128; > var v1 = IntVector.fromArray(species, SRC1, 0); > var v2 = IntVector.fromArray(species, SRC2, 0); > v1.rearrange(v2.toShuffle()).intoArray(DST, 0); > > Before: > movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} > vmovdqu 0x10(%r10),%xmm2 > movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} > vmovdqu 0x10(%r10),%xmm0 > vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask > ; {ex... Will this have any direct impact on the changes proposed by #20508 and #20634? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2359018872 From sviswanathan at openjdk.org Wed Sep 18 17:26:06 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 18 Sep 2024 17:26:06 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 17:18:42 GMT, Paul Sandoz wrote: > Will this have any direct impact on the changes proposed by #20508 and #20634? I think we should first get the 20508 and 20634 integrated before this one. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2359026443 From rcastanedalo at openjdk.org Wed Sep 18 17:45:51 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 18 Sep 2024 17:45:51 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v24] In-Reply-To: References: Message-ID: <-TKP8SOh4eIFRiOcn-a3aRb_GEgApaWv9vhyEOiKJSw=.d4ade865-3cb1-477d-a7c7-f46449553a64@github.com> > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion - Remove redundant comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/13b93bd9..d54d67f1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=22-23 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From qamai at openjdk.org Wed Sep 18 17:51:13 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 18 Sep 2024 17:51:13 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 16:13:55 GMT, Quan Anh Mai wrote: > Hi, > > This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. > > Regarding the related issues: > > - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. > - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` > - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. > > Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. > > Please take a look and leave reviews. Thanks a lot. > > The description of the original PR: > > This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: > > Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. > Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. > Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. > Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. > Upon these changes, a `rearrange` can emit more efficient code: > > var species = IntVector.SPECIES_128; > var v1 = IntVector.fromArray(species, SRC1, 0); > var v2 = IntVector.fromArray(species, SRC2, 0); > v1.rearrange(v2.toShuffle()).intoArray(DST, 0); > > Before: > movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} > vmovdqu 0x10(%r10),%xmm2 > movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} > vmovdqu 0x10(%r10),%xmm0 > vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask > ; {ex... Got it, I think https://github.com/openjdk/jdk/pull/20508 and this PR are unrelated implementation-wise, though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2359070587 From shade at openjdk.org Wed Sep 18 17:54:33 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 18 Sep 2024 17:54:33 GMT Subject: RFR: 8340392: Handle OopStorage in location decoder Message-ID: Another debugging QoL improvement. Currently, when there is a pointer into `OopStorage` that we need to decode for the error log, we just print: 0x00007ad45c169e10 into live malloced block starting at 0x00007ad45c169dd0, size 632, tag mtInternal This is reported by NMT after [JDK-8304815](https://bugs.openjdk.org/browse/JDK-8304815). It is likely worse without NMT. We can actually decode which block in which `OopStorage` the address likely belongs to. This becomes handy when debugging GC crashes that involve `OopStorage`-handled roots. This patch is able to print the following instead: 0x0000000102c05bd0 is a pointer 2/64 into block 0 in oop storage "VM Global" ------------- Commit messages: - "in block" -> "into block" - Fix Changes: https://git.openjdk.org/jdk/pull/21072/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21072&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340392 Stats: 111 lines in 7 files changed: 110 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21072.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21072/head:pull/21072 PR: https://git.openjdk.org/jdk/pull/21072 From shade at openjdk.org Wed Sep 18 17:58:17 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 18 Sep 2024 17:58:17 GMT Subject: RFR: 8340392: Handle OopStorage in location decoder [v2] In-Reply-To: References: Message-ID: > Another debugging QoL improvement. Currently, when there is a pointer into `OopStorage` that we need to decode for the error log, we just print: > > 0x00007ad45c169e10 into live malloced block starting at 0x00007ad45c169dd0, size 632, tag mtInternal > > > This is reported by NMT after [JDK-8304815](https://bugs.openjdk.org/browse/JDK-8304815). It is likely worse without NMT. We can actually decode which block in which `OopStorage` the address likely belongs to. This becomes handy when debugging GC crashes that involve `OopStorage`-handled roots. > > This patch is able to print the following instead: > > > 0x0000000102c05bd0 is a pointer 2/64 into block 0 in oop storage "VM Global" Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Test touchups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21072/files - new: https://git.openjdk.org/jdk/pull/21072/files/22491b08..504d1f97 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21072&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21072&range=00-01 Stats: 6 lines in 1 file changed: 1 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21072.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21072/head:pull/21072 PR: https://git.openjdk.org/jdk/pull/21072 From kvn at openjdk.org Wed Sep 18 18:02:07 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 18 Sep 2024 18:02:07 GMT Subject: RFR: 8335334: Stress mode to randomly execute unstable if traps [v3] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 10:54:40 GMT, Tobias Hartmann wrote: >> src/hotspot/share/opto/parse2.cpp line 1385: >> >>> 1383: bool do_stress_trap = StressUnstableIfTraps && ((C->random() % 2) == 0); >>> 1384: if (do_stress_trap) { >>> 1385: Node* counter_addr = makecon(TypeRawPtr::make((address)&_trap_stress_counter)); >> >> Would it be easier if you use new Ideal macro node for this and expand it in macro expansion phase? > > I think the problem with a macro node is that it might prevent optimizations that look through the memory graph, especially since the macro node would need to read and update (raw) memory. Also, such a simple load, increment and store does not pollute the graph too much to justify a macro node, I my opinion. okay ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21037#discussion_r1765485166 From kvn at openjdk.org Wed Sep 18 18:17:15 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 18 Sep 2024 18:17:15 GMT Subject: RFR: 8335334: Stress mode to randomly execute unstable if traps [v3] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 11:10:47 GMT, Tobias Hartmann wrote: >> Unstable if traps are supposed to be taken rarely. This patch introduces a `StressUnstableIfTraps` flag that forces unstable if traps to be taken randomly and thus potentially triggering intermittent bugs such as incorrect debug information. It works by adding another if before the unstable if that checks a "random" condition at runtime (a simple shared counter) and then either takes the trap or executes the original, unstable if. >> >> This stress option also has the nice side effect of triggering re-compilation of methods that would otherwise not be re-compiled (see [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843)). >> >> I had to adjust a few tests that rely on methods being compiled / deoptimized because with the stress option enabled, deoptimization might unexpectedly (not) happen. >> >> It reliably triggers [JDK-8335977](https://bugs.openjdk.org/browse/JDK-8335977), [JDK-8320308](https://bugs.openjdk.org/browse/JDK-8320308) and [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843) in our testing. >> >> Tested with multiple runs up to tier6. I'll integrate it into our (Oracle internal) testing and CTW [(JDK-8340302)](https://bugs.openjdk.org/browse/JDK-8340302) once all the bugs that it currently triggers are fixed. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Moved declaration More comments. src/hotspot/share/opto/parse2.cpp line 1375: > 1373: > 1374: // Used by StressUnstableIfTraps > 1375: static volatile int _trap_stress_counter = 0; Please, check that all accesses to it use ExternalAddress (external_word_type relocation). src/hotspot/share/opto/parse2.cpp line 1377: > 1375: static volatile int _trap_stress_counter = 0; > 1376: > 1377: void Parse::load_trap_stress_counter(Node*& counter, Node*& incr_store) { `load_trap_` -> `increment_trap_` src/hotspot/share/opto/parse2.cpp line 1594: > 1592: trap = (CallStaticJavaNode*)orig_iff->raw_out(i)->find_out_with(Op_CallStaticJava); > 1593: if (trap != nullptr && trap->is_uncommon_trap() && trap->jvms()->should_reexecute() && > 1594: Deoptimization::trap_request_reason(trap->uncommon_trap_request()) == Deoptimization::Reason_unstable_if) { Can we use `ProjNode::is_uncommon_trap_if_pattern()` here? src/hotspot/share/opto/parse2.cpp line 1627: > 1625: trap_region->set_req(1, trap_proj); > 1626: trap_region->set_req(2, if_true); > 1627: trap->set_req(0, _gvn.transform(trap_region)); Can you use `IdealKit::if_then()` here to simplify code? ------------- PR Review: https://git.openjdk.org/jdk/pull/21037#pullrequestreview-2313403156 PR Review Comment: https://git.openjdk.org/jdk/pull/21037#discussion_r1765497470 PR Review Comment: https://git.openjdk.org/jdk/pull/21037#discussion_r1765488657 PR Review Comment: https://git.openjdk.org/jdk/pull/21037#discussion_r1765493396 PR Review Comment: https://git.openjdk.org/jdk/pull/21037#discussion_r1765503522 From aboldtch at openjdk.org Wed Sep 18 18:19:11 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 18 Sep 2024 18:19:11 GMT Subject: RFR: 8337674: ZGC: Consistent style for naming private static constants In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 13:55:34 GMT, Joel Sikstr?m wrote: > There are various styles for naming private static constants in ZGC. Some have a leading underscore, some begin with a lowercase letter and some start with an uppercase letter. The convention we feel is most appropriate, which also aligns with the hotspot style guide, is to have mixed-case with the first letter of each word capitalized when naming private static constants. There are also some occurrences of writing `const static` instead of the more commonly used `static const`, which should be made consistent to have the static keyword appear first. > > The lines changed have been identified by running: > `rg "static const .* [[:lower:]].* =" src/hotspot/share/gc/z src/hotspot/*/*/gc/z` > `rg "static const .* _.* =" src/hotspot/share/gc/z src/hotspot/*/*/gc/z` > `rg "const static" src/hotspot/share/gc/z src/hotspot/*/*/gc/z` > > The occurrences of `const static valid_max_address_offset_bits` have been converted to `static const` from `const static` but have not been renamed to mixed-case as the occurrences are not exposed outside their function(s). > > Tested with tiers 1-3. lgtm. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20968#pullrequestreview-2313430508 From fbredberg at openjdk.org Wed Sep 18 18:29:23 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 18 Sep 2024 18:29:23 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v3] In-Reply-To: References: Message-ID: > Removed the concept of an ObjectMonitor Responsible thread. > > The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. > > The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). > > After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. > > Passes tier1-tier7 on supported platforms. > x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. > Arm32 and Zero doesn't need any changes as far as I can tell. Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: Update two, after the review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19454/files - new: https://git.openjdk.org/jdk/pull/19454/files/d2c6db2b..ef5d1683 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19454&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19454&range=01-02 Stats: 102 lines in 9 files changed: 30 ins; 45 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/19454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19454/head:pull/19454 PR: https://git.openjdk.org/jdk/pull/19454 From fbredberg at openjdk.org Wed Sep 18 18:29:23 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 18 Sep 2024 18:29:23 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v3] In-Reply-To: <6UxGZMfjEJfj7vA_1LFIDGkr65EufZc8nfoEpFeuyjk=.aedee205-884b-4288-bc6f-62503fe67eae@github.com> References: <58EiqcAfncGTnk6wqesW4ZsVyt3Js02aiOpEbl4HCwI=.148ffb62-7571-4aa3-b0dd-996022290e9b@github.com> <6UxGZMfjEJfj7vA_1LFIDGkr65EufZc8nfoEpFeuyjk=.aedee205-884b-4288-bc6f-62503fe67eae@github.com> Message-ID: On Thu, 12 Sep 2024 00:04:51 GMT, David Holmes wrote: >> As @xmas92 wrote, membar(StoreLoad); is all that we need between clearing the owner and checking the queues / successor. And, since I use membar(StoreLoad) in all other platforms, I wanted it to be consistent. >> Also if you look in [ObjectMonitor::exit](https://github.com/openjdk/jdk/blob/f9ddfc6fb0780a7d6e933a40ecd3cd458a058f04/src/hotspot/share/runtime/objectMonitor.cpp#L1132C6-L1132C25)() you'll see that this there is a call to [OrderAccess::storeload](https://github.com/openjdk/jdk/blob/f9ddfc6fb0780a7d6e933a40ecd3cd458a058f04/src/hotspot/share/runtime/objectMonitor.cpp#L1184)() just after [release_clear_owner](https://github.com/openjdk/jdk/blob/f9ddfc6fb0780a7d6e933a40ecd3cd458a058f04/src/hotspot/share/runtime/objectMonitor.cpp#L1183)(), so I'm just doing the same as has been done in the C++ slow-path for long. > > When the key change here is "add in the missing fence that otherwise allowed stranding" then I would really like something to include the word "fence". Very few people will understand/recall the equivalence with storeload. A comment will suffice. fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1765514377 From fbredberg at openjdk.org Wed Sep 18 18:29:24 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 18 Sep 2024 18:29:24 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v2] In-Reply-To: References: <7GRqYZv0orxGXAMb7ldeV5skNK7FraE-ovnhwlMDKNg=.0018fbb2-36ed-4870-80f6-6aaf026cd9f2@github.com> Message-ID: On Mon, 16 Sep 2024 16:03:16 GMT, Patricio Chilano Mateo wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update one, after the review > > src/hotspot/share/runtime/objectMonitor.cpp line 358: > >> 356: void ObjectMonitor::enter_for_with_contention_mark(JavaThread* locking_thread, ObjectMonitorContentionMark& contention_mark) { >> 357: // Used by LightweightSynchronizer::inflate_and_enter in deoptimization path to enter for another thread. >> 358: bool success = ObjectMonitor::TryLock_with_contention_mark(locking_thread, contention_mark); > > No need to use qualified name. fixed > src/hotspot/share/runtime/objectMonitor.cpp line 376: > >> 374: } >> 375: >> 376: bool success = ObjectMonitor::TryLock_with_contention_mark(locking_thread, contention_mark); > > No need to use qualified name. fixed > src/hotspot/share/runtime/objectMonitor.hpp line 376: > >> 374: ObjectWaiter* DequeueWaiter(); >> 375: void DequeueSpecificWaiter(ObjectWaiter* waiter); >> 376: bool TryLock_with_contention_mark(JavaThread* locking_thread, ObjectMonitorContentionMark& contention_mark); > > Following the existing style this should be TryLockWithContentionMark. fixed > src/hotspot/share/runtime/sharedRuntime.cpp line 1973: > >> 1971: // Some other thread acquired the lock (or the monitor was >> 1972: // deflated). Either way we are done. >> 1973: current->inc_held_monitor_count(-1); > > We can just use dec_held_monitor_count(). fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1765518111 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1765516028 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1765515609 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1765518719 From fbredberg at openjdk.org Wed Sep 18 18:29:24 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 18 Sep 2024 18:29:24 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v2] In-Reply-To: References: <7GRqYZv0orxGXAMb7ldeV5skNK7FraE-ovnhwlMDKNg=.0018fbb2-36ed-4870-80f6-6aaf026cd9f2@github.com> Message-ID: On Tue, 17 Sep 2024 13:18:39 GMT, Fredrik Bredberg wrote: >> src/hotspot/share/runtime/objectMonitor.cpp line 1224: >> >>> 1222: // falls to the new owner. >>> 1223: // >>> 1224: void* owner = try_set_owner_from(nullptr, current); >> >> Is this the same code as TryLock now? Except a little different... Could this call TryLock and return if the lock becomes owned by another thread, like in SharedRuntime::monitor_exit_helper() ? > > It seems it can call TryLock, which was also pointed out by @pchilano. Thanks for also spotting this. fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1765517473 From fbredberg at openjdk.org Wed Sep 18 18:29:24 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 18 Sep 2024 18:29:24 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v2] In-Reply-To: References: <7GRqYZv0orxGXAMb7ldeV5skNK7FraE-ovnhwlMDKNg=.0018fbb2-36ed-4870-80f6-6aaf026cd9f2@github.com> Message-ID: On Tue, 17 Sep 2024 13:15:33 GMT, Fredrik Bredberg wrote: >> src/hotspot/share/runtime/objectMonitor.cpp line 1267: >> >>> 1265: return; >>> 1266: } >>> 1267: } >> >> Can't we replace all this code for a call to TryLock? > > I think we can. Thanks for the suggestion. fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1765516816 From fbredberg at openjdk.org Wed Sep 18 18:29:25 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 18 Sep 2024 18:29:25 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v2] In-Reply-To: <0ZfEU7MwCLHUhJz_t3G2OiC3_2JjC0c7PUM5z2rOSUw=.99cea3aa-46a0-402e-ba0f-d68542041ed0@github.com> References: <7GRqYZv0orxGXAMb7ldeV5skNK7FraE-ovnhwlMDKNg=.0018fbb2-36ed-4870-80f6-6aaf026cd9f2@github.com> <0ZfEU7MwCLHUhJz_t3G2OiC3_2JjC0c7PUM5z2rOSUw=.99cea3aa-46a0-402e-ba0f-d68542041ed0@github.com> Message-ID: On Tue, 17 Sep 2024 13:12:43 GMT, Coleen Phillimore wrote: >> Good suggestion! I'll go with the "Another alternative...". > > Yes, that's a good suggestion. try_enter() is already exported. Also change the comment in try_enter(): > > // TryLock avoids the CAS and handles deflation. > > because it used to be a try_set_owner_from(), and now there's two reasons it can't go back. fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1765515214 From fbredberg at openjdk.org Wed Sep 18 18:29:23 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 18 Sep 2024 18:29:23 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v2] In-Reply-To: References: <7GRqYZv0orxGXAMb7ldeV5skNK7FraE-ovnhwlMDKNg=.0018fbb2-36ed-4870-80f6-6aaf026cd9f2@github.com> Message-ID: <6pkgvPNmoixrYKVkt2t0BV3ROlVV4ulSrI9-SiRcjx8=.bdf407e3-92eb-42d1-9ed1-57fa710a2b30@github.com> On Mon, 16 Sep 2024 05:57:59 GMT, Amit Kumar wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update one, after the review > > src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3685: > >> 3683: z_stg(currentHeader, Address(Z_thread, JavaThread::unlocked_inflated_monitor_offset())); >> 3684: >> 3685: z_cr(currentHeader, Z_thread); // Set flag = NE > > I ran tier1 test and don't see any new failure appearing. > > How about using `z_ltgr` here ? > > Suggestion: > > z_ltgr(oop, oop); // Set flag = NE fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1765514746 From dlong at openjdk.org Wed Sep 18 18:20:08 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 18 Sep 2024 18:20:08 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v4] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 21:02:41 GMT, Matias Saavedra Silva wrote: >> This patch cleans up the use of `get_new_method()` so callers don't have to worry about throwing `NoSuchMethodError`. The method is refactored to throw the error and avoid ever returning nullptr. Verified with tier1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Coleen suggestion src/hotspot/share/oops/method.hpp line 854: > 852: Method* new_method = method_holder()->method_with_idnum(orig_method_idnum()); > 853: assert(this != new_method, "sanity check"); > 854: assert(new_method != nullptr || is_deleted(), "must be"); Suggestion: assert((new_method == nullptr) == (old_method->is_deleted()), "must be"); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20874#discussion_r1765506638 From dlong at openjdk.org Wed Sep 18 18:38:06 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 18 Sep 2024 18:38:06 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v4] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 21:02:41 GMT, Matias Saavedra Silva wrote: >> This patch cleans up the use of `get_new_method()` so callers don't have to worry about throwing `NoSuchMethodError`. The method is refactored to throw the error and avoid ever returning nullptr. Verified with tier1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Coleen suggestion I thought I had convinced myself that this doesn't cause a change in behavior. But I'm concerned about ConstantPoolCache::adjust_method_entries(). It looks like setting deleted _resolved_indy_entries to NoSuchMethodError is new behavior. Why are they treated differently than _resolved_method_entries? For CallInfo::resolved_method() and selected_method(), it seems like this could be a change in behavior if the caller allows a safepoint. Can we assert that these functions never see an old method? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20874#issuecomment-2359156443 From coleenp at openjdk.org Wed Sep 18 19:03:06 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 18 Sep 2024 19:03:06 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v4] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 18:35:22 GMT, Dean Long wrote: > For CallInfo::resolved_method() and selected_method(), it seems like this could be a change in behavior if the caller allows a safepoint. Can we assert that these functions never see an old method? These functions *do* see old methods. The code tries to then store the result of these somewhere that's visited by redefinition before any safepoint that can make this method old again. Whether or not we have all the required NSVs is another question, especially in the compiledIC. I know the interpreter is correct because if the method is deleted, redefinition will clear the method entry and the interpreter will call into the runtime and the interpreter runtime will throw NSME directly. resolved_indy_entries.method is the adapter method which can't be deleted, we could add an assert about that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20874#issuecomment-2359200703 From shade at openjdk.org Wed Sep 18 19:31:39 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 18 Sep 2024 19:31:39 GMT Subject: RFR: 8340392: Handle OopStorage in location decoder [v3] In-Reply-To: References: Message-ID: > Another debugging QoL improvement. Currently, when there is a pointer into `OopStorage` that we need to decode for the error log, we just print: > > 0x00007ad45c169e10 into live malloced block starting at 0x00007ad45c169dd0, size 632, tag mtInternal > > > This is reported by NMT after [JDK-8304815](https://bugs.openjdk.org/browse/JDK-8304815). It is likely worse without NMT. We can actually decode which block in which `OopStorage` the address likely belongs to. This becomes handy when debugging GC crashes that involve `OopStorage`-handled roots. > > This patch is able to print the following instead: > > > 0x0000000102c05bd0 is a pointer 2/64 into block 0 in oop storage "VM Global" Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Fix 32-bit builds ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21072/files - new: https://git.openjdk.org/jdk/pull/21072/files/504d1f97..6045a429 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21072&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21072&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21072.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21072/head:pull/21072 PR: https://git.openjdk.org/jdk/pull/21072 From mli at openjdk.org Wed Sep 18 19:42:04 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 18 Sep 2024 19:42:04 GMT Subject: RFR: 8337674: ZGC: Consistent style for naming private static constants In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 13:55:34 GMT, Joel Sikstr?m wrote: > There are various styles for naming private static constants in ZGC. Some have a leading underscore, some begin with a lowercase letter and some start with an uppercase letter. The convention we feel is most appropriate, which also aligns with the hotspot style guide, is to have mixed-case with the first letter of each word capitalized when naming private static constants. There are also some occurrences of writing `const static` instead of the more commonly used `static const`, which should be made consistent to have the static keyword appear first. > > The lines changed have been identified by running: > `rg "static const .* [[:lower:]].* =" src/hotspot/share/gc/z src/hotspot/*/*/gc/z` > `rg "static const .* _.* =" src/hotspot/share/gc/z src/hotspot/*/*/gc/z` > `rg "const static" src/hotspot/share/gc/z src/hotspot/*/*/gc/z` > > The occurrences of `const static valid_max_address_offset_bits` have been converted to `static const` from `const static` but have not been renamed to mixed-case as the occurrences are not exposed outside their function(s). > > Tested with tiers 1-3. Marked as reviewed by mli (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20968#pullrequestreview-2313597170 From dlong at openjdk.org Wed Sep 18 19:49:09 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 18 Sep 2024 19:49:09 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v4] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 19:00:52 GMT, Coleen Phillimore wrote: > resolved_indy_entries.method is the adapter method which can't be deleted, we could add an assert about that. OK, please do. If CallInfo::resolved_method() can return an old method, then it could also return a deleted method, which would cause a null pointer crash before this change. Have we seen such crashes? It would mean the caller did a successful resolve but then decided to safepoint before calling resolved_method(). With this change, we return a method with the wrong name and signature, which I still have concerns about. Even though we did this before in other places, this is a change in behavior to do it here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20874#issuecomment-2359276019 From mdoerr at openjdk.org Wed Sep 18 20:50:35 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 18 Sep 2024 20:50:35 GMT Subject: RFR: 8338995: New Object to ObjectMonitor mapping: PPC64 implementation [v4] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 14:19:20 GMT, Martin Doerr wrote: >> PPC64 implementation of [JDK-8315884](https://bugs.openjdk.org/browse/JDK-8315884). > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Consistently use CCR0 in compiler_fast_lock_lightweight_object and compiler_fast_unlock_lightweight_object Thanks for reviewing! I have rerun the `LockUnlock` micro benchmarks and didn't see any significant performance difference / regression. (The results are not very stable on the machine.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20922#issuecomment-2359377440 From aboldtch at openjdk.org Wed Sep 18 20:58:39 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 18 Sep 2024 20:58:39 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v3] In-Reply-To: References: Message-ID: <8t6rNdDbJerisofk7hjzrB3Qt_KTV9MXxTZg4jpukao=.aca6e161-0401-47c9-85ae-37389c77f1c7@github.com> On Wed, 18 Sep 2024 18:29:23 GMT, Fredrik Bredberg wrote: >> Removed the concept of an ObjectMonitor Responsible thread. >> >> The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. >> >> The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). >> >> After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. >> >> Passes tier1-tier7 on supported platforms. >> x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. >> Arm32 and Zero doesn't need any changes as far as I can tell. > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Update two, after the review I think this looks good now. Just a small comment on the `ObjectMonitor::try_enter` function. src/hotspot/share/runtime/objectMonitor.cpp line 385: > 383: } > 384: > 385: bool ObjectMonitor::try_enter(JavaThread* current, bool check_owner) { The `check_owner` name is a little confusing to me. To me it looks more like `check_for_recursion` or `handle_recursion`. src/hotspot/share/runtime/objectMonitor.cpp line 396: > 394: // to use ObjectMonitor::try_enter() as a public way of doing TryLock(). > 395: // Used this way in SharedRuntime::monitor_exit_helper(). > 396: if (check_owner) { Probably preference but an early return here is easier for me to parse. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19454#pullrequestreview-2313781060 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1765709613 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1765709568 From aboldtch at openjdk.org Wed Sep 18 21:03:40 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 18 Sep 2024 21:03:40 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v3] In-Reply-To: <8t6rNdDbJerisofk7hjzrB3Qt_KTV9MXxTZg4jpukao=.aca6e161-0401-47c9-85ae-37389c77f1c7@github.com> References: <8t6rNdDbJerisofk7hjzrB3Qt_KTV9MXxTZg4jpukao=.aca6e161-0401-47c9-85ae-37389c77f1c7@github.com> Message-ID: On Wed, 18 Sep 2024 20:54:01 GMT, Axel Boldt-Christmas wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update two, after the review > > src/hotspot/share/runtime/objectMonitor.cpp line 385: > >> 383: } >> 384: >> 385: bool ObjectMonitor::try_enter(JavaThread* current, bool check_owner) { > > The `check_owner` name is a little confusing to me. To me it looks more like `check_for_recursion` or `handle_recursion`. I think the name should describe what setting the value actually does, but if it is just a hack to do what the comment bellow says, then it sounds like a friend declaration for `SharedRuntime::monitor_exit_helper()` is what is wanted. (Or make TryLock() public.) > Set check_owner to false (it's default value is true) if you want > to use ObjectMonitor::try_enter() as a public way of doing TryLock(). > Used this way in SharedRuntime::monitor_exit_helper(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1765717750 From kbarrett at openjdk.org Wed Sep 18 22:11:36 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 18 Sep 2024 22:11:36 GMT Subject: RFR: 8340392: Handle OopStorage in location decoder [v3] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 19:31:39 GMT, Aleksey Shipilev wrote: >> Another debugging QoL improvement. Currently, when there is a pointer into `OopStorage` that we need to decode for the error log, we just print: >> >> 0x00007ad45c169e10 into live malloced block starting at 0x00007ad45c169dd0, size 632, tag mtInternal >> >> >> This is reported by NMT after [JDK-8304815](https://bugs.openjdk.org/browse/JDK-8304815). It is likely worse without NMT. We can actually decode which block in which `OopStorage` the address likely belongs to. This becomes handy when debugging GC crashes that involve `OopStorage`-handled roots. >> >> This patch is able to print the following instead: >> >> >> 0x0000000102c05bd0 is a pointer 2/64 into block 0 in oop storage "VM Global" > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Fix 32-bit builds Changes requested by kbarrett (Reviewer). src/hotspot/share/gc/shared/oopStorage.cpp line 1151: > 1149: bool OopStorage::Block::print_containing(oop* addr, outputStream* st) { > 1150: if (contains(addr)) { > 1151: st->print(INTPTR_FORMAT " is a pointer %u/" SIZE_FORMAT " into block %zu", s/SIZE_FORMAT/%zu/ s/INTPTR_FORMAT/PTR_FORMAT/ - because it's semantically a pointer. src/hotspot/share/runtime/os.cpp line 1322: > 1320: > 1321: // Ask if any OopStorage knows about this address. > 1322: if (OopStorageSet::print_containing((oop*)addr, st)) { `addr` might not be oop-aligned, in which case this cast and use might lead to UB. I think this should be gated on `is_aligned(addr, alignof(oop))`. test/hotspot/gtest/gc/shared/test_oopStorageSet.cpp line 109: > 107: class OopStorageSetTest::VM_PrintAtSafepoint : public VM_GTestExecuteAtSafepoint { > 108: private: > 109: class PrintContainingClosure : public Closure { Need another leading space here for proper indentation. ------------- PR Review: https://git.openjdk.org/jdk/pull/21072#pullrequestreview-2313718283 PR Review Comment: https://git.openjdk.org/jdk/pull/21072#discussion_r1765673490 PR Review Comment: https://git.openjdk.org/jdk/pull/21072#discussion_r1765789363 PR Review Comment: https://git.openjdk.org/jdk/pull/21072#discussion_r1765756516 From coleenp at openjdk.org Thu Sep 19 00:04:45 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 19 Sep 2024 00:04:45 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 13:23:44 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > JVMCI support src/hotspot/share/oops/compressedKlass.cpp line 242: > 240: } else { > 241: > 242: // Traditional (non-compact) header mode) Extra ) src/hotspot/share/oops/compressedKlass.hpp line 175: > 173: // 5b) if CDS=off: Calls initialize() - here, we have more freedom and, if we want, can choose an encoding > 174: // base that differs from the reservation base from step (4). That allows us, e.g., to later use > 175: // zero-based encoding. Not for this but is there really any benefit for zero based encoding for klass ids? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765888065 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765889975 From coleenp at openjdk.org Thu Sep 19 00:04:46 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 19 Sep 2024 00:04:46 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: On Wed, 18 Sep 2024 12:56:16 GMT, Roman Kennke wrote: >> src/hotspot/share/oops/oop.inline.hpp line 90: >> >>> 88: } else { >>> 89: return markWord::prototype(); >>> 90: } >> >> Could this be unconditional since prototoype_header is initialized for all Klasses? > > yes, but there is ongoing effort (at Oracle) to get rid of ```Klass::_prototype_header``` altogether. Let's wait for that and see how it looks then. Yes, I saw that patch. I'm not sure I like the idea of cpu dependent code also doing the encoding. There were some C2 changes related to it that I didn't understand if that scheme required them. I don't see the down side to having the prototype header pre-encoded in the markWord. Seems simpler. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1765893566 From dholmes at openjdk.org Thu Sep 19 01:36:36 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 19 Sep 2024 01:36:36 GMT Subject: RFR: 8340363: User-specified default decorators for UnifiedLogging In-Reply-To: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: On Fri, 13 Sep 2024 09:03:55 GMT, Ant?n Seoane wrote: > Hi all, > > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: > > - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. > - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. > - Additionally, defaults may target a specific log level. > > Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. > > In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. > > Please consider this PR, and thanks! This affects all hotspot developers using UL so extending coverage: ------------- PR Comment: https://git.openjdk.org/jdk/pull/20988#issuecomment-2359737246 From iklam at openjdk.org Thu Sep 19 02:26:35 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 19 Sep 2024 02:26:35 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v3] In-Reply-To: References: <0QRMVGKYDVfu4Ie1N6RKC5B1QPxs7sQUvdnyZxayX9o=.e4fe8dc7-4cc8-44ba-99e7-dc5cacd85147@github.com> Message-ID: On Wed, 18 Sep 2024 14:29:44 GMT, Ashutosh Mehra wrote: > I was actually referring to the missing aot-initialized check for the Fruit class. As it stands, this method initializes the classes required by the archive mirrors as the _runtime_default_subgraph_info has all the archived mirrors. `_runtime_default_subgraph_info` is not recording the mirrors. It records all the classes of all the objects that can are reachable from the archived mirrors. For example, if the following three classes are aot-initialized: class A { static Object foo = new X(); } class B { static Object foo = new Y(); } class C { static Object foo = new Y(); } `_runtime_default_subgraph_info` records `X` and `Y`. It doesn't record `A`, `B`, or `C`. > But not all classes that have archived mirror are aot-initialized. If a class is not AOT initialized, its mirror is filled with zeros (plus a few native pointers) so the mirror doesn't point to any object. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20958#discussion_r1766058990 From iklam at openjdk.org Thu Sep 19 02:41:18 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 19 Sep 2024 02:41:18 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v9] In-Reply-To: References: Message-ID: > This is the 4th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > **Problem:** > > This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store [`sun.invoke.util.Wrapper`](https://github.com/openjdk/jdk/blob/c3711dc90980fb3e63ff199612c201c4464626bf/src/java.base/share/classes/sun/invoke/util/Wrapper.java) enums in the AOT cache. Although CDS has some limited support for storing enums, the `Wrapper` type is too complex for the existing solution to handle. Please see the JBS issue for details. > > **Solution:** > > In the assembly phase, we store the initialized states of the `Wrapper` class (captured in a `java.lang.Class` object, a.k.a. the *mirror* of this class) into the AOT cache. > > In production run, we no longer execute `Wrapper::`, because all the static fields (contained in its mirror) are already initialized. > > **Review Notes:** > > - The new capability is controlled by `CDSConfig::is_initing_classes_at_dump_time()`. We can aot-initialize classes only if `-XX:+AOTClassLinking` is enabled. > - The old (more limited) support for enums is still there (it's required when `AOTClassLinking` is disabled). See the call to `CDSEnumKlass::handle_enum_obj()` in heapShared.cpp. > - `AOTClassInitializer::can_archive_initialized_mirror()` decides what classes can be aot-initialized. This is currently a very small set of classes, but will expand in [JDK-8293336](https://bugs.openjdk.org/browse/JDK-8293336) > - Before, `HeapShared::archive_java_mirrors()` would clear out all the states in the archived mirrors. With this PR, the states of aot-initialized classes are preserved via `HeapShared::copy_aot_initialized_mirror()`. > - During the early state of the production run, `AOTLinkedClassBulkLoader::init_required_classes_for_loader()` is called to make sure that: > - all aot-initialized classes are moved into the `initialized` state (without executing its `` method). This is done in `InstanceKlass::initialize_from_cds()` > - the classes of all the objects that are reachable from the aot-initialized mirrors are initialized. See comments above ` HeapShared::init_classes_reachable_from_archived_mirrors()` > > **Caveats:** > > Not all Enum classes can be stored in the initialized state. E.g., some Enums might have static fields that depend on the environment: > > > enum Foo { > [....] > static fin... Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: Fixed typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20958/files - new: https://git.openjdk.org/jdk/pull/20958/files/0970a0e2..99462a81 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20958&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20958&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20958.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20958/head:pull/20958 PR: https://git.openjdk.org/jdk/pull/20958 From iklam at openjdk.org Thu Sep 19 02:41:19 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 19 Sep 2024 02:41:19 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v8] In-Reply-To: References: <1XujOZAE9Zl3KSlZAtUSPssVetp_bXJ58iWhgY0PYZE=.65bf692f-ad61-4252-b23f-2acca72ce1cf@github.com> Message-ID: On Wed, 18 Sep 2024 15:54:00 GMT, Ashutosh Mehra wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - erge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap >> - fixed merge >> - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap >> - @ashu-mehra comment: assert that ConstantDescs, etc, must be initialized >> - Improved in-line comments >> - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap >> - @vnkozlov comments >> - Clean up; removed unrelated changes in classPrinter.cpp >> - more cleanup >> - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap >> - ... and 6 more: https://git.openjdk.org/jdk/compare/be1d0ef1...0970a0e2 > > src/hotspot/share/cds/aotClassInitializer.cpp line 50: > >> 48: // - If we re-run the of these 3 classes again during the production >> 49: // run, ConstantDescs.CD_Boolean will get a new value that has a different >> 50: // object identity than the value referenced the the Wrapper enums. > > typo: than the value referenced _the_ the Wrapper enums -> than the value referenced _by_ the Wrapper enums. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20958#discussion_r1766066855 From iklam at openjdk.org Thu Sep 19 03:08:36 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 19 Sep 2024 03:08:36 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v9] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Wed, 18 Sep 2024 05:33:09 GMT, David Holmes wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed ZERO build > > src/hotspot/share/classfile/systemDictionary.cpp line 139: > >> 137: if (_java_platform_loader.is_empty()) { >> 138: oop platform_loader = get_platform_class_loader_impl(CHECK); >> 139: _java_platform_loader = OopHandle(Universe::vm_global(), platform_loader); > > Why has the order been switched here? It's just clean up. Platform loader should go before app loader. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1766082395 From iklam at openjdk.org Thu Sep 19 03:19:36 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 19 Sep 2024 03:19:36 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v9] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Wed, 18 Sep 2024 05:35:42 GMT, David Holmes wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed ZERO build > > test/hotspot/jtreg/runtime/cds/appcds/aotClassLinking/AOTClassLinkingVMOptions.java line 57: > >> 55: testCase("Archived full module graph must be enabled at runtime"); >> 56: TestCommon.run("-cp", appJar, "-Djdk.module.validation=1", "Hello") >> 57: .assertAbnormalExit("CDS archive has aot-linked classes." + > > Nit: align the dots The CDS test cases indent by 4 spaces in this situation. I searched for `'^ *[.]'` lines in the JDK source code and indentation of 8 and 4 spaces seem to be most common. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1766089235 From iklam at openjdk.org Thu Sep 19 04:07:08 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 19 Sep 2024 04:07:08 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v10] In-Reply-To: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: > This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > **Overview** > > - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. > - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. > - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. > - The boot classes are loaded as part of `vmClasses::resolve_all()` > - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). > - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. > > **All-or-nothing Loading** > > - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. > - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: > - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. > - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. > - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. > - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exact same set of modules are visible whe... Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: @dholmes-ora comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20843/files - new: https://git.openjdk.org/jdk/pull/20843/files/be1d0ef1..3215c002 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20843&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20843&range=08-09 Stats: 16 lines in 3 files changed: 8 ins; 4 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20843.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20843/head:pull/20843 PR: https://git.openjdk.org/jdk/pull/20843 From iklam at openjdk.org Thu Sep 19 04:07:09 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 19 Sep 2024 04:07:09 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v8] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> <7FyX0AFx1IRbgWFlAvOwJYgv-bgJ4w8E56h6DXSrGow=.4cfe39cc-d85e-47ea-ad6c-25cf19f6be24@github.com> Message-ID: On Wed, 18 Sep 2024 01:47:26 GMT, David Holmes wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> minor comment fix > > src/hotspot/share/cds/aotClassLinker.cpp line 121: > >> 119: assert(is_initialized(), "sanity"); >> 120: >> 121: if (!CDSConfig::is_dumping_aot_linked_classes() || !SystemDictionaryShared::is_builtin(ik)) { > > Shouldn't the CDSConfig check just be an assert - the caller is expected to check before trying to add candidates? Fixed > src/hotspot/share/cds/aotClassLinker.cpp line 145: > >> 143: return false; >> 144: } >> 145: } > > Are we concerned with the possibility that we might be able to add some interfaces but not all, hence returning false, but with a subset of interfaces already added to the candidate list? I don't think it should be possible, but the code structure makes it look like it could be possible. That's possible. Since we go through all the initial set of classes found in `ArchiveBuilder::current()->klasses()`, eventually all classes that are eligible for aot-linking will be added to the candidate list. The reason for the recursion is to - Filter out classes whose super types are not eligible - Sort the candidate list so that super types always come before sub types. > src/hotspot/share/cds/aotClassLinker.cpp line 191: > >> 189: if (ik->class_loader() != class_loader) { >> 190: continue; >> 191: } > > This seems very inefficient. We call `write_classes` 4 times, potentially with different loaders. Because the candidates are sorted the classes belonging to the same loader are likely to be grouped due to package names. So the app loader classes are likely to be right at end, and we have to traverse all the boot/platform classes first before we get to them. Conversely after we have encountered the last boot loader class (for example) we keep the scanning the entire list. If the set were ordered based on loader then name, we would be able to stop once we see the loader change to not being the desired one. And a binaery search would let you find the start of a section more quickly. The classes are sorted by class hierarchy. It's a linear operation repeated 4 times, so it's not really something we need to optimize. > src/hotspot/share/cds/aotClassLinker.cpp line 194: > >> 192: if ((ik->module() == ModuleEntryTable::javabase_moduleEntry()) != is_javabase) { >> 193: continue; >> 194: } > > Why do we process system loader classes (i.e. application loader classes) if they need to be in java.base, as the application classes will never be in java.base. ??? The code is written for simplicity. If it becomes a performance problem we can change it. > src/hotspot/share/cds/aotClassLinker.cpp line 198: > >> 196: if (ik->is_shared() && CDSConfig::is_dumping_dynamic_archive()) { >> 197: if (CDSConfig::is_using_aot_linked_classes()) { >> 198: // This class was recorded as a AOT-linked for the base archive, > > Suggestion: > > // This class was recorded as AOT-linked for the base archive, Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1766115216 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1766115133 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1766115349 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1766115433 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1766115463 From iklam at openjdk.org Thu Sep 19 04:07:09 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 19 Sep 2024 04:07:09 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v9] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Wed, 18 Sep 2024 05:11:48 GMT, David Holmes wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed ZERO build > > src/hotspot/share/cds/archiveBuilder.cpp line 316: > >> 314: >> 315: if (CDSConfig::is_dumping_aot_linked_classes()) { >> 316: _estimated_hashtable_bytes += _klasses->length() * 16 * sizeof(Klass*); > > Why 16? Doing the estimate is actually difficult here (and also pointless). I've filed https://bugs.openjdk.org/browse/JDK-8340416 to remove the estimation altogether. For the time being, I change the estimate to 20MB which will be more than enough. > src/hotspot/share/cds/archiveBuilder.cpp line 877: > >> 875: if (ik->is_hidden()) { >> 876: ADD_COUNT(num_hidden_klasses); >> 877: hidden = " hidden"; > > Why not do this at the same time you do the other hidden class updates above? I moved the code. > src/hotspot/share/cds/cds_globals.hpp line 99: > >> 97: \ >> 98: /*========== New "AOT" flags =========================================*/ \ >> 99: /* The following 3 flags are aliases of -Xshare:dump, */ \ > > Nit: align the `*/`. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1766116185 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1766116232 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1766116258 From iklam at openjdk.org Thu Sep 19 04:11:37 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 19 Sep 2024 04:11:37 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v9] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Wed, 18 Sep 2024 05:07:33 GMT, David Holmes wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed ZERO build > > src/hotspot/share/cds/aotLinkedClassBulkLoader.cpp line 170: > >> 168: log_error(cds)("Unable to resolve %s class from CDS archive: %s", category_name, ik->external_name()); >> 169: log_error(cds)("Expected: " INTPTR_FORMAT ", actual: " INTPTR_FORMAT, p2i(ik), p2i(actual)); >> 170: log_error(cds)("JVMTI class retransformation is not supported when archive was generated with -XX:+AOTClassLinking."); > > Nit: use a `logStream` instead of the three separate calls. Why? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1766119067 From iklam at openjdk.org Thu Sep 19 04:19:18 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 19 Sep 2024 04:19:18 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v11] In-Reply-To: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: > This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > **Overview** > > - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. > - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. > - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. > - The boot classes are loaded as part of `vmClasses::resolve_all()` > - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). > - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. > > **All-or-nothing Loading** > > - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. > - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: > - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. > - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. > - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. > - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exact same set of modules are visible whe... Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: @dholmes-ora comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20843/files - new: https://git.openjdk.org/jdk/pull/20843/files/3215c002..dd5a5ba6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20843&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20843&range=09-10 Stats: 15 lines in 4 files changed: 2 ins; 8 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20843.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20843/head:pull/20843 PR: https://git.openjdk.org/jdk/pull/20843 From iklam at openjdk.org Thu Sep 19 04:19:18 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 19 Sep 2024 04:19:18 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v8] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> <7FyX0AFx1IRbgWFlAvOwJYgv-bgJ4w8E56h6DXSrGow=.4cfe39cc-d85e-47ea-ad6c-25cf19f6be24@github.com> Message-ID: On Wed, 18 Sep 2024 01:56:36 GMT, David Holmes wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> minor comment fix > > src/hotspot/share/cds/aotClassLinker.cpp line 212: > >> 210: } else { >> 211: const char* category = class_category_name(list.at(0)); >> 212: log_info(cds, aot, link)("written %d class(es) for category %s", list.length(), category); > > Suggestion: > > log_info(cds, aot, link)("wrote %d class(es) for category %s", list.length(), category); Fixed > src/hotspot/share/cds/aotClassLinker.hpp line 100: > >> 98: static bool is_vm_class(InstanceKlass* ik); >> 99: >> 100: // When CDS is enabled, is ik guatanteed to be linked at deployment time (and > > Suggestion: > > // When CDS is enabled, is ik guaranteed to be linked at deployment time (and Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1766121912 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1766121483 From iklam at openjdk.org Thu Sep 19 04:19:19 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 19 Sep 2024 04:19:19 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v8] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> <7FyX0AFx1IRbgWFlAvOwJYgv-bgJ4w8E56h6DXSrGow=.4cfe39cc-d85e-47ea-ad6c-25cf19f6be24@github.com> Message-ID: <7XXroXoisk4nBwGdd0GQV2Cnd-6FoRzq_9Bhya3bluQ=.cde1408b-5ef4-4c01-8dfd-47d78d82587c@github.com> On Wed, 18 Sep 2024 01:59:59 GMT, David Holmes wrote: >> src/hotspot/share/cds/aotClassLinker.hpp line 60: >> >>> 58: // - The visibility of C >>> 59: // >>> 60: // During an Production Run, the JVM can use an AOTCache with an AOTLinkedClassTable >> >> Suggestion: >> >> // During a Production Run, the JVM can use an AOTCache with an AOTLinkedClassTable > > Why is "Production Run" capitalized? Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1766121446 From iklam at openjdk.org Thu Sep 19 04:19:19 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 19 Sep 2024 04:19:19 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v9] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Wed, 18 Sep 2024 05:01:00 GMT, David Holmes wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed ZERO build > > src/hotspot/share/cds/aotLinkedClassBulkLoader.cpp line 66: > >> 64: >> 65: void AOTLinkedClassBulkLoader::load_classes_in_loader(JavaThread* current, AOTLinkedClassCategory class_category, oop class_loader_oop) { >> 66: ExceptionMark em(current); > > Why do you need the EM when you are explicitly checking for exceptions? Removed. > src/hotspot/share/cds/aotLinkedClassBulkLoader.cpp line 67: > >> 65: void AOTLinkedClassBulkLoader::load_classes_in_loader(JavaThread* current, AOTLinkedClassCategory class_category, oop class_loader_oop) { >> 66: ExceptionMark em(current); >> 67: ResourceMark rm(current); > > The RM should go where it is actually needed for the logging. Fixed > src/hotspot/share/cds/aotLinkedClassBulkLoader.cpp line 68: > >> 66: ExceptionMark em(current); >> 67: ResourceMark rm(current); >> 68: HandleMark hm(current); > > Why do you need a HM here? It was needed when this code was called from a different path. It's no longer needed now, so I removed it. > src/hotspot/share/cds/aotLinkedClassBulkLoader.cpp line 95: > >> 93: >> 94: if (Universe::is_fully_initialized() && VerifyDuringStartup) { >> 95: // Make sure we're still in a clean slate. > > Suggestion: > > // Make sure we're still in a clean state. Fixed > src/hotspot/share/cds/aotLinkedClassBulkLoader.cpp line 132: > >> 130: break; >> 131: case AOTLinkedClassCategory::UNREGISTERED: >> 132: ShouldNotReachHere(); // Currently aot-linked classes are not supported for this category. > > Suggestion: > > case AOTLinkedClassCategory::UNREGISTERED: > default: > ShouldNotReachHere(); // Currently aot-linked classes are not supported for this category. Fixed. > src/hotspot/share/cds/aotLinkedClassTable.hpp line 34: > >> 32: class SerializeClosure; >> 33: >> 34: // Classes to be buik-loaded, in the "linked" state, at VM bootstrap. > > Suggestion: > > // Classes to be bulk-loaded, in the "linked" state, at VM bootstrap. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1766121555 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1766121383 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1766121348 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1766121642 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1766121689 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1766121855 From dholmes at openjdk.org Thu Sep 19 04:30:38 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 19 Sep 2024 04:30:38 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v11] In-Reply-To: <0vNiw1Z0gtC71V-K2bi7tyawwHZj2K8rERNB9afFYMM=.96ddf556-3a86-41d1-a508-a6da0b69cd2b@github.com> References: <0vNiw1Z0gtC71V-K2bi7tyawwHZj2K8rERNB9afFYMM=.96ddf556-3a86-41d1-a508-a6da0b69cd2b@github.com> Message-ID: On Wed, 18 Sep 2024 10:18:28 GMT, Simon Tooke wrote: >> This PR changes the status of realpath() from a Posix-specific API to a globally available API, i.e. adding it to the "Hotspot Porting API". Code would refer to os::realpath() instead of os::Posix::realpath(). >> >> This requires a Windows implementation of realpath(), using Windows _fullpath(), and renaming os::Posix::realpath() to os::realpath(). >> >> The main difference between POSIX and Windows behaviour is that POSIX actually requires an existing accessible file, while Windows will happily work with made-up filenames. >> >> Please note that guidelines for doing this appear in src/hotspot/share/runtime/os.hpp > > Simon Tooke has updated the pull request incrementally with one additional commit since the last revision: > > remove tabs test/hotspot/gtest/runtime/test_os.cpp line 433: > 431: errno = 0; > 432: returnedBuffer = os::realpath(tmppath, buffer, MAX_PATH); > 433: EXPECT_TRUE(returnedBuffer == buffer); Should we also do `EXPECT_TRUE(errno == 0);` ? Here and below. test/hotspot/gtest/runtime/test_os.cpp line 453: > 451: errno = 0; > 452: returnedBuffer = os::realpath(tmppath, buffer, sizeof(buffer)); > 453: EXPECT_TRUE(errno == EINVAL); How is this an EINVAL case? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1766138433 PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1766140771 From dholmes at openjdk.org Thu Sep 19 04:49:38 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 19 Sep 2024 04:49:38 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v9] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Thu, 19 Sep 2024 03:17:03 GMT, Ioi Lam wrote: >> test/hotspot/jtreg/runtime/cds/appcds/aotClassLinking/AOTClassLinkingVMOptions.java line 57: >> >>> 55: testCase("Archived full module graph must be enabled at runtime"); >>> 56: TestCommon.run("-cp", appJar, "-Djdk.module.validation=1", "Hello") >>> 57: .assertAbnormalExit("CDS archive has aot-linked classes." + >> >> Nit: align the dots > > The CDS test cases indent by 4 spaces in this situation. I searched for `'^ *[.]'` lines in the JDK source code and indentation of 8 and 4 spaces seem to be most common. Method chaining has its own indentation style of aligning the dots - see all the stream using code for example. But just a nit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1766152229 From amitkumar at openjdk.org Thu Sep 19 04:49:40 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 19 Sep 2024 04:49:40 GMT Subject: RFR: 8339416: [s390x] Provide implementation for resolve_global_jobject [v3] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 11:01:40 GMT, Amit Kumar wrote: >> This PR provides "resolve_global_jobject" method implementation for s390x-port. >> >> Testing: >> * Tier1 test with Fastdebug; >> * Added these changes on top of https://github.com/openjdk/jdk/pull/20479 and modified the call in the stubGenerator_s390.cpp file; >> * 1. Ran tier1 test with a call to "resolve_jobect" >> * 2. Ran tier1 test with a call to "resolve_global_jobject" >> >> I didn't see any new failure appearing there. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > implements ModRefBarrierSetAssembler::resolve_jobject Thank you so much Lutz & Martin for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20986#issuecomment-2359967512 From amitkumar at openjdk.org Thu Sep 19 04:49:41 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 19 Sep 2024 04:49:41 GMT Subject: Integrated: 8339416: [s390x] Provide implementation for resolve_global_jobject In-Reply-To: References: Message-ID: On Fri, 13 Sep 2024 07:37:19 GMT, Amit Kumar wrote: > This PR provides "resolve_global_jobject" method implementation for s390x-port. > > Testing: > * Tier1 test with Fastdebug; > * Added these changes on top of https://github.com/openjdk/jdk/pull/20479 and modified the call in the stubGenerator_s390.cpp file; > * 1. Ran tier1 test with a call to "resolve_jobect" > * 2. Ran tier1 test with a call to "resolve_global_jobject" > > I didn't see any new failure appearing there. This pull request has now been integrated. Changeset: ac58b610 Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/ac58b6102a26ac2ca7f6df5f176d5b5ca1d00d45 Stats: 81 lines in 6 files changed: 70 ins; 4 del; 7 mod 8339416: [s390x] Provide implementation for resolve_global_jobject Reviewed-by: mdoerr, lucy ------------- PR: https://git.openjdk.org/jdk/pull/20986 From stefank at openjdk.org Thu Sep 19 05:06:51 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 19 Sep 2024 05:06:51 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: On Wed, 18 Sep 2024 23:59:39 GMT, Coleen Phillimore wrote: >> yes, but there is ongoing effort (at Oracle) to get rid of ```Klass::_prototype_header``` altogether. Let's wait for that and see how it looks then. > > Yes, I saw that patch. I'm not sure I like the idea of cpu dependent code also doing the encoding. There were some C2 changes related to it that I didn't understand if that scheme required them. I don't see the down side to having the prototype header pre-encoded in the markWord. Seems simpler. We already have a cpu dependent code for both C1 and the interpreter. Adding cpu dependent code to C2 doesn't make it significantly worse. My latest patch also refactors the code so that C1, interpreter, and C2 all calls into shared functions in the macro assembler. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766163092 From john.r.rose at oracle.com Thu Sep 19 05:14:12 2024 From: john.r.rose at oracle.com (John Rose) Date: Wed, 18 Sep 2024 22:14:12 -0700 Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v9] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On 18 Sep 2024, at 21:11, Ioi Lam wrote: > On Wed, 18 Sep 2024 05:07:33 GMT, David Holmes wrote: > ? >> src/hotspot/share/cds/aotLinkedClassBulkLoader.cpp line 170: >> >>> 168: log_error(cds)("Unable to resolve %s class from CDS archive: %s", category_name, ik->external_name()); >>> 169: log_error(cds)("Expected: " INTPTR_FORMAT ", actual: " INTPTR_FORMAT, p2i(ik), p2i(actual)); >>> 170: log_error(cds)("JVMTI class retransformation is not supported when archive was generated with -XX:+AOTClassLinking."); >> >> Nit: use a `logStream` instead of the three separate calls. > > Why? If this were running millions of times a second, and it were a debug or trace log message, using a log stream might batch up the gating logic instead of executing it three times, and it might make for a more efficient output, with three lines grouped cleanly. But for a rare error message, those reasons are less important. Maybe the code would be more readable with a log stream? But I find this code readable enough. YMMV From shade at openjdk.org Thu Sep 19 05:42:18 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 19 Sep 2024 05:42:18 GMT Subject: RFR: 8340392: Handle OopStorage in location decoder [v4] In-Reply-To: References: Message-ID: > Another debugging QoL improvement. Currently, when there is a pointer into `OopStorage` that we need to decode for the error log, we just print: > > 0x00007ad45c169e10 into live malloced block starting at 0x00007ad45c169dd0, size 632, tag mtInternal > > > This is reported by NMT after [JDK-8304815](https://bugs.openjdk.org/browse/JDK-8304815). It is likely worse without NMT. We can actually decode which block in which `OopStorage` the address likely belongs to. This becomes handy when debugging GC crashes that involve `OopStorage`-handled roots. > > This patch is able to print the following instead: > > > 0x0000000102c05bd0 is a pointer 2/64 into block 0 in oop storage "VM Global" Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: - Try to handle unaligned pointers well - Indenting and formats ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21072/files - new: https://git.openjdk.org/jdk/pull/21072/files/6045a429..17622ba9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21072&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21072&range=02-03 Stats: 44 lines in 7 files changed: 21 ins; 0 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/21072.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21072/head:pull/21072 PR: https://git.openjdk.org/jdk/pull/21072 From shade at openjdk.org Thu Sep 19 05:42:18 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 19 Sep 2024 05:42:18 GMT Subject: RFR: 8340392: Handle OopStorage in location decoder [v3] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 20:22:52 GMT, Kim Barrett wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix 32-bit builds > > src/hotspot/share/gc/shared/oopStorage.cpp line 1151: > >> 1149: bool OopStorage::Block::print_containing(oop* addr, outputStream* st) { >> 1150: if (contains(addr)) { >> 1151: st->print(INTPTR_FORMAT " is a pointer %u/" SIZE_FORMAT " into block %zu", > > s/SIZE_FORMAT/%zu/ > s/INTPTR_FORMAT/PTR_FORMAT/ - because it's semantically a pointer. Should be fixed in new commit. > src/hotspot/share/runtime/os.cpp line 1322: > >> 1320: >> 1321: // Ask if any OopStorage knows about this address. >> 1322: if (OopStorageSet::print_containing((oop*)addr, st)) { > > `addr` might not be oop-aligned, in which case this cast and use might lead to UB. I think this should > be gated on `is_aligned(addr, alignof(oop))`. I don't want to lose the match if the pointer is unaligned. The unaligned pointer might is still be technically _in range_ that is covered by the OopStorage. See new commit, I think we can handle it more accurately without losing this match? > test/hotspot/gtest/gc/shared/test_oopStorageSet.cpp line 109: > >> 107: class OopStorageSetTest::VM_PrintAtSafepoint : public VM_GTestExecuteAtSafepoint { >> 108: private: >> 109: class PrintContainingClosure : public Closure { > > Need another leading space here for proper indentation. Right, should be fixed in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21072#discussion_r1766185213 PR Review Comment: https://git.openjdk.org/jdk/pull/21072#discussion_r1766187319 PR Review Comment: https://git.openjdk.org/jdk/pull/21072#discussion_r1766185355 From dholmes at openjdk.org Thu Sep 19 05:42:41 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 19 Sep 2024 05:42:41 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v11] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: <_L_sO08BUKwfLyb7gI92xLcxfkk--1rnb01Zqonavro=.fa2cfb32-3b5d-4b4f-8175-e5d51b7e1bb1@github.com> On Thu, 19 Sep 2024 04:19:18 GMT, Ioi Lam wrote: >> This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> **Overview** >> >> - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. >> - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. >> - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. >> - The boot classes are loaded as part of `vmClasses::resolve_all()` >> - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). >> - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. >> >> **All-or-nothing Loading** >> >> - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. >> - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: >> - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. >> - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. >> - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. >> - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exa... > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @dholmes-ora comments src/hotspot/share/cds/aotClassLinker.cpp line 122: > 120: assert(CDSConfig::is_dumping_aot_linked_classes(), "sanity"); > 121: > 122: if (!SystemDictionaryShared::is_builtin(ik)) { What does this actually mean by "built-in"? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1766187038 From dholmes at openjdk.org Thu Sep 19 05:42:42 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 19 Sep 2024 05:42:42 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v8] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> <7FyX0AFx1IRbgWFlAvOwJYgv-bgJ4w8E56h6DXSrGow=.4cfe39cc-d85e-47ea-ad6c-25cf19f6be24@github.com> Message-ID: On Thu, 19 Sep 2024 04:00:38 GMT, Ioi Lam wrote: >> src/hotspot/share/cds/aotClassLinker.cpp line 145: >> >>> 143: return false; >>> 144: } >>> 145: } >> >> Are we concerned with the possibility that we might be able to add some interfaces but not all, hence returning false, but with a subset of interfaces already added to the candidate list? I don't think it should be possible, but the code structure makes it look like it could be possible. > > That's possible. Since we go through all the initial set of classes found in `ArchiveBuilder::current()->klasses()`, eventually all classes that are eligible for aot-linking will be added to the candidate list. The reason for the recursion is to > > - Filter out classes whose super types are not eligible > - Sort the candidate list so that super types always come before sub types. Doesn't that potentially leave a "super" class that has no subclasses? Or do we not really care? >> src/hotspot/share/cds/aotClassLinker.cpp line 191: >> >>> 189: if (ik->class_loader() != class_loader) { >>> 190: continue; >>> 191: } >> >> This seems very inefficient. We call `write_classes` 4 times, potentially with different loaders. Because the candidates are sorted the classes belonging to the same loader are likely to be grouped due to package names. So the app loader classes are likely to be right at end, and we have to traverse all the boot/platform classes first before we get to them. Conversely after we have encountered the last boot loader class (for example) we keep the scanning the entire list. If the set were ordered based on loader then name, we would be able to stop once we see the loader change to not being the desired one. And a binaery search would let you find the start of a section more quickly. > > The classes are sorted by class hierarchy. It's a linear operation repeated 4 times, so it's not really something we need to optimize. Okay so not sorted in any way that I anticipated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1766187802 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1766188844 From dholmes at openjdk.org Thu Sep 19 05:45:38 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 19 Sep 2024 05:45:38 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v8] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> <7FyX0AFx1IRbgWFlAvOwJYgv-bgJ4w8E56h6DXSrGow=.4cfe39cc-d85e-47ea-ad6c-25cf19f6be24@github.com> Message-ID: On Thu, 19 Sep 2024 04:01:07 GMT, Ioi Lam wrote: >> src/hotspot/share/cds/aotClassLinker.cpp line 194: >> >>> 192: if ((ik->module() == ModuleEntryTable::javabase_moduleEntry()) != is_javabase) { >>> 193: continue; >>> 194: } >> >> Why do we process system loader classes (i.e. application loader classes) if they need to be in java.base, as the application classes will never be in java.base. ??? > > The code is written for simplicity. If it becomes a performance problem we can change it. I misread this check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1766191759 From dholmes at openjdk.org Thu Sep 19 05:51:50 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 19 Sep 2024 05:51:50 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v9] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Thu, 19 Sep 2024 04:08:30 GMT, Ioi Lam wrote: >> src/hotspot/share/cds/aotLinkedClassBulkLoader.cpp line 170: >> >>> 168: log_error(cds)("Unable to resolve %s class from CDS archive: %s", category_name, ik->external_name()); >>> 169: log_error(cds)("Expected: " INTPTR_FORMAT ", actual: " INTPTR_FORMAT, p2i(ik), p2i(actual)); >>> 170: log_error(cds)("JVMTI class retransformation is not supported when archive was generated with -XX:+AOTClassLinking."); >> >> Nit: use a `logStream` instead of the three separate calls. > > Why? Atomic logging ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1766196724 From stefank at openjdk.org Thu Sep 19 05:53:48 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 19 Sep 2024 05:53:48 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 13:23:44 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > JVMCI support src/hotspot/share/gc/parallel/psParallelCompact.cpp line 787: > 785: // The gap is always equal to min-fill-size, so nothing to do. > 786: return; > 787: } Reading the comment, it is not obvious that this is correct if you set MinObjectAlignment to something larger than the default value: void PSParallelCompact::fill_dense_prefix_end(SpaceId id) { // Comparing two sizes to decide if filling is required: // // The size of the filler (min-obj-size) is 2 heap words with the default // MinObjAlignment, since both markword and klass take 1 heap word. // // The size of the gap (if any) right before dense-prefix-end is // MinObjAlignment. // // Need to fill in the gap only if it's smaller than min-obj-size, and the // filler obj will extend to next region. // Note: If min-fill-size decreases to 1, this whole method becomes redundant. if (UseCompactObjectHeaders) { // The gap is always equal to min-fill-size, so nothing to do. return; } assert(CollectedHeap::min_fill_size() >= 2, "inv"); src/hotspot/share/oops/compressedKlass.cpp line 231: > 229: // The reason is that we want to avoid, if possible, shifts larger than > 230: // a cacheline size. > 231: _base = addr; Why is this important? src/hotspot/share/oops/compressedKlass.hpp line 261: > 259: } > 260: > 261: }; Missing blank line before `#endif` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766185665 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766192688 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766193355 From stefank at openjdk.org Thu Sep 19 05:53:49 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 19 Sep 2024 05:53:49 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: <-UEFgAIQjGBginN0JRoyuwwJLmDssUEQGE_tCP-tRkw=.01ef3f37-01fa-4931-b4f3-571d21252bbd@github.com> On Thu, 19 Sep 2024 05:35:34 GMT, Stefan Karlsson wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> JVMCI support > > src/hotspot/share/gc/parallel/psParallelCompact.cpp line 787: > >> 785: // The gap is always equal to min-fill-size, so nothing to do. >> 786: return; >> 787: } > > Reading the comment, it is not obvious that this is correct if you set MinObjectAlignment to something larger than the default value: > > void PSParallelCompact::fill_dense_prefix_end(SpaceId id) { > // Comparing two sizes to decide if filling is required: > // > // The size of the filler (min-obj-size) is 2 heap words with the default > // MinObjAlignment, since both markword and klass take 1 heap word. > // > // The size of the gap (if any) right before dense-prefix-end is > // MinObjAlignment. > // > // Need to fill in the gap only if it's smaller than min-obj-size, and the > // filler obj will extend to next region. > > // Note: If min-fill-size decreases to 1, this whole method becomes redundant. > if (UseCompactObjectHeaders) { > // The gap is always equal to min-fill-size, so nothing to do. > return; > } > assert(CollectedHeap::min_fill_size() >= 2, "inv"); Style note: The added code is inserted between a comment and the code that the comment refers to. It would be nice to tidy this up. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766186545 From dholmes at openjdk.org Thu Sep 19 05:56:39 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 19 Sep 2024 05:56:39 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v11] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Thu, 19 Sep 2024 04:19:18 GMT, Ioi Lam wrote: >> This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> **Overview** >> >> - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. >> - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. >> - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. >> - The boot classes are loaded as part of `vmClasses::resolve_all()` >> - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). >> - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. >> >> **All-or-nothing Loading** >> >> - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. >> - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: >> - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. >> - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. >> - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. >> - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exa... > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @dholmes-ora comments Nothing further from me. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20843#pullrequestreview-2314464674 From jwaters at openjdk.org Thu Sep 19 06:12:37 2024 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 19 Sep 2024 06:12:37 GMT Subject: RFR: 8316930: HotSpot should use noexcept instead of throw() [v5] In-Reply-To: <9k00GYxtEiNBgrtIsIYJUIdwwPjynEm6aONdchZreP4=.0ad54916-180e-4317-8385-e339595a340a@github.com> References: <9k00GYxtEiNBgrtIsIYJUIdwwPjynEm6aONdchZreP4=.0ad54916-180e-4317-8385-e339595a340a@github.com> Message-ID: On Tue, 6 Feb 2024 07:04:00 GMT, Julian Waters wrote: >> throw() has been deprecated since C++11 alongside dynamic exception specifications, we should replace all instances of it with noexcept to prepare HotSpot for later versions of C++ > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - Merge branch 'openjdk:master' into noexcept > - Merge branch 'openjdk:master' into noexcept > - Typo in GensrcAdlc.gmk > - Merge branch 'openjdk:master' into noexcept > - Merge branch 'master' into noexcept > - ic in compiledIC.hpp > - Revert compiledIC.cpp > - Revert compiledIC.hpp > - Partially Revert parse.hpp > - Merge branch 'master' into noexcept > - ... and 4 more: https://git.openjdk.org/jdk/compare/9ee9f288...b73a6882 That was very rude of you ------------- PR Comment: https://git.openjdk.org/jdk/pull/15910#issuecomment-2360062531 From dholmes at openjdk.org Thu Sep 19 06:16:37 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 19 Sep 2024 06:16:37 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v9] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: <7smfUywBZdYZsE-Yw1a_7pFmIdKO5Fot7sA96aJ_kmc=.a9f56d6c-58a8-4951-a2eb-de4242d179a0@github.com> On Thu, 19 Sep 2024 05:49:22 GMT, David Holmes wrote: >> Why? > > Atomic logging @stefank pointed out to me that wouldn't actually be atomic unless you used `NonInterleavingLogStream`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1766216952 From dholmes at openjdk.org Thu Sep 19 06:17:35 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 19 Sep 2024 06:17:35 GMT Subject: RFR: 8340392: Handle OopStorage in location decoder [v4] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 05:42:18 GMT, Aleksey Shipilev wrote: >> Another debugging QoL improvement. Currently, when there is a pointer into `OopStorage` that we need to decode for the error log, we just print: >> >> 0x00007ad45c169e10 into live malloced block starting at 0x00007ad45c169dd0, size 632, tag mtInternal >> >> >> This is reported by NMT after [JDK-8304815](https://bugs.openjdk.org/browse/JDK-8304815). It is likely worse without NMT. We can actually decode which block in which `OopStorage` the address likely belongs to. This becomes handy when debugging GC crashes that involve `OopStorage`-handled roots. >> >> This patch is able to print the following instead: >> >> >> 0x0000000102c05bd0 is a pointer 2/64 into block 0 in oop storage "VM Global" > > Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: > > - Try to handle unaligned pointers well > - Indenting and formats src/hotspot/share/gc/shared/oopStorage.cpp line 1140: > 1138: bool OopStorage::print_containing(const oop* addr, outputStream* st) { > 1139: if (addr != nullptr) { > 1140: Block *block = find_block_or_null(addr); Suggestion: Block* block = find_block_or_null(addr); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21072#discussion_r1766207638 From epeter at openjdk.org Thu Sep 19 06:31:43 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 19 Sep 2024 06:31:43 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v13] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 16:53:53 GMT, Paul Sandoz wrote: > > > > Why is this even called `VectorMath`? Because those ops are not at all restricted to vectorization, right? > > > > > > > > > Nomenclature is suggested by Paul. > > > > > > @PaulSandoz Do you want to limit these **scalar** operations to a class name that implies **vector** use? > > It's whatever math functions are required to in support of vector operations (as the JavaDoc indicates) that are not provided by other classes such as the boxed primitives or `java.lang.Math`. Ok. I suppose these methods could eventually be moved to `java.lang.Math` or some other `java.lang` class, when the VectorAPI goes out of incubator mode? I feel like these saturating operations, and also the unsigned ops could find a more wider use, away from (explicit) vector usage. For example, the saturating operations are nice because they prevent overflows, and in some cases that would be very nice to have readily available. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2360099389 From fyang at openjdk.org Thu Sep 19 06:32:37 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 19 Sep 2024 06:32:37 GMT Subject: RFR: 8339771: RISC-V: Reduce icache flushes [v2] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 12:53:18 GMT, Robbin Ehn wrote: >> Hey, please consider, >> >> All code which is offline (behind a barrier) do not need global icache flushes. >> As we can instead in slow path locally (thread and hart) emit fence.i. >> But if we were to be context switch do a hart which have not had fence.i emitted we can still fetch stale instructions. >> To handle this case new now have kernel support: >> https://docs.kernel.org/arch/riscv/cmodx.html >> >> It's not perfect as we will be emitting fence.i on any context switch for any thread with this patch, even if that thread do not execute on code heap (non attached native thread), and even if there was no changes to code heap. >> But this is in many cases much faster as the icache flush global IPI is very intrusive. >> Particular cases are running a concurrent gc with small head room. >> In such scenario I measured 15% increased throughput on VF2. >> A large CPU or less head room (faster GC cycles) will yield even more performance boost. >> >> Note that this requires 6.10 kernel. >> >> I'm running VF2 with 6.11-rc3 kernel and this passed tier1-3. (With setting on) >> >> Later we probably want this default on, but as it's hard to test I'll leave default off. > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Comment, moved init after feature enabling src/hotspot/cpu/riscv/relocInfo_riscv.cpp line 61: > 59: if (!UseCtxFencei) { > 60: ICache::invalidate_range(addr(), bytes); > 61: } One more question: Do we need a full fence (`OrderAccess::fence()`) here with `UseCtxFencei` after the patching? Like you do in `ZBarrierSetAssembler::patch_barrier_relocation()`: if (!UseCtxFencei) { // A full fence is generated before icache_flush by default in invalidate_word ICache::invalidate_range(addr, bytes); } else { OrderAccess::fence(); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20913#discussion_r1766233124 From rehn at openjdk.org Thu Sep 19 06:40:38 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 19 Sep 2024 06:40:38 GMT Subject: RFR: 8339771: RISC-V: Reduce icache flushes [v2] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 06:30:24 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: >> >> Comment, moved init after feature enabling > > src/hotspot/cpu/riscv/relocInfo_riscv.cpp line 61: > >> 59: if (!UseCtxFencei) { >> 60: ICache::invalidate_range(addr(), bytes); >> 61: } > > One more question: Do we need a full fence (`OrderAccess::fence()`) here with `UseCtxFencei` after the patching? Like you do in `ZBarrierSetAssembler::patch_barrier_relocation()`: > > if (!UseCtxFencei) { > // A full fence is generated before icache_flush by default in invalidate_word > ICache::invalidate_range(addr, bytes); > } else { > OrderAccess::fence(); > } I actually didn't look at that. As the old case did a full fence when calling ICache::invalidate_range(addr, bytes); and with that comment, I assumed there was a reason for pointing it out, so I just kept the old behavior. I'll see If I can figure out if/why/what. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20913#discussion_r1766238399 From rehn at openjdk.org Thu Sep 19 06:40:38 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 19 Sep 2024 06:40:38 GMT Subject: RFR: 8339771: RISC-V: Reduce icache flushes [v2] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 06:35:46 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/relocInfo_riscv.cpp line 61: >> >>> 59: if (!UseCtxFencei) { >>> 60: ICache::invalidate_range(addr(), bytes); >>> 61: } >> >> One more question: Do we need a full fence (`OrderAccess::fence()`) here with `UseCtxFencei` after the patching? Like you do in `ZBarrierSetAssembler::patch_barrier_relocation()`: >> >> if (!UseCtxFencei) { >> // A full fence is generated before icache_flush by default in invalidate_word >> ICache::invalidate_range(addr, bytes); >> } else { >> OrderAccess::fence(); >> } > > I actually didn't look at that. > As the old case did a full fence when calling ICache::invalidate_range(addr, bytes); and with that comment, > I assumed there was a reason for pointing it out, so I just kept the old behavior. > I'll see If I can figure out if/why/what. Sorry, now I realize what you are asking. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20913#discussion_r1766241194 From jbhateja at openjdk.org Thu Sep 19 06:44:20 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 19 Sep 2024 06:44:20 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v16] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Tests for newly added VectorMath.* operations ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/f81b2525..bc08bab5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=14-15 Stats: 331 lines in 13 files changed: 282 ins; 44 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From fyang at openjdk.org Thu Sep 19 06:46:38 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 19 Sep 2024 06:46:38 GMT Subject: RFR: 8339771: RISC-V: Reduce icache flushes [v2] In-Reply-To: References: Message-ID: <355DP8wyWpGSrOmvF1hcLGg2Q6qDaCqmgoQ3AzSR1ww=.0882c0d8-7d74-4c48-9210-a815950e10d1@github.com> On Thu, 19 Sep 2024 06:38:24 GMT, Robbin Ehn wrote: >> I actually didn't look at that. >> As the old case did a full fence when calling ICache::invalidate_range(addr, bytes); and with that comment, >> I assumed there was a reason for pointing it out, so I just kept the old behavior. >> I'll see If I can figure out if/why/what. > > Sorry, now I realize what you are asking. The purpose is to make a store to instruction memory visible to all RISC-V harts. Check this code in file icache_riscv.cpp: static int icache_flush(address addr, int lines, int magic) { // To make a store to instruction memory visible to all RISC-V harts, // the writing hart has to execute a data FENCE before requesting that // all remote RISC-V harts execute a FENCE.I. // We need to make sure stores happens before the I/D cache synchronization. __asm__ volatile("fence rw, rw" : : : "memory"); <============== RiscvFlushIcache::flush((uintptr_t)addr, ((uintptr_t)lines) << ICache::log2_line_size); return magic; } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20913#discussion_r1766245777 From jbhateja at openjdk.org Thu Sep 19 06:44:20 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 19 Sep 2024 06:44:20 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v13] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 14:22:16 GMT, Emanuel Peter wrote: > > > Do we have tests for the publically exposed methods in this new file? Or are they only implicitly tested through the VectorAPI, and its tests? `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java` > > > Why is this even called `VectorMath`? Because those ops are not at all restricted to vectorization, right? > > > > > > Nomenclature is suggested by Paul. > > @PaulSandoz Do you want to limit these **scalar** operations to a class name that implies **vector** use? > > > We have sufficient test coverage of these APIs in JTREG tests. > > @jatin-bhateja I can't see any dedicated JTREG tests to the `VectorMath` methods. I only see the VectorAPI tests. Can you point me to the `VectorMath` tests? I'd like to review them. Hi @eme64 , @PaulSandoz Yes dedicated test for each of newly added VectorMath operation is justified here. Thanks, let me know if there are other comments. > > > > > Why is this even called `VectorMath`? Because those ops are not at all restricted to vectorization, right? > > > > > > > > > > > > Nomenclature is suggested by Paul. > > > > > > > > > @PaulSandoz Do you want to limit these **scalar** operations to a class name that implies **vector** use? > > > > > > It's whatever math functions are required to in support of vector operations (as the JavaDoc indicates) that are not provided by other classes such as the boxed primitives or `java.lang.Math`. > > Ok. I suppose these methods could eventually be moved to `java.lang.Math` or some other `java.lang` class, when the VectorAPI goes out of incubator mode? > > I feel like these saturating operations, and also the unsigned ops could find a more wider use, away from (explicit) vector usage. For example, the saturating operations are nice because they prevent overflows, and in some cases that would be very nice to have readily available. Hi @eme64 , yes that what our extended plan is, for this patch we want to restrict its use to VectorAPI. > > > Do we have tests for the publically exposed methods in this new file? Or are they only implicitly tested through the VectorAPI, and its tests? `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorMath.java` > > > Why is this even called `VectorMath`? Because those ops are not at all restricted to vectorization, right? > > > > > > Nomenclature is suggested by Paul. > > @PaulSandoz Do you want to limit these **scalar** operations to a class name that implies **vector** use? > > > We have sufficient test coverage of these APIs in JTREG tests. > > @jatin-bhateja I can't see any dedicated JTREG tests to the `VectorMath` methods. I only see the VectorAPI tests. Can you point me to the `VectorMath` tests? I'd like to review them. @eme64 , @PaulSandoz , I agree that explicit test for all newly added VectorMath operation for all integral types is justified here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2360118143 From jbhateja at openjdk.org Thu Sep 19 06:53:15 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 19 Sep 2024 06:53:15 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v17] In-Reply-To: References: Message-ID: <-L7RYBQd-Q6zLkv5GKU0PDM2SZ-jdm1zAk1VRedDgyM=.c712848d-145b-4ecd-af2f-1a811832559d@github.com> > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Tuning extra spaces. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/bc08bab5..eb2960a9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=15-16 Stats: 38 lines in 1 file changed: 0 ins; 0 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From jbhateja at openjdk.org Thu Sep 19 06:55:45 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 19 Sep 2024 06:55:45 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v13] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 06:41:16 GMT, Jatin Bhateja wrote: > > > > > Why is this even called `VectorMath`? Because those ops are not at all restricted to vectorization, right? > > > > > > > > > > > > Nomenclature is suggested by Paul. > > > > > > > > > @PaulSandoz Do you want to limit these **scalar** operations to a class name that implies **vector** use? > > > > > > It's whatever math functions are required to in support of vector operations (as the JavaDoc indicates) that are not provided by other classes such as the boxed primitives or `java.lang.Math`. > > Ok. I suppose these methods could eventually be moved to `java.lang.Math` or some other `java.lang` class, when the VectorAPI goes out of incubator mode? > > I feel like these saturating operations, and also the unsigned ops could find a more wider use, away from (explicit) vector usage. For example, the saturating operations are nice because they prevent overflows, and in some cases that would be very nice to have readily available. Hi @eme64 , as per @PaulSandoz and @jddarcy we should wait till Valhalla preview to add full blown unsigned value type and associated operations, for the time being restricting the scope of these new operations to VectorAPI. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2360137469 From rrich at openjdk.org Thu Sep 19 07:01:38 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 19 Sep 2024 07:01:38 GMT Subject: RFR: 8338995: New Object to ObjectMonitor mapping: PPC64 implementation [v4] In-Reply-To: References: Message-ID: <7QUo4Se63eTJ6pZ7k1ICDut1ACTf8__GnyyEqL3nFh8=.2cfc41dd-c6a3-42c3-bd47-d4c336e3c735@github.com> On Wed, 18 Sep 2024 14:19:20 GMT, Martin Doerr wrote: >> PPC64 implementation of [JDK-8315884](https://bugs.openjdk.org/browse/JDK-8315884). > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Consistently use CCR0 in compiler_fast_lock_lightweight_object and compiler_fast_unlock_lightweight_object Thanks for rerunning the micro benchmark. Changes look good to me. Cheers, Richard. ------------- Marked as reviewed by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20922#pullrequestreview-2314579288 From jbhateja at openjdk.org Thu Sep 19 07:10:38 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 19 Sep 2024 07:10:38 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v3] In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: <-SvKZpGY6NbQyh2PnmV5--a8f4oKdSq3VQKV2siSawg=.c812df74-12d4-428b-a7f9-5b1945cdae39@github.com> On Wed, 18 Sep 2024 17:00:30 GMT, Sandhya Viswanathan wrote: >> Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. >> >> Summary of changes is as follows: >> 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. >> 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code >> >> For the following source: >> >> >> public void test() { >> var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); >> for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { >> var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); >> index.selectFrom(inpvect).intoArray(byteres, j); >> } >> } >> >> >> The code generated for inner main now looks as follows: >> ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 >> 0x00007f40d02274d0: movslq %ebx,%r13 >> 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) >> 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) >> 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) >> 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) >> 0x00007f40d022751f: add $0x40,%ebx >> 0x00007f40d0227522: cmp %r8d,%ebx >> 0x00007f40d0227525: jl 0x00007f40d02274d0 >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Change method name src/hotspot/share/opto/vectorIntrinsics.cpp line 772: > 770: > 771: if (elem_klass == nullptr || shuffle_klass == nullptr || shuffle->is_top() || vlen == nullptr) { > 772: return false; // dead code Why dead code in comment ? this is a failed intrinsification condition. src/hotspot/share/opto/vectorIntrinsics.cpp line 776: > 774: if (!vlen->is_con() || shuffle_klass->const_oop() == nullptr) { > 775: return false; // not enough info for intrinsification > 776: } Why don't you club it with above conditions to be consistent with other inline expanders ? src/hotspot/share/opto/vectorIntrinsics.cpp line 790: > 788: // Shuffles use byte array based backing storage > 789: BasicType shuffle_bt = T_BYTE; > 790: No need a of new line b/w 789 and 791 src/hotspot/share/opto/vectorIntrinsics.cpp line 793: > 791: if (!arch_supports_vector(Op_AndV, num_elem, shuffle_bt, VecMaskNotUsed) || > 792: !arch_supports_vector(Op_Replicate, num_elem, shuffle_bt, VecMaskNotUsed)) { > 793: return false; You should emit proper intrinsification failure message here. src/hotspot/share/opto/vectorIntrinsics.cpp line 805: > 803: const TypeVect* vt = TypeVect::make(shuffle_bt, num_elem); > 804: const Type* shuffle_type_bt = Type::get_const_basic_type(shuffle_bt); > 805: No need of a blank line here. src/hotspot/share/opto/vectorIntrinsics.cpp line 808: > 806: Node* mod_mask = gvn().makecon(TypeInt::make(num_elem-1)); > 807: Node* bcast_mod_mask = gvn().transform(VectorNode::scalar2vector(mod_mask, num_elem, shuffle_type_bt)); > 808: Remove redundant new line. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766272449 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766273205 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766273880 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766274718 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766275107 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766275345 From rcastanedalo at openjdk.org Thu Sep 19 07:11:37 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 19 Sep 2024 07:11:37 GMT Subject: RFR: 8340363: User-specified default decorators for UnifiedLogging In-Reply-To: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: On Fri, 13 Sep 2024 09:03:55 GMT, Ant?n Seoane wrote: > Hi all, > > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: > > - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. > - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. > - Additionally, defaults may target a specific log level. > > Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. > > In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. > > Please consider this PR, and thanks! Nice proposal, Ant?n! This will make it possible to migrate lots of debug/trace-level ad-hoc logging in the compiler code to the UL while preserving its current format (e.g. time decorators are hardly needed when examining the output of `-XX:+TraceLoopOpts`). Having said this, I find the following behavior unintuitive. If I run: -Xlog:jit*=debug I get the global default decorators, i.e. `uptime,level,tags`, which is what I expected. But if I run: java -Xlog:jit+compilation=debug,jit+inlining=debug,jit+thread=debug I would expect to get the same decorators, but instead I get the default decorators for `jit+inlining`, i.e. none. Is this intentional? In general, as a HotSpot developer the behavior I would find most natural is to select the union of all decorators for all chosen tags (regardless of whether the decorators for a tag have been chosen actively by the user, specified as default for the tag, or "inherited" from the global default), as in the first option (`-Xlog:jit*=debug`). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20988#issuecomment-2360164889 From rcastanedalo at openjdk.org Thu Sep 19 07:17:38 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 19 Sep 2024 07:17:38 GMT Subject: RFR: 8340363: User-specified default decorators for UnifiedLogging In-Reply-To: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: On Fri, 13 Sep 2024 09:03:55 GMT, Ant?n Seoane wrote: > Hi all, > > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: > > - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. > - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. > - Additionally, defaults may target a specific log level. > > Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. > > In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. > > Please consider this PR, and thanks! src/hotspot/share/logging/logDecorators.cpp line 31: > 29: const LogLevelType AnyLevel = LogLevelType::NotMentioned; > 30: #define DEFAULT_DECORATORS \ > 31: DEFAULT_VALUE(mask_from_decorators(NoDecorators), AnyLevel, LOG_TAGS(jit, inlining)) As a compiler developer, I agree with the choice of no decorators by default for `jit+inlining`. When this is the only tag selected, there isn't much value in the information provided by the global default decorators. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20988#discussion_r1766288455 From rehn at openjdk.org Thu Sep 19 07:22:36 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 19 Sep 2024 07:22:36 GMT Subject: RFR: 8339771: RISC-V: Reduce icache flushes [v2] In-Reply-To: <355DP8wyWpGSrOmvF1hcLGg2Q6qDaCqmgoQ3AzSR1ww=.0882c0d8-7d74-4c48-9210-a815950e10d1@github.com> References: <355DP8wyWpGSrOmvF1hcLGg2Q6qDaCqmgoQ3AzSR1ww=.0882c0d8-7d74-4c48-9210-a815950e10d1@github.com> Message-ID: On Thu, 19 Sep 2024 06:42:29 GMT, Fei Yang wrote: >> Sorry, now I realize what you are asking. > > The purpose is to make a store to instruction memory visible to all RISC-V harts. > Check this code in file icache_riscv.cpp: > > static int icache_flush(address addr, int lines, int magic) { > // To make a store to instruction memory visible to all RISC-V harts, > // the writing hart has to execute a data FENCE before requesting that > // all remote RISC-V harts execute a FENCE.I. > > // We need to make sure stores happens before the I/D cache synchronization. > __asm__ volatile("fence rw, rw" : : : "memory"); <============== > > RiscvFlushIcache::flush((uintptr_t)addr, ((uintptr_t)lines) << ICache::log2_line_size); > > return magic; > } > > > PS: Here is what the spec says: > > `FENCE.I does not ensure that other RISC-V harts? instruction fetches will > observe the local hart?s stores in a multiprocessor system. To make a store to instruction memory > visible to all RISC-V harts, the writing hart also has to execute a data FENCE before requesting > that all remote RISC-V harts execute a FENCE.I.` Yes. The comment is a bit misleading. A fence wr, wr is just for locally, on a hart, order instructions. A fence wr wr do not flush a store from the store buffer, it just means that the hart must act such that the store appears to have happened before the syscall. Meaning the store can not happen after the syscall. As we are not using the syscall: Instead the other threads/harts must emit the fence.i them self, either by leaving a safepoint, hitting the patch_epoch_barrier and if they are moved to another hart. Which means the store must happen before we leave the safepoint and disarming the nmethod barrier. These already have store fences as we have a bunch of other stores which must be ordered. For the nmethod barrier disarming is a full fence as the new epoch must happen before the disarm: 0: (nmethod barriers armed, implies storestore) 1: stores to instruction stream 2: store new patching epoch 3: storestore // the store to instruction and patching epoch must happen before disarm 4: disarm If the stores to instruction stream and storing the new patching epoch happens in another order that is fine, as the critical thing is the disarm. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20913#discussion_r1766290017 From rehn at openjdk.org Thu Sep 19 07:25:37 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 19 Sep 2024 07:25:37 GMT Subject: RFR: 8339771: RISC-V: Reduce icache flushes [v2] In-Reply-To: References: <355DP8wyWpGSrOmvF1hcLGg2Q6qDaCqmgoQ3AzSR1ww=.0882c0d8-7d74-4c48-9210-a815950e10d1@github.com> Message-ID: On Thu, 19 Sep 2024 07:16:20 GMT, Robbin Ehn wrote: >> The purpose is to make a store to instruction memory visible to all RISC-V harts. >> Check this code in file icache_riscv.cpp: >> >> static int icache_flush(address addr, int lines, int magic) { >> // To make a store to instruction memory visible to all RISC-V harts, >> // the writing hart has to execute a data FENCE before requesting that >> // all remote RISC-V harts execute a FENCE.I. >> >> // We need to make sure stores happens before the I/D cache synchronization. >> __asm__ volatile("fence rw, rw" : : : "memory"); <============== >> >> RiscvFlushIcache::flush((uintptr_t)addr, ((uintptr_t)lines) << ICache::log2_line_size); >> >> return magic; >> } >> >> >> PS: Here is what the spec says: >> >> `FENCE.I does not ensure that other RISC-V harts? instruction fetches will >> observe the local hart?s stores in a multiprocessor system. To make a store to instruction memory >> visible to all RISC-V harts, the writing hart also has to execute a data FENCE before requesting >> that all remote RISC-V harts execute a FENCE.I.` > > Yes. The comment is a bit misleading. > A fence wr, wr is just for locally, on a hart, order instructions. > A fence wr wr do not flush a store from the store buffer, it just means that the hart must act such that the store appears to have happened before the syscall. Meaning the store can not happen after the syscall. > > As we are not using the syscall: > Instead the other threads/harts must emit the fence.i them self, either by leaving a safepoint, hitting the patch_epoch_barrier and if they are moved to another hart. > > Which means the store must happen before we leave the safepoint and disarming the nmethod barrier. > These already have store fences as we have a bunch of other stores which must be ordered. > For the nmethod barrier disarming is a full fence as the new epoch must happen before the disarm: > > > 0: (nmethod barriers armed, implies storestore) > 1: stores to instruction stream > 2: store new patching epoch > 3: storestore // the store to instruction and patching epoch must happen before disarm > 4: disarm > > > If the stores to instruction stream and storing the new patching epoch happens in another order that is fine, as the critical thing is the disarm. Code would in BarrierSetNMethod::set_guard_value(...) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20913#discussion_r1766299491 From kbarrett at openjdk.org Thu Sep 19 07:26:40 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 19 Sep 2024 07:26:40 GMT Subject: RFR: 8340392: Handle OopStorage in location decoder [v4] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 05:42:18 GMT, Aleksey Shipilev wrote: >> Another debugging QoL improvement. Currently, when there is a pointer into `OopStorage` that we need to decode for the error log, we just print: >> >> 0x00007ad45c169e10 into live malloced block starting at 0x00007ad45c169dd0, size 632, tag mtInternal >> >> >> This is reported by NMT after [JDK-8304815](https://bugs.openjdk.org/browse/JDK-8304815). It is likely worse without NMT. We can actually decode which block in which `OopStorage` the address likely belongs to. This becomes handy when debugging GC crashes that involve `OopStorage`-handled roots. >> >> This patch is able to print the following instead: >> >> >> 0x0000000102c05bd0 is a pointer 2/64 into block 0 in oop storage "VM Global" > > Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: > > - Try to handle unaligned pointers well > - Indenting and formats Changes requested by kbarrett (Reviewer). src/hotspot/share/gc/shared/oopStorageSet.cpp line 86: > 84: > 85: bool OopStorageSet::print_containing(const void* addr, outputStream* st) { > 86: const void* aligned_addr = align_down(addr, alignof(oop*)); Should be `alignof(oop)`. src/hotspot/share/gc/shared/oopStorageSet.cpp line 87: > 85: bool OopStorageSet::print_containing(const void* addr, outputStream* st) { > 86: const void* aligned_addr = align_down(addr, alignof(oop*)); > 87: if (aligned_addr != nullptr) { I think the nullptr check should come first. Pointer arithmetic can't produce a null pointer (because of UB), so I think a sufficiently smart might be able to conclude that this test is always true in the absence of UB in the alignment code. src/hotspot/share/gc/shared/oopStorageSet.cpp line 88: > 86: const void* aligned_addr = align_down(addr, alignof(oop*)); > 87: if (aligned_addr != nullptr) { > 88: for (uint i = 0; i < OopStorageSet::all_count; i++) { Don't need the `OopStorageSet::` qualifier here, since we're in that class already. But better would be to iterate over the storages using the provided iteration mechanism: for (OopStorage* storage : Range()) { ... } test/hotspot/gtest/gc/shared/test_oopStorageSet.cpp line 139: > 137: void doit() { > 138: PrintContainingClosure cl; > 139: for (auto id: EnumRange()) { Instead use for (OopStorage* storage : OopStorageSet::Range()) { ... } test/hotspot/gtest/gc/shared/test_oopStorageSet.cpp line 166: > 164: { > 165: stringStream ss; > 166: bool printed = OopStorageSet::print_containing((char*)0x8, &ss); Instead of hard-coded 0x8, instead use alignof(oop). ------------- PR Review: https://git.openjdk.org/jdk/pull/21072#pullrequestreview-2314518336 PR Review Comment: https://git.openjdk.org/jdk/pull/21072#discussion_r1766234492 PR Review Comment: https://git.openjdk.org/jdk/pull/21072#discussion_r1766237573 PR Review Comment: https://git.openjdk.org/jdk/pull/21072#discussion_r1766241306 PR Review Comment: https://git.openjdk.org/jdk/pull/21072#discussion_r1766297311 PR Review Comment: https://git.openjdk.org/jdk/pull/21072#discussion_r1766282757 From stooke at openjdk.org Thu Sep 19 07:26:42 2024 From: stooke at openjdk.org (Simon Tooke) Date: Thu, 19 Sep 2024 07:26:42 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v11] In-Reply-To: References: <0vNiw1Z0gtC71V-K2bi7tyawwHZj2K8rERNB9afFYMM=.96ddf556-3a86-41d1-a508-a6da0b69cd2b@github.com> Message-ID: On Thu, 19 Sep 2024 04:24:09 GMT, David Holmes wrote: >> Simon Tooke has updated the pull request incrementally with one additional commit since the last revision: >> >> remove tabs > > test/hotspot/gtest/runtime/test_os.cpp line 433: > >> 431: errno = 0; >> 432: returnedBuffer = os::realpath(tmppath, buffer, MAX_PATH); >> 433: EXPECT_TRUE(returnedBuffer == buffer); > > Should we also do `EXPECT_TRUE(errno == 0);` ? Here and below. This is interesting! I found that on Linux, errno _was not zero_! The specifications for POSIX realpath say `RETURN VALUE Upon successful completion, realpath() shall return a pointer to the resolved name. Otherwise, realpath() shall return a null pointer and set errno to indicate the error, and the contents of the buffer pointed to by resolved_name are undefined.` Nowhere does it say errno is unchanged if successful. errno = 0; ::printf("before ::realpath("/tmp",nullptr) errno=%d\n", errno); char* p = ::realpath("/tmp", nullptr); ::printf("after ::realpath p=%s errno=%d\n", p, errno); outputs: before ::realpath("/tmp",nullptr) errno=0 after ::realpath /tmp p=/tmp errno=22 With behaviour like this, one can see why OpenJDK wraps ::realpath()... Compiler used: g++ (GCC) 14.2.1 20240801 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1766296852 From stooke at openjdk.org Thu Sep 19 07:29:38 2024 From: stooke at openjdk.org (Simon Tooke) Date: Thu, 19 Sep 2024 07:29:38 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v11] In-Reply-To: References: <0vNiw1Z0gtC71V-K2bi7tyawwHZj2K8rERNB9afFYMM=.96ddf556-3a86-41d1-a508-a6da0b69cd2b@github.com> Message-ID: On Thu, 19 Sep 2024 04:27:54 GMT, David Holmes wrote: >> Simon Tooke has updated the pull request incrementally with one additional commit since the last revision: >> >> remove tabs > > test/hotspot/gtest/runtime/test_os.cpp line 453: > >> 451: errno = 0; >> 452: returnedBuffer = os::realpath(tmppath, buffer, sizeof(buffer)); >> 453: EXPECT_TRUE(errno == EINVAL); > > How is this an EINVAL case? You are correct - this was a cut and paste error; the call tested should have been `os::realpath(tmppath, nullptr, sizeof(buffer));` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1766304627 From jbhateja at openjdk.org Thu Sep 19 07:32:38 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 19 Sep 2024 07:32:38 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v3] In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: <2pPJKmEHM24iStw8Xv2IQ08Xrp7Ag3P2_9yEzsS4nOw=.49f3554d-0917-40ff-8824-d8719c3d271f@github.com> On Wed, 18 Sep 2024 17:00:30 GMT, Sandhya Viswanathan wrote: >> Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. >> >> Summary of changes is as follows: >> 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. >> 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code >> >> For the following source: >> >> >> public void test() { >> var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); >> for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { >> var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); >> index.selectFrom(inpvect).intoArray(byteres, j); >> } >> } >> >> >> The code generated for inner main now looks as follows: >> ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 >> 0x00007f40d02274d0: movslq %ebx,%r13 >> 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) >> 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) >> 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) >> 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) >> 0x00007f40d022751f: add $0x40,%ebx >> 0x00007f40d0227522: cmp %r8d,%ebx >> 0x00007f40d0227525: jl 0x00007f40d02274d0 >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Change method name Hi @sviswa7 , some comments, overall patch looks good to me. Best Regards, Jatin src/hotspot/share/opto/vectorIntrinsics.cpp line 2120: > 2118: > 2119: if (vector_klass == nullptr || elem_klass == nullptr || vlen == nullptr) { > 2120: return false; // dead code Why dead code in comments ? src/hotspot/share/opto/vectorIntrinsics.cpp line 2129: > 2127: NodeClassNames[argument(2)->Opcode()], > 2128: NodeClassNames[argument(3)->Opcode()]); > 2129: return false; // not enough info for intrinsification Please club this with above condition to be consistent with other inline expanders. src/hotspot/share/opto/vectorIntrinsics.cpp line 2141: > 2139: } > 2140: BasicType elem_bt = elem_type->basic_type(); > 2141: Remove new line. src/hotspot/share/opto/vectorIntrinsics.cpp line 2144: > 2142: int num_elem = vlen->get_con(); > 2143: if ((num_elem < 4) || !is_power_of_2(num_elem)) { > 2144: log_if_needed(" ** vlen < 4 or not power of two=%d", num_elem); Will num_elem < 4 not be handled by L2149 since we have an implementation limitation to support less than 32-bit shuffle / masks. src/hotspot/share/opto/vectorIntrinsics.cpp line 2171: > 2169: use_predicate = false; > 2170: if(!is_masked_op || > 2171: (!arch_supports_vector(Op_VectorRearrange, num_elem, elem_bt, VecMaskNotUsed) || Suggestion: (!arch_supports_vector(Op_VectorRearrange, num_elem, elem_bt, VecMaskUseLoad) || src/hotspot/share/opto/vectorIntrinsics.cpp line 2188: > 2186: > 2187: if (v1 == nullptr || v2 == nullptr) { > 2188: return false; // operand unboxing failed To be consistent with other expanders please emit proper error for unboxing failure like on following line. https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectorIntrinsics.cpp#L426 src/hotspot/share/opto/vectorIntrinsics.cpp line 2197: > 2195: mask = unbox_vector(argument(6), mbox_type, elem_bt, num_elem); > 2196: if (mask == nullptr) { > 2197: log_if_needed(" ** not supported: op=selectFrom vlen=%d etype=%s is_masked_op=1", Error should an unboxing failure here. ------------- PR Review: https://git.openjdk.org/jdk/pull/20634#pullrequestreview-2314643808 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766277056 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766277739 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766278169 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766297640 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766292679 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766303620 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1766304688 From rehn at openjdk.org Thu Sep 19 07:45:13 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 19 Sep 2024 07:45:13 GMT Subject: RFR: 8339771: RISC-V: Reduce icache flushes [v3] In-Reply-To: References: Message-ID: > Hey, please consider, > > All code which is offline (behind a barrier) do not need global icache flushes. > As we can instead in slow path locally (thread and hart) emit fence.i. > But if we were to be context switch do a hart which have not had fence.i emitted we can still fetch stale instructions. > To handle this case new now have kernel support: > https://docs.kernel.org/arch/riscv/cmodx.html > > It's not perfect as we will be emitting fence.i on any context switch for any thread with this patch, even if that thread do not execute on code heap (non attached native thread), and even if there was no changes to code heap. > But this is in many cases much faster as the icache flush global IPI is very intrusive. > Particular cases are running a concurrent gc with small head room. > In such scenario I measured 15% increased throughput on VF2. > A large CPU or less head room (faster GC cycles) will yield even more performance boost. > > Note that this requires 6.10 kernel. > > I'm running VF2 with 6.11-rc3 kernel and this passed tier1-3. (With setting on) > > Later we probably want this default on, but as it's hard to test I'll leave default off. Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into cmodx-fence - Comment, moved init after feature enabling - Fixed ws - Draft ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20913/files - new: https://git.openjdk.org/jdk/pull/20913/files/8411301b..5489a0d5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20913&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20913&range=01-02 Stats: 158400 lines in 898 files changed: 147978 ins; 6116 del; 4306 mod Patch: https://git.openjdk.org/jdk/pull/20913.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20913/head:pull/20913 PR: https://git.openjdk.org/jdk/pull/20913 From fyang at openjdk.org Thu Sep 19 08:17:47 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 19 Sep 2024 08:17:47 GMT Subject: RFR: 8339771: RISC-V: Reduce icache flushes [v2] In-Reply-To: References: <355DP8wyWpGSrOmvF1hcLGg2Q6qDaCqmgoQ3AzSR1ww=.0882c0d8-7d74-4c48-9210-a815950e10d1@github.com> Message-ID: On Thu, 19 Sep 2024 07:23:13 GMT, Robbin Ehn wrote: >> Yes. The comment is a bit misleading. >> A fence wr, wr is just for locally, on a hart, order instructions. >> A fence wr wr do not flush a store from the store buffer, it just means that the hart must act such that the store appears to have happened before the syscall. Meaning the store can not happen after the syscall. >> >> As we are not using the syscall: >> Instead the other threads/harts must emit the fence.i them self, either by leaving a safepoint, hitting the patch_epoch_barrier and if they are moved to another hart. >> >> Which means the store must happen before we leave the safepoint and disarming the nmethod barrier. >> These already have store fences as we have a bunch of other stores which must be ordered. >> For the nmethod barrier disarming is a full fence as the new epoch must happen before the disarm: >> >> >> 0: (nmethod barriers armed, implies storestore) >> 1: stores to instruction stream >> 2: store new patching epoch >> 3: storestore // the store to instruction and patching epoch must happen before disarm >> 4: disarm >> >> >> If the stores to instruction stream and storing the new patching epoch happens in another order that is fine, as the critical thing is the disarm. > > Code would in BarrierSetNMethod::set_guard_value(...) I see. Seems deserve a code comment. Also, does that mean the `OrderAccess::fence()` you added in `ZBarrierSetAssembler::patch_barrier_relocation()` unnecessary? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20913#discussion_r1766374849 From shade at openjdk.org Thu Sep 19 08:25:19 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 19 Sep 2024 08:25:19 GMT Subject: RFR: 8340392: Handle OopStorage in location decoder [v5] In-Reply-To: References: Message-ID: > Another debugging QoL improvement. Currently, when there is a pointer into `OopStorage` that we need to decode for the error log, we just print: > > 0x00007ad45c169e10 into live malloced block starting at 0x00007ad45c169dd0, size 632, tag mtInternal > > > This is reported by NMT after [JDK-8304815](https://bugs.openjdk.org/browse/JDK-8304815). It is likely worse without NMT. We can actually decode which block in which `OopStorage` the address likely belongs to. This becomes handy when debugging GC crashes that involve `OopStorage`-handled roots. > > This patch is able to print the following instead: > > > 0x0000000102c05bd0 is a pointer 2/64 into block 0 in oop storage "VM Global" Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: More review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21072/files - new: https://git.openjdk.org/jdk/pull/21072/files/17622ba9..eee8a738 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21072&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21072&range=03-04 Stats: 9 lines in 3 files changed: 0 ins; 1 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/21072.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21072/head:pull/21072 PR: https://git.openjdk.org/jdk/pull/21072 From shade at openjdk.org Thu Sep 19 08:25:19 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 19 Sep 2024 08:25:19 GMT Subject: RFR: 8340392: Handle OopStorage in location decoder [v4] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 06:03:14 GMT, David Holmes wrote: >> Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: >> >> - Try to handle unaligned pointers well >> - Indenting and formats > > src/hotspot/share/gc/shared/oopStorage.cpp line 1140: > >> 1138: bool OopStorage::print_containing(const oop* addr, outputStream* st) { >> 1139: if (addr != nullptr) { >> 1140: Block *block = find_block_or_null(addr); > > Suggestion: > > Block* block = find_block_or_null(addr); Thanks, fixed in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21072#discussion_r1766382633 From shade at openjdk.org Thu Sep 19 08:25:20 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 19 Sep 2024 08:25:20 GMT Subject: RFR: 8340392: Handle OopStorage in location decoder [v4] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 06:31:49 GMT, Kim Barrett wrote: >> Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: >> >> - Try to handle unaligned pointers well >> - Indenting and formats > > src/hotspot/share/gc/shared/oopStorageSet.cpp line 86: > >> 84: >> 85: bool OopStorageSet::print_containing(const void* addr, outputStream* st) { >> 86: const void* aligned_addr = align_down(addr, alignof(oop*)); > > Should be `alignof(oop)`. Right, dang. I missed that we carry `oop`, not `oop*` array in `Block`. Fixed. > src/hotspot/share/gc/shared/oopStorageSet.cpp line 87: > >> 85: bool OopStorageSet::print_containing(const void* addr, outputStream* st) { >> 86: const void* aligned_addr = align_down(addr, alignof(oop*)); >> 87: if (aligned_addr != nullptr) { > > I think the nullptr check should come first. Pointer arithmetic can't produce a null pointer (because > of UB), so I think a sufficiently smart might be able to conclude that this test is always true in the > absence of UB in the alignment code. OK, some day I will internalize all these landmines :) > src/hotspot/share/gc/shared/oopStorageSet.cpp line 88: > >> 86: const void* aligned_addr = align_down(addr, alignof(oop*)); >> 87: if (aligned_addr != nullptr) { >> 88: for (uint i = 0; i < OopStorageSet::all_count; i++) { > > Don't need the `OopStorageSet::` qualifier here, since we're in that class already. > But better would be to iterate over the storages using the provided iteration mechanism: > > for (OopStorage* storage : Range()) { > ... > } Fixed. > test/hotspot/gtest/gc/shared/test_oopStorageSet.cpp line 139: > >> 137: void doit() { >> 138: PrintContainingClosure cl; >> 139: for (auto id: EnumRange()) { > > Instead use > > for (OopStorage* storage : OopStorageSet::Range()) { > ... > } Fixed. > test/hotspot/gtest/gc/shared/test_oopStorageSet.cpp line 166: > >> 164: { >> 165: stringStream ss; >> 166: bool printed = OopStorageSet::print_containing((char*)0x8, &ss); > > Instead of hard-coded 0x8, instead use alignof(oop). Right. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21072#discussion_r1766382422 PR Review Comment: https://git.openjdk.org/jdk/pull/21072#discussion_r1766381514 PR Review Comment: https://git.openjdk.org/jdk/pull/21072#discussion_r1766380800 PR Review Comment: https://git.openjdk.org/jdk/pull/21072#discussion_r1766380908 PR Review Comment: https://git.openjdk.org/jdk/pull/21072#discussion_r1766381089 From shade at openjdk.org Thu Sep 19 08:29:20 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 19 Sep 2024 08:29:20 GMT Subject: RFR: 8340392: Handle OopStorage in location decoder [v6] In-Reply-To: References: Message-ID: > Another debugging QoL improvement. Currently, when there is a pointer into `OopStorage` that we need to decode for the error log, we just print: > > 0x00007ad45c169e10 into live malloced block starting at 0x00007ad45c169dd0, size 632, tag mtInternal > > > This is reported by NMT after [JDK-8304815](https://bugs.openjdk.org/browse/JDK-8304815). It is likely worse without NMT. We can actually decode which block in which `OopStorage` the address likely belongs to. This becomes handy when debugging GC crashes that involve `OopStorage`-handled roots. > > This patch is able to print the following instead: > > > 0x0000000102c05bd0 is a pointer 2/64 into block 0 in oop storage "VM Global" Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Also assert "unaligned" is not printed for aligned pointers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21072/files - new: https://git.openjdk.org/jdk/pull/21072/files/eee8a738..22964f78 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21072&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21072&range=04-05 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21072.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21072/head:pull/21072 PR: https://git.openjdk.org/jdk/pull/21072 From luhenry at openjdk.org Thu Sep 19 08:33:40 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Thu, 19 Sep 2024 08:33:40 GMT Subject: RFR: 8334999: RISC-V: implement AES single block encryption/decryption intrinsics [v3] In-Reply-To: References: Message-ID: On Thu, 18 Jul 2024 08:26:02 GMT, Fei Yang wrote: >> Changes requested by fyang (Reviewer). > >> As for comparison with the openssl version: first of all, thanks for the sources, @RealFYang! The main difference that I see is that they introduced three different different versions of encryption depending on the key sizes, which allows them to skip a couple of instructions, like when I did `vaesem_vv(res, vzero)` followed by `vxor_vv(res, res, vtemp1)`. So I thought it'll be more efficient to replace the current version by something openssl-lookalike. The only problem I see is increasing code size a bit. Please let me know if we are not interested in this change for some reason > > Does `vaesz_vs` help in anyway? And what about the `generate_aescrypt_decryptBlock`? [1] > > [1] https://github.com/openssl/openssl/blob/master/crypto/aes/asm/aes-riscv64-zvkned.pl#L451 @RealFYang following up on your questions. I would love to see this one go through as it promises some pretty significant gains on compatible hardware! Thanks again ------------- PR Comment: https://git.openjdk.org/jdk/pull/19960#issuecomment-2360351719 From fyang at openjdk.org Thu Sep 19 08:36:41 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 19 Sep 2024 08:36:41 GMT Subject: RFR: 8334999: RISC-V: implement AES single block encryption/decryption intrinsics [v3] In-Reply-To: References: Message-ID: On Thu, 18 Jul 2024 08:26:02 GMT, Fei Yang wrote: >> Changes requested by fyang (Reviewer). > >> As for comparison with the openssl version: first of all, thanks for the sources, @RealFYang! The main difference that I see is that they introduced three different different versions of encryption depending on the key sizes, which allows them to skip a couple of instructions, like when I did `vaesem_vv(res, vzero)` followed by `vxor_vv(res, res, vtemp1)`. So I thought it'll be more efficient to replace the current version by something openssl-lookalike. The only problem I see is increasing code size a bit. Please let me know if we are not interested in this change for some reason > > Does `vaesz_vs` help in anyway? And what about the `generate_aescrypt_decryptBlock`? [1] > > [1] https://github.com/openssl/openssl/blob/master/crypto/aes/asm/aes-riscv64-zvkned.pl#L451 > @RealFYang following up on your questions. I would love to see this one go through as it promises some pretty significant gains on compatible hardware! Thanks again Yeah, will take another look. Have you tried this on real hardware? Interesting to see the numbers. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19960#issuecomment-2360364773 From kbarrett at openjdk.org Thu Sep 19 08:38:43 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 19 Sep 2024 08:38:43 GMT Subject: RFR: 8340392: Handle OopStorage in location decoder [v6] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 08:29:20 GMT, Aleksey Shipilev wrote: >> Another debugging QoL improvement. Currently, when there is a pointer into `OopStorage` that we need to decode for the error log, we just print: >> >> 0x00007ad45c169e10 into live malloced block starting at 0x00007ad45c169dd0, size 632, tag mtInternal >> >> >> This is reported by NMT after [JDK-8304815](https://bugs.openjdk.org/browse/JDK-8304815). It is likely worse without NMT. We can actually decode which block in which `OopStorage` the address likely belongs to. This becomes handy when debugging GC crashes that involve `OopStorage`-handled roots. >> >> This patch is able to print the following instead: >> >> >> 0x0000000102c05bd0 is a pointer 2/64 into block 0 in oop storage "VM Global" > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Also assert "unaligned" is not printed for aligned pointers Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21072#pullrequestreview-2314825265 From mli at openjdk.org Thu Sep 19 08:38:55 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 19 Sep 2024 08:38:55 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF Message-ID: Hi, Can you help to review this patch? Thanks! This patch is based on https://github.com/openjdk/jdk/pull/20781 which added the sleef source (in particular the generated sleef inline headers). We use sleef api to vectorize the math operations in vector api. On machine with vector intrinsic support on riscv (e.g. gcc 14+) it will generate libsleef.so with the bridge functions to sleef api, otherwise without the bridge functions. ### Test test/jdk/jdk/incubator/vector ### Performance data on bananapi Benchmark - bananapi | (size) | Mode | Cnt | Score +intrinsic | Error +intrinsic | Score -intrinsic | Error -intrinsic | Units | Improvement -- | -- | -- | -- | -- | -- | -- | -- | -- | -- Double128Vector.ACOS | 1024 | avgt | 10 | 112444.388 | 655.761 | 208554.742 | 1508.709 | ns/op | 1.855 Double128Vector.ASIN | 1024 | avgt | 10 | 104121.259 | 243.167 | 208314.499 | 2833.61 | ns/op | 2.001 Double128Vector.ATAN | 1024 | avgt | 10 | 136941.263 | 243.486 | 284024.53 | 2204.224 | ns/op | 2.074 Double128Vector.ATAN2 | 1024 | avgt | 10 | 163228.681 | 435.455 | 427589.587 | 3045.192 | ns/op | 2.62 Double128Vector.CBRT | 1024 | avgt | 10 | 146395.753 | 239.355 | 317136.654 | 1330.869 | ns/op | 2.166 Double128Vector.COS | 1024 | avgt | 10 | 154865.298 | 235.697 | 305721.518 | 1319.313 | ns/op | 1.974 Double128Vector.COSH | 1024 | avgt | 10 | 189212.943 | 262.399 | 220756.27 | 61324.863 | ns/op | 1.167 Double128Vector.EXP | 1024 | avgt | 10 | 113941.594 | 219.647 | 252853.07 | 891.272 | ns/op | 2.219 Double128Vector.EXPM1 | 1024 | avgt | 10 | 184552.939 | 513.715 | 254087.184 | 2144.997 | ns/op | 1.377 Double128Vector.HYPOT | 1024 | avgt | 10 | 111580.194 | 423.282 | 374537.338 | 2091.811 | ns/op | 3.357 Double128Vector.LOG | 1024 | avgt | 10 | 110680.548 | 192.731 | 265391.129 | 2653.519 | ns/op | 2.398 Double128Vector.LOG10 | 1024 | avgt | 10 | 116708.105 | 167.095 | 285764.405 | 2489.08 | ns/op | 2.449 Double128Vector.LOG1P | 1024 | avgt | 10 | 115633.302 | 567.7 | 317235.967 | 1062.848 | ns/op | 2.743 Double128Vector.POW | 1024 | avgt | 10 | 321655.14 | 36.55 | 560765.066 | 2669.33 | ns/op | 1.743 Double128Vector.SIN | 1024 | avgt | 10 | 166240.988 | 512.253 | 287741.373 | 2089.286 | ns/op | 1.731 Double128Vector.SINH | 1024 | avgt | 10 | 196233.614 | 225.88 | 221493.573 | 60941.438 | ns/op | 1.129 Double128Vector.TAN | 1024 | avgt | 10 | 203347.384 | 267.385 | 372912.183 | 2093.675 | ns/op | 1.834 Double128Vector.TANH | 1024 | avgt | 10 | 195587.19 | 5260.844 | 190723.4 | 873.135 | ns/op | 0.975 Double256Vector.ACOS | 1024 | avgt | 10 | 55282.885 | 8.888 | 138468.959 | 1342.937 | ns/op | 2.505 Double256Vector.ASIN | 1024 | avgt | 10 | 51424.997 | 22.614 | 141245.24 | 3213.405 | ns/op | 2.747 Double256Vector.ATAN | 1024 | avgt | 10 | 70385.397 | 14.196 | 210226.648 | 897.412 | ns/op | 2.987 Double256Vector.ATAN2 | 1024 | avgt | 10 | 83098.264 | 120.424 | 373363.523 | 3093.761 | ns/op | 4.493 Double256Vector.CBRT | 1024 | avgt | 10 | 72695.917 | 28.785 | 250843.027 | 869.34 | ns/op | 3.451 Double256Vector.COS | 1024 | avgt | 10 | 77373.4 | 10.275 | 249779.557 | 1143.93 | ns/op | 3.228 Double256Vector.COSH | 1024 | avgt | 10 | 95626.561 | 169.093 | 135295.836 | 26164.804 | ns/op | 1.415 Double256Vector.EXP | 1024 | avgt | 10 | 57013.105 | 25.681 | 169211.888 | 1723.985 | ns/op | 2.968 Double256Vector.EXPM1 | 1024 | avgt | 10 | 89929.364 | 172.868 | 189713.959 | 619.662 | ns/op | 2.11 Double256Vector.HYPOT | 1024 | avgt | 10 | 58179.576 | 72.265 | 253002.315 | 1413.97 | ns/op | 4.349 Double256Vector.LOG | 1024 | avgt | 10 | 55274.107 | 6.781 | 199552.499 | 1070.838 | ns/op | 3.61 Double256Vector.LOG10 | 1024 | avgt | 10 | 58321.206 | 3.046 | 219497.134 | 1784.676 | ns/op | 3.764 Double256Vector.LOG1P | 1024 | avgt | 10 | 59457.661 | 4.266 | 248897.335 | 1101.141 | ns/op | 4.186 Double256Vector.POW | 1024 | avgt | 10 | 161727.792 | 283.278 | 389901.211 | 5128.643 | ns/op | 2.411 Double256Vector.SIN | 1024 | avgt | 10 | 82028.764 | 163.402 | 229585.318 | 2284.46 | ns/op | 2.799 Double256Vector.SINH | 1024 | avgt | 10 | 95533.939 | 144.219 | 138338.01 | 32257.269 | ns/op | 1.448 Double256Vector.TAN | 1024 | avgt | 10 | 100587.595 | 175.454 | 255335.96 | 2392.867 | ns/op | 2.538 Double256Vector.TANH | 1024 | avgt | 10 | 122826.824 | 8132.31 | 116587.352 | 20456.614 | ns/op | 0.949 Double512Vector.ACOS | 1024 | avgt | 10 | 100644.726 | 6559.453 | 90596.17 | 6201.774 | ns/op | 0.9 Double512Vector.ASIN | 1024 | avgt | 10 | 97781.73 | 6454.561 | 81923.501 | 6875.259 | ns/op | 0.838 Double512Vector.ATAN | 1024 | avgt | 10 | 230365.297 | 5657.262 | 231136.108 | 8201.677 | ns/op | 1.003 Double512Vector.ATAN2 | 1024 | avgt | 10 | 330644.739 | 965.308 | 334507.514 | 1871.147 | ns/op | 1.012 Double512Vector.CBRT | 1024 | avgt | 10 | 269499.416 | 7578.3 | 275058.533 | 2931.999 | ns/op | 1.021 Double512Vector.COS | 1024 | avgt | 10 | 250239.661 | 8717.098 | 251643.64 | 5974.845 | ns/op | 1.006 Double512Vector.COSH | 1024 | avgt | 10 | 130896.571 | 3149.555 | 116148.85 | 19419.192 | ns/op | 0.887 Double512Vector.EXP | 1024 | avgt | 10 | 167358.383 | 4017.309 | 163777.077 | 9441.332 | ns/op | 0.979 Double512Vector.EXPM1 | 1024 | avgt | 10 | 180627.099 | 6239.875 | 181451.788 | 2833.293 | ns/op | 1.005 Double512Vector.HYPOT | 1024 | avgt | 10 | 259838.022 | 2413.253 | 253563.622 | 5666.461 | ns/op | 0.976 Double512Vector.LOG | 1024 | avgt | 10 | 214492.394 | 8551.06 | 223659.532 | 4634.475 | ns/op | 1.043 Double512Vector.LOG10 | 1024 | avgt | 10 | 237482.746 | 5504.954 | 241056.068 | 3773.962 | ns/op | 1.015 Double512Vector.LOG1P | 1024 | avgt | 10 | 259562.363 | 6983.428 | 255542.226 | 6799.872 | ns/op | 0.985 Double512Vector.POW | 1024 | avgt | 10 | 409067.718 | 1031.843 | 415598.626 | 1333.94 | ns/op | 1.016 Double512Vector.SIN | 1024 | avgt | 10 | 233720.922 | 9117.177 | 237166.138 | 5740.104 | ns/op | 1.015 Double512Vector.SINH | 1024 | avgt | 10 | 106110.446 | 8082.622 | 120441.14 | 19807.6 | ns/op | 1.135 Double512Vector.TAN | 1024 | avgt | 10 | 286363.576 | 5171.85 | 289463.344 | 5786.435 | ns/op | 1.011 Double512Vector.TANH | 1024 | avgt | 10 | 55621.25 | 1751.435 | 54999.583 | 114.941 | ns/op | 0.989 Double64Vector.ACOS | 1024 | avgt | 10 | 440775.699 | 2196.779 | 448951.428 | 1820.252 | ns/op | 1.019 Double64Vector.ASIN | 1024 | avgt | 10 | 463051.606 | 2394.98 | 454351.539 | 2086.492 | ns/op | 0.981 Double64Vector.ATAN | 1024 | avgt | 10 | 544190.664 | 4013.885 | 546309.02 | 3440.376 | ns/op | 1.004 Double64Vector.ATAN2 | 1024 | avgt | 10 | 799967.835 | 3488.851 | 812483.613 | 2421.999 | ns/op | 1.016 Double64Vector.CBRT | 1024 | avgt | 10 | 618953.967 | 5293.167 | 622328.702 | 2301.048 | ns/op | 1.005 Double64Vector.COS | 1024 | avgt | 10 | 574667.991 | 2894.881 | 604963.23 | 12128.549 | ns/op | 1.053 Double64Vector.COSH | 1024 | avgt | 10 | 480884.659 | 3050.01 | 474405.728 | 2223.766 | ns/op | 0.987 Double64Vector.EXP | 1024 | avgt | 10 | 476743.952 | 1468.7 | 493014.212 | 2879.845 | ns/op | 1.034 Double64Vector.EXPM1 | 1024 | avgt | 10 | 522048.987 | 2879.475 | 505978.67 | 1825.956 | ns/op | 0.969 Double64Vector.HYPOT | 1024 | avgt | 10 | 713841.457 | 2816.621 | 716284.872 | 7024.984 | ns/op | 1.003 Double64Vector.LOG | 1024 | avgt | 10 | 523702.517 | 1849.651 | 525498.61 | 1122.938 | ns/op | 1.003 Double64Vector.LOG10 | 1024 | avgt | 10 | 539968.004 | 2445.033 | 541415.051 | 2966.057 | ns/op | 1.003 Double64Vector.LOG1P | 1024 | avgt | 10 | 556206.02 | 3156.961 | 554613.942 | 2628.038 | ns/op | 0.997 Double64Vector.POW | 1024 | avgt | 10 | 931275.694 | 5378.585 | 914787.042 | 11244.374 | ns/op | 0.982 Double64Vector.SIN | 1024 | avgt | 10 | 620118.172 | 3805.705 | 553147.004 | 2265.843 | ns/op | 0.892 Double64Vector.SINH | 1024 | avgt | 10 | 504218.91 | 2259.924 | 482680.497 | 5218.21 | ns/op | 0.957 Double64Vector.TAN | 1024 | avgt | 10 | 620591.643 | 5541.53 | 622098.336 | 4892.394 | ns/op | 1.002 Double64Vector.TANH | 1024 | avgt | 10 | 438766.135 | 4313.069 | 426783.749 | 5986.632 | ns/op | 0.973 DoubleMaxVector.ACOS | 1024 | avgt | 10 | 55281.88 | 5.819 | 152707.139 | 2337.434 | ns/op | 2.762 DoubleMaxVector.ASIN | 1024 | avgt | 10 | 51632.365 | 20.723 | 152958.169 | 2530.258 | ns/op | 2.962 DoubleMaxVector.ATAN | 1024 | avgt | 10 | 70393.309 | 7.502 | 225146.6 | 4836.393 | ns/op | 3.198 DoubleMaxVector.ATAN2 | 1024 | avgt | 10 | 83049.389 | 131.221 | 376129.104 | 2973.54 | ns/op | 4.529 DoubleMaxVector.CBRT | 1024 | avgt | 10 | 73401.993 | 20.547 | 252789.351 | 1322.396 | ns/op | 3.444 DoubleMaxVector.COS | 1024 | avgt | 10 | 77388.046 | 8.768 | 252428.563 | 4605.328 | ns/op | 3.262 DoubleMaxVector.COSH | 1024 | avgt | 10 | 95373.866 | 167.177 | 145355.624 | 35146.538 | ns/op | 1.524 DoubleMaxVector.EXP | 1024 | avgt | 10 | 57910.881 | 11.031 | 183133.879 | 3721.502 | ns/op | 3.162 DoubleMaxVector.EXPM1 | 1024 | avgt | 10 | 89968.248 | 180.822 | 199712.477 | 2009.862 | ns/op | 2.22 DoubleMaxVector.HYPOT | 1024 | avgt | 10 | 59064.115 | 186.157 | 253275.967 | 1479.124 | ns/op | 4.288 DoubleMaxVector.LOG | 1024 | avgt | 10 | 53685.913 | 4.08 | 202019.279 | 1174.832 | ns/op | 3.763 DoubleMaxVector.LOG10 | 1024 | avgt | 10 | 58333.057 | 4.644 | 223237.023 | 2682.561 | ns/op | 3.827 DoubleMaxVector.LOG1P | 1024 | avgt | 10 | 59455.511 | 4.493 | 248216.075 | 4200.623 | ns/op | 4.175 DoubleMaxVector.POW | 1024 | avgt | 10 | 161793.312 | 355.543 | 395000.995 | 4070.581 | ns/op | 2.441 DoubleMaxVector.SIN | 1024 | avgt | 10 | 82045.108 | 178.173 | 232964.02 | 3351.878 | ns/op | 2.839 DoubleMaxVector.SINH | 1024 | avgt | 10 | 95557.571 | 171.167 | 139434.904 | 33020.695 | ns/op | 1.459 DoubleMaxVector.TAN | 1024 | avgt | 10 | 99139.084 | 170.106 | 255665.125 | 1463.226 | ns/op | 2.579 DoubleMaxVector.TANH | 1024 | avgt | 10 | 122556.944 | 7304.643 | 112638.697 | 22789.428 | ns/op | 0.919 DoubleScalar.ACOS | 1024 | avgt | 10 | 35364.49 | 43.834 | 35391.461 | 11.475 | ns/op | 1.001 DoubleScalar.ASIN | 1024 | avgt | 10 | 36020.676 | 41.123 | 36040.44 | 22.284 | ns/op | 1.001 DoubleScalar.ATAN | 1024 | avgt | 10 | 100104.331 | 135.729 | 102039.921 | 286.803 | ns/op | 1.019 DoubleScalar.ATAN2 | 1024 | avgt | 10 | 163987.639 | 239.624 | 165832.456 | 1865.186 | ns/op | 1.011 DoubleScalar.CBRT | 1024 | avgt | 10 | 144175.051 | 169.152 | 144177.588 | 175.837 | ns/op | 1 DoubleScalar.COS | 1024 | avgt | 10 | 129137.254 | 186.072 | 129187.403 | 164.344 | ns/op | 1 DoubleScalar.COSH | 1024 | avgt | 10 | 65408.411 | 158.758 | 65469.654 | 302.387 | ns/op | 1.001 DoubleScalar.EXP | 1024 | avgt | 10 | 66358.519 | 15.942 | 66370.088 | 13.886 | ns/op | 1 DoubleScalar.EXPM1 | 1024 | avgt | 10 | 84449.659 | 20.205 | 84443.216 | 17.539 | ns/op | 1 DoubleScalar.HYPOT | 1024 | avgt | 10 | 98996.854 | 149.906 | 99114.226 | 247.392 | ns/op | 1.001 DoubleScalar.LOG | 1024 | avgt | 10 | 92296.061 | 84.554 | 92362.323 | 127.4 | ns/op | 1.001 DoubleScalar.LOG10 | 1024 | avgt | 10 | 108959.603 | 214.845 | 109177.708 | 151.172 | ns/op | 1.002 DoubleScalar.LOG1P | 1024 | avgt | 10 | 133745.827 | 189.726 | 133626.747 | 159.786 | ns/op | 0.999 DoubleScalar.POW | 1024 | avgt | 10 | 245735.03 | 392.669 | 246363.909 | 776.007 | ns/op | 1.003 DoubleScalar.SIN | 1024 | avgt | 10 | 112985.666 | 211.564 | 113015.922 | 93.048 | ns/op | 1 DoubleScalar.SINH | 1024 | avgt | 10 | 65009.526 | 547.157 | 65443.714 | 150.434 | ns/op | 1.007 DoubleScalar.TAN | 1024 | avgt | 10 | 163437.236 | 157.673 | 163196.802 | 196.316 | ns/op | 0.999 DoubleScalar.TANH | 1024 | avgt | 10 | 15174.949 | 7.999 | 15178.266 | 19.863 | ns/op | 1 Float128Vector.ACOS | 1024 | avgt | 10 | 43372.933 | 5.055 | 126575.586 | 976.159 | ns/op | 2.918 Float128Vector.ASIN | 1024 | avgt | 10 | 38632.619 | 1.743 | 127126.175 | 1368.112 | ns/op | 3.291 Float128Vector.ATAN | 1024 | avgt | 10 | 56269.042 | 3.274 | 188537.782 | 1465.567 | ns/op | 3.351 Float128Vector.ATAN2 | 1024 | avgt | 10 | 64863.602 | 9.184 | 289789.784 | 1933.189 | ns/op | 4.468 Float128Vector.CBRT | 1024 | avgt | 10 | 60648.572 | 30.499 | 219496.505 | 2628.005 | ns/op | 3.619 Float128Vector.COS | 1024 | avgt | 10 | 90296.6 | 173.89 | 193875.308 | 2878.795 | ns/op | 2.147 Float128Vector.COSH | 1024 | avgt | 10 | 72513.407 | 13.428 | 134362.41 | 28085.258 | ns/op | 1.853 Float128Vector.EXP | 1024 | avgt | 10 | 32520.847 | 6.845 | 158283.434 | 1092.762 | ns/op | 4.867 Float128Vector.EXPM1 | 1024 | avgt | 10 | 65130.005 | 3.498 | 186841.627 | 1140.313 | ns/op | 2.869 Float128Vector.HYPOT | 1024 | avgt | 10 | 52240.243 | 4.423 | 228928.31 | 1385.126 | ns/op | 4.382 Float128Vector.LOG | 1024 | avgt | 10 | 44080.307 | 2.549 | 186830.712 | 797.576 | ns/op | 4.238 Float128Vector.LOG10 | 1024 | avgt | 10 | 45302.969 | 7.095 | 189605.302 | 2126.429 | ns/op | 4.185 Float128Vector.LOG1P | 1024 | avgt | 10 | 47599.582 | 3.822 | 194058.394 | 2620.35 | ns/op | 4.077 Float128Vector.POW | 1024 | avgt | 10 | 118329.731 | 157.834 | 375914.2 | 2800.253 | ns/op | 3.177 Float128Vector.SIN | 1024 | avgt | 10 | 96545.285 | 409.639 | 190830.529 | 1511.452 | ns/op | 1.977 Float128Vector.SINH | 1024 | avgt | 10 | 67999.296 | 8.793 | 134817.031 | 28316.519 | ns/op | 1.983 Float128Vector.TAN | 1024 | avgt | 10 | 105051.902 | 193.021 | 236690.576 | 6686.38 | ns/op | 2.253 Float128Vector.TANH | 1024 | avgt | 10 | 107938.486 | 1593.331 | 107867.708 | 1037.358 | ns/op | 0.999 Float256Vector.ACOS | 1024 | avgt | 10 | 21993.336 | 0.945 | 90171.186 | 765.896 | ns/op | 4.1 Float256Vector.ASIN | 1024 | avgt | 10 | 19176.439 | 4.288 | 91491.757 | 946.887 | ns/op | 4.771 Float256Vector.ATAN | 1024 | avgt | 10 | 28573.58 | 1.788 | 153126.232 | 1354.054 | ns/op | 5.359 Float256Vector.ATAN2 | 1024 | avgt | 10 | 32809.207 | 57.366 | 241229.586 | 2703.039 | ns/op | 7.352 Float256Vector.CBRT | 1024 | avgt | 10 | 30349.65 | 5.52 | 195162.623 | 2631.134 | ns/op | 6.43 Float256Vector.COS | 1024 | avgt | 10 | 45629.146 | 6.614 | 185366.17 | 1616.6 | ns/op | 4.062 Float256Vector.COSH | 1024 | avgt | 10 | 36923.595 | 2.135 | 108690.335 | 13921.018 | ns/op | 2.944 Float256Vector.EXP | 1024 | avgt | 10 | 16170.263 | 2.046 | 125594.096 | 1033.554 | ns/op | 7.767 Float256Vector.EXPM1 | 1024 | avgt | 10 | 32608.2 | 5.484 | 129709.448 | 993.492 | ns/op | 3.978 Float256Vector.HYPOT | 1024 | avgt | 10 | 27921.801 | 1.528 | 190117.16 | 1454.543 | ns/op | 6.809 Float256Vector.LOG | 1024 | avgt | 10 | 22076.681 | 2.329 | 134540.724 | 1704.931 | ns/op | 6.094 Float256Vector.LOG10 | 1024 | avgt | 10 | 23064.284 | 2.37 | 159962.122 | 2503.179 | ns/op | 6.935 Float256Vector.LOG1P | 1024 | avgt | 10 | 23835.965 | 2.04 | 194624.332 | 4779.995 | ns/op | 8.165 Float256Vector.POW | 1024 | avgt | 10 | 59593.468 | 74.705 | 317616.881 | 1183.352 | ns/op | 5.33 Float256Vector.SIN | 1024 | avgt | 10 | 48733.012 | 19.4 | 169500.443 | 2768.932 | ns/op | 3.478 Float256Vector.SINH | 1024 | avgt | 10 | 33625.182 | 1.423 | 124512.293 | 1771.11 | ns/op | 3.703 Float256Vector.TAN | 1024 | avgt | 10 | 54313.62 | 14.978 | 215172.493 | 1753.706 | ns/op | 3.962 Float256Vector.TANH | 1024 | avgt | 10 | 61708.469 | 1605.348 | 63690.609 | 796.163 | ns/op | 1.032 Float512Vector.ACOS | 1024 | avgt | 10 | 93820.934 | 3011.58 | 90663.418 | 2027.372 | ns/op | 0.966 Float512Vector.ASIN | 1024 | avgt | 10 | 95866.984 | 3057.351 | 97612.203 | 3787.454 | ns/op | 1.018 Float512Vector.ATAN | 1024 | avgt | 10 | 167859.888 | 4240.703 | 167247.975 | 4300.418 | ns/op | 0.996 Float512Vector.ATAN2 | 1024 | avgt | 10 | 255441.315 | 685.737 | 254700.612 | 4896.306 | ns/op | 0.997 Float512Vector.CBRT | 1024 | avgt | 10 | 214410.72 | 931.01 | 214285.796 | 1383.406 | ns/op | 0.999 Float512Vector.COS | 1024 | avgt | 10 | 196689.274 | 1880.854 | 197309.067 | 1784.865 | ns/op | 1.003 Float512Vector.COSH | 1024 | avgt | 10 | 104335.896 | 561.089 | 88993.056 | 1788.606 | ns/op | 0.853 Float512Vector.EXP | 1024 | avgt | 10 | 135852.89 | 2981.107 | 135877.338 | 2846.752 | ns/op | 1 Float512Vector.EXPM1 | 1024 | avgt | 10 | 152498.16 | 2995.351 | 153719.922 | 2343.672 | ns/op | 1.008 Float512Vector.HYPOT | 1024 | avgt | 10 | 188872.565 | 802.938 | 188659.105 | 505.853 | ns/op | 0.999 Float512Vector.LOG | 1024 | avgt | 10 | 159618.453 | 2347.331 | 159789.006 | 3077.534 | ns/op | 1.001 Float512Vector.LOG10 | 1024 | avgt | 10 | 177141.543 | 2208.144 | 173862.555 | 7986.955 | ns/op | 0.981 Float512Vector.LOG1P | 1024 | avgt | 10 | 201767.835 | 2097.682 | 194773.996 | 3261.783 | ns/op | 0.965 Float512Vector.POW | 1024 | avgt | 10 | 340428.997 | 898.57 | 339608.679 | 2319.682 | ns/op | 0.998 Float512Vector.SIN | 1024 | avgt | 10 | 182644.272 | 2827.997 | 183512.561 | 3230.558 | ns/op | 1.005 Float512Vector.SINH | 1024 | avgt | 10 | 88864.538 | 856.677 | 96766.798 | 4680.862 | ns/op | 1.089 Float512Vector.TAN | 1024 | avgt | 10 | 230591.607 | 2406.274 | 235481.617 | 2062.326 | ns/op | 1.021 Float512Vector.TANH | 1024 | avgt | 10 | 41323.35 | 1108.87 | 41397.969 | 105.838 | ns/op | 1.002 Float64Vector.ACOS | 1024 | avgt | 10 | 87808.157 | 135.816 | 197215.577 | 1427.244 | ns/op | 2.246 Float64Vector.ASIN | 1024 | avgt | 10 | 79126.845 | 9.019 | 197556.29 | 786.402 | ns/op | 2.497 Float64Vector.ATAN | 1024 | avgt | 10 | 112334.153 | 161.759 | 262670.28 | 1106.929 | ns/op | 2.338 Float64Vector.ATAN2 | 1024 | avgt | 10 | 132755.668 | 148.942 | 422308.023 | 1739.683 | ns/op | 3.181 Float64Vector.CBRT | 1024 | avgt | 10 | 121393.777 | 462.316 | 311727.38 | 2783.373 | ns/op | 2.568 Float64Vector.COS | 1024 | avgt | 10 | 180332.5 | 204.792 | 311139.369 | 2435.05 | ns/op | 1.725 Float64Vector.COSH | 1024 | avgt | 10 | 145071.014 | 281.11 | 219063.334 | 1330.185 | ns/op | 1.51 Float64Vector.EXP | 1024 | avgt | 10 | 64474.087 | 18.916 | 222943.443 | 2074.818 | ns/op | 3.458 Float64Vector.EXPM1 | 1024 | avgt | 10 | 128611.56 | 230.073 | 242737.624 | 1997.438 | ns/op | 1.887 Float64Vector.HYPOT | 1024 | avgt | 10 | 104683.692 | 161.234 | 324578.297 | 2702.814 | ns/op | 3.101 Float64Vector.LOG | 1024 | avgt | 10 | 88124.496 | 142.168 | 252264.027 | 536.035 | ns/op | 2.863 Float64Vector.LOG10 | 1024 | avgt | 10 | 95184.783 | 184.6 | 270674.746 | 1399.203 | ns/op | 2.844 Float64Vector.LOG1P | 1024 | avgt | 10 | 91969.404 | 1086.102 | 310777.655 | 2490.714 | ns/op | 3.379 Float64Vector.POW | 1024 | avgt | 10 | 237248.014 | 1684.478 | 472070.731 | 2933.214 | ns/op | 1.99 Float64Vector.SIN | 1024 | avgt | 10 | 194778.558 | 470.935 | 281942.775 | 1400.795 | ns/op | 1.448 Float64Vector.SINH | 1024 | avgt | 10 | 137944.677 | 202.705 | 222200.113 | 1312.19 | ns/op | 1.611 Float64Vector.TAN | 1024 | avgt | 10 | 212713.608 | 218.316 | 313379.409 | 2596.888 | ns/op | 1.473 Float64Vector.TANH | 1024 | avgt | 10 | 173926.377 | 1685.093 | 174629.554 | 3082.909 | ns/op | 1.004 FloatMaxVector.ACOS | 1024 | avgt | 10 | 21889.905 | 39.906 | 90252.786 | 418.764 | ns/op | 4.123 FloatMaxVector.ASIN | 1024 | avgt | 10 | 18793.467 | 4.566 | 90741.587 | 741.291 | ns/op | 4.828 FloatMaxVector.ATAN | 1024 | avgt | 10 | 28496.993 | 6.548 | 153581.577 | 1744.674 | ns/op | 5.389 FloatMaxVector.ATAN2 | 1024 | avgt | 10 | 33658.989 | 3.092 | 258396.05 | 5256.453 | ns/op | 7.677 FloatMaxVector.CBRT | 1024 | avgt | 10 | 30350.281 | 1.956 | 197139.203 | 2129.485 | ns/op | 6.495 FloatMaxVector.COS | 1024 | avgt | 10 | 45628.863 | 3.576 | 187231.562 | 1821.847 | ns/op | 4.103 FloatMaxVector.COSH | 1024 | avgt | 10 | 36925.011 | 5.202 | 108522.288 | 13952.184 | ns/op | 2.939 FloatMaxVector.EXP | 1024 | avgt | 10 | 16173.603 | 1.355 | 126495.517 | 621.715 | ns/op | 7.821 FloatMaxVector.EXPM1 | 1024 | avgt | 10 | 32651.571 | 16.621 | 129689.32 | 2807.684 | ns/op | 3.972 FloatMaxVector.HYPOT | 1024 | avgt | 10 | 28246.148 | 2.652 | 190196.415 | 1919.742 | ns/op | 6.734 FloatMaxVector.LOG | 1024 | avgt | 10 | 22078.034 | 5.796 | 137138.837 | 3034.756 | ns/op | 6.212 FloatMaxVector.LOG10 | 1024 | avgt | 10 | 23840.747 | 6.694 | 164126.237 | 2423.198 | ns/op | 6.884 FloatMaxVector.LOG1P | 1024 | avgt | 10 | 22993.078 | 5.389 | 190345.701 | 1427.37 | ns/op | 8.278 FloatMaxVector.POW | 1024 | avgt | 10 | 58727.04 | 133.816 | 316392.473 | 1713.258 | ns/op | 5.388 FloatMaxVector.SIN | 1024 | avgt | 10 | 48729.964 | 4.439 | 168993.476 | 2234.287 | ns/op | 3.468 FloatMaxVector.SINH | 1024 | avgt | 10 | 33635.008 | 2.951 | 117198.021 | 2609.295 | ns/op | 3.484 FloatMaxVector.TAN | 1024 | avgt | 10 | 54314.082 | 14.057 | 213847.082 | 1390.695 | ns/op | 3.937 FloatMaxVector.TANH | 1024 | avgt | 10 | 65545.343 | 1419.074 | 65362.648 | 1729.148 | ns/op | 0.997 FloatScalar.ACOS | 1024 | avgt | 10 | 36607.495 | 4.656 | 36661.257 | 20.306 | ns/op | 1.001 FloatScalar.ASIN | 1024 | avgt | 10 | 37281.012 | 28.006 | 37272.249 | 46.647 | ns/op | 1 FloatScalar.ATAN | 1024 | avgt | 10 | 101949.284 | 327.101 | 103939.277 | 166.274 | ns/op | 1.02 FloatScalar.ATAN2 | 1024 | avgt | 10 | 165461.209 | 1727.043 | 163270.286 | 593.568 | ns/op | 0.987 FloatScalar.CBRT | 1024 | avgt | 10 | 148653.826 | 166.069 | 148638.661 | 171.44 | ns/op | 1 FloatScalar.COS | 1024 | avgt | 10 | 129975.842 | 204.494 | 129889.093 | 123.915 | ns/op | 0.999 FloatScalar.COSH | 1024 | avgt | 10 | 67462.124 | 25.755 | 67761.353 | 12.415 | ns/op | 1.004 FloatScalar.EXP | 1024 | avgt | 10 | 67723.964 | 8.617 | 67720.157 | 15.651 | ns/op | 1 FloatScalar.EXPM1 | 1024 | avgt | 10 | 85058.759 | 97.872 | 84612.5 | 16.527 | ns/op | 0.995 FloatScalar.HYPOT | 1024 | avgt | 10 | 99875.247 | 713.526 | 99915.975 | 607.926 | ns/op | 1 FloatScalar.LOG | 1024 | avgt | 10 | 94004.039 | 20.602 | 93571.942 | 124.254 | ns/op | 0.995 FloatScalar.LOG10 | 1024 | avgt | 10 | 110012.901 | 232.67 | 110132.542 | 476.101 | ns/op | 1.001 FloatScalar.LOG1P | 1024 | avgt | 10 | 134646.067 | 809.2 | 134554.613 | 651.85 | ns/op | 0.999 FloatScalar.POW | 1024 | avgt | 10 | 246303.685 | 269.215 | 246268.844 | 294.437 | ns/op | 1 FloatScalar.SIN | 1024 | avgt | 10 | 115767.708 | 300.899 | 116114.497 | 209.168 | ns/op | 1.003 FloatScalar.SINH | 1024 | avgt | 10 | 68118.234 | 190.657 | 68973.513 | 289.182 | ns/op | 1.013 FloatScalar.TAN | 1024 | avgt | 10 | 164639.016 | 323.546 | 164428.942 | 175.806 | ns/op | 0.999 FloatScalar.TANH | 1024 | avgt | 10 | 17730.106 | 10.645 | 17730.258 | 9.184 | ns/op | 1 ------------- Commit messages: - fix make warning - Initial commit Changes: https://git.openjdk.org/jdk/pull/21083/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21083&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320500 Stats: 247 lines in 14 files changed: 202 ins; 0 del; 45 mod Patch: https://git.openjdk.org/jdk/pull/21083.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21083/head:pull/21083 PR: https://git.openjdk.org/jdk/pull/21083 From mli at openjdk.org Thu Sep 19 08:38:55 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 19 Sep 2024 08:38:55 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF In-Reply-To: References: Message-ID: <8CEveiLA-SQY0Lsf4JT0Fc-X8i81jcq69FTK2EWmq3Y=.407ee338-5201-443b-a7f4-699c6d02b1df@github.com> On Thu, 19 Sep 2024 08:32:38 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > Thanks! > > This patch is based on https://github.com/openjdk/jdk/pull/20781 which added the sleef source (in particular the generated sleef inline headers). We use sleef api to vectorize the math operations in vector api. > > On machine with vector intrinsic support on riscv (e.g. gcc 14+) it will generate libsleef.so with the bridge functions to sleef api, otherwise without the bridge functions. > > ### Test > test/jdk/jdk/incubator/vector > > ### Performance > data on bananapi > > Benchmark - bananapi | (size) | Mode | Cnt | Score +intrinsic | Error +intrinsic | Score -intrinsic | Error -intrinsic | Units | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Double128Vector.ACOS | 1024 | avgt | 10 | 112444.388 | 655.761 | 208554.742 | 1508.709 | ns/op | 1.855 > Double128Vector.ASIN | 1024 | avgt | 10 | 104121.259 | 243.167 | 208314.499 | 2833.61 | ns/op | 2.001 > Double128Vector.ATAN | 1024 | avgt | 10 | 136941.263 | 243.486 | 284024.53 | 2204.224 | ns/op | 2.074 > Double128Vector.ATAN2 | 1024 | avgt | 10 | 163228.681 | 435.455 | 427589.587 | 3045.192 | ns/op | 2.62 > Double128Vector.CBRT | 1024 | avgt | 10 | 146395.753 | 239.355 | 317136.654 | 1330.869 | ns/op | 2.166 > Double128Vector.COS | 1024 | avgt | 10 | 154865.298 | 235.697 | 305721.518 | 1319.313 | ns/op | 1.974 > Double128Vector.COSH | 1024 | avgt | 10 | 189212.943 | 262.399 | 220756.27 | 61324.863 | ns/op | 1.167 > Double128Vector.EXP | 1024 | avgt | 10 | 113941.594 | 219.647 | 252853.07 | 891.272 | ns/op | 2.219 > Double128Vector.EXPM1 | 1024 | avgt | 10 | 184552.939 | 513.715 | 254087.184 | 2144.997 | ns/op | 1.377 > Double128Vector.HYPOT | 1024 | avgt | 10 | 111580.194 | 423.282 | 374537.338 | 2091.811 | ns/op | 3.357 > Double128Vector.LOG | 1024 | avgt | 10 | 110680.548 | 192.731 | 265391.129 | 2653.519 | ns/op | 2.398 > Double128Vector.LOG10 | 1024 | avgt | 10 | 116708.105 | 167.095 | 285764.405 | 2489.08 | ns/op | 2.449 > Double128Vector.LOG1P | 1024 | avgt | 10 | 115633.302 | 567.7 | 317235.967 | 1062.848 | ns/op | 2.743 > Double128Vector.POW | 1024 | avgt | 10 | 321655.14 | 36.55 | 560765.066 | 2669.33 | ns/op | 1.743 > Double128Vector.... Hi @magicus , could you please have a look at the make part? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21083#issuecomment-2360366938 From duke at openjdk.org Thu Sep 19 08:39:43 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 19 Sep 2024 08:39:43 GMT Subject: RFR: 8337674: ZGC: Consistent style for naming private static constants In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 14:13:44 GMT, Stefan Karlsson wrote: >> There are various styles for naming private static constants in ZGC. Some have a leading underscore, some begin with a lowercase letter and some start with an uppercase letter. The convention we feel is most appropriate, which also aligns with the hotspot style guide, is to have mixed-case with the first letter of each word capitalized when naming private static constants. There are also some occurrences of writing `const static` instead of the more commonly used `static const`, which should be made consistent to have the static keyword appear first. >> >> The lines changed have been identified by running: >> `rg "static const .* [[:lower:]].* =" src/hotspot/share/gc/z src/hotspot/*/*/gc/z` >> `rg "static const .* _.* =" src/hotspot/share/gc/z src/hotspot/*/*/gc/z` >> `rg "const static" src/hotspot/share/gc/z src/hotspot/*/*/gc/z` >> >> The occurrences of `const static valid_max_address_offset_bits` have been converted to `static const` from `const static` but have not been renamed to mixed-case as the occurrences are not exposed outside their function(s). >> >> Tested with tiers 1-3. > > Looks good. Thank you for the reviews! @stefank @xmas92 @Hamlin-Li ------------- PR Comment: https://git.openjdk.org/jdk/pull/20968#issuecomment-2360369471 From duke at openjdk.org Thu Sep 19 08:39:44 2024 From: duke at openjdk.org (duke) Date: Thu, 19 Sep 2024 08:39:44 GMT Subject: RFR: 8337674: ZGC: Consistent style for naming private static constants In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 13:55:34 GMT, Joel Sikstr?m wrote: > There are various styles for naming private static constants in ZGC. Some have a leading underscore, some begin with a lowercase letter and some start with an uppercase letter. The convention we feel is most appropriate, which also aligns with the hotspot style guide, is to have mixed-case with the first letter of each word capitalized when naming private static constants. There are also some occurrences of writing `const static` instead of the more commonly used `static const`, which should be made consistent to have the static keyword appear first. > > The lines changed have been identified by running: > `rg "static const .* [[:lower:]].* =" src/hotspot/share/gc/z src/hotspot/*/*/gc/z` > `rg "static const .* _.* =" src/hotspot/share/gc/z src/hotspot/*/*/gc/z` > `rg "const static" src/hotspot/share/gc/z src/hotspot/*/*/gc/z` > > The occurrences of `const static valid_max_address_offset_bits` have been converted to `static const` from `const static` but have not been renamed to mixed-case as the occurrences are not exposed outside their function(s). > > Tested with tiers 1-3. @jsikstro Your change (at version 5601b63b4b45a819ed17d3721fb17469fbd4c58a) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20968#issuecomment-2360371674 From shade at openjdk.org Thu Sep 19 08:40:35 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 19 Sep 2024 08:40:35 GMT Subject: RFR: 8340353: Remove CompressedOops::ptrs_base In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 13:00:19 GMT, Kim Barrett wrote: > Please review this change that > > (1) Removes CompressedOops::ptrs_base(), changing all callers to instead call > CompressedOops::base(). > > (2) Renames CompressedOops::ptrs_base_addr() to CompressedOops::base_addr(), > updating all callers. > > Testing: > mach5 tier1 > GHA to test building on non-Oracle supported platforms Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21060#pullrequestreview-2314830026 From mli at openjdk.org Thu Sep 19 08:44:34 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 19 Sep 2024 08:44:34 GMT Subject: RFR: 8340353: Remove CompressedOops::ptrs_base In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 13:00:19 GMT, Kim Barrett wrote: > Please review this change that > > (1) Removes CompressedOops::ptrs_base(), changing all callers to instead call > CompressedOops::base(). > > (2) Renames CompressedOops::ptrs_base_addr() to CompressedOops::base_addr(), > updating all callers. > > Testing: > mach5 tier1 > GHA to test building on non-Oracle supported platforms Marked as reviewed by mli (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21060#pullrequestreview-2314840838 From duke at openjdk.org Thu Sep 19 08:50:40 2024 From: duke at openjdk.org (Joel =?UTF-8?B?U2lrc3Ryw7Zt?=) Date: Thu, 19 Sep 2024 08:50:40 GMT Subject: Integrated: 8337674: ZGC: Consistent style for naming private static constants In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 13:55:34 GMT, Joel Sikstr?m wrote: > There are various styles for naming private static constants in ZGC. Some have a leading underscore, some begin with a lowercase letter and some start with an uppercase letter. The convention we feel is most appropriate, which also aligns with the hotspot style guide, is to have mixed-case with the first letter of each word capitalized when naming private static constants. There are also some occurrences of writing `const static` instead of the more commonly used `static const`, which should be made consistent to have the static keyword appear first. > > The lines changed have been identified by running: > `rg "static const .* [[:lower:]].* =" src/hotspot/share/gc/z src/hotspot/*/*/gc/z` > `rg "static const .* _.* =" src/hotspot/share/gc/z src/hotspot/*/*/gc/z` > `rg "const static" src/hotspot/share/gc/z src/hotspot/*/*/gc/z` > > The occurrences of `const static valid_max_address_offset_bits` have been converted to `static const` from `const static` but have not been renamed to mixed-case as the occurrences are not exposed outside their function(s). > > Tested with tiers 1-3. This pull request has now been integrated. Changeset: 8908812d Author: Joel Sikstr?m Committer: Hamlin Li URL: https://git.openjdk.org/jdk/commit/8908812d0a64f25f0d033d44725a69348789b223 Stats: 62 lines in 24 files changed: 0 ins; 0 del; 62 mod 8337674: ZGC: Consistent style for naming private static constants Reviewed-by: stefank, aboldtch, mli ------------- PR: https://git.openjdk.org/jdk/pull/20968 From jsjolen at openjdk.org Thu Sep 19 09:18:37 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 19 Sep 2024 09:18:37 GMT Subject: RFR: 8340363: User-specified default decorators for UnifiedLogging In-Reply-To: References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: On Thu, 19 Sep 2024 01:30:30 GMT, David Holmes wrote: > Finally, this is really subjective. You'd really need to socialise the actual proposed changes to the defaults independent of any mechanism to allow it. By that you mean the `jit+inlining` default, right? That has been socialized among Oracle's C2 developers if I understand correctly, though it hasn't been done for the wider community. The lack of socialising the changes to the wider the community is an oversight on my part. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20988#issuecomment-2360456868 From jsjolen at openjdk.org Thu Sep 19 09:25:36 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 19 Sep 2024 09:25:36 GMT Subject: RFR: 8340363: User-specified default decorators for UnifiedLogging In-Reply-To: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: On Fri, 13 Sep 2024 09:03:55 GMT, Ant?n Seoane wrote: > Hi all, > > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: > > - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. > - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. > - Additionally, defaults may target a specific log level. > > Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. > > In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. > > Please consider this PR, and thanks! src/hotspot/share/logging/logDecorators.cpp line 30: > 28: > 29: const LogLevelType AnyLevel = LogLevelType::NotMentioned; > 30: #define DEFAULT_DECORATORS \ I think this should also have the default decorators that UL already have. That is, all data about default decorators is gathered here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20988#discussion_r1766489613 From rehn at openjdk.org Thu Sep 19 10:22:35 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 19 Sep 2024 10:22:35 GMT Subject: RFR: 8339771: RISC-V: Reduce icache flushes [v2] In-Reply-To: References: <355DP8wyWpGSrOmvF1hcLGg2Q6qDaCqmgoQ3AzSR1ww=.0882c0d8-7d74-4c48-9210-a815950e10d1@github.com> Message-ID: On Thu, 19 Sep 2024 08:14:50 GMT, Fei Yang wrote: >> Code would in BarrierSetNMethod::set_guard_value(...) > > I see. Seems deserve a code comment. Also, does that mean the `OrderAccess::fence()` you added in `ZBarrierSetAssembler::patch_barrier_relocation()` unnecessary? Yes. I added due to the comment. I'll remove it and update that comment and add new comment about above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20913#discussion_r1766573632 From mli at openjdk.org Thu Sep 19 10:32:50 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 19 Sep 2024 10:32:50 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 13:23:44 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > JVMCI support src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 2529: > 2527: } > 2528: __ decode_klass_not_null(result); > 2529: } else { Could this if/else block be replaced with a simple call of load_klass(...)? src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp line 3522: > 3520: { > 3521: __ movptr(result, Address(obj, oopDesc::klass_offset_in_bytes())); > 3522: } Could this if/else block be replaced with a simple call of load_klass(...)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766587136 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766582255 From rcastanedalo at openjdk.org Thu Sep 19 11:02:50 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 19 Sep 2024 11:02:50 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 08:04:43 GMT, Roberto Casta?eda Lozano wrote: > I agree that this is the simplest and least intrusive way of getting klass loading working in C2 for this experimental version of the feature. However, the approach seems brittle and error-prone, and it may be hard to maintain in the long run. Therefore, I think that a more principled and robust modeling will be needed, after this PR is integrated, in preparation for the non-experimental version. What do you think about this @rkennke? Do you agree on an alternative modeling of klass loading in C2 (without any reliance on `oopDesc::klass_offset_in_bytes()`) being a pre-condition for a future, non-experimental version of compact headers? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2360673405 From yzheng at openjdk.org Thu Sep 19 11:12:49 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Thu, 19 Sep 2024 11:12:49 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: On Thu, 19 Sep 2024 05:03:42 GMT, Stefan Karlsson wrote: >> Yes, I saw that patch. I'm not sure I like the idea of cpu dependent code also doing the encoding. There were some C2 changes related to it that I didn't understand if that scheme required them. I don't see the down side to having the prototype header pre-encoded in the markWord. Seems simpler. > > We already have a cpu dependent code for both C1 and the interpreter. Adding cpu dependent code to C2 doesn't make it significantly worse. My latest patch also refactors the code so that C1, interpreter, and C2 all calls into shared functions in the macro assembler. Could you please point me to the C2 change? Is it going to be integrated in this PR? We in Graal have not yet adopted `Klass::_prototype_header` and will hold if you decide to get rid of it ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766642585 From stuefe at openjdk.org Thu Sep 19 11:39:50 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 19 Sep 2024 11:39:50 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 23:49:34 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> JVMCI support > > src/hotspot/share/oops/compressedKlass.cpp line 242: > >> 240: } else { >> 241: >> 242: // Traditional (non-compact) header mode) > > Extra ) Will fix ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766676702 From rkennke at openjdk.org Thu Sep 19 11:52:34 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 19 Sep 2024 11:52:34 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v22] In-Reply-To: References: Message-ID: <0mWQW50x4UNwdsRE94w3rZVGnppxQeR9fbe4eUrAGtM=.cca89805-ca82-4605-bc11-4f9ac53d2b90@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Simplify LIR_Assembler::emit_load_klass() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/9ad2e62f..b25a4b69 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=20-21 Stats: 28 lines in 2 files changed: 0 ins; 26 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Thu Sep 19 11:52:34 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 19 Sep 2024 11:52:34 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 11:00:20 GMT, Roberto Casta?eda Lozano wrote: > > I agree that this is the simplest and least intrusive way of getting klass loading working in C2 for this experimental version of the feature. However, the approach seems brittle and error-prone, and it may be hard to maintain in the long run. Therefore, I think that a more principled and robust modeling will be needed, after this PR is integrated, in preparation for the non-experimental version. > > What do you think about this @rkennke? Do you agree on an alternative modeling of klass loading in C2 (without any reliance on `oopDesc::klass_offset_in_bytes()`) being a pre-condition for a future, non-experimental version of compact headers? Yes, that sounds like a good improvement! It'd also clean up C2 considerably - right now there are many places in C2 that rely on klass_offset_in_bytes(). Getting rid of them all would be great, but also seems like a major effort. Could you file an issue to track that future work? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2360756796 From rkennke at openjdk.org Thu Sep 19 11:52:37 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 19 Sep 2024 11:52:37 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 10:29:11 GMT, Hamlin Li wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> JVMCI support > > src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 2529: > >> 2527: } >> 2528: __ decode_klass_not_null(result); >> 2529: } else { > > Could this if/else block be replaced with a simple call of load_klass(...)? Yes, will do. > src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp line 3522: > >> 3520: { >> 3521: __ movptr(result, Address(obj, oopDesc::klass_offset_in_bytes())); >> 3522: } > > Could this if/else block be replaced with a simple call of load_klass(...)? Yes, will do. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766689169 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766689004 From stuefe at openjdk.org Thu Sep 19 11:52:38 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 19 Sep 2024 11:52:38 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 05:44:42 GMT, Stefan Karlsson wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> JVMCI support > > src/hotspot/share/oops/compressedKlass.cpp line 231: > >> 229: // The reason is that we want to avoid, if possible, shifts larger than >> 230: // a cacheline size. >> 231: _base = addr; > > Why is this important? It lessens the cache effects of Klass hyperaligning. > src/hotspot/share/oops/compressedKlass.hpp line 261: > >> 259: } >> 260: >> 261: }; > > Missing blank line before `#endif` Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766684016 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766684491 From stuefe at openjdk.org Thu Sep 19 11:52:39 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 19 Sep 2024 11:52:39 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 11:43:12 GMT, Thomas Stuefe wrote: >> src/hotspot/share/oops/compressedKlass.cpp line 231: >> >>> 229: // The reason is that we want to avoid, if possible, shifts larger than >>> 230: // a cacheline size. >>> 231: _base = addr; >> >> Why is this important? > > It lessens the cache effects of Klass hyperaligning. Note that if we go with my KLUT proposal for post-Lilliput (the GC oop iteration improvements), this will not matter anymore and can be simplified to a fixed shift of 10. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766688756 From stuefe at openjdk.org Thu Sep 19 11:52:40 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 19 Sep 2024 11:52:40 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 23:53:28 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> JVMCI support > > src/hotspot/share/oops/compressedKlass.hpp line 175: > >> 173: // 5b) if CDS=off: Calls initialize() - here, we have more freedom and, if we want, can choose an encoding >> 174: // base that differs from the reservation base from step (4). That allows us, e.g., to later use >> 175: // zero-based encoding. > > Not for this but is there really any benefit for zero based encoding for klass ids? Yes, I think so. I think the SAP Jit people investigated this when doing the PPC ports. You save at least two instructions, and possibly more, per decode op. You save code size too since you don't need to materialize the 64-bit base immediate. Especially on x64 this can mean easily 11 fewer bytes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766681110 From stuefe at openjdk.org Thu Sep 19 11:52:42 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 19 Sep 2024 11:52:42 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v18] In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 10:36:58 GMT, Johan Sj?len wrote: >> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 57 commits: >> >> - fix CompressedClassPointersEncodingScheme yet again for linux aarch64 >> - Fixes post-8340184 >> - Merge upstream up to and including 8340184 >> - Merge remote-tracking branch 'origin/master' into JDK-8305895-v4 >> - Fix test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java >> - Fix loop on aarch64 >> - clarify obscure assert in metasapce setup >> - Rework compressedklass encoding >> - remove stray debug output >> - Fixes post 8338526 >> - ... and 47 more: https://git.openjdk.org/jdk/compare/7849f252...28a26aed > > test/hotspot/gtest/metaspace/test_clms.cpp line 193: > >> 191: >> 192: { >> 193: // Nonclass arena allocation. > > The style in this source file isn't really up to scratch, especially *these* lines. Anyway, it's in the tests, so I'm OK with this being fixed in a follow up RFE. Okay, will fix ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766686807 From rkennke at openjdk.org Thu Sep 19 11:57:52 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 19 Sep 2024 11:57:52 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: On Thu, 19 Sep 2024 05:03:42 GMT, Stefan Karlsson wrote: >> Yes, I saw that patch. I'm not sure I like the idea of cpu dependent code also doing the encoding. There were some C2 changes related to it that I didn't understand if that scheme required them. I don't see the down side to having the prototype header pre-encoded in the markWord. Seems simpler. > > We already have a cpu dependent code for both C1 and the interpreter. Adding cpu dependent code to C2 doesn't make it significantly worse. My latest patch also refactors the code so that C1, interpreter, and C2 all calls into shared functions in the macro assembler. We haven't decided whether or not we will git rid of ```Klass::_prototype_header``` before intergrating this PR, or not. @stefank could point you to a WIP branch, if that's helpful. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766697849 From syan at openjdk.org Thu Sep 19 12:02:03 2024 From: syan at openjdk.org (SendaoYan) Date: Thu, 19 Sep 2024 12:02:03 GMT Subject: RFR: 8340439: AArch64: Extra entry declaration for assember test Message-ID: Hi all, The function declaration `extern "C" void entry(CodeBuffer*);` in `src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp` line 74, seems to used for AArch64 assember test. The AArch64 assember test has been moved to `test/hotspot/gtest` by [JDK-8252684](https://bugs.openjdk.org/browse/JDK-8252684) , so I think this function declaration can be remove. Additional testing: - [ ] linux aarch64 jtreg(tier1/2/3 etc.) with release build - [ ] linux aarch64 jtreg(tier1/2/3 etc.) with fastdebug build ------------- Commit messages: - 8340439: Extra entry declaration for assember test Changes: https://git.openjdk.org/jdk/pull/21086/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21086&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340439 Stats: 6 lines in 1 file changed: 0 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21086.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21086/head:pull/21086 PR: https://git.openjdk.org/jdk/pull/21086 From rkennke at openjdk.org Thu Sep 19 12:08:46 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 19 Sep 2024 12:08:46 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v23] In-Reply-To: References: Message-ID: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'lilliput/JEP-450-temporary-fix-branch-2' into JDK-8305895-v4 - review feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/b25a4b69..0d8a9236 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=21-22 Stats: 10 lines in 3 files changed: 1 ins; 4 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From lucy at openjdk.org Thu Sep 19 12:20:44 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 19 Sep 2024 12:20:44 GMT Subject: RFR: 8338995: New Object to ObjectMonitor mapping: PPC64 implementation [v4] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 14:19:20 GMT, Martin Doerr wrote: >> PPC64 implementation of [JDK-8315884](https://bugs.openjdk.org/browse/JDK-8315884). > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Consistently use CCR0 in compiler_fast_lock_lightweight_object and compiler_fast_unlock_lightweight_object LGTM. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20922#pullrequestreview-2315364152 From mdoerr at openjdk.org Thu Sep 19 12:32:40 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 19 Sep 2024 12:32:40 GMT Subject: RFR: 8338995: New Object to ObjectMonitor mapping: PPC64 implementation [v4] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 14:19:20 GMT, Martin Doerr wrote: >> PPC64 implementation of [JDK-8315884](https://bugs.openjdk.org/browse/JDK-8315884). > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Consistently use CCR0 in compiler_fast_lock_lightweight_object and compiler_fast_unlock_lightweight_object Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20922#issuecomment-2360854417 From mdoerr at openjdk.org Thu Sep 19 12:32:41 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 19 Sep 2024 12:32:41 GMT Subject: Integrated: 8338995: New Object to ObjectMonitor mapping: PPC64 implementation In-Reply-To: References: Message-ID: On Mon, 9 Sep 2024 20:28:21 GMT, Martin Doerr wrote: > PPC64 implementation of [JDK-8315884](https://bugs.openjdk.org/browse/JDK-8315884). This pull request has now been integrated. Changeset: 7579d374 Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/7579d3740217e4a819cbf63837ec929f00464585 Stats: 171 lines in 8 files changed: 65 ins; 7 del; 99 mod 8338995: New Object to ObjectMonitor mapping: PPC64 implementation Reviewed-by: rrich, lucy ------------- PR: https://git.openjdk.org/jdk/pull/20922 From coleenp at openjdk.org Thu Sep 19 12:38:48 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 19 Sep 2024 12:38:48 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 11:47:21 GMT, Thomas Stuefe wrote: >> It lessens the cache effects of Klass hyperaligning. > > Note that if we go with my KLUT proposal for post-Lilliput (the GC oop iteration improvements), this will not matter anymore and can be simplified to a fixed shift of 10. Yes, please, not having this code would be really nice. This is difficult code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766753081 From rehn at openjdk.org Thu Sep 19 12:41:54 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 19 Sep 2024 12:41:54 GMT Subject: RFR: 8339771: RISC-V: Reduce icache flushes [v4] In-Reply-To: References: Message-ID: > Hey, please consider, > > All code which is offline (behind a barrier) do not need global icache flushes. > As we can instead in slow path locally (thread and hart) emit fence.i. > But if we were to be context switch do a hart which have not had fence.i emitted we can still fetch stale instructions. > To handle this case new now have kernel support: > https://docs.kernel.org/arch/riscv/cmodx.html > > It's not perfect as we will be emitting fence.i on any context switch for any thread with this patch, even if that thread do not execute on code heap (non attached native thread), and even if there was no changes to code heap. > But this is in many cases much faster as the icache flush global IPI is very intrusive. > Particular cases are running a concurrent gc with small head room. > In such scenario I measured 15% increased throughput on VF2. > A large CPU or less head room (faster GC cycles) will yield even more performance boost. > > Note that this requires 6.10 kernel. > > I'm running VF2 with 6.11-rc3 kernel and this passed tier1-3. (With setting on) > > Later we probably want this default on, but as it's hard to test I'll leave default off. Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: Comment, remove not needed fence ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20913/files - new: https://git.openjdk.org/jdk/pull/20913/files/5489a0d5..afbea83b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20913&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20913&range=02-03 Stats: 24 lines in 2 files changed: 21 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20913.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20913/head:pull/20913 PR: https://git.openjdk.org/jdk/pull/20913 From rehn at openjdk.org Thu Sep 19 12:46:35 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 19 Sep 2024 12:46:35 GMT Subject: RFR: 8339771: RISC-V: Reduce icache flushes [v2] In-Reply-To: References: <355DP8wyWpGSrOmvF1hcLGg2Q6qDaCqmgoQ3AzSR1ww=.0882c0d8-7d74-4c48-9210-a815950e10d1@github.com> Message-ID: On Thu, 19 Sep 2024 10:20:09 GMT, Robbin Ehn wrote: >> I see. Seems deserve a code comment. Also, does that mean the `OrderAccess::fence()` you added in `ZBarrierSetAssembler::patch_barrier_relocation()` unnecessary? > > Yes. I added due to the comment. > I'll remove it and update that comment and add new comment about above. I added the same comment at the two places. Note that's it's great you ask questions about this, so we can reason about the correctness. Thank you! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20913#discussion_r1766765314 From thartmann at openjdk.org Thu Sep 19 12:48:53 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 19 Sep 2024 12:48:53 GMT Subject: RFR: 8335334: Stress mode to randomly execute unstable if traps [v4] In-Reply-To: References: Message-ID: > Unstable if traps are supposed to be taken rarely. This patch introduces a `StressUnstableIfTraps` flag that forces unstable if traps to be taken randomly and thus potentially triggering intermittent bugs such as incorrect debug information. It works by adding another if before the unstable if that checks a "random" condition at runtime (a simple shared counter) and then either takes the trap or executes the original, unstable if. > > This stress option also has the nice side effect of triggering re-compilation of methods that would otherwise not be re-compiled (see [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843)). > > I had to adjust a few tests that rely on methods being compiled / deoptimized because with the stress option enabled, deoptimization might unexpectedly (not) happen. > > It reliably triggers [JDK-8335977](https://bugs.openjdk.org/browse/JDK-8335977), [JDK-8320308](https://bugs.openjdk.org/browse/JDK-8320308) and [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843) in our testing. > > Tested with multiple runs up to tier6. I'll integrate it into our (Oracle internal) testing and CTW [(JDK-8340302)](https://bugs.openjdk.org/browse/JDK-8340302) once all the bugs that it currently triggers are fixed. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: More reviewer comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21037/files - new: https://git.openjdk.org/jdk/pull/21037/files/8257e6e3..691af16c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21037&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21037&range=02-03 Stats: 20 lines in 4 files changed: 1 ins; 9 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/21037.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21037/head:pull/21037 PR: https://git.openjdk.org/jdk/pull/21037 From thartmann at openjdk.org Thu Sep 19 12:53:41 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 19 Sep 2024 12:53:41 GMT Subject: RFR: 8335334: Stress mode to randomly execute unstable if traps [v3] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 18:09:52 GMT, Vladimir Kozlov wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Moved declaration > > src/hotspot/share/opto/parse2.cpp line 1375: > >> 1373: >> 1374: // Used by StressUnstableIfTraps >> 1375: static volatile int _trap_stress_counter = 0; > > Please, check that all accesses to it use ExternalAddress (external_word_type relocation). Checked. We use the same pattern at other places already, for example in `GraphKit::increment_counter`. > src/hotspot/share/opto/parse2.cpp line 1377: > >> 1375: static volatile int _trap_stress_counter = 0; >> 1376: >> 1377: void Parse::load_trap_stress_counter(Node*& counter, Node*& incr_store) { > > `load_trap_` -> `increment_trap_` Good point. Renamed. > src/hotspot/share/opto/parse2.cpp line 1594: > >> 1592: trap = (CallStaticJavaNode*)orig_iff->raw_out(i)->find_out_with(Op_CallStaticJava); >> 1593: if (trap != nullptr && trap->is_uncommon_trap() && trap->jvms()->should_reexecute() && >> 1594: Deoptimization::trap_request_reason(trap->uncommon_trap_request()) == Deoptimization::Reason_unstable_if) { > > Can we use `ProjNode::is_uncommon_trap_if_pattern()` here? Not directly but we could use `IfNode::uncommon_trap_proj`. Updated the code accordingly. > src/hotspot/share/opto/parse2.cpp line 1627: > >> 1625: trap_region->set_req(1, trap_proj); >> 1626: trap_region->set_req(2, if_true); >> 1627: trap->set_req(0, _gvn.transform(trap_region)); > > Can you use `IdealKit::if_then()` here to simplify code? I thought about that. The problem is that `IdealKit` does not work if we `stopped()` already. And if we added a trap, we stopped in one of the branches, one of which might be the one we are currently parsing (note that we are updating the trap only after creating the branches). Also, we still need to create a new region manually to merge the "should trap" path from the new check and the "should trap" path from the original if into the trap. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21037#discussion_r1766248797 PR Review Comment: https://git.openjdk.org/jdk/pull/21037#discussion_r1766249140 PR Review Comment: https://git.openjdk.org/jdk/pull/21037#discussion_r1766769687 PR Review Comment: https://git.openjdk.org/jdk/pull/21037#discussion_r1766776983 From thartmann at openjdk.org Thu Sep 19 12:53:38 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 19 Sep 2024 12:53:38 GMT Subject: RFR: 8335334: Stress mode to randomly execute unstable if traps [v3] In-Reply-To: References: Message-ID: <9LbeuBDSm-nXsDlIF38vwRQcAvi-dUhL4LZNXX46JO8=.c20c59fd-d49f-4ea2-adcf-c5169a13c5b4@github.com> On Wed, 18 Sep 2024 11:10:47 GMT, Tobias Hartmann wrote: >> Unstable if traps are supposed to be taken rarely. This patch introduces a `StressUnstableIfTraps` flag that forces unstable if traps to be taken randomly and thus potentially triggering intermittent bugs such as incorrect debug information. It works by adding another if before the unstable if that checks a "random" condition at runtime (a simple shared counter) and then either takes the trap or executes the original, unstable if. >> >> This stress option also has the nice side effect of triggering re-compilation of methods that would otherwise not be re-compiled (see [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843)). >> >> I had to adjust a few tests that rely on methods being compiled / deoptimized because with the stress option enabled, deoptimization might unexpectedly (not) happen. >> >> It reliably triggers [JDK-8335977](https://bugs.openjdk.org/browse/JDK-8335977), [JDK-8320308](https://bugs.openjdk.org/browse/JDK-8320308) and [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843) in our testing. >> >> Tested with multiple runs up to tier6. I'll integrate it into our (Oracle internal) testing and CTW [(JDK-8340302)](https://bugs.openjdk.org/browse/JDK-8340302) once all the bugs that it currently triggers are fixed. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Moved declaration Thanks again Vladimir. I updated the code and answered your comments. ------------- PR Review: https://git.openjdk.org/jdk/pull/21037#pullrequestreview-2314548718 From erikj at openjdk.org Thu Sep 19 13:08:35 2024 From: erikj at openjdk.org (Erik Joelsson) Date: Thu, 19 Sep 2024 13:08:35 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 08:32:38 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > Thanks! > > This patch is based on https://github.com/openjdk/jdk/pull/20781 which added the sleef source (in particular the generated sleef inline headers). We use sleef api to vectorize the math operations in vector api. > > On machine with vector intrinsic support on riscv (e.g. gcc 14+) it will generate libsleef.so with the bridge functions to sleef api, otherwise without the bridge functions. > > ### Test > test/jdk/jdk/incubator/vector > > ### Performance > data on bananapi > > Benchmark - bananapi | (size) | Mode | Cnt | Score +intrinsic | Error +intrinsic | Score -intrinsic | Error -intrinsic | Units | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Double128Vector.ACOS | 1024 | avgt | 10 | 112444.388 | 655.761 | 208554.742 | 1508.709 | ns/op | 1.855 > Double128Vector.ASIN | 1024 | avgt | 10 | 104121.259 | 243.167 | 208314.499 | 2833.61 | ns/op | 2.001 > Double128Vector.ATAN | 1024 | avgt | 10 | 136941.263 | 243.486 | 284024.53 | 2204.224 | ns/op | 2.074 > Double128Vector.ATAN2 | 1024 | avgt | 10 | 163228.681 | 435.455 | 427589.587 | 3045.192 | ns/op | 2.62 > Double128Vector.CBRT | 1024 | avgt | 10 | 146395.753 | 239.355 | 317136.654 | 1330.869 | ns/op | 2.166 > Double128Vector.COS | 1024 | avgt | 10 | 154865.298 | 235.697 | 305721.518 | 1319.313 | ns/op | 1.974 > Double128Vector.COSH | 1024 | avgt | 10 | 189212.943 | 262.399 | 220756.27 | 61324.863 | ns/op | 1.167 > Double128Vector.EXP | 1024 | avgt | 10 | 113941.594 | 219.647 | 252853.07 | 891.272 | ns/op | 2.219 > Double128Vector.EXPM1 | 1024 | avgt | 10 | 184552.939 | 513.715 | 254087.184 | 2144.997 | ns/op | 1.377 > Double128Vector.HYPOT | 1024 | avgt | 10 | 111580.194 | 423.282 | 374537.338 | 2091.811 | ns/op | 3.357 > Double128Vector.LOG | 1024 | avgt | 10 | 110680.548 | 192.731 | 265391.129 | 2653.519 | ns/op | 2.398 > Double128Vector.LOG10 | 1024 | avgt | 10 | 116708.105 | 167.095 | 285764.405 | 2489.08 | ns/op | 2.449 > Double128Vector.LOG1P | 1024 | avgt | 10 | 115633.302 | 567.7 | 317235.967 | 1062.848 | ns/op | 2.743 > Double128Vector.POW | 1024 | avgt | 10 | 321655.14 | 36.55 | 560765.066 | 2669.33 | ns/op | 1.743 > Double128Vector.... make/modules/jdk.incubator.vector/Lib.gmk line 48: > 46: DISABLED_WARNINGS_gcc := unused-function sign-compare tautological-compare ignored-qualifiers, \ > 47: DISABLED_WARNINGS_clang := unused-function sign-compare tautological-compare ignored-qualifiers, \ > 48: CFLAGS := $(CFLAGS_JDKLIB) -O3 -march=rv64gcv, \ I think we prefer using the `C_O_FLAG_*` variables instead of explicitly specifying `-O3`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1766779663 From rcastanedalo at openjdk.org Thu Sep 19 13:12:49 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 19 Sep 2024 13:12:49 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: Message-ID: <0gatRiYQ3frDnMftpb_WaDolUwcYvBFh5hAp6jY0dzQ=.21d6518e-7217-477e-954f-69fd52eb713e@github.com> On Thu, 19 Sep 2024 11:42:04 GMT, Roman Kennke wrote: > > > I agree that this is the simplest and least intrusive way of getting klass loading working in C2 for this experimental version of the feature. However, the approach seems brittle and error-prone, and it may be hard to maintain in the long run. Therefore, I think that a more principled and robust modeling will be needed, after this PR is integrated, in preparation for the non-experimental version. > > > > > > What do you think about this @rkennke? Do you agree on an alternative modeling of klass loading in C2 (without any reliance on `oopDesc::klass_offset_in_bytes()`) being a pre-condition for a future, non-experimental version of compact headers? > > Yes, that sounds like a good improvement! It'd also clean up C2 considerably - right now there are many places in C2 that rely on klass_offset_in_bytes(). Getting rid of them all would be great, but also seems like a major effort. Could you file an issue to track that future work? Done: https://bugs.openjdk.org/browse/JDK-8340453. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2360945827 From stefank at openjdk.org Thu Sep 19 13:12:50 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 19 Sep 2024 13:12:50 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 12:35:30 GMT, Coleen Phillimore wrote: >> Note that if we go with my KLUT proposal for post-Lilliput (the GC oop iteration improvements), this will not matter anymore and can be simplified to a fixed shift of 10. > > Yes, please, not having this code would be really nice. This is difficult code. Do you seen any effects of this in anything other than special-crafted micro benchmarks? I wonder if it would be good enough to hard-code this to be 10 for the first integration of Lilliput. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766804699 From stuefe at openjdk.org Thu Sep 19 13:37:52 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 19 Sep 2024 13:37:52 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 13:08:43 GMT, Stefan Karlsson wrote: >> Yes, please, not having this code would be really nice. This is difficult code. > > Do you seen any effects of this in anything other than special-crafted micro benchmarks? I wonder if it would be good enough to hard-code this to be 10 for the first integration of Lilliput. I will do some benchmarks ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766848371 From asmehra at openjdk.org Thu Sep 19 13:49:39 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Thu, 19 Sep 2024 13:49:39 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v11] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Thu, 19 Sep 2024 04:19:18 GMT, Ioi Lam wrote: >> This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> **Overview** >> >> - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. >> - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. >> - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. >> - The boot classes are loaded as part of `vmClasses::resolve_all()` >> - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). >> - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. >> >> **All-or-nothing Loading** >> >> - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. >> - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: >> - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. >> - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. >> - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. >> - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exa... > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > @dholmes-ora comments LGTM ------------- PR Comment: https://git.openjdk.org/jdk/pull/20843#issuecomment-2361038557 From mli at openjdk.org Thu Sep 19 13:50:36 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 19 Sep 2024 13:50:36 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 12:52:59 GMT, Erik Joelsson wrote: >> Hi, >> Can you help to review this patch? >> Thanks! >> >> This patch is based on https://github.com/openjdk/jdk/pull/20781 which added the sleef source (in particular the generated sleef inline headers). We use sleef api to vectorize the math operations in vector api. >> >> On machine with vector intrinsic support on riscv (e.g. gcc 14+) it will generate libsleef.so with the bridge functions to sleef api, otherwise without the bridge functions. >> >> ### Test >> test/jdk/jdk/incubator/vector >> >> ### Performance >> data on bananapi >> >> Benchmark - bananapi | (size) | Mode | Cnt | Score +intrinsic | Error +intrinsic | Score -intrinsic | Error -intrinsic | Units | Improvement >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Double128Vector.ACOS | 1024 | avgt | 10 | 112444.388 | 655.761 | 208554.742 | 1508.709 | ns/op | 1.855 >> Double128Vector.ASIN | 1024 | avgt | 10 | 104121.259 | 243.167 | 208314.499 | 2833.61 | ns/op | 2.001 >> Double128Vector.ATAN | 1024 | avgt | 10 | 136941.263 | 243.486 | 284024.53 | 2204.224 | ns/op | 2.074 >> Double128Vector.ATAN2 | 1024 | avgt | 10 | 163228.681 | 435.455 | 427589.587 | 3045.192 | ns/op | 2.62 >> Double128Vector.CBRT | 1024 | avgt | 10 | 146395.753 | 239.355 | 317136.654 | 1330.869 | ns/op | 2.166 >> Double128Vector.COS | 1024 | avgt | 10 | 154865.298 | 235.697 | 305721.518 | 1319.313 | ns/op | 1.974 >> Double128Vector.COSH | 1024 | avgt | 10 | 189212.943 | 262.399 | 220756.27 | 61324.863 | ns/op | 1.167 >> Double128Vector.EXP | 1024 | avgt | 10 | 113941.594 | 219.647 | 252853.07 | 891.272 | ns/op | 2.219 >> Double128Vector.EXPM1 | 1024 | avgt | 10 | 184552.939 | 513.715 | 254087.184 | 2144.997 | ns/op | 1.377 >> Double128Vector.HYPOT | 1024 | avgt | 10 | 111580.194 | 423.282 | 374537.338 | 2091.811 | ns/op | 3.357 >> Double128Vector.LOG | 1024 | avgt | 10 | 110680.548 | 192.731 | 265391.129 | 2653.519 | ns/op | 2.398 >> Double128Vector.LOG10 | 1024 | avgt | 10 | 116708.105 | 167.095 | 285764.405 | 2489.08 | ns/op | 2.449 >> Double128Vector.LOG1P | 1024 | avgt | 10 | 115633.302 | 567.7 | 317235.967 | 1062.848 | ns/op | 2.743 >> Double128Vector.POW | 1024 | avgt | 10 | 321655.14 | 3... > > make/modules/jdk.incubator.vector/Lib.gmk line 48: > >> 46: DISABLED_WARNINGS_gcc := unused-function sign-compare tautological-compare ignored-qualifiers, \ >> 47: DISABLED_WARNINGS_clang := unused-function sign-compare tautological-compare ignored-qualifiers, \ >> 48: CFLAGS := $(CFLAGS_JDKLIB) -O3 -march=rv64gcv, \ > > I think we prefer using the `C_O_FLAG_*` variables instead of explicitly specifying `-O3`. Thanks, do you mean something like below? I'll fix it. CFLAGS := $(CFLAGS_JDKLIB) $(C_O_FLAG_HI) -march=rv64gcv, \ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1766874473 From lmesnik at openjdk.org Thu Sep 19 14:07:43 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 19 Sep 2024 14:07:43 GMT Subject: Withdrawn: 8340415: Update failure handler to print more info using gdb scripts In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 03:07:48 GMT, Leonid Mesnik wrote: > The failure handler updated to use gdb scripts. > The initial version print some information about mutextes, safepoint and threads state. So the deadlocks are easier to analyze without opening core files. > The existing gdb processing is not changed. > > > Later the script might be improved with more information. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/21078 From matsaave at openjdk.org Thu Sep 19 14:07:52 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 19 Sep 2024 14:07:52 GMT Subject: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v4] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 21:02:41 GMT, Matias Saavedra Silva wrote: >> This patch cleans up the use of `get_new_method()` so callers don't have to worry about throwing `NoSuchMethodError`. The method is refactored to throw the error and avoid ever returning nullptr. Verified with tier1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Coleen suggestion There are lingering concerns about the impact of this change and how it changes the behavior at certain code paths. Since we cannot reach a consensus this PR will be closed an picked up elsewhere. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20874#issuecomment-2361082192 From duke at openjdk.org Thu Sep 19 14:07:54 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Thu, 19 Sep 2024 14:07:54 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v9] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Wed, 18 Sep 2024 11:49:08 GMT, Andrew Haley wrote: >> Mikhail Ablakatov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - Merge branch 'master' into 8322770 >> - cleanup: adjust a comment in the light of the latest change >> - cleanup: fix comment formatting >> >> Co-authored-by: Andrew Haley >> - Optimize both the stub and inlined parts of the implementation >> >> Process T_CHAR/T_SHORT elements using T8H arrangement instead of T4H. >> Add a non-unrolled vectorized loop to the stub to handle vectorizable >> tail portions of arrays multiple to 4/8 elements (for ints / other >> types). Make the stub process array as a whole instead of relying on >> the inlined part to process an unvectorizable tail. >> - cleanup: add comments and simplify the orr ins >> - cleanup: remove redundant copyright notice >> - cleanup: use a constexpr function for intpow instead of a templated class >> - cleanup: address review comments >> - cleanup: remove a redundant parameter >> - 8322770: AArch64: C2: Implement VectorizedHashCode >> >> The code to calculate a hash code consists of two parts: a stub method that >> implements a vectorized loop using Neon instruction which processes 16 or 32 >> elements per iteration depending on the data type; and an unrolled inlined >> scalar loop that processes remaining tail elements. >> >> [Performance] >> >> [[Neoverse V2]] >> ``` >> | 328a053 (master) | dc2909f (this) | >> ---------------------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt | Score Error | Score Error | Units >> ---------------------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 | 0.805 ? 0.206 | 0.815 ? 0.141 | ns/op >> ArraysHashCode.bytes 10 avgt 15 | 4.362 ? 0.013 | 3.522 ? 0.124 | ns/op >> ArraysHashCode.bytes 100 avgt 15 | 78.374 ? 0.136 | 12.935 ? 0.016 | ns/op >> ArraysHashCode.bytes 10000 avgt 15 | 9247.335 ? 13.691 | 1344.770 ? 1.898 | ns/op >> ArraysHashCode.cha... > > src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2877: > >> 2875: f(0b01111, 28, 24); \ >> 2876: if (T == T4H || T == T8H) { \ >> 2877: f(0b01, 23, 22), f(index & 0b11, 21, 20), rf(Vm, 16), f(op2, 15, 12), f(index >> 2 & 1, 11); \ > > This isn't right. > Please go to test/hotspot/gtest/aarch64/aarch64-asmtest.py and add `mulv` to the set of tested instructions. Please make sure you test all modes. Could you point me towards asm tests for *Vector - Scalar* insts like `fmlavs`, `fmlsvs` or `fmulxvs` ([assembler_aarch64.hpp#L2857](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/assembler_aarch64.hpp:2857))? Judging by git blame these were added without any new tests. Am I right assuming we are missing asm tests for *Vector - Scalar* instructions in general? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1766902957 From matsaave at openjdk.org Thu Sep 19 14:07:53 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 19 Sep 2024 14:07:53 GMT Subject: Withdrawn: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling In-Reply-To: References: Message-ID: <2Wg6SCp9UAyzNfh41xpRY2nR3l_V3R8kA4SHsIX_CKQ=.8f5329da-639c-4ee8-9e3e-043ac4e663b7@github.com> On Thu, 5 Sep 2024 18:56:19 GMT, Matias Saavedra Silva wrote: > This patch cleans up the use of `get_new_method()` so callers don't have to worry about throwing `NoSuchMethodError`. The method is refactored to throw the error and avoid ever returning nullptr. Verified with tier1-5 tests. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/20874 From asmehra at openjdk.org Thu Sep 19 14:19:43 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Thu, 19 Sep 2024 14:19:43 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v3] In-Reply-To: References: <0QRMVGKYDVfu4Ie1N6RKC5B1QPxs7sQUvdnyZxayX9o=.e4fe8dc7-4cc8-44ba-99e7-dc5cacd85147@github.com> Message-ID: On Thu, 19 Sep 2024 02:23:32 GMT, Ioi Lam wrote: >>> That's why there's no check for k to be aot-initialized. >> >> I was actually referring to the missing aot-initialized check for the Fruit class. >> As it stands, this method initializes the classes required by the archive mirrors as the _runtime_default_subgraph_info has all the archived mirrors. But not all classes that have archived mirror are aot-initialized. And from the Fruit class example in the comment it seems this method should only be initializing the classes that are required by archived mirrors of _aot-initialized classes_: >> >> >> // For example, if this enum class is initialized at AOT cache assembly time: >> // >> // enum Fruit { >> // APPLE, ORANGE, BANANA; >> // static final Set HAVE_SEEDS = new HashSet<>(Arrays.asList(APPLE, ORANGE)); >> // } >> // >> // the pre-inited mirror of Fruit references HashSet, which should be initialized >> // before any Java code can access the Fruit class. >> >> >> So based on the comment there should be a way to identify the subgraph_object_klasses of only the aot-initialized classes and initialize only those classes. >> Am I reading this wrong? > >> I was actually referring to the missing aot-initialized check for the Fruit class. As it stands, this method initializes the classes required by the archive mirrors as the _runtime_default_subgraph_info has all the archived mirrors. > > `_runtime_default_subgraph_info` is not recording the mirrors. It records all the classes of all the objects that can are reachable from the archived mirrors. > > For example, if the following three classes are aot-initialized: > > > class A { static Object foo = new X(); } > class B { static Object foo = new Y(); } > class C { static Object foo = new Y(); } > > > `_runtime_default_subgraph_info` records `X` and `Y`. It doesn't record `A`, `B`, or `C`. > >> But not all classes that have archived mirror are aot-initialized. > > If a class is not AOT initialized, its mirror is filled with zeros (plus a few native pointers) so the mirror doesn't point to any object. Expanding on the above example, lets say A is aot- initialized, but B and C are not. So this function should initialize only X not Y, is that correct? If so, then how does it prevent initialization of Y? It iterates through all the subgraph_object_klasses which includes both X and Y. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20958#discussion_r1766924948 From stefank at openjdk.org Thu Sep 19 14:25:52 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 19 Sep 2024 14:25:52 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: <2w9H6VAbxm7BgSGRwKAxbI56bG-k4bE_ZDviGrBF36o=.3d4cb47f-0f84-479a-a809-6d0186dfad2d@github.com> On Thu, 19 Sep 2024 11:54:50 GMT, Roman Kennke wrote: >> We already have a cpu dependent code for both C1 and the interpreter. Adding cpu dependent code to C2 doesn't make it significantly worse. My latest patch also refactors the code so that C1, interpreter, and C2 all calls into shared functions in the macro assembler. > > We haven't decided whether or not we will git rid of ```Klass::_prototype_header``` before intergrating this PR, or not. @stefank could point you to a WIP branch, if that's helpful. This is my current work-in-progress code: https://github.com/stefank/jdk/compare/pull/20677...stefank:jdk:lilliput_remove_prototype_header_wip_2 I've made some large rewrites and are currently running it through functional testing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1766934571 From mli at openjdk.org Thu Sep 19 15:03:53 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 19 Sep 2024 15:03:53 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v23] In-Reply-To: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com> References: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com> Message-ID: On Thu, 19 Sep 2024 12:08:46 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'lilliput/JEP-450-temporary-fix-branch-2' into JDK-8305895-v4 > - review feedback In both aarch64.ad and x86_64.ad, `MachUEPNode::format` might need some change accordingly? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2361266175 From duke at openjdk.org Thu Sep 19 15:05:05 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Thu, 19 Sep 2024 15:05:05 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v9] In-Reply-To: <6ekcH31ryktqs1NAEtBp2QPOuMSgPs84y6GOrAyvHXE=.b96cba1c-8ec0-453a-a584-bc821ce8a05c@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <6ekcH31ryktqs1NAEtBp2QPOuMSgPs84y6GOrAyvHXE=.b96cba1c-8ec0-453a-a584-bc821ce8a05c@github.com> Message-ID: On Wed, 18 Sep 2024 13:18:41 GMT, Andrew Haley wrote: >> The current implementation reflects that the decision to process a register by halves depends on the arrangement used. In the previous version of this PR, we tested for `load_arrangement` in places where `multiply_by_halves` is tested now. This way, for example, changing the arrangement for `T_CHAR`/`T_SHORT` from `T8H` to `T4H` requires only changing the arrangement itself. Using the logic you suggest would require one to be aware of the connection between `load_arrangement` and `multiply_by_halves` that must be maintained. Therefore, I recommend leaving the code as it is. > > No, because the connection between load_arrangement and multiply_by_halves is inherent in the logic. Please keep things as simple as possible for this implementation. If a future engineer decides to extend this code they'll probably do something different. Speculating here about what might happen is over-engineering. Fixed by https://github.com/openjdk/jdk/pull/18487/commits/a824a74263ce9309fca50a90b3d0f09e9b56a5c4 , please check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1766998230 From duke at openjdk.org Thu Sep 19 15:05:04 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Thu, 19 Sep 2024 15:05:04 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v9] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Thu, 19 Sep 2024 14:04:52 GMT, Mikhail Ablakatov wrote: >> src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2877: >> >>> 2875: f(0b01111, 28, 24); \ >>> 2876: if (T == T4H || T == T8H) { \ >>> 2877: f(0b01, 23, 22), f(index & 0b11, 21, 20), rf(Vm, 16), f(op2, 15, 12), f(index >> 2 & 1, 11); \ >> >> This isn't right. >> Please go to test/hotspot/gtest/aarch64/aarch64-asmtest.py and add `mulv` to the set of tested instructions. Please make sure you test all modes. > > Could you point me towards asm tests for *Vector - Scalar* insts like `fmlavs`, `fmlsvs` or `fmulxvs` ([assembler_aarch64.hpp#L2857](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/assembler_aarch64.hpp:2857))? Judging by git blame these were added without any new tests. Am I right assuming we are missing asm tests for *Vector - Scalar* instructions in general? It was renamed to `mulvs` by https://github.com/openjdk/jdk/pull/18487/commits/419f39473b53099b7bd42c33380a6ccb3917ab16 >> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5416: >> >>> 5414: : load_arrangement == Assembler::T8H ? 36 // 9 insts >>> 5415: : load_arrangement == Assembler::T8B ? 40 // 10 insts >>> 5416: : -1; // invalid >> >> This is extremely fragile in the presence of code change. Can we not simply delete it? > > There's a `guarantee()` at the end of the loop to verify the size so the code change shouldn't be left unnoticed. Removed by https://github.com/openjdk/jdk/pull/18487/commits/9f4ae8554106d3e1290ee127b848d7f48952e5fb . >> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5465: >> >>> 5463: __ addv(vmul0, load_arrangement, vmul0, vdata0); >>> 5464: } else if (load_arrangement == Assembler::T8B || load_arrangement == Assembler::T4H || >>> 5465: load_arrangement == Assembler::T8H) { >> >> Use a switch here, and everywhere else that a switch applies. > > The only other piece of code similar to this one is the one at line [5591](https://github.com/openjdk/jdk/pull/18487/files#diff-9112056f732229b18fec48fb0b20a3fe824de49d0abd41fbdb4202cfe70ad114R5591). Any other which I'm missing? Fixed by https://github.com/openjdk/jdk/pull/18487/commits/a824a74263ce9309fca50a90b3d0f09e9b56a5c4 , please check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1767000130 PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1766997201 PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1766998899 From duke at openjdk.org Thu Sep 19 15:05:04 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Thu, 19 Sep 2024 15:05:04 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v10] In-Reply-To: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: > Hello, > > Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. > > The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. > > At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. > > # Performance > > ## Neoverse N1 > > > -------------------------------------------------------------------------------------------- > Version Baseline This patch > -------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > -------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op > ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op > ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op > ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op > ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op > ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op > ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op > ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op > ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op > ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op > ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op > ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op > ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op > ArraysHashCode.multibytes 10 avgt 15 5.4... Mikhail Ablakatov has updated the pull request incrementally with four additional commits since the last revision: - cleanup: use switch-case instead of if-else statements and ternary operators - Don't try align basic blocks as it brings no measurable performance benefits - fixup: rename the newly added Vector-Scalar mulv to mulvs - fixup: fix Windows build by not using RELATIVE as an identifier ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18487/files - new: https://git.openjdk.org/jdk/pull/18487/files/f5918cca..a824a742 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=08-09 Stats: 108 lines in 3 files changed: 26 ins; 48 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/18487.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18487/head:pull/18487 PR: https://git.openjdk.org/jdk/pull/18487 From duke at openjdk.org Thu Sep 19 15:05:05 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Thu, 19 Sep 2024 15:05:05 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v9] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <0HQYj18WDlekyIQSJsH9aRxy93drv-UCq0M9015oZyE=.89d705d1-6c95-4a3b-bb70-1fa31dce8171@github.com> Message-ID: On Wed, 18 Sep 2024 14:27:50 GMT, Andrew Haley wrote: >>> Does that make a significant measurable difference? >> >> I'll revert on this with performance numbers later. >> >>> Why not simply 32-align the region? Then we can get rid of this large_loop_size calculation. >> >> We aim to align the code only if it reduces the number of aligned 32-byte instruction memory regions the loop compromises. > > If doing this really does help performance for this function, then we can find a good way to do it which doesn't require any lengths to be hard coded, and we can fully document the solution so others can use it. Doing this gives no performance benefits. Removed by https://github.com/openjdk/jdk/pull/18487/commits/9f4ae8554106d3e1290ee127b848d7f48952e5fb . ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1766996726 From erikj at openjdk.org Thu Sep 19 16:05:41 2024 From: erikj at openjdk.org (Erik Joelsson) Date: Thu, 19 Sep 2024 16:05:41 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 13:47:50 GMT, Hamlin Li wrote: >> make/modules/jdk.incubator.vector/Lib.gmk line 48: >> >>> 46: DISABLED_WARNINGS_gcc := unused-function sign-compare tautological-compare ignored-qualifiers, \ >>> 47: DISABLED_WARNINGS_clang := unused-function sign-compare tautological-compare ignored-qualifiers, \ >>> 48: CFLAGS := $(CFLAGS_JDKLIB) -O3 -march=rv64gcv, \ >> >> I think we prefer using the `C_O_FLAG_*` variables instead of explicitly specifying `-O3`. > > Thanks, do you mean something like below? I'll fix it. > > CFLAGS := $(CFLAGS_JDKLIB) $(C_O_FLAG_HI) -march=rv64gcv, \ Sorry, I had to remind myself of how this works. We actually set this as a separate parameter on the Setup macro: `OPTIMIZATION := HIGH` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1767100590 From iklam at openjdk.org Thu Sep 19 16:28:40 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 19 Sep 2024 16:28:40 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v3] In-Reply-To: References: <0QRMVGKYDVfu4Ie1N6RKC5B1QPxs7sQUvdnyZxayX9o=.e4fe8dc7-4cc8-44ba-99e7-dc5cacd85147@github.com> Message-ID: <1VQUnjdiscLRkDSW_pKI9D3HRHuRVsvuDGscxfXjCgs=.bd09973b-bba9-4ca1-9aa8-c015f5e4c9cf@github.com> On Thu, 19 Sep 2024 14:17:16 GMT, Ashutosh Mehra wrote: >>> I was actually referring to the missing aot-initialized check for the Fruit class. As it stands, this method initializes the classes required by the archive mirrors as the _runtime_default_subgraph_info has all the archived mirrors. >> >> `_runtime_default_subgraph_info` is not recording the mirrors. It records all the classes of all the objects that can are reachable from the archived mirrors. >> >> For example, if the following three classes are aot-initialized: >> >> >> class A { static Object foo = new X(); } >> class B { static Object foo = new Y(); } >> class C { static Object foo = new Y(); } >> >> >> `_runtime_default_subgraph_info` records `X` and `Y`. It doesn't record `A`, `B`, or `C`. >> >>> But not all classes that have archived mirror are aot-initialized. >> >> If a class is not AOT initialized, its mirror is filled with zeros (plus a few native pointers) so the mirror doesn't point to any object. > > Expanding on the above example, lets say A is aot- initialized, but B and C are not. > So this function should initialize only X not Y, is that correct? If so, then how does it prevent initialization of Y? It iterates through all the subgraph_object_klasses which includes both X and Y. The `subgraph_object_klasses` are built during assembly phase when each mirror is added to the cache. Note that we don't add the "real" mirror of the classes, but we add the scratch mirror: void HeapShared::archive_java_mirrors() { ... oop m = scratch_java_mirror(orig_k); if (m != nullptr) { Klass* buffered_k = ArchiveBuilder::get_buffered_klass(orig_k); bool success = archive_reachable_objects_from(1, _default_subgraph_info, m); So the scratch mirrors of `B` and `C` will be empty when they are being archived. Because `A` is marked as aot-initialized, we copy the fields of the "real" mirror of `A` into its scratch mirror. See `HeapShared::copy_aot_initialized_mirror()`. That's why we are able to add `X` into the subgraph_object_klasses (when `A`'s scratch mirror is scanned inside `archive_reachable_objects_from()`) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20958#discussion_r1767131875 From mli at openjdk.org Thu Sep 19 16:37:40 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 19 Sep 2024 16:37:40 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 16:02:36 GMT, Erik Joelsson wrote: >> Thanks, do you mean something like below? I'll fix it. >> >> CFLAGS := $(CFLAGS_JDKLIB) $(C_O_FLAG_HI) -march=rv64gcv, \ > > Sorry, I had to remind myself of how this works. We actually set this as a separate parameter on the Setup macro: `OPTIMIZATION := HIGH` Thanks. I'm sorry too, I'm not familiar with the build system. What you expected could be something like below? diff --git a/make/modules/jdk.incubator.vector/Lib.gmk b/make/modules/jdk.incubator.vector/Lib.gmk index 5e52277919a..c6c6103a301 100644 --- a/make/modules/jdk.incubator.vector/Lib.gmk +++ b/make/modules/jdk.incubator.vector/Lib.gmk @@ -41,11 +41,12 @@ endif ifeq ($(call isTargetOs, linux)+$(call isTargetCpu, riscv64)+$(INCLUDE_COMPILER2), true+true+true) $(eval $(call SetupJdkLibrary, BUILD_LIBSLEEF, \ NAME := sleef, \ + OPTIMIZATION := HIGH, \ SRC := libsleef/lib, \ EXTRA_SRC := libsleef/generated, \ DISABLED_WARNINGS_gcc := unused-function sign-compare tautological-compare ignored-qualifiers, \ DISABLED_WARNINGS_clang := unused-function sign-compare tautological-compare ignored-qualifiers, \ - CFLAGS := $(CFLAGS_JDKLIB) -O3 -march=rv64gcv, \ + CFLAGS := $(CFLAGS_JDKLIB) -march=rv64gcv, \ LDFLAGS := $(LDFLAGS_JDKLIB) \ $(call SET_SHARED_LIBRARY_ORIGIN), \ LIBS := $(JDKLIB_LIBS) \ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1767158779 From kvn at openjdk.org Thu Sep 19 16:54:37 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 19 Sep 2024 16:54:37 GMT Subject: RFR: 8335334: Stress mode to randomly execute unstable if traps [v4] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 12:48:53 GMT, Tobias Hartmann wrote: >> Unstable if traps are supposed to be taken rarely. This patch introduces a `StressUnstableIfTraps` flag that forces unstable if traps to be taken randomly and thus potentially triggering intermittent bugs such as incorrect debug information. It works by adding another if before the unstable if that checks a "random" condition at runtime (a simple shared counter) and then either takes the trap or executes the original, unstable if. >> >> This stress option also has the nice side effect of triggering re-compilation of methods that would otherwise not be re-compiled (see [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843)). >> >> I had to adjust a few tests that rely on methods being compiled / deoptimized because with the stress option enabled, deoptimization might unexpectedly (not) happen. >> >> It reliably triggers [JDK-8335977](https://bugs.openjdk.org/browse/JDK-8335977), [JDK-8320308](https://bugs.openjdk.org/browse/JDK-8320308) and [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843) in our testing. >> >> Tested with multiple runs up to tier6. I'll integrate it into our (Oracle internal) testing and CTW [(JDK-8340302)](https://bugs.openjdk.org/browse/JDK-8340302) once all the bugs that it currently triggers are fixed. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > More reviewer comments Looks good now. > What do you think? Thank you for looking on this. I am fine with current code. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21037#pullrequestreview-2316156139 PR Comment: https://git.openjdk.org/jdk/pull/21037#issuecomment-2361610976 From rcastanedalo at openjdk.org Thu Sep 19 17:23:50 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 19 Sep 2024 17:23:50 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: <6UJOrZqmfsJj6pRzMjPdlYt191QgBV6fIv1qJAYsv60=.15284272-464f-4321-b76c-3412dafc6c63@github.com> Message-ID: On Wed, 18 Sep 2024 12:08:46 GMT, Roman Kennke wrote: >> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2576: >> >>> 2574: } else { >>> 2575: lea(dst, Address(obj, index, Address::lsl(scale))); >>> 2576: ldr(dst, Address(dst, offset)); >> >> Do you have a reproducer (or, better yet, a test case) that exercises this case? I ran Oracle's internal CI tiers 1-5 and could never hit it. Could this happen for x64 as well? > > AFAIK, this happens only when using compressed oops with a heap-base in r27. When running with that setting, we would get addresses like r27[nklass] or r27[nklass]+offset, both with scale=8. You would need large heaps, perhaps >4GB, to get this coops setting. The problem with aarch64 is that we can't have an address like r27[nklass]+offset, that's why we need to lea the r27[nklass] part first. > Yes, this also happens on x86, but x86 supports rX[nklass]+offset addressing. Thanks @rkennke, I tried running test tiers 1-3 using different compressed OOPs configurations but could not reach this code, unfortunately. Could you provide a reproducer? The reason I am particularly interested is because I'd like to find whether there could be any problematic interaction with C2's implicit null check optimization. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1767315114 From coleenp at openjdk.org Thu Sep 19 18:28:05 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 19 Sep 2024 18:28:05 GMT Subject: RFR: 8338471: Assert deleted methods not returned by CallInfo Message-ID: <_oZ5yRT9RbgXUxoLYHXTD9dbWLhYP3SFWiibNwaxB44=.4cd5f168-9f5f-4667-b599-999912f937c8@github.com> CompiledIC can get old methods, but it can't get deleted methods for itable and vtable calls, so add a guarantee and asserts that this cannot happen before dereferencing. Removed unused function and unnecessary methodHandle operator. Tested with tier1-4 and 8 which does a lot of redefinition. ------------- Commit messages: - 8338471: Assert deleted methods not returned by CallInfo Changes: https://git.openjdk.org/jdk/pull/21075/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21075&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338471 Stats: 11 lines in 4 files changed: 4 ins; 1 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/21075.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21075/head:pull/21075 PR: https://git.openjdk.org/jdk/pull/21075 From pchilanomate at openjdk.org Thu Sep 19 19:12:38 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 19 Sep 2024 19:12:38 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v3] In-Reply-To: References: <8t6rNdDbJerisofk7hjzrB3Qt_KTV9MXxTZg4jpukao=.aca6e161-0401-47c9-85ae-37389c77f1c7@github.com> Message-ID: On Wed, 18 Sep 2024 21:01:18 GMT, Axel Boldt-Christmas wrote: >> src/hotspot/share/runtime/objectMonitor.cpp line 385: >> >>> 383: } >>> 384: >>> 385: bool ObjectMonitor::try_enter(JavaThread* current, bool check_owner) { >> >> The `check_owner` name is a little confusing to me. To me it looks more like `check_for_recursion` or `handle_recursion`. > > I think the name should describe what setting the value actually does, but if it is just a hack to do what the comment bellow says, then it sounds like a friend declaration for `SharedRuntime::monitor_exit_helper()` is what is wanted. (Or make TryLock() public.) >> Set check_owner to false (it's default value is true) if you want >> to use ObjectMonitor::try_enter() as a public way of doing TryLock(). >> Used this way in SharedRuntime::monitor_exit_helper(). Maybe `check_already_owned`? FTR I prefer this version than a friend declaration. We would also have to rename it as try_lock to be consistent. And having to expose and check for TryLockResult is also uglier in my opinion than checking a boolean. I don't see it as a hack, we are just skipping the already owned case since we know in this case it will fail. I would actually remove this comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1767473271 From coleenp at openjdk.org Thu Sep 19 19:15:38 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 19 Sep 2024 19:15:38 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v9] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 02:41:18 GMT, Ioi Lam wrote: >> This is the 4th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> **Problem:** >> >> This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store [`sun.invoke.util.Wrapper`](https://github.com/openjdk/jdk/blob/c3711dc90980fb3e63ff199612c201c4464626bf/src/java.base/share/classes/sun/invoke/util/Wrapper.java) enums in the AOT cache. Although CDS has some limited support for storing enums, the `Wrapper` type is too complex for the existing solution to handle. Please see the JBS issue for details. >> >> **Solution:** >> >> In the assembly phase, we store the initialized states of the `Wrapper` class (captured in a `java.lang.Class` object, a.k.a. the *mirror* of this class) into the AOT cache. >> >> In production run, we no longer execute `Wrapper::`, because all the static fields (contained in its mirror) are already initialized. >> >> **Review Notes:** >> >> - The new capability is controlled by `CDSConfig::is_initing_classes_at_dump_time()`. We can aot-initialize classes only if `-XX:+AOTClassLinking` is enabled. >> - The old (more limited) support for enums is still there (it's required when `AOTClassLinking` is disabled). See the call to `CDSEnumKlass::handle_enum_obj()` in heapShared.cpp. >> - `AOTClassInitializer::can_archive_initialized_mirror()` decides what classes can be aot-initialized. This is currently a very small set of classes, but will expand in [JDK-8293336](https://bugs.openjdk.org/browse/JDK-8293336) >> - Before, `HeapShared::archive_java_mirrors()` would clear out all the states in the archived mirrors. With this PR, the states of aot-initialized classes are preserved via `HeapShared::copy_aot_initialized_mirror()`. >> - During the early state of the production run, `AOTLinkedClassBulkLoader::init_required_classes_for_loader()` is called to make sure that: >> - all aot-initialized classes are moved into the `initialized` state (without executing its `` method). This is done in `InstanceKlass::initialize_from_cds()` >> - the classes of all the objects that are reachable from the aot-initialized mirrors are initialized. See comments above ` HeapShared::init_classes_reachable_from_archived_mirrors()` >> >> **Caveats:** >> >> Not all Enum classes can be stored in the initialized state. E.g., some Enums might have static fields that depend on the e... > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > Fixed typo I only have one drive-by comment. src/hotspot/share/cds/aotClassInitializer.cpp line 41: > 39: } else if (ik->name()->equals("jdk/internal/constant/PrimitiveClassDescImpl") || > 40: ik->name()->equals("jdk/internal/constant/ReferenceClassDescImpl") || > 41: ik->name()->equals("java/lang/constant/ConstantDescs")) { Why not intern these strings as Symbols so you can test for == ? ------------- PR Review: https://git.openjdk.org/jdk/pull/20958#pullrequestreview-2316432031 PR Review Comment: https://git.openjdk.org/jdk/pull/20958#discussion_r1767402253 From pchilanomate at openjdk.org Thu Sep 19 19:20:42 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 19 Sep 2024 19:20:42 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v3] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 18:29:23 GMT, Fredrik Bredberg wrote: >> Removed the concept of an ObjectMonitor Responsible thread. >> >> The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. >> >> The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). >> >> After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. >> >> Passes tier1-tier7 on supported platforms. >> x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. >> Arm32 and Zero doesn't need any changes as far as I can tell. > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Update two, after the review Thanks for the fixes, looks good to me. ------------- Marked as reviewed by pchilanomate (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19454#pullrequestreview-2316562366 From coleenp at openjdk.org Thu Sep 19 19:28:35 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 19 Sep 2024 19:28:35 GMT Subject: RFR: 8293337: Store method handle intrinsics in AOT cache [v4] In-Reply-To: References: <4fMDUSZRg0HcIiZmr-yqr7vleFXrD_zNXpdd_pfgHQ8=.4a679e86-b0f9-4101-bcc2-f49d8bcb417b@github.com> Message-ID: On Wed, 18 Sep 2024 03:02:19 GMT, Ioi Lam wrote: >> This is the 5th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store Java heap objects that have native pointers to the C++ `Method` objects returned by `SystemDictionary::find_method_handle_intrinsic()` >> >> These `Method` objects are created within the JVM. They do not belong to any actual Java classes. We store all these `Method` objects into the AOT cache, so that they can be referenced by other artifacts in the AOT cache. >> >> --- >> See [here](https://bugs.openjdk.org/browse/JDK-8315737) for the sequence of dependent RFEs for implementing JEP 483. > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics > - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics > - @vnkozlov comment - added NOT_CDS_RETURN > - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics > - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics > - some clean up > - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics > - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics > - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics > - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics > - ... and 1 more: https://git.openjdk.org/jdk/compare/9320bca9...988f101c I have one question, but this is good. I've always wanted these to be shared. src/hotspot/share/classfile/systemDictionary.cpp line 2095: > 2093: } > 2094: } > 2095: #endif Can you add // INCLUDE_CDS This is called at startup time before anything so it doesn't need the locking? ------------- PR Review: https://git.openjdk.org/jdk/pull/20959#pullrequestreview-2316570560 PR Review Comment: https://git.openjdk.org/jdk/pull/20959#discussion_r1767488604 From coleenp at openjdk.org Thu Sep 19 19:52:41 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 19 Sep 2024 19:52:41 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v3] In-Reply-To: References: Message-ID: <4AsV5DtoOGyc_sXyrvuiijJeiXFcol90QEvg7wDbGDM=.ceae4774-09ae-4eb8-b703-614385b57c1a@github.com> On Wed, 18 Sep 2024 18:29:23 GMT, Fredrik Bredberg wrote: >> Removed the concept of an ObjectMonitor Responsible thread. >> >> The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. >> >> The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). >> >> After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. >> >> Passes tier1-tier7 on supported platforms. >> x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. >> Arm32 and Zero doesn't need any changes as far as I can tell. > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Update two, after the review This looks great. Very nice work! src/hotspot/share/runtime/objectMonitor.cpp line 582: > 580: return TryLockResult::Interference; > 581: } > 582: if (TryLockWithContentionMark(current, contention_mark)) { I like this rename. It's consistent with the camel case that we want to change all at once someday. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19454#pullrequestreview-2316604907 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1767509800 From coleenp at openjdk.org Thu Sep 19 19:52:42 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 19 Sep 2024 19:52:42 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v3] In-Reply-To: References: <8t6rNdDbJerisofk7hjzrB3Qt_KTV9MXxTZg4jpukao=.aca6e161-0401-47c9-85ae-37389c77f1c7@github.com> Message-ID: On Thu, 19 Sep 2024 19:09:38 GMT, Patricio Chilano Mateo wrote: >> I think the name should describe what setting the value actually does, but if it is just a hack to do what the comment bellow says, then it sounds like a friend declaration for `SharedRuntime::monitor_exit_helper()` is what is wanted. (Or make TryLock() public.) >>> Set check_owner to false (it's default value is true) if you want >>> to use ObjectMonitor::try_enter() as a public way of doing TryLock(). >>> Used this way in SharedRuntime::monitor_exit_helper(). > > Maybe `check_already_owned`? FTR I prefer this version than a friend declaration. Using it outside would also imply renaming it as try_lock to be consistent. And having to expose and check for TryLockResult is also uglier in my opinion than checking a boolean. I don't see it as a hack, we are just skipping the already owned case since we know in this case it will fail. I would actually remove that comment altogether. I like "check_for_recusion" as a parameter name and also agree that dropping the comment, or rewriting the comment as: // If called from SharedRuntime::monitor_exit_helper, we know that this thread doesn't already own the lock I agree that I don't want TryLock and its results exposed outside this file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1767509133 From dholmes at openjdk.org Thu Sep 19 21:08:37 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 19 Sep 2024 21:08:37 GMT Subject: RFR: 8340392: Handle OopStorage in location decoder [v6] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 08:29:20 GMT, Aleksey Shipilev wrote: >> Another debugging QoL improvement. Currently, when there is a pointer into `OopStorage` that we need to decode for the error log, we just print: >> >> 0x00007ad45c169e10 into live malloced block starting at 0x00007ad45c169dd0, size 632, tag mtInternal >> >> >> This is reported by NMT after [JDK-8304815](https://bugs.openjdk.org/browse/JDK-8304815). It is likely worse without NMT. We can actually decode which block in which `OopStorage` the address likely belongs to. This becomes handy when debugging GC crashes that involve `OopStorage`-handled roots. >> >> This patch is able to print the following instead: >> >> >> 0x0000000102c05bd0 is a pointer 2/64 into block 0 in oop storage "VM Global" > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Also assert "unaligned" is not printed for aligned pointers Seems reasonable - and Kim has checked the details. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21072#pullrequestreview-2316747811 From kbarrett at openjdk.org Thu Sep 19 21:09:41 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 19 Sep 2024 21:09:41 GMT Subject: RFR: 8340353: Remove CompressedOops::ptrs_base In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 13:36:52 GMT, Stefan Karlsson wrote: >> Please review this change that >> >> (1) Removes CompressedOops::ptrs_base(), changing all callers to instead call >> CompressedOops::base(). >> >> (2) Renames CompressedOops::ptrs_base_addr() to CompressedOops::base_addr(), >> updating all callers. >> >> Testing: >> mach5 tier1 >> GHA to test building on non-Oracle supported platforms > > Looks good to me. Thanks for reviews @stefank , @coleenp , @shipilev , and @Hamlin-Li ------------- PR Comment: https://git.openjdk.org/jdk/pull/21060#issuecomment-2362195715 From kbarrett at openjdk.org Thu Sep 19 21:09:43 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 19 Sep 2024 21:09:43 GMT Subject: Integrated: 8340353: Remove CompressedOops::ptrs_base In-Reply-To: References: Message-ID: <3369TFORD1afBlYqePavOKfUIe6mWqbSlNEjoHiuat8=.8e3713f6-fbb0-4c8b-aabc-93ade87ce87f@github.com> On Wed, 18 Sep 2024 13:00:19 GMT, Kim Barrett wrote: > Please review this change that > > (1) Removes CompressedOops::ptrs_base(), changing all callers to instead call > CompressedOops::base(). > > (2) Renames CompressedOops::ptrs_base_addr() to CompressedOops::base_addr(), > updating all callers. > > Testing: > mach5 tier1 > GHA to test building on non-Oracle supported platforms This pull request has now been integrated. Changeset: 296b4963 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/296b49634eed83bca6cfdee514b9c7c4f8252d59 Stats: 14 lines in 5 files changed: 1 ins; 3 del; 10 mod 8340353: Remove CompressedOops::ptrs_base Reviewed-by: stefank, coleenp, shade, mli ------------- PR: https://git.openjdk.org/jdk/pull/21060 From duke at openjdk.org Thu Sep 19 21:15:11 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 19 Sep 2024 21:15:11 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v12] In-Reply-To: References: Message-ID: > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: fix is_intrinsic_supported to work properly ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20657/files - new: https://git.openjdk.org/jdk/pull/20657/files/aa163896..5da2754a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=10-11 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20657/head:pull/20657 PR: https://git.openjdk.org/jdk/pull/20657 From asmehra at openjdk.org Thu Sep 19 21:40:37 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Thu, 19 Sep 2024 21:40:37 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v3] In-Reply-To: <1VQUnjdiscLRkDSW_pKI9D3HRHuRVsvuDGscxfXjCgs=.bd09973b-bba9-4ca1-9aa8-c015f5e4c9cf@github.com> References: <0QRMVGKYDVfu4Ie1N6RKC5B1QPxs7sQUvdnyZxayX9o=.e4fe8dc7-4cc8-44ba-99e7-dc5cacd85147@github.com> <1VQUnjdiscLRkDSW_pKI9D3HRHuRVsvuDGscxfXjCgs=.bd09973b-bba9-4ca1-9aa8-c015f5e4c9cf@github.com> Message-ID: On Thu, 19 Sep 2024 16:26:07 GMT, Ioi Lam wrote: >> Expanding on the above example, lets say A is aot- initialized, but B and C are not. >> So this function should initialize only X not Y, is that correct? If so, then how does it prevent initialization of Y? It iterates through all the subgraph_object_klasses which includes both X and Y. > > The `subgraph_object_klasses` are built during assembly phase when each mirror is added to the cache. Note that we don't add the "real" mirror of the classes, but we add the scratch mirror: > > > void HeapShared::archive_java_mirrors() { > ... > oop m = scratch_java_mirror(orig_k); > if (m != nullptr) { > Klass* buffered_k = ArchiveBuilder::get_buffered_klass(orig_k); > bool success = archive_reachable_objects_from(1, _default_subgraph_info, m); > > > So the scratch mirrors of `B` and `C` will be empty when they are being archived. > > Because `A` is marked as aot-initialized, we copy the fields of the "real" mirror of `A` into its scratch mirror. See `HeapShared::copy_aot_initialized_mirror()`. That's why we are able to add `X` into the subgraph_object_klasses (when `A`'s scratch mirror is scanned inside `archive_reachable_objects_from()`) Oh I see! I was missing the point that scratch java mirrors are empty. So their clinit dependencies don't get recorded in subgraph_object_klasses. I should have seen that before. Sorry for dragging it. The point that is bothering me is this code relies on the assumption that the default subgraph mainly consists of mirror objects, some of which may be aot-initialized, while others are emply. If in future we added other objects (not the mirror objects) to the graph, then this assumption would fail. But we don't have any check to protect against this. Can we create a separate subgraph for aot-initialized mirror objects to record their clinit dependencies and process them here? That would make the code explicit in what it intends to do. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20958#discussion_r1767631526 From sviswanathan at openjdk.org Thu Sep 19 21:43:02 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 19 Sep 2024 21:43:02 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v3] In-Reply-To: <2pPJKmEHM24iStw8Xv2IQ08Xrp7Ag3P2_9yEzsS4nOw=.49f3554d-0917-40ff-8824-d8719c3d271f@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> <2pPJKmEHM24iStw8Xv2IQ08Xrp7Ag3P2_9yEzsS4nOw=.49f3554d-0917-40ff-8824-d8719c3d271f@github.com> Message-ID: On Thu, 19 Sep 2024 07:29:11 GMT, Jatin Bhateja wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> Change method name > > Hi @sviswa7 , some comments, overall patch looks good to me. > > Best Regards, > Jatin Thanks a lot @jatin-bhateja. I have implemented your review comments. > src/hotspot/share/opto/vectorIntrinsics.cpp line 772: > >> 770: >> 771: if (elem_klass == nullptr || shuffle_klass == nullptr || shuffle->is_top() || vlen == nullptr) { >> 772: return false; // dead code > > Why dead code in comment ? this is a failed intrinsification condition. Modified comment. > src/hotspot/share/opto/vectorIntrinsics.cpp line 776: > >> 774: if (!vlen->is_con() || shuffle_klass->const_oop() == nullptr) { >> 775: return false; // not enough info for intrinsification >> 776: } > > Why don't you club it with above conditions to be consistent with other inline expanders ? Done > src/hotspot/share/opto/vectorIntrinsics.cpp line 2120: > >> 2118: >> 2119: if (vector_klass == nullptr || elem_klass == nullptr || vlen == nullptr) { >> 2120: return false; // dead code > > Why dead code in comments ? Modified comment. > src/hotspot/share/opto/vectorIntrinsics.cpp line 2129: > >> 2127: NodeClassNames[argument(2)->Opcode()], >> 2128: NodeClassNames[argument(3)->Opcode()]); >> 2129: return false; // not enough info for intrinsification > > Please club this with above condition to be consistent with other inline expanders. done > src/hotspot/share/opto/vectorIntrinsics.cpp line 2144: > >> 2142: int num_elem = vlen->get_con(); >> 2143: if ((num_elem < 4) || !is_power_of_2(num_elem)) { >> 2144: log_if_needed(" ** vlen < 4 or not power of two=%d", num_elem); > > Will num_elem < 4 not be handled by L2149 since we have an implementation limitation to support less than 32-bit shuffle / masks. Yes that should handle it. > src/hotspot/share/opto/vectorIntrinsics.cpp line 2171: > >> 2169: use_predicate = false; >> 2170: if(!is_masked_op || >> 2171: (!arch_supports_vector(Op_VectorRearrange, num_elem, elem_bt, VecMaskNotUsed) || > > Suggestion: > > (!arch_supports_vector(Op_VectorRearrange, num_elem, elem_bt, VecMaskUseLoad) || Here it should be VecMaskNotUsed as this case it using blend to emulate masking. The VecMaskUseLoad case is checked at line 2168. > src/hotspot/share/opto/vectorIntrinsics.cpp line 2188: > >> 2186: >> 2187: if (v1 == nullptr || v2 == nullptr) { >> 2188: return false; // operand unboxing failed > > To be consistent with other expanders please emit proper error for unboxing failure like on following line. > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectorIntrinsics.cpp#L426 done > src/hotspot/share/opto/vectorIntrinsics.cpp line 2197: > >> 2195: mask = unbox_vector(argument(6), mbox_type, elem_bt, num_elem); >> 2196: if (mask == nullptr) { >> 2197: log_if_needed(" ** not supported: op=selectFrom vlen=%d etype=%s is_masked_op=1", > > Error should an unboxing failure here. done ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2362249672 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1767601917 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1767602096 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1767605028 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1767605213 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1767607670 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1767610833 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1767615559 PR Review Comment: https://git.openjdk.org/jdk/pull/20634#discussion_r1767617255 From sviswanathan at openjdk.org Thu Sep 19 21:43:01 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 19 Sep 2024 21:43:01 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v4] In-Reply-To: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: > Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. > > Summary of changes is as follows: > 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. > 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code > > For the following source: > > > public void test() { > var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); > for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { > var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); > index.selectFrom(inpvect).intoArray(byteres, j); > } > } > > > The code generated for inner main now looks as follows: > ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 > 0x00007f40d02274d0: movslq %ebx,%r13 > 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 > 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) > 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 > 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) > 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 > 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) > 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 > 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) > 0x00007f40d022751f: add $0x40,%ebx > 0x00007f40d0227522: cmp %r8d,%ebx > 0x00007f40d0227525: jl 0x00007f40d02274d0 > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Implement review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20634/files - new: https://git.openjdk.org/jdk/pull/20634/files/87e103ee..f8e67fb3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20634&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20634&range=02-03 Stats: 27 lines in 1 file changed: 9 ins; 8 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/20634.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20634/head:pull/20634 PR: https://git.openjdk.org/jdk/pull/20634 From sviswanathan at openjdk.org Thu Sep 19 21:45:36 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 19 Sep 2024 21:45:36 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v2] In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: <_Q0HCE6Lc7LZY8Sc5XzQvLHg_WdeCDOAGZgMOeEWK4M=.d28c8b11-ee52-4551-92b8-357c04a4d5ef@github.com> On Wed, 18 Sep 2024 12:23:48 GMT, Emanuel Peter wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > I'm a bit confused by the name `shuffleWrapIndexes` and `inline_vector_shuffle_wrap_indexes`. > > Are you **shuffling wrap-indexes**? I don't know what that would even mean. I think you should name it `wrapShuffleIndexes`. Or is there any naming convention in the VectorAPI that prevents this? Thanks a lot @eme64 for the review. I have implemented your review comment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20634#issuecomment-2362253398 From kbarrett at openjdk.org Thu Sep 19 22:46:43 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 19 Sep 2024 22:46:43 GMT Subject: RFR: 8340436: Remove unused CompressedOops::AnyNarrowOopMode Message-ID: Please review this trivial change to remove an unused enumerator. Testing: mach5 tier1 ------------- Commit messages: - remove unused AnyNarrowOopMode Changes: https://git.openjdk.org/jdk/pull/21098/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21098&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340436 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21098.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21098/head:pull/21098 PR: https://git.openjdk.org/jdk/pull/21098 From psandoz at openjdk.org Thu Sep 19 23:52:35 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Thu, 19 Sep 2024 23:52:35 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 16:13:55 GMT, Quan Anh Mai wrote: > Hi, > > This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. > > Regarding the related issues: > > - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. > - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` > - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. > > Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. > > Please take a look and leave reviews. Thanks a lot. > > The description of the original PR: > > This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: > > Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. > Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. > Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. > Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. > Upon these changes, a `rearrange` can emit more efficient code: > > var species = IntVector.SPECIES_128; > var v1 = IntVector.fromArray(species, SRC1, 0); > var v2 = IntVector.fromArray(species, SRC2, 0); > v1.rearrange(v2.toShuffle()).intoArray(DST, 0); > > Before: > movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} > vmovdqu 0x10(%r10),%xmm2 > movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} > vmovdqu 0x10(%r10),%xmm0 > vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask > ; {ex... > Got it, I think #20508 and this PR are unrelated implementation-wise, though. It would be nice if we can move independently of #20508 as that may take longer to integrate because of API/CSR review. > > @jatin-bhateja What do you think of using this patch and intrinsifing `Vector::rearrange(VectorShuffle, Vector)` instead of introducing the 2 vector `selectFrom` API? IMO the two-vector `selectFrom` API is complementary to the existing single-vector `selectFrom`, and both have equivalent `rearrange` expressions. For either use we should ideally get to the point that a similar/identical optimal instruction sequence is generated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2362388082 From sviswanathan at openjdk.org Fri Sep 20 00:12:48 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 20 Sep 2024 00:12:48 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v12] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 21:15:11 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm >> >> Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup >> -- | -- | -- | -- >> MathBench.tanhDouble | 70900 | 95618 | 1.35x > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > fix is_intrinsic_supported to work properly The PR looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20657#pullrequestreview-2316948493 From dholmes at openjdk.org Fri Sep 20 01:23:35 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 20 Sep 2024 01:23:35 GMT Subject: RFR: 8338471: Assert deleted methods not returned by CallInfo In-Reply-To: <_oZ5yRT9RbgXUxoLYHXTD9dbWLhYP3SFWiibNwaxB44=.4cd5f168-9f5f-4667-b599-999912f937c8@github.com> References: <_oZ5yRT9RbgXUxoLYHXTD9dbWLhYP3SFWiibNwaxB44=.4cd5f168-9f5f-4667-b599-999912f937c8@github.com> Message-ID: On Wed, 18 Sep 2024 22:39:11 GMT, Coleen Phillimore wrote: > CompiledIC can get old methods, but it can't get deleted methods for itable and vtable calls, so add a guarantee and asserts that this cannot happen before dereferencing. Removed unused function and unnecessary methodHandle operator. > Tested with tier1-4 and 8 which does a lot of redefinition. Looks good. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21075#pullrequestreview-2317021927 From haosun at openjdk.org Fri Sep 20 01:25:34 2024 From: haosun at openjdk.org (Hao Sun) Date: Fri, 20 Sep 2024 01:25:34 GMT Subject: RFR: 8340436: Remove unused CompressedOops::AnyNarrowOopMode In-Reply-To: References: Message-ID: <3J52Ofbx-w6wqRjS49OaljWvtHdCEMY_1pjT48ofDr0=.a1cb913f-6892-407e-adff-3c5a36da1249@github.com> On Thu, 19 Sep 2024 22:41:30 GMT, Kim Barrett wrote: > Please review this trivial change to remove an unused enumerator. > > Testing: mach5 tier1 LGTM ------------- Marked as reviewed by haosun (Committer). PR Review: https://git.openjdk.org/jdk/pull/21098#pullrequestreview-2317023121 From haosun at openjdk.org Fri Sep 20 01:37:41 2024 From: haosun at openjdk.org (Hao Sun) Date: Fri, 20 Sep 2024 01:37:41 GMT Subject: RFR: 8340439: AArch64: Extra entry declaration for assember test In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 11:54:32 GMT, SendaoYan wrote: > Hi all, > The function declaration `extern "C" void entry(CodeBuffer*);` in `src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp` line 74, seems to used for AArch64 assember test. > The AArch64 assember test has been moved to `test/hotspot/gtest` by [JDK-8252684](https://bugs.openjdk.org/browse/JDK-8252684) , so I think this function declaration can be remove. > > Additional testing: > > - [ ] linux aarch64 jtreg(tier1/2/3 etc., include gtest) with release build > - [ ] linux aarch64 jtreg(tier1/2/3 etc., include gtest) with fastdebug build Marked as reviewed by haosun (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21086#pullrequestreview-2317031483 From dholmes at openjdk.org Fri Sep 20 02:46:39 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 20 Sep 2024 02:46:39 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v11] In-Reply-To: References: <0vNiw1Z0gtC71V-K2bi7tyawwHZj2K8rERNB9afFYMM=.96ddf556-3a86-41d1-a508-a6da0b69cd2b@github.com> Message-ID: On Thu, 19 Sep 2024 07:21:17 GMT, Simon Tooke wrote: >> test/hotspot/gtest/runtime/test_os.cpp line 433: >> >>> 431: errno = 0; >>> 432: returnedBuffer = os::realpath(tmppath, buffer, MAX_PATH); >>> 433: EXPECT_TRUE(returnedBuffer == buffer); >> >> Should we also do `EXPECT_TRUE(errno == 0);` ? Here and below. > > This is interesting! I found that on Linux, errno _was not zero_! The specifications for POSIX realpath say > `RETURN VALUE > Upon successful completion, realpath() shall return a pointer to the resolved name. Otherwise, realpath() shall return a null pointer and set errno to indicate the error, and the contents of the buffer pointed to by resolved_name are undefined.` > Nowhere does it say errno is unchanged if successful. > > > errno = 0; > ::printf("before ::realpath("/tmp",nullptr) errno=%d\n", errno); > char* p = ::realpath("/tmp", nullptr); > ::printf("after ::realpath p=%s errno=%d\n", p, errno); > > > outputs: > > before ::realpath("/tmp",nullptr) errno=0 > after ::realpath /tmp p=/tmp errno=22 > > With behaviour like this, one can see why OpenJDK wraps ::realpath()... > > Compiler used: g++ (GCC) 14.2.1 20240801 Right I forgot about this. The Posix spec even states: > No function in this volume of POSIX.1-2017 shall set errno to 0. so reading errno is only valid after calling a function that sets errno on error, and which returns a value that says there was an error. Windows GetLastError is the same. Even setting it to zero before the call does not guarantee the call doesn't modify it even if successful. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1767843602 From dholmes at openjdk.org Fri Sep 20 02:50:33 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 20 Sep 2024 02:50:33 GMT Subject: RFR: 8340436: Remove unused CompressedOops::AnyNarrowOopMode In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 22:41:30 GMT, Kim Barrett wrote: > Please review this trivial change to remove an unused enumerator. > > Testing: mach5 tier1 Good and trivial. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21098#pullrequestreview-2317099088 From lmesnik at openjdk.org Fri Sep 20 02:56:37 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 20 Sep 2024 02:56:37 GMT Subject: RFR: 8340439: AArch64: Extra entry declaration for assember test In-Reply-To: References: Message-ID: <1HSBMnkRwxi_JIF3B6bHyusNuMMYzySAGA_rt4Nhbuw=.b80cc4e1-f93d-410a-9fb3-af71c0f8a2ce@github.com> On Thu, 19 Sep 2024 11:54:32 GMT, SendaoYan wrote: > Hi all, > The function declaration `extern "C" void entry(CodeBuffer*);` in `src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp` line 74, seems to used for AArch64 assember test. > The AArch64 assember test has been moved to `test/hotspot/gtest` by [JDK-8252684](https://bugs.openjdk.org/browse/JDK-8252684) , so I think this function declaration can be remove. > > Additional testing: > > - [x] linux aarch64 jtreg(tier1/2/3 etc., include gtest) with release build > - [x] linux aarch64 jtreg(tier1/2/3 etc., include gtest) with fastdebug build Marked as reviewed by lmesnik (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21086#pullrequestreview-2317121604 From syan at openjdk.org Fri Sep 20 03:01:40 2024 From: syan at openjdk.org (SendaoYan) Date: Fri, 20 Sep 2024 03:01:40 GMT Subject: RFR: 8340439: AArch64: Extra entry declaration for assember test In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 11:54:32 GMT, SendaoYan wrote: > Hi all, > The function declaration `extern "C" void entry(CodeBuffer*);` in `src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp` line 74, seems to used for AArch64 assember test. > The AArch64 assember test has been moved to `test/hotspot/gtest` by [JDK-8252684](https://bugs.openjdk.org/browse/JDK-8252684) , so I think this function declaration can be remove. > > Additional testing: > > - [x] linux aarch64 jtreg(tier1/2/3 etc., include gtest) with release build > - [x] linux aarch64 jtreg(tier1/2/3 etc., include gtest) with fastdebug build Thanks all for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21086#issuecomment-2362662254 From duke at openjdk.org Fri Sep 20 03:01:40 2024 From: duke at openjdk.org (duke) Date: Fri, 20 Sep 2024 03:01:40 GMT Subject: RFR: 8340439: AArch64: Extra entry declaration for assember test In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 11:54:32 GMT, SendaoYan wrote: > Hi all, > The function declaration `extern "C" void entry(CodeBuffer*);` in `src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp` line 74, seems to used for AArch64 assember test. > The AArch64 assember test has been moved to `test/hotspot/gtest` by [JDK-8252684](https://bugs.openjdk.org/browse/JDK-8252684) , so I think this function declaration can be remove. > > Additional testing: > > - [x] linux aarch64 jtreg(tier1/2/3 etc., include gtest) with release build > - [x] linux aarch64 jtreg(tier1/2/3 etc., include gtest) with fastdebug build @sendaoYan Your change (at version a34153780fe5e5e8e2bbb1b951795465f2f13cf0) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21086#issuecomment-2362663839 From fyang at openjdk.org Fri Sep 20 03:15:39 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 20 Sep 2024 03:15:39 GMT Subject: RFR: 8339771: RISC-V: Reduce icache flushes [v4] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 12:41:54 GMT, Robbin Ehn wrote: >> Hey, please consider, >> >> All code which is offline (behind a barrier) do not need global icache flushes. >> As we can instead in slow path locally (thread and hart) emit fence.i. >> But if we were to be context switch do a hart which have not had fence.i emitted we can still fetch stale instructions. >> To handle this case new now have kernel support: >> https://docs.kernel.org/arch/riscv/cmodx.html >> >> It's not perfect as we will be emitting fence.i on any context switch for any thread with this patch, even if that thread do not execute on code heap (non attached native thread), and even if there was no changes to code heap. >> But this is in many cases much faster as the icache flush global IPI is very intrusive. >> Particular cases are running a concurrent gc with small head room. >> In such scenario I measured 15% increased throughput on VF2. >> A large CPU or less head room (faster GC cycles) will yield even more performance boost. >> >> Note that this requires 6.10 kernel. >> >> I'm running VF2 with 6.11-rc3 kernel and this passed tier1-3. (With setting on) >> >> Later we probably want this default on, but as it's hard to test I'll leave default off. > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Comment, remove not needed fence Updated change looks reasonable to me. Thanks for the answers and details. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20913#pullrequestreview-2317147533 From kbarrett at openjdk.org Fri Sep 20 04:19:42 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 20 Sep 2024 04:19:42 GMT Subject: RFR: 8340436: Remove unused CompressedOops::AnyNarrowOopMode In-Reply-To: <3J52Ofbx-w6wqRjS49OaljWvtHdCEMY_1pjT48ofDr0=.a1cb913f-6892-407e-adff-3c5a36da1249@github.com> References: <3J52Ofbx-w6wqRjS49OaljWvtHdCEMY_1pjT48ofDr0=.a1cb913f-6892-407e-adff-3c5a36da1249@github.com> Message-ID: <1MqUt_jFlqjmoTgeR1MrMEvpXOa3oi3dzhQ78Iy2Sxc=.7b4a9c61-a677-4463-b679-b75f203d1da7@github.com> On Fri, 20 Sep 2024 01:22:50 GMT, Hao Sun wrote: >> Please review this trivial change to remove an unused enumerator. >> >> Testing: mach5 tier1 > > LGTM Thanks for reviews @shqking and @dholmes-ora ------------- PR Comment: https://git.openjdk.org/jdk/pull/21098#issuecomment-2362726495 From kbarrett at openjdk.org Fri Sep 20 04:19:42 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 20 Sep 2024 04:19:42 GMT Subject: Integrated: 8340436: Remove unused CompressedOops::AnyNarrowOopMode In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 22:41:30 GMT, Kim Barrett wrote: > Please review this trivial change to remove an unused enumerator. > > Testing: mach5 tier1 This pull request has now been integrated. Changeset: 0f7d9e59 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/0f7d9e599593bb8e31e7e33a559d25ec803c7ba4 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod 8340436: Remove unused CompressedOops::AnyNarrowOopMode Reviewed-by: haosun, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/21098 From iklam at openjdk.org Fri Sep 20 04:29:17 2024 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 20 Sep 2024 04:29:17 GMT Subject: RFR: 8338018: Rename ClassPrelinker to AOTConstantPoolResolver [v3] In-Reply-To: References: Message-ID: > This is the 2nd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > A simple renaming of the `ClassPrelinker` class to `AOTConstantPoolLinker`, so that the name is consistent with new classes that will be introduced in subsequent PRs for JEP 483 (`AOTClassLinker`, `AOTLinkedClassTable`, and `AOTLinkedClassBulkLoader`). > > ----- > See [here](https://bugs.openjdk.org/browse/JDK-8315737) for the sequence of dependent RFEs for implementing JEP 483. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'jep-483-step-01-8338017-add-aot-command-line-aliases' into jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver - Merge branch 'jep-483-step-01-8338017-add-aot-command-line-aliases' into jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver - Merge branch 'jep-483-step-01-8338017-add-aot-command-line-aliases' into jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver - Merge branch 'jep-483-step-01-8338017-add-aot-command-line-aliases' into jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver - Merge branch 'jep-483-step-01-8338017-add-aot-command-line-aliases' into jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver - 8338018: Rename ClassPrelinker to AOTConstantPoolResolver ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20517/files - new: https://git.openjdk.org/jdk/pull/20517/files/fed4dfed..49dbfa6a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20517&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20517&range=01-02 Stats: 174298 lines in 1393 files changed: 158710 ins; 8126 del; 7462 mod Patch: https://git.openjdk.org/jdk/pull/20517.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20517/head:pull/20517 PR: https://git.openjdk.org/jdk/pull/20517 From amitkumar at openjdk.org Fri Sep 20 05:05:06 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 20 Sep 2024 05:05:06 GMT Subject: RFR: 8338658: New Object to ObjectMonitor mapping: s390x implementation [v4] In-Reply-To: References: Message-ID: > s390x implementation of [JDK-8315884](https://bugs.openjdk.org/browse/JDK-8315884) New Object to ObjectMonitor mapping; > > Testing: > - tier1-test (fastdebug) > - tier1-test with UseObjectMonitorTable (fastdebug) > - tier1-test with UseObjectMonitorTable (release) Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merge branch 'master' into om_v0 - only use z_mvghi instruction - review comments - s390-port ------------- Changes: https://git.openjdk.org/jdk/pull/20740/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20740&range=03 Stats: 167 lines in 7 files changed: 92 ins; 23 del; 52 mod Patch: https://git.openjdk.org/jdk/pull/20740.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20740/head:pull/20740 PR: https://git.openjdk.org/jdk/pull/20740 From amitkumar at openjdk.org Fri Sep 20 05:05:07 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 20 Sep 2024 05:05:07 GMT Subject: RFR: 8338658: New Object to ObjectMonitor mapping: s390x implementation [v3] In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 09:28:46 GMT, Amit Kumar wrote: >> s390x implementation of [JDK-8315884](https://bugs.openjdk.org/browse/JDK-8315884) New Object to ObjectMonitor mapping; >> >> Testing: >> - tier1-test (fastdebug) >> - tier1-test with UseObjectMonitorTable (fastdebug) >> - tier1-test with UseObjectMonitorTable (release) > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > only use z_mvghi instruction @TheRealMDoerr @RealLucy can I get further reviews on this :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20740#issuecomment-2362828690 From dholmes at openjdk.org Fri Sep 20 05:42:43 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 20 Sep 2024 05:42:43 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v3] In-Reply-To: References: Message-ID: <2b1-1pjo1Xk-0CDMYCHAWjrjdlhD8hA5H8c9Qh-Ireg=.9df3a0f8-c12c-4515-b60f-407924c5a287@github.com> On Wed, 18 Sep 2024 18:29:23 GMT, Fredrik Bredberg wrote: >> Removed the concept of an ObjectMonitor Responsible thread. >> >> The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. >> >> The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). >> >> After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. >> >> Passes tier1-tier7 on supported platforms. >> x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. >> Arm32 and Zero doesn't need any changes as far as I can tell. > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Update two, after the review I've taken another pass through and have a few queries. Thanks src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 535: > 533: > 534: // Set owner to null. > 535: // Release to satisfy the JMM Can I suggest you use this comment form in each of the cpu-specific files. At the moment code that does the same thing is commented slightly differently with regard to the need for a release-store. Thanks. src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2734: > 2732: // We need a full fence after clearing owner to avoid stranding. > 2733: // StoreLoad achieves this. > 2734: membar(StoreLoad); Suggestion: fence(); similar to S390 src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 499: > 497: #endif > 498: > 499: // Intentional fall-through into slow path I don't think this comment makes sense / applies any more. src/hotspot/share/runtime/javaThread.hpp line 467: > 465: intx _held_monitor_count; // used by continuations for fast lock detection > 466: intx _jni_monitor_count; > 467: ObjectMonitor* _unlocked_inflated_monitor; At the time we store this the OM is in-use but we have unlocked it and so by the time we go to re-lock it later it may no longer be in-use. What prevents it from being deflated and deallocated? Does it require a safepoint that can't happen on that code path? If so we should add a comment to that affect somewhere. src/hotspot/share/runtime/sharedRuntime.cpp line 1973: > 1971: // Some other thread acquired the lock (or the monitor was > 1972: // deflated). Either way we are done. > 1973: current->dec_held_monitor_count(); So this decrement is pairing with the actual unlock that was done in the C2 code? test/micro/org/openjdk/bench/vm/lang/LockUnlock.java line 315: > 313: * inflated. > 314: */ > 315: @Threads(3) Please explain this change and update the comment. ------------- PR Review: https://git.openjdk.org/jdk/pull/19454#pullrequestreview-2317264716 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1767981547 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1767987462 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1767992448 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1767995887 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1768007118 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1768007650 From lucy at openjdk.org Fri Sep 20 06:53:37 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 20 Sep 2024 06:53:37 GMT Subject: RFR: 8338658: New Object to ObjectMonitor mapping: s390x implementation [v4] In-Reply-To: References: Message-ID: On Fri, 20 Sep 2024 05:05:06 GMT, Amit Kumar wrote: >> s390x implementation of [JDK-8315884](https://bugs.openjdk.org/browse/JDK-8315884) New Object to ObjectMonitor mapping; >> >> Testing: >> - tier1-test (fastdebug) >> - tier1-test with UseObjectMonitorTable (fastdebug) >> - tier1-test with UseObjectMonitorTable (release) > > Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Merge branch 'master' into om_v0 > - only use z_mvghi instruction > - review comments > - s390-port Changes are looking good now. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20740#pullrequestreview-2317412198 From mli at openjdk.org Fri Sep 20 07:28:40 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 20 Sep 2024 07:28:40 GMT Subject: RFR: 8339771: RISC-V: Reduce icache flushes [v4] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 12:41:54 GMT, Robbin Ehn wrote: >> Hey, please consider, >> >> All code which is offline (behind a barrier) do not need global icache flushes. >> As we can instead in slow path locally (thread and hart) emit fence.i. >> But if we were to be context switch do a hart which have not had fence.i emitted we can still fetch stale instructions. >> To handle this case new now have kernel support: >> https://docs.kernel.org/arch/riscv/cmodx.html >> >> It's not perfect as we will be emitting fence.i on any context switch for any thread with this patch, even if that thread do not execute on code heap (non attached native thread), and even if there was no changes to code heap. >> But this is in many cases much faster as the icache flush global IPI is very intrusive. >> Particular cases are running a concurrent gc with small head room. >> In such scenario I measured 15% increased throughput on VF2. >> A large CPU or less head room (faster GC cycles) will yield even more performance boost. >> >> Note that this requires 6.10 kernel. >> >> I'm running VF2 with 6.11-rc3 kernel and this passed tier1-3. (With setting on) >> >> Later we probably want this default on, but as it's hard to test I'll leave default off. > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Comment, remove not needed fence Marked as reviewed by mli (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20913#pullrequestreview-2317476860 From shade at openjdk.org Fri Sep 20 07:30:35 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 20 Sep 2024 07:30:35 GMT Subject: RFR: 8338471: Assert deleted methods not returned by CallInfo In-Reply-To: <_oZ5yRT9RbgXUxoLYHXTD9dbWLhYP3SFWiibNwaxB44=.4cd5f168-9f5f-4667-b599-999912f937c8@github.com> References: <_oZ5yRT9RbgXUxoLYHXTD9dbWLhYP3SFWiibNwaxB44=.4cd5f168-9f5f-4667-b599-999912f937c8@github.com> Message-ID: On Wed, 18 Sep 2024 22:39:11 GMT, Coleen Phillimore wrote: > CompiledIC can get old methods, but it can't get deleted methods for itable and vtable calls, so add a guarantee and asserts that this cannot happen before dereferencing. Removed unused function and unnecessary methodHandle operator. > Tested with tier1-4 and 8 which does a lot of redefinition. Looks okay, but do some of these _need_ to be `guarantee`-s, or just `assert` would be enough? I would prefer to keep release bit paths very efficient, if only for startup/Leyden :) ------------- PR Review: https://git.openjdk.org/jdk/pull/21075#pullrequestreview-2317479948 From rehn at openjdk.org Fri Sep 20 07:36:54 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 20 Sep 2024 07:36:54 GMT Subject: RFR: 8339771: RISC-V: Reduce icache flushes [v5] In-Reply-To: References: Message-ID: > Hey, please consider, > > All code which is offline (behind a barrier) do not need global icache flushes. > As we can instead in slow path locally (thread and hart) emit fence.i. > But if we were to be context switch do a hart which have not had fence.i emitted we can still fetch stale instructions. > To handle this case new now have kernel support: > https://docs.kernel.org/arch/riscv/cmodx.html > > It's not perfect as we will be emitting fence.i on any context switch for any thread with this patch, even if that thread do not execute on code heap (non attached native thread), and even if there was no changes to code heap. > But this is in many cases much faster as the icache flush global IPI is very intrusive. > Particular cases are running a concurrent gc with small head room. > In such scenario I measured 15% increased throughput on VF2. > A large CPU or less head room (faster GC cycles) will yield even more performance boost. > > Note that this requires 6.10 kernel. > > I'm running VF2 with 6.11-rc3 kernel and this passed tier1-3. (With setting on) > > Later we probably want this default on, but as it's hard to test I'll leave default off. Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into cmodx-fence - Comment, remove not needed fence - Merge branch 'master' into cmodx-fence - Comment, moved init after feature enabling - Fixed ws - Draft ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20913/files - new: https://git.openjdk.org/jdk/pull/20913/files/afbea83b..79e30029 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20913&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20913&range=03-04 Stats: 4011 lines in 94 files changed: 3696 ins; 39 del; 276 mod Patch: https://git.openjdk.org/jdk/pull/20913.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20913/head:pull/20913 PR: https://git.openjdk.org/jdk/pull/20913 From mli at openjdk.org Fri Sep 20 07:37:44 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 20 Sep 2024 07:37:44 GMT Subject: RFR: 8340439: AArch64: Extra entry declaration for assember test In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 11:54:32 GMT, SendaoYan wrote: > Hi all, > The function declaration `extern "C" void entry(CodeBuffer*);` in `src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp` line 74, seems to used for AArch64 assember test. > The AArch64 assember test has been moved to `test/hotspot/gtest` by [JDK-8252684](https://bugs.openjdk.org/browse/JDK-8252684) , so I think this function declaration can be remove. > > Additional testing: > > - [x] linux aarch64 jtreg(tier1/2/3 etc., include gtest) with release build > - [x] linux aarch64 jtreg(tier1/2/3 etc., include gtest) with fastdebug build Marked as reviewed by mli (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21086#pullrequestreview-2317489576 From syan at openjdk.org Fri Sep 20 07:37:45 2024 From: syan at openjdk.org (SendaoYan) Date: Fri, 20 Sep 2024 07:37:45 GMT Subject: Integrated: 8340439: AArch64: Extra entry declaration for assember test In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 11:54:32 GMT, SendaoYan wrote: > Hi all, > The function declaration `extern "C" void entry(CodeBuffer*);` in `src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp` line 74, seems to used for AArch64 assember test. > The AArch64 assember test has been moved to `test/hotspot/gtest` by [JDK-8252684](https://bugs.openjdk.org/browse/JDK-8252684) , so I think this function declaration can be remove. > > Additional testing: > > - [x] linux aarch64 jtreg(tier1/2/3 etc., include gtest) with release build > - [x] linux aarch64 jtreg(tier1/2/3 etc., include gtest) with fastdebug build This pull request has now been integrated. Changeset: 5d611c03 Author: SendaoYan Committer: Hamlin Li URL: https://git.openjdk.org/jdk/commit/5d611c0377d4b5d5495d3941a6a63b128142a2dc Stats: 6 lines in 1 file changed: 0 ins; 6 del; 0 mod 8340439: AArch64: Extra entry declaration for assember test Reviewed-by: haosun, lmesnik, mli ------------- PR: https://git.openjdk.org/jdk/pull/21086 From rehn at openjdk.org Fri Sep 20 07:52:54 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 20 Sep 2024 07:52:54 GMT Subject: RFR: 8339771: RISC-V: Reduce icache flushes [v5] In-Reply-To: References: Message-ID: On Fri, 20 Sep 2024 07:26:02 GMT, Hamlin Li wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Merge branch 'master' into cmodx-fence >> - Comment, remove not needed fence >> - Merge branch 'master' into cmodx-fence >> - Comment, moved init after feature enabling >> - Fixed ws >> - Draft > > Marked as reviewed by mli (Reviewer). Thanks @Hamlin-Li, @luhenry and @RealFYang ! I'll let this sit until next week, doing some extras test runs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20913#issuecomment-2363054069 From syan at openjdk.org Fri Sep 20 07:55:48 2024 From: syan at openjdk.org (SendaoYan) Date: Fri, 20 Sep 2024 07:55:48 GMT Subject: RFR: 8340439: AArch64: Extra entry declaration for assember test In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 11:54:32 GMT, SendaoYan wrote: > Hi all, > The function declaration `extern "C" void entry(CodeBuffer*);` in `src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp` line 74, seems to used for AArch64 assember test. > The AArch64 assember test has been moved to `test/hotspot/gtest` by [JDK-8252684](https://bugs.openjdk.org/browse/JDK-8252684) , so I think this function declaration can be remove. > > Additional testing: > > - [x] linux aarch64 jtreg(tier1/2/3 etc., include gtest) with release build > - [x] linux aarch64 jtreg(tier1/2/3 etc., include gtest) with fastdebug build Thanks for the review and sponsor. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21086#issuecomment-2363064972 From mdoerr at openjdk.org Fri Sep 20 08:27:41 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 20 Sep 2024 08:27:41 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: <1MC83jRy9o6GrZouJaYjgHyIoyfNvrakHuirZMxIdhk=.769c2ce1-795f-4981-a10b-cee04cad5a0a@github.com> References: <1MC83jRy9o6GrZouJaYjgHyIoyfNvrakHuirZMxIdhk=.769c2ce1-795f-4981-a10b-cee04cad5a0a@github.com> Message-ID: On Thu, 12 Sep 2024 12:27:44 GMT, Fredrik Bredberg wrote: >> I've run it through our nightly testing (x86_64, aarch64, PPC64 with several OSes) and the good news is that I haven't seen any functional problems. Performance looks also good for the SPEC benchmarks. I don't think they stress Java monitors very strongly. >> >> I've rerun the `LockUnlock` micro benchmark with this patch applied, but `LockUnlock.java` reverted to the original version. This makes `LockUnlock.testContendedLock` faster, but not as fast as without this patch (on the 96 Thread Xeon linux server, similar on Power10). Would be great if anybody could confirm. >> I think this should at least be documented and the description of the JBS issue improved. > > @TheRealMDoerr >> I've run it through our nightly testing (x86_64, aarch64, PPC64 with several OSes) and the good news is that I haven't seen any functional problems. Performance looks also good for the SPEC benchmarks. I don't think they stress Java monitors very strongly. > > That really is good news! Thanks for testing! > >> I've rerun the `LockUnlock` micro benchmark with this patch applied, but `LockUnlock.java` reverted to the original version. This makes `LockUnlock.testContendedLock` faster, but not as fast as without this patch (on the 96 Thread Xeon linux server, similar on Power10). Would be great if anybody could confirm. I think this should at least be documented and the description of the JBS issue improved. > > Tanks for confirming that my suspension was right. As I stated earlier, due to the added StoreLoad barrier a slight decrease in performance is probably to be expected if you just run `LockUnlock.testContendedLock`, but it shouldn't really matter when running real life applications. Anyhow I'll update the description of the JBS issue. @fbredber: If you need help to resolve the PPC64 conflicts with https://github.com/openjdk/jdk/commit/7579d3740217e4a819cbf63837ec929f00464585, just let me know. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19454#issuecomment-2363155597 From luhenry at openjdk.org Fri Sep 20 09:15:37 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 20 Sep 2024 09:15:37 GMT Subject: RFR: 8334999: RISC-V: implement AES single block encryption/decryption intrinsics [v3] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 08:34:05 GMT, Fei Yang wrote: > Yeah, will take another look. Have you tried this on real hardware? Interesting to see the numbers. There is no real hardware that I know of that have vector crypto just yet. I expect it's one of these that we'll want to test as soon as hardware is available, and even possibly enable by default then ------------- PR Comment: https://git.openjdk.org/jdk/pull/19960#issuecomment-2363246939 From mdoerr at openjdk.org Fri Sep 20 09:24:38 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 20 Sep 2024 09:24:38 GMT Subject: RFR: 8338658: New Object to ObjectMonitor mapping: s390x implementation [v4] In-Reply-To: References: Message-ID: On Fri, 20 Sep 2024 05:05:06 GMT, Amit Kumar wrote: >> s390x implementation of [JDK-8315884](https://bugs.openjdk.org/browse/JDK-8315884) New Object to ObjectMonitor mapping; >> >> Testing: >> - tier1-test (fastdebug) >> - tier1-test with UseObjectMonitorTable (fastdebug) >> - tier1-test with UseObjectMonitorTable (release) > > Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Merge branch 'master' into om_v0 > - only use z_mvghi instruction > - review comments > - s390-port Marked as reviewed by mdoerr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20740#pullrequestreview-2317733092 From luhenry at openjdk.org Fri Sep 20 09:35:35 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Fri, 20 Sep 2024 09:35:35 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 08:32:38 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > Thanks! > > This patch is based on https://github.com/openjdk/jdk/pull/20781 which added the sleef source (in particular the generated sleef inline headers). We use sleef api to vectorize the math operations in vector api. > > On machine with vector intrinsic support on riscv (e.g. gcc 14+) it will generate libsleef.so with the bridge functions to sleef api, otherwise without the bridge functions. > > ### Test > test/jdk/jdk/incubator/vector > > ### Performance > data on bananapi > > Benchmark - bananapi | (size) | Mode | Cnt | Score +intrinsic | Error +intrinsic | Score -intrinsic | Error -intrinsic | Units | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Double128Vector.ACOS | 1024 | avgt | 10 | 112444.388 | 655.761 | 208554.742 | 1508.709 | ns/op | 1.855 > Double128Vector.ASIN | 1024 | avgt | 10 | 104121.259 | 243.167 | 208314.499 | 2833.61 | ns/op | 2.001 > Double128Vector.ATAN | 1024 | avgt | 10 | 136941.263 | 243.486 | 284024.53 | 2204.224 | ns/op | 2.074 > Double128Vector.ATAN2 | 1024 | avgt | 10 | 163228.681 | 435.455 | 427589.587 | 3045.192 | ns/op | 2.62 > Double128Vector.CBRT | 1024 | avgt | 10 | 146395.753 | 239.355 | 317136.654 | 1330.869 | ns/op | 2.166 > Double128Vector.COS | 1024 | avgt | 10 | 154865.298 | 235.697 | 305721.518 | 1319.313 | ns/op | 1.974 > Double128Vector.COSH | 1024 | avgt | 10 | 189212.943 | 262.399 | 220756.27 | 61324.863 | ns/op | 1.167 > Double128Vector.EXP | 1024 | avgt | 10 | 113941.594 | 219.647 | 252853.07 | 891.272 | ns/op | 2.219 > Double128Vector.EXPM1 | 1024 | avgt | 10 | 184552.939 | 513.715 | 254087.184 | 2144.997 | ns/op | 1.377 > Double128Vector.HYPOT | 1024 | avgt | 10 | 111580.194 | 423.282 | 374537.338 | 2091.811 | ns/op | 3.357 > Double128Vector.LOG | 1024 | avgt | 10 | 110680.548 | 192.731 | 265391.129 | 2653.519 | ns/op | 2.398 > Double128Vector.LOG10 | 1024 | avgt | 10 | 116708.105 | 167.095 | 285764.405 | 2489.08 | ns/op | 2.449 > Double128Vector.LOG1P | 1024 | avgt | 10 | 115633.302 | 567.7 | 317235.967 | 1062.848 | ns/op | 2.743 > Double128Vector.POW | 1024 | avgt | 10 | 321655.14 | 36.55 | 560765.066 | 2669.33 | ns/op | 1.743 > Double128Vector.... Glad to see it come to fruition! ------------- Marked as reviewed by luhenry (Committer). PR Review: https://git.openjdk.org/jdk/pull/21083#pullrequestreview-2317754680 From aph at openjdk.org Fri Sep 20 09:36:45 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 20 Sep 2024 09:36:45 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v9] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: <58jnT00LJ-V7_N-pFrR8duCccBUHxZiq2cFuj-uS9ww=.3468c4b9-0b10-4909-96e9-7d40a0cffa62@github.com> On Thu, 19 Sep 2024 15:01:48 GMT, Mikhail Ablakatov wrote: >> Could you point me towards asm tests for *Vector - Scalar* insts like `fmlavs`, `fmlsvs` or `fmulxvs` ([assembler_aarch64.hpp#L2857](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/assembler_aarch64.hpp:2857))? Judging by git blame these were added without any new tests. Am I right assuming we are missing asm tests for *Vector - Scalar* instructions in general? > > It was renamed to `mulvs` by https://github.com/openjdk/jdk/pull/18487/commits/419f39473b53099b7bd42c33380a6ccb3917ab16 It's certainly possible that we are missing them. There was a period when `Assembler` changes were't being fully tested, but I've reviewed PRs more strictly since I realized. In this case, though, there is a bug which will be revealed by testing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1768295246 From aph at openjdk.org Fri Sep 20 09:53:45 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 20 Sep 2024 09:53:45 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v10] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: <007crTrGW5cVq0iWhpvI2J0J6lI5CKC_xdVVwu4aSt8=.cd62acd6-f164-4487-bc0f-8bc661eccc85@github.com> On Thu, 19 Sep 2024 15:05:04 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request incrementally with four additional commits since the last revision: > > - cleanup: use switch-case instead of if-else statements and ternary operators > - Don't try align basic blocks as it brings no measurable performance benefits > - fixup: rename the newly added Vector-Scalar mulv to mulvs > - fixup: fix Windows build by not using RELATIVE as an identifier src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 140: > 138: } > 139: > 140: void C2_MacroAssembler::arrays_hashcode_elload(Register dst, Address src, BasicType eltype) { This method has nothing to do with either arrays nor hashcode. Looking at class `Assembler`, it would make sense to have a general-purpose load that takes a `BasicType`. We already have this `Assembler` function: ` void ld_st2(Register Rt, const Address &adr, int size, int op, int V = 0)` where two bits of `op` gives us store and three versions of sign extension. Please use `ld_st2` to define a general-purpose `MacroAssembler::load` function with this set of arguments. It was perhaps a historic mistake of ours not to use an assembler function with the size and the signedness of an operand as an argument to ldr/str. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1768320227 From rkennke at openjdk.org Fri Sep 20 12:33:50 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 20 Sep 2024 12:33:50 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: <6UJOrZqmfsJj6pRzMjPdlYt191QgBV6fIv1qJAYsv60=.15284272-464f-4321-b76c-3412dafc6c63@github.com> Message-ID: On Thu, 19 Sep 2024 17:20:36 GMT, Roberto Casta?eda Lozano wrote: >> AFAIK, this happens only when using compressed oops with a heap-base in r27. When running with that setting, we would get addresses like r27[nklass] or r27[nklass]+offset, both with scale=8. You would need large heaps, perhaps >4GB, to get this coops setting. The problem with aarch64 is that we can't have an address like r27[nklass]+offset, that's why we need to lea the r27[nklass] part first. >> Yes, this also happens on x86, but x86 supports rX[nklass]+offset addressing. > > Thanks @rkennke, I tried running test tiers 1-3 using different compressed OOPs configurations but could not reach this code, unfortunately. Could you provide a reproducer? The reason I am particularly interested is because I'd like to find whether there could be any problematic interaction with C2's implicit null check optimization. I tried to reproduce for a few hours now using a custom testcase, with no success. I am pretty sure that this can happen, that is why I added this code. Originally I had an assert there asserting that index is not used. I do remember that this happens very rarely, and I don't remember the exact condition. Looking at the possible operands in opclass memory, I think this can only happen when we load an nKlass from an address of the form [rX, rY], i.e. the address in rX indexed by rY. This is an odd thing to happen for loadNKlass, I think, because rY should always be klass_offset_in_bytes. Maybe this is possible when we get odd address merges where we get a PhiNode as the offset/index? I don't know. I agree, this *might* lead to surprising problems with implicit null-checking, if it is expected that the first instruction in loadNKlass provokes the SIGSEGV. A way around this would be to declare an opclass that is a subset of 'memory' that excludes all operands with index, and match on that. I think this would force the lea as a separate instruction and ensure that we never see such a thing in loadNKlass. However, I would not feel very confident to do that without a reproducer. Let me dig a little further. For reference, here is my unsuccessful reproducer: https://gist.github.com/rkennke/8a57610d74fcde07a9390f268ec35738 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768538965 From coleenp at openjdk.org Fri Sep 20 12:37:35 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 20 Sep 2024 12:37:35 GMT Subject: RFR: 8338471: Assert deleted methods not returned by CallInfo In-Reply-To: <_oZ5yRT9RbgXUxoLYHXTD9dbWLhYP3SFWiibNwaxB44=.4cd5f168-9f5f-4667-b599-999912f937c8@github.com> References: <_oZ5yRT9RbgXUxoLYHXTD9dbWLhYP3SFWiibNwaxB44=.4cd5f168-9f5f-4667-b599-999912f937c8@github.com> Message-ID: On Wed, 18 Sep 2024 22:39:11 GMT, Coleen Phillimore wrote: > CompiledIC can get old methods, but it can't get deleted methods for itable and vtable calls, so add a guarantee and asserts that this cannot happen before dereferencing. Removed unused function and unnecessary methodHandle operator. > Tested with tier1-4 and 8 which does a lot of redefinition. So this was flagged by our source code analysis tool, which ignores asserts and debug only code, which is why I added them as guarantees. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21075#issuecomment-2363636593 From erikj at openjdk.org Fri Sep 20 12:37:35 2024 From: erikj at openjdk.org (Erik Joelsson) Date: Fri, 20 Sep 2024 12:37:35 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 16:35:14 GMT, Hamlin Li wrote: >> Sorry, I had to remind myself of how this works. We actually set this as a separate parameter on the Setup macro: `OPTIMIZATION := HIGH` > > Thanks. I'm sorry too, I'm not familiar with the build system. > What you expected could be something like below? > > diff --git a/make/modules/jdk.incubator.vector/Lib.gmk b/make/modules/jdk.incubator.vector/Lib.gmk > index 5e52277919a..c6c6103a301 100644 > --- a/make/modules/jdk.incubator.vector/Lib.gmk > +++ b/make/modules/jdk.incubator.vector/Lib.gmk > @@ -41,11 +41,12 @@ endif > ifeq ($(call isTargetOs, linux)+$(call isTargetCpu, riscv64)+$(INCLUDE_COMPILER2), true+true+true) > $(eval $(call SetupJdkLibrary, BUILD_LIBSLEEF, \ > NAME := sleef, \ > + OPTIMIZATION := HIGH, \ > SRC := libsleef/lib, \ > EXTRA_SRC := libsleef/generated, \ > DISABLED_WARNINGS_gcc := unused-function sign-compare tautological-compare ignored-qualifiers, \ > DISABLED_WARNINGS_clang := unused-function sign-compare tautological-compare ignored-qualifiers, \ > - CFLAGS := $(CFLAGS_JDKLIB) -O3 -march=rv64gcv, \ > + CFLAGS := $(CFLAGS_JDKLIB) -march=rv64gcv, \ > LDFLAGS := $(LDFLAGS_JDKLIB) \ > $(call SET_SHARED_LIBRARY_ORIGIN), \ > LIBS := $(JDKLIB_LIBS) \ Yes, exactly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1768546861 From stooke at openjdk.org Fri Sep 20 13:10:52 2024 From: stooke at openjdk.org (Simon Tooke) Date: Fri, 20 Sep 2024 13:10:52 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v12] In-Reply-To: References: Message-ID: > This PR changes the status of realpath() from a Posix-specific API to a globally available API, i.e. adding it to the "Hotspot Porting API". Code would refer to os::realpath() instead of os::Posix::realpath(). > > This requires a Windows implementation of realpath(), using Windows _fullpath(), and renaming os::Posix::realpath() to os::realpath(). > > The main difference between POSIX and Windows behaviour is that POSIX actually requires an existing accessible file, while Windows will happily work with made-up filenames. > > Please note that guidelines for doing this appear in src/hotspot/share/runtime/os.hpp Simon Tooke has updated the pull request incrementally with one additional commit since the last revision: fix realpath test on macOS ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20683/files - new: https://git.openjdk.org/jdk/pull/20683/files/24bfde29..a7a76e92 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20683&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20683&range=10-11 Stats: 30 lines in 1 file changed: 22 ins; 4 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20683.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20683/head:pull/20683 PR: https://git.openjdk.org/jdk/pull/20683 From stooke at openjdk.org Fri Sep 20 13:14:54 2024 From: stooke at openjdk.org (Simon Tooke) Date: Fri, 20 Sep 2024 13:14:54 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v13] In-Reply-To: References: Message-ID: > This PR changes the status of realpath() from a Posix-specific API to a globally available API, i.e. adding it to the "Hotspot Porting API". Code would refer to os::realpath() instead of os::Posix::realpath(). > > This requires a Windows implementation of realpath(), using Windows _fullpath(), and renaming os::Posix::realpath() to os::realpath(). > > The main difference between POSIX and Windows behaviour is that POSIX actually requires an existing accessible file, while Windows will happily work with made-up filenames. > > Please note that guidelines for doing this appear in src/hotspot/share/runtime/os.hpp Simon Tooke has updated the pull request incrementally with one additional commit since the last revision: delete commented out code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20683/files - new: https://git.openjdk.org/jdk/pull/20683/files/a7a76e92..920f67d8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20683&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20683&range=11-12 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20683.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20683/head:pull/20683 PR: https://git.openjdk.org/jdk/pull/20683 From fbredberg at openjdk.org Fri Sep 20 13:19:43 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Fri, 20 Sep 2024 13:19:43 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v3] In-Reply-To: <2b1-1pjo1Xk-0CDMYCHAWjrjdlhD8hA5H8c9Qh-Ireg=.9df3a0f8-c12c-4515-b60f-407924c5a287@github.com> References: <2b1-1pjo1Xk-0CDMYCHAWjrjdlhD8hA5H8c9Qh-Ireg=.9df3a0f8-c12c-4515-b60f-407924c5a287@github.com> Message-ID: On Fri, 20 Sep 2024 05:18:23 GMT, David Holmes wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update two, after the review > > src/hotspot/share/runtime/javaThread.hpp line 467: > >> 465: intx _held_monitor_count; // used by continuations for fast lock detection >> 466: intx _jni_monitor_count; >> 467: ObjectMonitor* _unlocked_inflated_monitor; > > At the time we store this the OM is in-use but we have unlocked it and so by the time we go to re-lock it later it may no longer be in-use. What prevents it from being deflated and deallocated? Does it require a safepoint that can't happen on that code path? If so we should add a comment to that affect somewhere. I asked the GC guys @fisk and @xmas92 about this some time ago, and they assured me that it was ok. Unfortunately I forgot about the details, so I re-asked them today and they said: "Since there is no safepoint polling when calling into the VM, so you can be sure that it hasn't been dealocated." I'll add a comment about it `SharedRuntime::monitor_exit_helper()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1768600977 From fbredberg at openjdk.org Fri Sep 20 13:25:44 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Fri, 20 Sep 2024 13:25:44 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v3] In-Reply-To: <2b1-1pjo1Xk-0CDMYCHAWjrjdlhD8hA5H8c9Qh-Ireg=.9df3a0f8-c12c-4515-b60f-407924c5a287@github.com> References: <2b1-1pjo1Xk-0CDMYCHAWjrjdlhD8hA5H8c9Qh-Ireg=.9df3a0f8-c12c-4515-b60f-407924c5a287@github.com> Message-ID: On Fri, 20 Sep 2024 05:36:40 GMT, David Holmes wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update two, after the review > > test/micro/org/openjdk/bench/vm/lang/LockUnlock.java line 315: > >> 313: * inflated. >> 314: */ >> 315: @Threads(3) > > Please explain this change and update the comment. he reason I increased the number of threads from 2 to 3, was because it enabled me to increase the code coverage, and thereby execute all(?) the corner cases when doing ObjectMonitor locking. I'll update the comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1768609529 From shade at openjdk.org Fri Sep 20 14:20:36 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 20 Sep 2024 14:20:36 GMT Subject: RFR: 8338471: Assert deleted methods not returned by CallInfo In-Reply-To: References: <_oZ5yRT9RbgXUxoLYHXTD9dbWLhYP3SFWiibNwaxB44=.4cd5f168-9f5f-4667-b599-999912f937c8@github.com> Message-ID: On Fri, 20 Sep 2024 12:34:52 GMT, Coleen Phillimore wrote: > So this was flagged by our source code analysis tool, which ignores asserts and debug only code, which is why I added them as guarantees. OK, satisfying static analysis tools does not sound like a compelling reason to expose these checks in release builds? Maybe I am missing something? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21075#issuecomment-2363849601 From haosun at openjdk.org Fri Sep 20 14:24:43 2024 From: haosun at openjdk.org (Hao Sun) Date: Fri, 20 Sep 2024 14:24:43 GMT Subject: RFR: 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime [v3] In-Reply-To: References: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> <07DqhAfjMD9qfeno10HOAuNBeiIul86acqTMpE6YtaY=.2569accb-c0ab-470f-b348-5894831be5d5@github.com> Message-ID: On Mon, 19 Aug 2024 08:58:53 GMT, Andrew Dinn wrote: >> Hi Andrew, I find that we need following add-on change for riscv: >> >> >> diff --git a/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp b/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp >> index dc89e489b24..bed24e442e8 100644 >> --- a/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp >> +++ b/src/hotspot/cpu/riscv/sharedRuntime_riscv.cpp >> @@ -66,6 +66,12 @@ >> >> #define __ masm-> >> >> +#ifdef PRODUCT >> +#define BLOCK_COMMENT(str) /* nothing */ >> +#else >> +#define BLOCK_COMMENT(str) __ block_comment(str) >> +#endif >> + >> const int StackAlignmentInSlots = StackAlignmentInBytes / VMRegImpl::stack_slot_size; >> >> class RegisterSaver { >> @@ -2742,7 +2748,7 @@ static void jfr_epilogue(MacroAssembler* masm) { >> // For c2: c_rarg0 is junk, call to runtime to write a checkpoint. >> // It returns a jobject handle to the event writer. >> // The handle is dereferenced and the return value is the event writer oop. >> -static RuntimeStub* SharedRuntime::generate_jfr_write_checkpoint() { >> +RuntimeStub* SharedRuntime::generate_jfr_write_checkpoint() { >> enum layout { >> fp_off, >> fp_off2, >> @@ -2780,7 +2786,7 @@ static RuntimeStub* SharedRuntime::generate_jfr_write_checkpoint() { >> } >> >> // For c2: call to return a leased buffer. >> -static RuntimeStub* SharedRuntime::generate_jfr_return_lease() { >> +RuntimeStub* SharedRuntime::generate_jfr_return_lease() { >> enum layout { >> fp_off, >> fp_off2, > > @RealFYang Thanks! Hi @adinn , I encountered Client build failure on AArch64 after this commit. Could you help take a look at it when you have spare time? Thanks. Here shows the configuration ==================================================== The existing configuration has been successfully updated in /tmp/test123/build-release using configure arguments '--with-debug-level=release --with-version-opt=git-fe80618bf3f --with-jvm-variants=client'. Configuration summary: * Name: /tmp/test123/build-release * Debug level: release * HS debug level: product * JVM variants: client * JVM features: client: 'cds compiler1 epsilongc g1gc jfr jni-check jvmti management parallelgc serialgc services shenandoahgc vm-structs zgc' * OpenJDK target: OS: linux, CPU architecture: aarch64, address length: 64 * Version string: 24-internal-git-fe80618bf3f (24-internal) * Source date: 1726841146 (2024-09-20T14:05:46Z) Tools summary: * Boot JDK: openjdk version "22.0.2" 2024-07-16 OpenJDK Runtime Environment (build 22.0.2+9-70) OpenJDK 64-Bit Server VM (build 22.0.2+9-70, mixed mode, sharing) (at /usr/lib/jvm/jdk-22.0.2) * Toolchain: gcc (GNU Compiler Collection) * C Compiler: Version 13.2.0 (at /usr/bin/gcc) * C++ Compiler: Version 13.2.0 (at /usr/bin/g++) Build performance summary: * Build jobs: 72 * Memory limit: 587068 MB And here shows the error msg: === Output from failing command(s) repeated here === * For target hotspot_variant-client_libjvm_objs_sharedRuntime_aarch64.o: /tmp/test123/jdk-src/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp: In static member function ?static RuntimeStub* SharedRuntime::generate_throw_exception(const char*, address)?: /tmp/test123/jdk-src/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp:2809:3: error: ?TraceTime? was not declared in this scope; did you mean ?traceid?? 2809 | TraceTime timer(timer_msg, TRACETIME_LOG(Info, startuptime)); | ^~~~~~~~~ | traceid * All command lines available in /tmp/test123/build-release/make-support/failure-logs. === End of repeated output === ------------- PR Comment: https://git.openjdk.org/jdk/pull/20566#issuecomment-2363857168 From amitkumar at openjdk.org Fri Sep 20 14:48:46 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 20 Sep 2024 14:48:46 GMT Subject: RFR: 8338658: New Object to ObjectMonitor mapping: s390x implementation [v4] In-Reply-To: References: Message-ID: On Fri, 20 Sep 2024 05:05:06 GMT, Amit Kumar wrote: >> s390x implementation of [JDK-8315884](https://bugs.openjdk.org/browse/JDK-8315884) New Object to ObjectMonitor mapping; >> >> Testing: >> - tier1-test (fastdebug) >> - tier1-test with UseObjectMonitorTable (fastdebug) >> - tier1-test with UseObjectMonitorTable (release) > > Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Merge branch 'master' into om_v0 > - only use z_mvghi instruction > - review comments > - s390-port thanks for the reviews; ------------- PR Comment: https://git.openjdk.org/jdk/pull/20740#issuecomment-2363906193 From amitkumar at openjdk.org Fri Sep 20 14:48:47 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 20 Sep 2024 14:48:47 GMT Subject: Integrated: 8338658: New Object to ObjectMonitor mapping: s390x implementation In-Reply-To: References: Message-ID: On Wed, 28 Aug 2024 04:26:23 GMT, Amit Kumar wrote: > s390x implementation of [JDK-8315884](https://bugs.openjdk.org/browse/JDK-8315884) New Object to ObjectMonitor mapping; > > Testing: > - tier1-test (fastdebug) > - tier1-test with UseObjectMonitorTable (fastdebug) > - tier1-test with UseObjectMonitorTable (release) This pull request has now been integrated. Changeset: 9bcde4ff Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/9bcde4ffca20941b010ed454b2fcb948d24b3cac Stats: 167 lines in 7 files changed: 92 ins; 23 del; 52 mod 8338658: New Object to ObjectMonitor mapping: s390x implementation Reviewed-by: lucy, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/20740 From coleenp at openjdk.org Fri Sep 20 14:49:12 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 20 Sep 2024 14:49:12 GMT Subject: RFR: 8338471: Assert deleted methods not returned by CallInfo [v2] In-Reply-To: <_oZ5yRT9RbgXUxoLYHXTD9dbWLhYP3SFWiibNwaxB44=.4cd5f168-9f5f-4667-b599-999912f937c8@github.com> References: <_oZ5yRT9RbgXUxoLYHXTD9dbWLhYP3SFWiibNwaxB44=.4cd5f168-9f5f-4667-b599-999912f937c8@github.com> Message-ID: > CompiledIC can get old methods, but it can't get deleted methods for itable and vtable calls, so add a guarantee and asserts that this cannot happen before dereferencing. Removed unused function and unnecessary methodHandle operator. > Tested with tier1-4 and 8 which does a lot of redefinition. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Change guarantees to asserts. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21075/files - new: https://git.openjdk.org/jdk/pull/21075/files/8a70ab32..f3540cd8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21075&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21075&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21075.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21075/head:pull/21075 PR: https://git.openjdk.org/jdk/pull/21075 From coleenp at openjdk.org Fri Sep 20 14:49:12 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 20 Sep 2024 14:49:12 GMT Subject: RFR: 8338471: Assert deleted methods not returned by CallInfo In-Reply-To: <_oZ5yRT9RbgXUxoLYHXTD9dbWLhYP3SFWiibNwaxB44=.4cd5f168-9f5f-4667-b599-999912f937c8@github.com> References: <_oZ5yRT9RbgXUxoLYHXTD9dbWLhYP3SFWiibNwaxB44=.4cd5f168-9f5f-4667-b599-999912f937c8@github.com> Message-ID: On Wed, 18 Sep 2024 22:39:11 GMT, Coleen Phillimore wrote: > CompiledIC can get old methods, but it can't get deleted methods for itable and vtable calls, so add a guarantee and asserts that this cannot happen before dereferencing. Removed unused function and unnecessary methodHandle operator. > Tested with tier1-4 and 8 which does a lot of redefinition. Ok, I figured out how to fix it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21075#issuecomment-2363907238 From shade at openjdk.org Fri Sep 20 14:59:34 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 20 Sep 2024 14:59:34 GMT Subject: RFR: 8338471: Assert deleted methods not returned by CallInfo [v2] In-Reply-To: References: <_oZ5yRT9RbgXUxoLYHXTD9dbWLhYP3SFWiibNwaxB44=.4cd5f168-9f5f-4667-b599-999912f937c8@github.com> Message-ID: On Fri, 20 Sep 2024 14:49:12 GMT, Coleen Phillimore wrote: >> CompiledIC can get old methods, but it can't get deleted methods for itable and vtable calls, so add a guarantee and asserts that this cannot happen before dereferencing. Removed unused function and unnecessary methodHandle operator. >> Tested with tier1-4 and 8 which does a lot of redefinition. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Change guarantees to asserts. Looks fine, thanks. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21075#pullrequestreview-2318533943 From rkennke at openjdk.org Fri Sep 20 15:29:52 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 20 Sep 2024 15:29:52 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: <6UJOrZqmfsJj6pRzMjPdlYt191QgBV6fIv1qJAYsv60=.15284272-464f-4321-b76c-3412dafc6c63@github.com> Message-ID: On Fri, 20 Sep 2024 12:31:18 GMT, Roman Kennke wrote: >> Thanks @rkennke, I tried running test tiers 1-3 using different compressed OOPs configurations but could not reach this code, unfortunately. Could you provide a reproducer? The reason I am particularly interested is because I'd like to find whether there could be any problematic interaction with C2's implicit null check optimization. > > I tried to reproduce for a few hours now using a custom testcase, with no success. > I am pretty sure that this can happen, that is why I added this code. Originally I had an assert there asserting that index is not used. I do remember that this happens very rarely, and I don't remember the exact condition. Looking at the possible operands in opclass memory, I think this can only happen when we load an nKlass from an address of the form [rX, rY], i.e. the address in rX indexed by rY. This is an odd thing to happen for loadNKlass, I think, because rY should always be klass_offset_in_bytes. Maybe this is possible when we get odd address merges where we get a PhiNode as the offset/index? I don't know. > I agree, this *might* lead to surprising problems with implicit null-checking, if it is expected that the first instruction in loadNKlass provokes the SIGSEGV. A way around this would be to declare an opclass that is a subset of 'memory' that excludes all operands with index, and match on that. I think this would force the lea as a separate instruction and ensure that we never see such a thing in loadNKlass. However, I would not feel very confident to do that without a reproducer. Let me dig a little further. > > For reference, here is my unsuccessful reproducer: https://gist.github.com/rkennke/8a57610d74fcde07a9390f268ec35738 Something like this is what I have in mind. It seems to pass tier1 tests. I still haven't managed to reproduce the path that requires an index register, though. https://github.com/rkennke/jdk/commit/2c4a7877e4ef94017c8155578d8cfc9342441656 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768816377 From ogillespie at openjdk.org Fri Sep 20 15:36:04 2024 From: ogillespie at openjdk.org (Oli Gillespie) Date: Fri, 20 Sep 2024 15:36:04 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints Message-ID: Mitigate the impact of JVM_StartThread on safepoint synchronization, by adding a new ThreadStart_lock which limits the number of JVM_StartThread invocations competing for Threads_lock at any given time to 1. This gives a VM thread trying to call a safepoint a much better chance of acquiring Threads_lock when there are many JVM_StartThread invocations in flight, at the cost of one extra lock/unlock for every new thread. Can be disabled with new diagnostic flag `-XX:-UseExtraThreadStartLock`. Before (ThreadStartTtsp.java is shared in JDK-8340547): java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' Reaching safepoint: 1291591 ns Reaching safepoint: 59962 ns Reaching safepoint: 1958065 ns Reaching safepoint: 14456666258 ns <-- 14 seconds! ... After: java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' Reaching safepoint: 214269 ns Reaching safepoint: 60253 ns Reaching safepoint: 2040680 ns Reaching safepoint: 3089284 ns Reaching safepoint: 2998303 ns Reaching safepoint: 4433713 ns <-- 4.4ms Reaching safepoint: 3368436 ns Reaching safepoint: 2986519 ns Reaching safepoint: 3269102 ns ... **Alternatives** I considered some other options for mitigating this. For example, could we reduce the time spent holding the lock in StartThread? Most of the time is spent managing the threads list for ThreadSMR support, and each time we add a thread to that list we need to copy the whole list and free every entry in the original, which is slow. But I didn't see an easy way to avoid this. I also looked at some kind of signal from the VM thread that it is ready to start synchronizing that StartThread could check before trying to grab Threads_lock, but I didn't find anything better than this extra lock. ------------- Commit messages: - Remove unused code - Improve ttsp while creating threads Changes: https://git.openjdk.org/jdk/pull/21111/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21111&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340547 Stats: 15 lines in 4 files changed: 12 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21111.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21111/head:pull/21111 PR: https://git.openjdk.org/jdk/pull/21111 From shade at openjdk.org Fri Sep 20 15:45:35 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 20 Sep 2024 15:45:35 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints In-Reply-To: References: Message-ID: On Fri, 20 Sep 2024 15:31:42 GMT, Oli Gillespie wrote: > Mitigate the impact of JVM_StartThread on safepoint synchronization, by adding a new ThreadStart_lock which limits the number of JVM_StartThread invocations competing for Threads_lock at any given time to 1. > This gives a VM thread trying to call a safepoint a much better chance of acquiring Threads_lock when there are many JVM_StartThread invocations in flight, at the cost of one extra lock/unlock for every new thread. > > Can be disabled with new diagnostic flag `-XX:-UseExtraThreadStartLock`. > > Before (ThreadStartTtsp.java is shared in JDK-8340547): > > java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' > Reaching safepoint: 1291591 ns > Reaching safepoint: 59962 ns > Reaching safepoint: 1958065 ns > Reaching safepoint: 14456666258 ns <-- 14 seconds! > ... > > > After: > > java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' > Reaching safepoint: 214269 ns > Reaching safepoint: 60253 ns > Reaching safepoint: 2040680 ns > Reaching safepoint: 3089284 ns > Reaching safepoint: 2998303 ns > Reaching safepoint: 4433713 ns <-- 4.4ms > Reaching safepoint: 3368436 ns > Reaching safepoint: 2986519 ns > Reaching safepoint: 3269102 ns > ... > > > > **Alternatives** > > I considered some other options for mitigating this. For example, could we reduce the time spent holding the lock in StartThread? Most of the time is spent managing the threads list for ThreadSMR support, and each time we add a thread to that list we need to copy the whole list and free every entry in the original, which is slow. But I didn't see an easy way to avoid this. > I also looked at some kind of signal from the VM thread that it is ready to start synchronizing that StartThread could check before trying to grab Threads_lock, but I didn't find anything better than this extra lock. src/hotspot/share/prims/jvm.cpp line 2953: > 2951: // to reduce competition for Threads_lock. > 2952: ConditionalMutexLocker mu1(ThreadStart_lock, UseExtraThreadStartLock > 2953: && Universe::is_fully_initialized()); I know I suggested looking into this, but know I think that `JVM_StartThread` is already called only from Java code, meaning we are probably fine already? I think we can drop the `Universe::` check. I suggest to rename the locals for `MutexLocker` to `ml1` and `ml2`, respectively. src/hotspot/share/runtime/globals.hpp line 2003: > 2001: product(bool, UseThreadStartLock, true, DIAGNOSTIC, \ > 2002: "Use an extra lock in JVM_ThreadStart to reduce " \ > 2003: "time-to-safepoint while creating threads.") \ The description should probably be more generic, e.g.: Use an extra lock during Thread.start to alleviate contention on Threads lock. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21111#discussion_r1768834620 PR Review Comment: https://git.openjdk.org/jdk/pull/21111#discussion_r1768837433 From ogillespie at openjdk.org Fri Sep 20 15:56:08 2024 From: ogillespie at openjdk.org (Oli Gillespie) Date: Fri, 20 Sep 2024 15:56:08 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v2] In-Reply-To: References: Message-ID: > Mitigate the impact of JVM_StartThread on safepoint synchronization, by adding a new ThreadStart_lock which limits the number of JVM_StartThread invocations competing for Threads_lock at any given time to 1. > This gives a VM thread trying to call a safepoint a much better chance of acquiring Threads_lock when there are many JVM_StartThread invocations in flight, at the cost of one extra lock/unlock for every new thread. > > Can be disabled with new diagnostic flag `-XX:-UseExtraThreadStartLock`. > > Before (ThreadStartTtsp.java is shared in JDK-8340547): > > java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' > Reaching safepoint: 1291591 ns > Reaching safepoint: 59962 ns > Reaching safepoint: 1958065 ns > Reaching safepoint: 14456666258 ns <-- 14 seconds! > ... > > > After: > > java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' > Reaching safepoint: 214269 ns > Reaching safepoint: 60253 ns > Reaching safepoint: 2040680 ns > Reaching safepoint: 3089284 ns > Reaching safepoint: 2998303 ns > Reaching safepoint: 4433713 ns <-- 4.4ms > Reaching safepoint: 3368436 ns > Reaching safepoint: 2986519 ns > Reaching safepoint: 3269102 ns > ... > > > > **Alternatives** > > I considered some other options for mitigating this. For example, could we reduce the time spent holding the lock in StartThread? Most of the time is spent managing the threads list for ThreadSMR support, and each time we add a thread to that list we need to copy the whole list and free every entry in the original, which is slow. But I didn't see an easy way to avoid this. > I also looked at some kind of signal from the VM thread that it is ready to start synchronizing that StartThread could check before trying to grab Threads_lock, but I didn't find anything better than this extra lock. Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: Fix build and address comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21111/files - new: https://git.openjdk.org/jdk/pull/21111/files/ac5b2d2f..b9550f68 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21111&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21111&range=00-01 Stats: 5 lines in 2 files changed: 0 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21111.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21111/head:pull/21111 PR: https://git.openjdk.org/jdk/pull/21111 From jwaters at openjdk.org Fri Sep 20 17:03:37 2024 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 20 Sep 2024 17:03:37 GMT Subject: RFR: 8338471: Assert deleted methods not returned by CallInfo [v2] In-Reply-To: References: <_oZ5yRT9RbgXUxoLYHXTD9dbWLhYP3SFWiibNwaxB44=.4cd5f168-9f5f-4667-b599-999912f937c8@github.com> Message-ID: On Fri, 20 Sep 2024 14:49:12 GMT, Coleen Phillimore wrote: >> CompiledIC can get old methods, but it can't get deleted methods for itable and vtable calls, so add a guarantee and asserts that this cannot happen before dereferencing. Removed unused function and unnecessary methodHandle operator. >> Tested with tier1-4 and 8 which does a lot of redefinition. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Change guarantees to asserts. Looks good, also great to have a HotSpot PR that's actually simple enough for me to understand :P ------------- Marked as reviewed by jwaters (Committer). PR Review: https://git.openjdk.org/jdk/pull/21075#pullrequestreview-2318800126 From jbhateja at openjdk.org Fri Sep 20 17:04:41 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 20 Sep 2024 17:04:41 GMT Subject: RFR: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes [v4] In-Reply-To: References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Thu, 19 Sep 2024 21:43:01 GMT, Sandhya Viswanathan wrote: >> Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. >> >> Summary of changes is as follows: >> 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. >> 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code >> >> For the following source: >> >> >> public void test() { >> var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); >> for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { >> var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); >> index.selectFrom(inpvect).intoArray(byteres, j); >> } >> } >> >> >> The code generated for inner main now looks as follows: >> ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 >> 0x00007f40d02274d0: movslq %ebx,%r13 >> 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) >> 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 >> 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) >> 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) >> 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 >> 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 >> 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) >> 0x00007f40d022751f: add $0x40,%ebx >> 0x00007f40d0227522: cmp %r8d,%ebx >> 0x00007f40d0227525: jl 0x00007f40d02274d0 >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Implement review comments Thanks @sviswa7 , LGTM ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20634#pullrequestreview-2318802240 From matsaave at openjdk.org Fri Sep 20 17:21:51 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 20 Sep 2024 17:21:51 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v23] In-Reply-To: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com> References: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com> Message-ID: On Thu, 19 Sep 2024 12:08:46 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'lilliput/JEP-450-temporary-fix-branch-2' into JDK-8305895-v4 > - review feedback CDS changes look good! Have two style comments but otherwise this makes sense ------------- Marked as reviewed by matsaave (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2318793061 From matsaave at openjdk.org Fri Sep 20 17:21:53 2024 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 20 Sep 2024 17:21:53 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 20:08:43 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix bit counts in GCForwarding src/hotspot/share/cds/archiveBuilder.cpp line 677: > 675: // Allocate space for the future InstanceKlass with proper alignment > 676: const size_t alignment = > 677: #ifdef _LP64 I think the text alignment here is a bit confusing. Should 678 and 682 be at the same indentation? src/hotspot/share/cds/archiveUtils.cpp line 348: > 346: old_tag = (int)(intptr_t)nextPtr(); > 347: // do_int(&old_tag); > 348: assert(tag == old_tag, "tag doesn't match (%d, expected %d)", old_tag, tag); Is this assert message change a leftover from debugging or is it meant to be this way? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768946883 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768923643 From ccheung at openjdk.org Fri Sep 20 18:03:52 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 20 Sep 2024 18:03:52 GMT Subject: RFR: 8328313: Archived module graph should allow identical --module-path to be specified during dump time and run time [v2] In-Reply-To: References: Message-ID: > Prior to this patch, if `--module-path` is specified in the command line: > during CDS dump time, full module graph will not be included in the CDS archive; > during run time, full module graph will not be used. > > With this patch, the full module graph will be included in the CDS archive with the `--module-path` option. During run time, if the same `--module-path` option is specified, the archived module graph will be used. > > The checking of module paths between dump time and run time is more lenient compared with the checking of class paths; the ordering of the modules is unimportant, duplicate module names are ignored. > E.g. the following is considered a match: > dump time runtime > m1,m2 m2,m1 > m1,m2 m1,m2,m2 > > I included some [notes](https://bugs.openjdk.org/browse/JDK-8328313?focusedId=14699275&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14699275) in the bug report regarding some changes in the corelib classes. Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: comments from David, Alan, and Ioi ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21048/files - new: https://git.openjdk.org/jdk/pull/21048/files/33c9d247..c6fc5bd8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21048&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21048&range=00-01 Stats: 138 lines in 12 files changed: 86 ins; 37 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/21048.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21048/head:pull/21048 PR: https://git.openjdk.org/jdk/pull/21048 From ccheung at openjdk.org Fri Sep 20 18:16:57 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 20 Sep 2024 18:16:57 GMT Subject: RFR: 8328313: Archived module graph should allow identical --module-path to be specified during dump time and run time [v3] In-Reply-To: References: Message-ID: > Prior to this patch, if `--module-path` is specified in the command line: > during CDS dump time, full module graph will not be included in the CDS archive; > during run time, full module graph will not be used. > > With this patch, the full module graph will be included in the CDS archive with the `--module-path` option. During run time, if the same `--module-path` option is specified, the archived module graph will be used. > > The checking of module paths between dump time and run time is more lenient compared with the checking of class paths; the ordering of the modules is unimportant, duplicate module names are ignored. > E.g. the following is considered a match: > dump time runtime > m1,m2 m2,m1 > m1,m2 m1,m2,m2 > > I included some [notes](https://bugs.openjdk.org/browse/JDK-8328313?focusedId=14699275&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14699275) in the bug report regarding some changes in the corelib classes. Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: trailing whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21048/files - new: https://git.openjdk.org/jdk/pull/21048/files/c6fc5bd8..61ffd1b2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21048&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21048&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21048.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21048/head:pull/21048 PR: https://git.openjdk.org/jdk/pull/21048 From ccheung at openjdk.org Fri Sep 20 18:19:42 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 20 Sep 2024 18:19:42 GMT Subject: RFR: 8328313: Archived module graph should allow identical --module-path to be specified during dump time and run time [v3] In-Reply-To: <8QVb7SLhVWE1Q0U7_2oOpcltcNbKEYw4PJ1zi01U-tc=.207e30c8-4c9b-40da-a3a5-9389c4fcc44b@github.com> References: <8QVb7SLhVWE1Q0U7_2oOpcltcNbKEYw4PJ1zi01U-tc=.207e30c8-4c9b-40da-a3a5-9389c4fcc44b@github.com> Message-ID: On Wed, 18 Sep 2024 01:15:40 GMT, David Holmes wrote: >> Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: >> >> trailing whitespace > > src/hotspot/share/cds/filemap.cpp line 931: > >> 929: bool FileMapInfo::is_jar_suffix(const char* filename) { >> 930: const char* dot = strrchr(filename, '.'); >> 931: if (strcmp(dot + 1, "jar") == 0) { > > What if there is no dot? We need a null check. Added a null check. > src/hotspot/share/cds/filemap.hpp line 558: > >> 556: unsigned int dumptime_prefix_len, >> 557: unsigned int runtime_prefix_len) NOT_CDS_RETURN_(false); >> 558: bool is_jar_suffix(const char* filename); > > Suggestion: has_jar_suffix Fixed. Also moved the function to ClassLoaderExt.cpp. > src/hotspot/share/cds/heapShared.cpp line 884: > >> 882: ClassLoaderExt::num_module_paths() > 0) { >> 883: log_info(cds, heap)(" is_using_optimized_module_handling %d num_module_paths %d jdk.module.main %s", >> 884: CDSConfig::is_using_optimized_module_handling(), ClassLoaderExt::num_module_paths(), Arguments::get_property("jdk.module.main")); > > Why are you printing a bool value as an int? I'm surprised one of the format checkers doesn't complain about it. I changed it to BOOL_TO_STR and updated the log statement. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1769018271 PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1769018370 PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1769018601 From coleenp at openjdk.org Fri Sep 20 18:19:51 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 20 Sep 2024 18:19:51 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v23] In-Reply-To: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com> References: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com> Message-ID: On Thu, 19 Sep 2024 12:08:46 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'lilliput/JEP-450-temporary-fix-branch-2' into JDK-8305895-v4 > - review feedback I mostly reviewed the metaspace changes and suggest upstreaming the MetaBlock refactoring ahead of the rest of this patch. Only one comment about the interpreter code (affecting 4 locations). src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 3636: > 3634: } else { > 3635: __ sub(r3, r3, sizeof(oopDesc)); > 3636: } This looks like something that could be buggy if we're not careful. We had a pass where we cleaned up sizeof(oopDesc) once. Can this be in oopDesc as (this is not header_size() anymore?) some function with the right name? src/hotspot/cpu/x86/templateTable_x86.cpp line 4121: > 4119: __ movptr(Address(rax, rdx, Address::times_8, sizeof(oopDesc) - 1*oopSize), rcx); > 4120: NOT_LP64(__ movptr(Address(rax, rdx, Address::times_8, sizeof(oopDesc) - 2*oopSize), rcx)); > 4121: } For this and above, I'd rather oopDesc encapsulate the header_size for UseCompactObjectHeaders condition in C++ code, and never see sizeof(oopDesc). src/hotspot/share/memory/metaspace.cpp line 799: > 797: > 798: // Set up compressed class pointer encoding. > 799: // In CDS=off mode, we give the JVM some leeway to choose a favorable base/shift combination. I don't know why this comment is here. Seems out of place. src/hotspot/share/memory/metaspace/freeBlocks.cpp line 57: > 55: } > 56: } > 57: return p; This answers my prior question. The waste is added back to the block list for non-class-arenas as well. src/hotspot/share/memory/metaspace/metablock.hpp line 74: > 72: #define METABLOCKFORMATARGS(__block__) p2i((__block__).base()), (__block__).word_size() > 73: > 74: } // namespace metaspace I am wondering if some of these metaspace changes, that is, the addition of MetaBlock could be upstreamed ahead of the CompactObjectHeaders. Some is refactoring so that you can use the wastage to allocate into class-arena but a lot of this seems neutral to compact object headers, and would reduce this patch and allow different people to focus on just this. src/hotspot/share/memory/metaspace/metaspaceArena.cpp line 470: > 468: > 469: // Returns true if the given block is contained in this arena > 470: // Returns true if the given block is contained in this arena Here's the same comment twice. ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2318539468 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768775590 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768781956 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768979540 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1769008437 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1769012842 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1769015008 From coleenp at openjdk.org Fri Sep 20 18:19:52 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 20 Sep 2024 18:19:52 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: <5Mh0IteE4Z7zDseCNMmYKOtyCMTQe0iuAp70kJf8pS0=.8215bcce-6387-46fb-97a0-d1e6a9498b61@github.com> On Wed, 18 Sep 2024 13:57:29 GMT, Thomas Stuefe wrote: >> src/hotspot/share/memory/classLoaderMetaspace.cpp line 87: >> >>> 85: klass_alignment_words, >>> 86: "class arena"); >>> 87: } >> >> As per my comment in the header file, change the code to this: >> >> ```c++ >> if (class_context != nullptr) { >> // ... Same as in PR >> } else { >> _class_space_arena = _non_class_space_arena; >> } > > Rather not, see reasoning under https://github.com/openjdk/jdk/pull/20677/files#r1754330432 Yes, I'd rather _class_space_arena be nullptr if not used. >> src/hotspot/share/memory/classLoaderMetaspace.cpp line 115: >> >>> 113: if (wastage.is_nonempty()) { >>> 114: non_class_space_arena()->deallocate(wastage); >>> 115: } >> >> This code reads a bit strangely. I understand *what* it tries to do. It tries to give back any wasted memory from either the class space arena *or* the non class space arena to the non class space arena's freelist. I assume that we do this since any wastage is presumably too small to be used by our new 22-bit class pointers. However, this context will be lost on future readers. It should have at least a comment in the `if (wastage.is_nonempty())` clause explaining what we expect should happen and why. For example: >> >> ```c++ >> // Any wasted memory is presumably too small for any class. >> // Therefore, give it back to the non-class space arena's free list. > > Yes. Some background: > > - wastage can only occur for larger Klass* alignments (aka class space arena alignment property), so only for +COH (note to self, maybe assert) > - wastage is, by definition, not aligned to the required Klass* alignment, so it cannot be reused. Yes, its probably also too small > > Yes, I will write a better comment. Yes, this definitely needs a comment why since this is how we allocate small chunks of wasted because of hyper-aligning Klasses in class space. Line 111 is somewhat surprising though. I didn't expect there to be wastage from allocating to non-class-metaspace. The unnerving bit of this is that CompressedKlassPointers::is_encodable() is true for memory allocated here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768897591 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768966812 From coleenp at openjdk.org Fri Sep 20 18:19:53 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 20 Sep 2024 18:19:53 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: <5Mh0IteE4Z7zDseCNMmYKOtyCMTQe0iuAp70kJf8pS0=.8215bcce-6387-46fb-97a0-d1e6a9498b61@github.com> References: <5Mh0IteE4Z7zDseCNMmYKOtyCMTQe0iuAp70kJf8pS0=.8215bcce-6387-46fb-97a0-d1e6a9498b61@github.com> Message-ID: On Fri, 20 Sep 2024 17:34:09 GMT, Coleen Phillimore wrote: >> Yes. Some background: >> >> - wastage can only occur for larger Klass* alignments (aka class space arena alignment property), so only for +COH (note to self, maybe assert) >> - wastage is, by definition, not aligned to the required Klass* alignment, so it cannot be reused. Yes, its probably also too small >> >> Yes, I will write a better comment. > > Yes, this definitely needs a comment why since this is how we allocate small chunks of wasted because of hyper-aligning Klasses in class space. Line 111 is somewhat surprising though. I didn't expect there to be wastage from allocating to non-class-metaspace. > > The unnerving bit of this is that CompressedKlassPointers::is_encodable() is true for memory allocated here. I think this should also assert or be condionalized on UseCompactObjectHeaders. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1768972448 From coleenp at openjdk.org Fri Sep 20 18:41:38 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 20 Sep 2024 18:41:38 GMT Subject: RFR: 8338471: Assert deleted methods not returned by CallInfo [v2] In-Reply-To: References: <_oZ5yRT9RbgXUxoLYHXTD9dbWLhYP3SFWiibNwaxB44=.4cd5f168-9f5f-4667-b599-999912f937c8@github.com> Message-ID: On Fri, 20 Sep 2024 14:49:12 GMT, Coleen Phillimore wrote: >> CompiledIC can get old methods, but it can't get deleted methods for itable and vtable calls, so add a guarantee and asserts that this cannot happen before dereferencing. Removed unused function and unnecessary methodHandle operator. >> Tested with tier1-4 and 8 which does a lot of redefinition. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Change guarantees to asserts. Thanks for reviewing David, Aleksey and Julian. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21075#issuecomment-2364312144 From coleenp at openjdk.org Fri Sep 20 18:41:39 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 20 Sep 2024 18:41:39 GMT Subject: Integrated: 8338471: Assert deleted methods not returned by CallInfo In-Reply-To: <_oZ5yRT9RbgXUxoLYHXTD9dbWLhYP3SFWiibNwaxB44=.4cd5f168-9f5f-4667-b599-999912f937c8@github.com> References: <_oZ5yRT9RbgXUxoLYHXTD9dbWLhYP3SFWiibNwaxB44=.4cd5f168-9f5f-4667-b599-999912f937c8@github.com> Message-ID: On Wed, 18 Sep 2024 22:39:11 GMT, Coleen Phillimore wrote: > CompiledIC can get old methods, but it can't get deleted methods for itable and vtable calls, so add a guarantee and asserts that this cannot happen before dereferencing. Removed unused function and unnecessary methodHandle operator. > Tested with tier1-4 and 8 which does a lot of redefinition. This pull request has now been integrated. Changeset: 5cffddc6 Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/5cffddc689a0134e1aaacb432d2f0fdd61dd74b1 Stats: 11 lines in 4 files changed: 4 ins; 1 del; 6 mod 8338471: Assert deleted methods not returned by CallInfo Reviewed-by: shade, jwaters, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/21075 From coleenp at openjdk.org Fri Sep 20 19:02:49 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 20 Sep 2024 19:02:49 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> Message-ID: On Wed, 18 Sep 2024 12:54:34 GMT, Roman Kennke wrote: >> src/hotspot/share/oops/markWord.inline.hpp line 90: >> >>> 88: ShouldNotReachHere(); >>> 89: return markWord(); >>> 90: #endif >> >> Is the ifdef _LP64 necessary, since UseCompactObjectHeaders should always be false for 32 bits? > > Kindof. The problem is that klass_shift is larger than 31, and shifting with it would thus be UB and generate a compiler warning. I opted to simply not compile any of that code in 32bit builds. We could also define klass_shift differently on 32bit. > Long-term (maybe with Lilliput2/4-byte-headers?) it would be nice to consolidate the header layout between 32 and 64 bit builds and not make any distinction anywhere. E.g. define markWord (or objectHeader?) in a single way, and use that to extract all the relevant stuff. It's not totally unlikely that we deprecate 32-bit builds before that can happen, though. Ok. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1769069007 From coleenp at openjdk.org Fri Sep 20 19:09:50 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 20 Sep 2024 19:09:50 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: <2w9H6VAbxm7BgSGRwKAxbI56bG-k4bE_ZDviGrBF36o=.3d4cb47f-0f84-479a-a809-6d0186dfad2d@github.com> References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> <2w9H6VAbxm7BgSGRwKAxbI56bG-k4bE_ZDviGrBF36o=.3d4cb47f-0f84-479a-a809-6d0186dfad2d@github.com> Message-ID: On Thu, 19 Sep 2024 14:22:51 GMT, Stefan Karlsson wrote: >> We haven't decided whether or not we will git rid of ```Klass::_prototype_header``` before intergrating this PR, or not. @stefank could point you to a WIP branch, if that's helpful. > > This is my current work-in-progress code: > https://github.com/stefank/jdk/compare/pull/20677...stefank:jdk:lilliput_remove_prototype_header_wip_2 > > I've made some large rewrites and I'm currently running it through functional testing. The refactoring is better in this last version with encode_and_store_compact_object_header, although some comments around the c2 version would be good. Still don't know what the c2 version does. Someone else should review that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1769075714 From dlong at openjdk.org Sat Sep 21 00:22:35 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 21 Sep 2024 00:22:35 GMT Subject: RFR: 8318127: align_up has potential overflow In-Reply-To: References: Message-ID: On Mon, 2 Sep 2024 09:41:29 GMT, Casper Norrbin wrote: > Hi everyone, > > The `align_up` function contained code which could potentially overflow and produce an incorrect result. This PR adds an assert to check for such. > > Additionally, two test case that previously caused an overflow have been updated to use the highest possible values that do not trigger an overflow. Maybe we need two versions of this function, one that allows wrap-around and the other that doesn't. I'm guessing that for most callers, wrap-around would be unexpected and cause bugs, so let's have them ask for it explicitly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20808#issuecomment-2364776411 From kbarrett at openjdk.org Sat Sep 21 04:25:12 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 21 Sep 2024 04:25:12 GMT Subject: RFR: 8340524: Remove NarrowPtrStruct Message-ID: <6LlBPSiXFW9f-ieBnj_LMTf_LAdO92ZHJicxpvgy61Y=.2b387a38-1670-4df2-86a7-fff1cc68cc26@github.com> Please review this change which removes the class NarrowPtrStruct. The only place it was still being used was as the type of CompressedOops::_narrow_oops. Instead, we move the members from NarrowPtrStruct directly into CompressedOops, flattening its structure. Testing: mach5 tier1-3 Tiers 2&3 run serviceability tests that hit the changes to that component. ------------- Commit messages: - remove NarrowPtrStruct Changes: https://git.openjdk.org/jdk/pull/21115/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21115&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340524 Stats: 35 lines in 4 files changed: 4 ins; 7 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/21115.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21115/head:pull/21115 PR: https://git.openjdk.org/jdk/pull/21115 From shade at openjdk.org Sat Sep 21 06:07:06 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Sat, 21 Sep 2024 06:07:06 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized Message-ID: See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. Current patch makes a seqcst write, which is stronger than strictly necessary. I think it is okay to be extra paranoid on rarely-executed class initialization path. Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. Additional testing: - [x] Linux x86_64 server fastdebug, `all` - [x] Linux AArch64 server fastdebug, `all` - [x] GHA to test platform buildability + adhoc platform cross-compilation ------------- Commit messages: - Initial version Changes: https://git.openjdk.org/jdk/pull/21110/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21110&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338379 Stats: 27 lines in 17 files changed: 12 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/21110.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21110/head:pull/21110 PR: https://git.openjdk.org/jdk/pull/21110 From shade at openjdk.org Sat Sep 21 06:08:39 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Sat, 21 Sep 2024 06:08:39 GMT Subject: RFR: 8340524: Remove NarrowPtrStruct In-Reply-To: <6LlBPSiXFW9f-ieBnj_LMTf_LAdO92ZHJicxpvgy61Y=.2b387a38-1670-4df2-86a7-fff1cc68cc26@github.com> References: <6LlBPSiXFW9f-ieBnj_LMTf_LAdO92ZHJicxpvgy61Y=.2b387a38-1670-4df2-86a7-fff1cc68cc26@github.com> Message-ID: On Sat, 21 Sep 2024 04:19:13 GMT, Kim Barrett wrote: > Please review this change which removes the class NarrowPtrStruct. The only > place it was still being used was as the type of CompressedOops::_narrow_oops. > Instead, we move the members from NarrowPtrStruct directly into > CompressedOops, flattening its structure. > > Testing: mach5 tier1-3 > Tiers 2&3 run serviceability tests that hit the changes to that component. Agreed, good cleanup. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21115#pullrequestreview-2319662213 From kbarrett at openjdk.org Sat Sep 21 06:28:37 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 21 Sep 2024 06:28:37 GMT Subject: RFR: 8318127: align_up has potential overflow In-Reply-To: References: Message-ID: On Mon, 16 Sep 2024 08:31:59 GMT, Andrew Haley wrote: >> Hi everyone, >> >> The `align_up` function contained code which could potentially overflow and produce an incorrect result. This PR adds an assert to check for such. >> >> Additionally, two test case that previously caused an overflow have been updated to use the highest possible values that do not trigger an overflow. > > src/hotspot/share/utilities/align.hpp line 76: > >> 74: constexpr T align_up(T size, A alignment) { >> 75: T mask = checked_cast(alignment_mask(alignment)); >> 76: assert(size <= std::numeric_limits::max() - mask, "overflow"); > > I don't really understand this assertion. `align_up((uint32_t)0fffff_ffff, 16) == 0`, because `uint32_t` is an unsigned type: > > _An unsigned integer type has the same width N as the corresponding signed integer type. The range of representable values for the unsigned type is 0 to 2 N ? 1 (inclusive); arithmeticfor the unsigned type is performed modulo 2**N_. > [Note 2 : Unsigned arithmetic does not overflow. Overflow for signed arithmetic yields undefined behavior ] The JBS issue is using "overflow" in the sense of "high bits of the mathematical result are discarded". Fixed-width unsigned arithmetic can certainly overflow in that sense. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20808#discussion_r1769487022 From alanb at openjdk.org Sat Sep 21 06:33:37 2024 From: alanb at openjdk.org (Alan Bateman) Date: Sat, 21 Sep 2024 06:33:37 GMT Subject: RFR: 8328313: Archived module graph should allow identical --module-path to be specified during dump time and run time [v3] In-Reply-To: References: Message-ID: <7x-dr_M70dbSsP6Jr-QIY1g40vSdKMnXmkwfuUElzDg=.ca786a00-1613-4db3-a53b-0ce01942e5bd@github.com> On Fri, 20 Sep 2024 18:16:57 GMT, Calvin Cheung wrote: >> Prior to this patch, if `--module-path` is specified in the command line: >> during CDS dump time, full module graph will not be included in the CDS archive; >> during run time, full module graph will not be used. >> >> With this patch, the full module graph will be included in the CDS archive with the `--module-path` option. During run time, if the same `--module-path` option is specified, the archived module graph will be used. >> >> The checking of module paths between dump time and run time is more lenient compared with the checking of class paths; the ordering of the modules is unimportant, duplicate module names are ignored. >> E.g. the following is considered a match: >> dump time runtime >> m1,m2 m2,m1 >> m1,m2 m1,m2,m2 >> >> I included some [notes](https://bugs.openjdk.org/browse/JDK-8328313?focusedId=14699275&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14699275) in the bug report regarding some changes in the corelib classes. > > Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: > > trailing whitespace src/java.base/share/classes/jdk/internal/module/ModuleReferences.java line 105: > 103: public byte[] generate(String algorithm) { > 104: return ModuleHashes.computeHash(supplier, algorithm); > 105: } Why is JarModuleReader changed to use a file string, is this because of an environment dependency when using a Path? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1769487689 From fyang at openjdk.org Sat Sep 21 06:48:45 2024 From: fyang at openjdk.org (Fei Yang) Date: Sat, 21 Sep 2024 06:48:45 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v24] In-Reply-To: <-TKP8SOh4eIFRiOcn-a3aRb_GEgApaWv9vhyEOiKJSw=.d4ade865-3cb1-477d-a7c7-f46449553a64@github.com> References: <-TKP8SOh4eIFRiOcn-a3aRb_GEgApaWv9vhyEOiKJSw=.d4ade865-3cb1-477d-a7c7-f46449553a64@github.com> Message-ID: On Wed, 18 Sep 2024 17:45:51 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: > > - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion > - Remove redundant comment src/hotspot/cpu/aarch64/gc/g1/g1_aarch64.ad line 257: > 255: RegSet::of($res$$Register) /* no_preserve */); > 256: __ mov($tmp1$$Register, $oldval$$Register); > 257: __ mov($tmp2$$Register, $newval$$Register); Hi, I don't quite understand these two register-register moves here. Seems to me that we could pass `oldval` and `newval` to `cmpxchg` directly as `cmpxchg` won't modify them, which help us save these two moves. Did I miss anything? Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1769492955 From dholmes at openjdk.org Sat Sep 21 09:11:34 2024 From: dholmes at openjdk.org (David Holmes) Date: Sat, 21 Sep 2024 09:11:34 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v2] In-Reply-To: References: Message-ID: On Fri, 20 Sep 2024 15:56:08 GMT, Oli Gillespie wrote: >> Mitigate the impact of JVM_StartThread on safepoint synchronization, by adding a new ThreadStart_lock which limits the number of JVM_StartThread invocations competing for Threads_lock at any given time to 1. >> This gives a VM thread trying to call a safepoint a much better chance of acquiring Threads_lock when there are many JVM_StartThread invocations in flight, at the cost of one extra lock/unlock for every new thread. >> >> Can be disabled with new diagnostic flag `-XX:-UseExtraThreadStartLock`. >> >> Before (ThreadStartTtsp.java is shared in JDK-8340547): >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 1291591 ns >> Reaching safepoint: 59962 ns >> Reaching safepoint: 1958065 ns >> Reaching safepoint: 14456666258 ns <-- 14 seconds! >> ... >> >> >> After: >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 214269 ns >> Reaching safepoint: 60253 ns >> Reaching safepoint: 2040680 ns >> Reaching safepoint: 3089284 ns >> Reaching safepoint: 2998303 ns >> Reaching safepoint: 4433713 ns <-- 4.4ms >> Reaching safepoint: 3368436 ns >> Reaching safepoint: 2986519 ns >> Reaching safepoint: 3269102 ns >> ... >> >> >> >> **Alternatives** >> >> I considered some other options for mitigating this. For example, could we reduce the time spent holding the lock in StartThread? Most of the time is spent managing the threads list for ThreadSMR support, and each time we add a thread to that list we need to copy the whole list and free every entry in the original, which is slow. But I didn't see an easy way to avoid this. >> I also looked at some kind of signal from the VM thread that it is ready to start synchronizing that StartThread could check before trying to grab Threads_lock, but I didn't find anything better than this extra lock. > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Fix build and address comments What is the performance hit in your scenario of starting many threads? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21111#issuecomment-2365071268 From aph at openjdk.org Sun Sep 22 08:48:34 2024 From: aph at openjdk.org (Andrew Haley) Date: Sun, 22 Sep 2024 08:48:34 GMT Subject: RFR: 8318127: align_up has potential overflow In-Reply-To: References: Message-ID: On Sat, 21 Sep 2024 00:20:23 GMT, Dean Long wrote: > Maybe we need two versions of this function, one that allows wrap-around and the other that doesn't. I'm guessing that for most callers, wrap-around would be unexpected and cause bugs, so let's have them ask for it explicitly. Sounds good to me, but we'd have to be careful about existing usages. I'd just make it so that overflow was signalled for signed types, but perhaps explicit checked and unchecked function names are clearer to the reader, ------------- PR Comment: https://git.openjdk.org/jdk/pull/20808#issuecomment-2366119016 From aph at openjdk.org Sun Sep 22 08:48:34 2024 From: aph at openjdk.org (Andrew Haley) Date: Sun, 22 Sep 2024 08:48:34 GMT Subject: RFR: 8318127: align_up has potential overflow In-Reply-To: References: Message-ID: On Sat, 21 Sep 2024 06:26:09 GMT, Kim Barrett wrote: >> src/hotspot/share/utilities/align.hpp line 76: >> >>> 74: constexpr T align_up(T size, A alignment) { >>> 75: T mask = checked_cast(alignment_mask(alignment)); >>> 76: assert(size <= std::numeric_limits::max() - mask, "overflow"); >> >> I don't really understand this assertion. `align_up((uint32_t)0fffff_ffff, 16) == 0`, because `uint32_t` is an unsigned type: >> >> _An unsigned integer type has the same width N as the corresponding signed integer type. The range of representable values for the unsigned type is 0 to 2 N ? 1 (inclusive); arithmeticfor the unsigned type is performed modulo 2**N_. >> [Note 2 : Unsigned arithmetic does not overflow. Overflow for signed arithmetic yields undefined behavior ] > > The JBS issue is using "overflow" in the sense of "high bits of the mathematical result are discarded". > Fixed-width unsigned arithmetic can certainly overflow in that sense. Perhaps, although "overflow" has a precise definition in C++, but I would be extremely surprised if I were looking for the address of the end of the page at 0fffff_ffff and was informed of an overflow. IMO, _there is no overflow_ in that case, and the correct answer must be 0. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20808#discussion_r1770223524 From jbhateja at openjdk.org Sun Sep 22 09:49:38 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 22 Sep 2024 09:49:38 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 16:13:55 GMT, Quan Anh Mai wrote: > Hi, > > This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. > > Regarding the related issues: > > - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. > - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` > - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. > > Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. > > Please take a look and leave reviews. Thanks a lot. > > The description of the original PR: > > This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: > > Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. > Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. > Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. > Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. > Upon these changes, a `rearrange` can emit more efficient code: > > var species = IntVector.SPECIES_128; > var v1 = IntVector.fromArray(species, SRC1, 0); > var v2 = IntVector.fromArray(species, SRC2, 0); > v1.rearrange(v2.toShuffle()).intoArray(DST, 0); > > Before: > movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} > vmovdqu 0x10(%r10),%xmm2 > movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} > vmovdqu 0x10(%r10),%xmm0 > vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask > ; {ex... > Got it, I think #20508 and this PR are unrelated implementation-wise, though. > > @jatin-bhateja What do you think of using this patch and intrinsifing `Vector::rearrange(VectorShuffle, Vector)` instead of introducing the 2 vector `selectFrom` API? Hi @merykitty , I had implemented as [similar LoadShuffle bypassing optimization](https://github.com/openjdk/jdk/pull/20508/commits/7c80bfce59f486f6c25aec13f0f0f6a42f5319b1) in my original implementation of PR #20508 , which we decided to address in subsequent patch for both the flavors of selectFromAPI. Main difference b/w two vector re-arrange and selectFrom API is w.r.t to their signatures and acceptable index ranges post wrapping. In the latter case wrapping brings down the index range into [0, 2*VLEN -1) while in the former case we prune the exceptional indexes into valid single vector index range [0, VLEN) augmented with selection mask which picks the elements from independently permuted vectors to produce result vector. Unlike single vector re-arrange which now favors index wrapping parting ways from throwing IndexOutOfBounds exception for exceptional indexes (-ve indexes), two vector re-arrange wraps exceptional indexes into valid single vector range. To bring the exceptional indexes into valid two vector range will need changes int wrapping logic to add 2*VECLEN to exceptional indexes, but this may be implemented in target specific manner, we can take this up in a follow up patch after integrating #20508 As Paul mentioned, vector rearrange and selectFrom are complimentary APIs with different signatures and we intend to produce optimal code for both. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2366421736 From kbarrett at openjdk.org Sun Sep 22 11:05:41 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 22 Sep 2024 11:05:41 GMT Subject: RFR: 8318127: align_up has potential overflow In-Reply-To: References: Message-ID: On Sun, 22 Sep 2024 08:43:03 GMT, Andrew Haley wrote: >> The JBS issue is using "overflow" in the sense of "high bits of the mathematical result are discarded". >> Fixed-width unsigned arithmetic can certainly overflow in that sense. > > Perhaps, although "overflow" has a precise definition in C++, but I would be extremely surprised if I were looking for the address of the end of the page at 0fffff_ffff and was informed of an overflow. IMO, _there is no overflow_ in that case, and the correct answer must be 0. There's no "perhaps" about the intended meaning in the JBS issue. I wrote that issue; I remember what I meant. :) I suppose I could have been more precise. So I disagree. I think align_up has an implied post-condition that the result is not less than the value being aligned. That's certainly how it's used, in every occurrance I've looked at. (I admit I didn't look at all ~450 uses though.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20808#discussion_r1770523394 From jwaters at openjdk.org Sun Sep 22 11:07:36 2024 From: jwaters at openjdk.org (Julian Waters) Date: Sun, 22 Sep 2024 11:07:36 GMT Subject: RFR: 8340524: Remove NarrowPtrStruct In-Reply-To: <6LlBPSiXFW9f-ieBnj_LMTf_LAdO92ZHJicxpvgy61Y=.2b387a38-1670-4df2-86a7-fff1cc68cc26@github.com> References: <6LlBPSiXFW9f-ieBnj_LMTf_LAdO92ZHJicxpvgy61Y=.2b387a38-1670-4df2-86a7-fff1cc68cc26@github.com> Message-ID: <56QcgGlM7piNby0Sh5dZau5n88BZRupoNJfx9lpLodM=.c9f9b3ac-df56-4f19-84ad-7f88f473abc0@github.com> On Sat, 21 Sep 2024 04:19:13 GMT, Kim Barrett wrote: > Please review this change which removes the class NarrowPtrStruct. The only > place it was still being used was as the type of CompressedOops::_narrow_oops. > Instead, we move the members from NarrowPtrStruct directly into > CompressedOops, flattening its structure. > > Testing: mach5 tier1-3 > Tiers 2&3 run serviceability tests that hit the changes to that component. Took me a while to work through the confusing diff in one of the files, but this looks ok src/hotspot/share/oops/compressedOops.hpp line 37: > 35: class ReservedHeapSpace; > 36: > 37: struct NarrowPtrStruct { Not an objection, just a complaint on how the GitHub diff view makes this confusing to review ------------- Marked as reviewed by jwaters (Committer). PR Review: https://git.openjdk.org/jdk/pull/21115#pullrequestreview-2320823566 PR Review Comment: https://git.openjdk.org/jdk/pull/21115#discussion_r1770523694 From stuefe at openjdk.org Sun Sep 22 12:01:51 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 22 Sep 2024 12:01:51 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Fri, 20 Sep 2024 16:56:58 GMT, Matias Saavedra Silva wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix bit counts in GCForwarding > > src/hotspot/share/cds/archiveUtils.cpp line 348: > >> 346: old_tag = (int)(intptr_t)nextPtr(); >> 347: // do_int(&old_tag); >> 348: assert(tag == old_tag, "tag doesn't match (%d, expected %d)", old_tag, tag); > > Is this assert message change a leftover from debugging or is it meant to be this way? Its a leftover, but otoh it does not hurt. I found myself re-adding it several times to analyze CDS issues during development, so I decided to just leave it in. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1770536320 From dholmes at openjdk.org Mon Sep 23 01:51:41 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 23 Sep 2024 01:51:41 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized In-Reply-To: References: Message-ID: On Fri, 20 Sep 2024 14:02:51 GMT, Aleksey Shipilev wrote: > See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. > > In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. Current patch makes a seqcst write, which is stronger than strictly necessary. I think it is okay to be extra paranoid on rarely-executed class initialization path. > > Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). > > I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` > - [x] GHA to test platform buildability + adhoc platform cross-compilation Seems far more extensive than what was discussed. Code that takes the lock-free path to check `in_initialized` is what I thought we agreed needed the acquire/release not every read of the state variable. This code will be executed a lot and in 99.99% of cases the memory barriers are not needed. src/hotspot/share/oops/instanceKlass.cpp line 4099: > 4097: #endif > 4098: assert(_init_thread == nullptr, "should be cleared before state change"); > 4099: Atomic::release_store_fence(&_init_state, state); Why not just a release_store ?? Why do we need the trailing fence? ------------- PR Review: https://git.openjdk.org/jdk/pull/21110#pullrequestreview-2321028771 PR Review Comment: https://git.openjdk.org/jdk/pull/21110#discussion_r1770709316 From dholmes at openjdk.org Mon Sep 23 02:16:40 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 23 Sep 2024 02:16:40 GMT Subject: RFR: 8338851: Hoist os::Posix::realpath() to os::realpath() and implement on Windows [v13] In-Reply-To: References: Message-ID: On Fri, 20 Sep 2024 13:14:54 GMT, Simon Tooke wrote: >> This PR changes the status of realpath() from a Posix-specific API to a globally available API, i.e. adding it to the "Hotspot Porting API". Code would refer to os::realpath() instead of os::Posix::realpath(). >> >> This requires a Windows implementation of realpath(), using Windows _fullpath(), and renaming os::Posix::realpath() to os::realpath(). >> >> The main difference between POSIX and Windows behaviour is that POSIX actually requires an existing accessible file, while Windows will happily work with made-up filenames. >> >> Please note that guidelines for doing this appear in src/hotspot/share/runtime/os.hpp > > Simon Tooke has updated the pull request incrementally with one additional commit since the last revision: > > delete commented out code Changes requested by dholmes (Reviewer). test/hotspot/gtest/runtime/test_os.cpp line 434: > 432: #if defined(_WINDOWS) > 433: EXPECT_TRUE(returnedBuffer == buffer); > 434: EXPECT_TRUE(errno == 0); I thought we concluded you cannot guarantee that errno==0 after a successful call? test/hotspot/gtest/runtime/test_os.cpp line 442: > 440: errno = 0; > 441: returnedBuffer = os::realpath(tmppath, buffer, MAX_PATH); > 442: EXPECT_TRUE(returnedBuffer != nullptr); Why the change? test/hotspot/gtest/runtime/test_os.cpp line 452: > 450: EXPECT_TRUE(returnedBuffer == nullptr); > 451: EXPECT_TRUE(errno == ENAMETOOLONG); > 452: #endif I think it would be better to increase the buffer size on macOS so this remains a positive test for all platforms. test/hotspot/gtest/runtime/test_os.cpp line 460: > 458: > 459: /* the following tests cause an assert in fastdebug mode */ > 460: DEBUG_ONLY(if (false)) { Suggestion: #ifndef ASSERT no need for a runtime check. test/hotspot/gtest/runtime/test_os.cpp line 467: > 465: > 466: errno = 0; > 467: returnedBuffer = os::realpath(tmppath, buffer, sizeof(buffer)); This is still not an EINVAL case - the buffer should be null. Suggestion: returnedBuffer = os::realpath(tmppath, nullptr, sizeof(buffer)); ------------- PR Review: https://git.openjdk.org/jdk/pull/20683#pullrequestreview-2321034429 PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1770713210 PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1770713938 PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1770716883 PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1770717314 PR Review Comment: https://git.openjdk.org/jdk/pull/20683#discussion_r1770717227 From sjayagond at openjdk.org Mon Sep 23 03:40:09 2024 From: sjayagond at openjdk.org (Sidraya Jayagond) Date: Mon, 23 Sep 2024 03:40:09 GMT Subject: RFR: 8327652: S390x: Implements SLP support [v10] In-Reply-To: References: Message-ID: > This PR Adds SIMD support on s390x. Sidraya Jayagond has updated the pull request incrementally with one additional commit since the last revision: Use Op_regF instead of Op_VecX Revert commit 3caa470c0f89be306e5b43c5da4ca9e625abfe6b ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18162/files - new: https://git.openjdk.org/jdk/pull/18162/files/3f2af99e..00973c63 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18162&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18162&range=08-09 Stats: 239 lines in 6 files changed: 143 ins; 1 del; 95 mod Patch: https://git.openjdk.org/jdk/pull/18162.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18162/head:pull/18162 PR: https://git.openjdk.org/jdk/pull/18162 From sjayagond at openjdk.org Mon Sep 23 04:40:18 2024 From: sjayagond at openjdk.org (Sidraya Jayagond) Date: Mon, 23 Sep 2024 04:40:18 GMT Subject: RFR: 8327652: S390x: Implements SLP support [v11] In-Reply-To: References: Message-ID: > This PR Adds SIMD support on s390x. Sidraya Jayagond has updated the pull request incrementally with one additional commit since the last revision: Fix whitespace errors ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18162/files - new: https://git.openjdk.org/jdk/pull/18162/files/00973c63..34341aed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18162&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18162&range=09-10 Stats: 16 lines in 4 files changed: 0 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/18162.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18162/head:pull/18162 PR: https://git.openjdk.org/jdk/pull/18162 From sjayagond at openjdk.org Mon Sep 23 04:43:39 2024 From: sjayagond at openjdk.org (Sidraya Jayagond) Date: Mon, 23 Sep 2024 04:43:39 GMT Subject: RFR: 8327652: S390x: Implements SLP support [v9] In-Reply-To: References: Message-ID: On Mon, 26 Aug 2024 21:10:43 GMT, Cesar Soares Lucas wrote: >> Sidraya Jayagond has updated the pull request incrementally with one additional commit since the last revision: >> >> Add rebase changes from jdk master > > src/hotspot/cpu/s390/vmreg_s390.cpp line 48: > >> 46: >> 47: VectorRegister vreg = ::as_VectorRegister(0); >> 48: for (; i < ConcreteRegisterImpl::max_vr;) { > > NIT: this really looks like a `while` loop. @JohnTortugo Could you elaborate more on this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18162#discussion_r1770764969 From amitkumar at openjdk.org Mon Sep 23 04:50:43 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 23 Sep 2024 04:50:43 GMT Subject: RFR: 8327652: S390x: Implements SLP support [v9] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 04:41:07 GMT, Sidraya Jayagond wrote: >> src/hotspot/cpu/s390/vmreg_s390.cpp line 48: >> >>> 46: >>> 47: VectorRegister vreg = ::as_VectorRegister(0); >>> 48: for (; i < ConcreteRegisterImpl::max_vr;) { >> >> NIT: this really looks like a `while` loop. > > @JohnTortugo Could you elaborate more on this? I think this is what he meant: Suggestion: while(i < ConcreteRegisterImpl::max_vr) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18162#discussion_r1770767832 From thartmann at openjdk.org Mon Sep 23 05:48:38 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 23 Sep 2024 05:48:38 GMT Subject: RFR: 8335334: Stress mode to randomly execute unstable if traps [v4] In-Reply-To: References: Message-ID: <21jVqrnhSbkZIouDaIsJNOszGaGFl1hATDj30c8TjzQ=.91daf150-a4ef-4eb9-8e74-5bd7f96e8153@github.com> On Thu, 19 Sep 2024 12:48:53 GMT, Tobias Hartmann wrote: >> Unstable if traps are supposed to be taken rarely. This patch introduces a `StressUnstableIfTraps` flag that forces unstable if traps to be taken randomly and thus potentially triggering intermittent bugs such as incorrect debug information. It works by adding another if before the unstable if that checks a "random" condition at runtime (a simple shared counter) and then either takes the trap or executes the original, unstable if. >> >> This stress option also has the nice side effect of triggering re-compilation of methods that would otherwise not be re-compiled (see [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843)). >> >> I had to adjust a few tests that rely on methods being compiled / deoptimized because with the stress option enabled, deoptimization might unexpectedly (not) happen. >> >> It reliably triggers [JDK-8335977](https://bugs.openjdk.org/browse/JDK-8335977), [JDK-8320308](https://bugs.openjdk.org/browse/JDK-8320308) and [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843) in our testing. >> >> Tested with multiple runs up to tier6. I'll integrate it into our (Oracle internal) testing and CTW [(JDK-8340302)](https://bugs.openjdk.org/browse/JDK-8340302) once all the bugs that it currently triggers are fixed. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > More reviewer comments Thanks Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21037#issuecomment-2367280611 From kbarrett at openjdk.org Mon Sep 23 05:52:39 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 23 Sep 2024 05:52:39 GMT Subject: Integrated: 8340524: Remove NarrowPtrStruct In-Reply-To: <6LlBPSiXFW9f-ieBnj_LMTf_LAdO92ZHJicxpvgy61Y=.2b387a38-1670-4df2-86a7-fff1cc68cc26@github.com> References: <6LlBPSiXFW9f-ieBnj_LMTf_LAdO92ZHJicxpvgy61Y=.2b387a38-1670-4df2-86a7-fff1cc68cc26@github.com> Message-ID: On Sat, 21 Sep 2024 04:19:13 GMT, Kim Barrett wrote: > Please review this change which removes the class NarrowPtrStruct. The only > place it was still being used was as the type of CompressedOops::_narrow_oops. > Instead, we move the members from NarrowPtrStruct directly into > CompressedOops, flattening its structure. > > Testing: mach5 tier1-3 > Tiers 2&3 run serviceability tests that hit the changes to that component. This pull request has now been integrated. Changeset: dd498794 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/dd498794f20df0ac1a73d84e54591905c8a5a5c7 Stats: 35 lines in 4 files changed: 4 ins; 7 del; 24 mod 8340524: Remove NarrowPtrStruct Reviewed-by: shade, jwaters ------------- PR: https://git.openjdk.org/jdk/pull/21115 From kbarrett at openjdk.org Mon Sep 23 05:52:38 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 23 Sep 2024 05:52:38 GMT Subject: RFR: 8340524: Remove NarrowPtrStruct In-Reply-To: References: <6LlBPSiXFW9f-ieBnj_LMTf_LAdO92ZHJicxpvgy61Y=.2b387a38-1670-4df2-86a7-fff1cc68cc26@github.com> Message-ID: On Sat, 21 Sep 2024 06:06:10 GMT, Aleksey Shipilev wrote: >> Please review this change which removes the class NarrowPtrStruct. The only >> place it was still being used was as the type of CompressedOops::_narrow_oops. >> Instead, we move the members from NarrowPtrStruct directly into >> CompressedOops, flattening its structure. >> >> Testing: mach5 tier1-3 >> Tiers 2&3 run serviceability tests that hit the changes to that component. > > Agreed, good cleanup. Thanks for reviews @shipilev and @TheShermanTanker . ------------- PR Comment: https://git.openjdk.org/jdk/pull/21115#issuecomment-2367282433 From ccheung at openjdk.org Mon Sep 23 05:59:38 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Mon, 23 Sep 2024 05:59:38 GMT Subject: RFR: 8328313: Archived module graph should allow identical --module-path to be specified during dump time and run time [v3] In-Reply-To: <7x-dr_M70dbSsP6Jr-QIY1g40vSdKMnXmkwfuUElzDg=.ca786a00-1613-4db3-a53b-0ce01942e5bd@github.com> References: <7x-dr_M70dbSsP6Jr-QIY1g40vSdKMnXmkwfuUElzDg=.ca786a00-1613-4db3-a53b-0ce01942e5bd@github.com> Message-ID: On Sat, 21 Sep 2024 06:31:16 GMT, Alan Bateman wrote: >> Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: >> >> trailing whitespace > > src/java.base/share/classes/jdk/internal/module/ModuleReferences.java line 105: > >> 103: public byte[] generate(String algorithm) { >> 104: return ModuleHashes.computeHash(supplier, algorithm); >> 105: } > > Why is JarModuleReader changed to use a file string, is this because of an environment dependency when using a Path? It is to avoid the following warnings during dump time: [1.607s][warning][cds,heap ] Archive heap points to a static field that may be reinitialized at runtime: [1.607s][warning][cds,heap ] Field: java/util/zip/ZipFile$Source::builtInFS [1.607s][warning][cds,heap ] Value: sun.nio.fs.LinuxFileSystem ... [1.607s][warning][cds,heap ] Archive heap points to a static field that may be reinitialized at runtime: [1.607s][warning][cds,heap ] Field: sun/nio/fs/DefaultFileSystemProvider::INSTANCE [1.607s][warning][cds,heap ] Value: sun.nio.fs.LinuxFileSystemProvider ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1770803768 From shade at openjdk.org Mon Sep 23 07:05:40 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 23 Sep 2024 07:05:40 GMT Subject: RFR: 8340392: Handle OopStorage in location decoder [v6] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 08:29:20 GMT, Aleksey Shipilev wrote: >> Another debugging QoL improvement. Currently, when there is a pointer into `OopStorage` that we need to decode for the error log, we just print: >> >> 0x00007ad45c169e10 into live malloced block starting at 0x00007ad45c169dd0, size 632, tag mtInternal >> >> >> This is reported by NMT after [JDK-8304815](https://bugs.openjdk.org/browse/JDK-8304815). It is likely worse without NMT. We can actually decode which block in which `OopStorage` the address likely belongs to. This becomes handy when debugging GC crashes that involve `OopStorage`-handled roots. >> >> This patch is able to print the following instead: >> >> >> 0x0000000102c05bd0 is a pointer 2/64 into block 0 in oop storage "VM Global" > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Also assert "unaligned" is not printed for aligned pointers Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21072#issuecomment-2367376021 From shade at openjdk.org Mon Sep 23 07:05:41 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 23 Sep 2024 07:05:41 GMT Subject: Integrated: 8340392: Handle OopStorage in location decoder In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 17:47:21 GMT, Aleksey Shipilev wrote: > Another debugging QoL improvement. Currently, when there is a pointer into `OopStorage` that we need to decode for the error log, we just print: > > 0x00007ad45c169e10 into live malloced block starting at 0x00007ad45c169dd0, size 632, tag mtInternal > > > This is reported by NMT after [JDK-8304815](https://bugs.openjdk.org/browse/JDK-8304815). It is likely worse without NMT. We can actually decode which block in which `OopStorage` the address likely belongs to. This becomes handy when debugging GC crashes that involve `OopStorage`-handled roots. > > This patch is able to print the following instead: > > > 0x0000000102c05bd0 is a pointer 2/64 into block 0 in oop storage "VM Global" This pull request has now been integrated. Changeset: 0f253d11 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/0f253d11033a26d15ea20df19db6765bb274a848 Stats: 132 lines in 7 files changed: 131 ins; 0 del; 1 mod 8340392: Handle OopStorage in location decoder Reviewed-by: kbarrett, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/21072 From shade at openjdk.org Mon Sep 23 07:17:50 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 23 Sep 2024 07:17:50 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: > See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. > > In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. Current patch makes a seqcst write, which is stronger than strictly necessary. I think it is okay to be extra paranoid on rarely-executed class initialization path. > > Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). > > I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` > - [x] GHA to test platform buildability + adhoc platform cross-compilation Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Relax to just a release ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21110/files - new: https://git.openjdk.org/jdk/pull/21110/files/66dc20b6..179d8aa1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21110&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21110&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21110.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21110/head:pull/21110 PR: https://git.openjdk.org/jdk/pull/21110 From shade at openjdk.org Mon Sep 23 07:17:50 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 23 Sep 2024 07:17:50 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: <7TWC-8mou61HHIklbJ9Ox2XpSNQBD-QfBKXWyDhd3C8=.431c5d1b-448b-4676-8a0d-f23906480546@github.com> On Mon, 23 Sep 2024 01:46:12 GMT, David Holmes wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Relax to just a release > > src/hotspot/share/oops/instanceKlass.cpp line 4099: > >> 4097: #endif >> 4098: assert(_init_thread == nullptr, "should be cleared before state change"); >> 4099: Atomic::release_store_fence(&_init_state, state); > > Why not just a release_store ?? Why do we need the trailing fence? Says in PR: "Current patch makes a seqcst write, which is stronger than strictly necessary. I think it is okay to be extra paranoid on rarely-executed class initialization path." But I can turn it into just a weaker release, sure. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21110#discussion_r1770871023 From mli at openjdk.org Mon Sep 23 07:30:59 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 23 Sep 2024 07:30:59 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > Thanks! > > This patch is based on https://github.com/openjdk/jdk/pull/20781 which added the sleef source (in particular the generated sleef inline headers). We use sleef api to vectorize the math operations in vector api. > > On machine with vector intrinsic support on riscv (e.g. gcc 14+) it will generate libsleef.so with the bridge functions to sleef api, otherwise without the bridge functions. > > ### Test > test/jdk/jdk/incubator/vector > > ### Performance > data on bananapi > > Benchmark - bananapi | (size) | Mode | Cnt | Score +intrinsic | Error +intrinsic | Score -intrinsic | Error -intrinsic | Units | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Double128Vector.ACOS | 1024 | avgt | 10 | 112444.388 | 655.761 | 208554.742 | 1508.709 | ns/op | 1.855 > Double128Vector.ASIN | 1024 | avgt | 10 | 104121.259 | 243.167 | 208314.499 | 2833.61 | ns/op | 2.001 > Double128Vector.ATAN | 1024 | avgt | 10 | 136941.263 | 243.486 | 284024.53 | 2204.224 | ns/op | 2.074 > Double128Vector.ATAN2 | 1024 | avgt | 10 | 163228.681 | 435.455 | 427589.587 | 3045.192 | ns/op | 2.62 > Double128Vector.CBRT | 1024 | avgt | 10 | 146395.753 | 239.355 | 317136.654 | 1330.869 | ns/op | 2.166 > Double128Vector.COS | 1024 | avgt | 10 | 154865.298 | 235.697 | 305721.518 | 1319.313 | ns/op | 1.974 > Double128Vector.COSH | 1024 | avgt | 10 | 189212.943 | 262.399 | 220756.27 | 61324.863 | ns/op | 1.167 > Double128Vector.EXP | 1024 | avgt | 10 | 113941.594 | 219.647 | 252853.07 | 891.272 | ns/op | 2.219 > Double128Vector.EXPM1 | 1024 | avgt | 10 | 184552.939 | 513.715 | 254087.184 | 2144.997 | ns/op | 1.377 > Double128Vector.HYPOT | 1024 | avgt | 10 | 111580.194 | 423.282 | 374537.338 | 2091.811 | ns/op | 3.357 > Double128Vector.LOG | 1024 | avgt | 10 | 110680.548 | 192.731 | 265391.129 | 2653.519 | ns/op | 2.398 > Double128Vector.LOG10 | 1024 | avgt | 10 | 116708.105 | 167.095 | 285764.405 | 2489.08 | ns/op | 2.449 > Double128Vector.LOG1P | 1024 | avgt | 10 | 115633.302 | 567.7 | 317235.967 | 1062.848 | ns/op | 2.743 > Double128Vector.POW | 1024 | avgt | 10 | 321655.14 | 36.55 | 560765.066 | 2669.33 | ns/op | 1.743 > Double128Vector.... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: modify cflags style ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21083/files - new: https://git.openjdk.org/jdk/pull/21083/files/304b74a6..26a68071 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21083&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21083&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21083.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21083/head:pull/21083 PR: https://git.openjdk.org/jdk/pull/21083 From mli at openjdk.org Mon Sep 23 07:30:59 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 23 Sep 2024 07:30:59 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v2] In-Reply-To: References: Message-ID: On Fri, 20 Sep 2024 12:34:38 GMT, Erik Joelsson wrote: >> Thanks. I'm sorry too, I'm not familiar with the build system. >> What you expected could be something like below? >> >> diff --git a/make/modules/jdk.incubator.vector/Lib.gmk b/make/modules/jdk.incubator.vector/Lib.gmk >> index 5e52277919a..c6c6103a301 100644 >> --- a/make/modules/jdk.incubator.vector/Lib.gmk >> +++ b/make/modules/jdk.incubator.vector/Lib.gmk >> @@ -41,11 +41,12 @@ endif >> ifeq ($(call isTargetOs, linux)+$(call isTargetCpu, riscv64)+$(INCLUDE_COMPILER2), true+true+true) >> $(eval $(call SetupJdkLibrary, BUILD_LIBSLEEF, \ >> NAME := sleef, \ >> + OPTIMIZATION := HIGH, \ >> SRC := libsleef/lib, \ >> EXTRA_SRC := libsleef/generated, \ >> DISABLED_WARNINGS_gcc := unused-function sign-compare tautological-compare ignored-qualifiers, \ >> DISABLED_WARNINGS_clang := unused-function sign-compare tautological-compare ignored-qualifiers, \ >> - CFLAGS := $(CFLAGS_JDKLIB) -O3 -march=rv64gcv, \ >> + CFLAGS := $(CFLAGS_JDKLIB) -march=rv64gcv, \ >> LDFLAGS := $(LDFLAGS_JDKLIB) \ >> $(call SET_SHARED_LIBRARY_ORIGIN), \ >> LIBS := $(JDKLIB_LIBS) \ > > Yes, exactly. Modified, Thank you! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1770886456 From shade at openjdk.org Mon Sep 23 07:33:35 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 23 Sep 2024 07:33:35 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 01:48:47 GMT, David Holmes wrote: > Seems far more extensive than what was discussed. Code that takes the lock-free path to check `in_initialized` is what I thought we agreed needed the acquire/release not every read of the state variable. This code will be executed a lot and in 99.99% of cases the memory barriers are not needed. This just extends the architectural parts of the patch we agreed with @coleenp for the fix. Which parts you think are excessive? The acquires in `instanceKlass.hpp`? It would be hard to track which one of those are used without a lock, I think. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2367428015 From rcastanedalo at openjdk.org Mon Sep 23 07:48:16 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 23 Sep 2024 07:48:16 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v25] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 46 additional commits since the last revision: - Merge jdk-24+16 - Ensure that detected encode-and-store patterns are matched - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion - Remove redundant comment - Assert that unneeded stub tmp registers are not initialized in x64 and aarch64 platforms - Set tmp registers to noreg by default in G1PreBarrierStubC2::initialize_registers, for consistency - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion - Restore some asserts - Default values for tmp regs of G1PostBarrierStubC2 - 8334060: [arm32] Implementation of Late Barrier Expansion for G1 - ... and 36 more: https://git.openjdk.org/jdk/compare/bdb0e33c...47c982ba ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/d54d67f1..47c982ba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=23-24 Stats: 170497 lines in 1328 files changed: 155223 ins; 8073 del; 7201 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From aboldtch at openjdk.org Mon Sep 23 07:49:17 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 23 Sep 2024 07:49:17 GMT Subject: RFR: 8340422: ZGC: TestAllocateHeapAt.java should not run with transparent hugepages Message-ID: <7DtNgJ7IORWdKXdZwQsKuWLDG8uZJmGLAQaoFbGcg9I=.95486530-8384-4f65-b0a4-8793139078dd@github.com> Similarly to [JDK-8340146](https://bugs.openjdk.org/browse/JDK-8340146) / #21127 does not work well with transparent hugepages. Because a machine may be configured in such a way that UseTransperetHugePages option gets ignored, the test driver must also check if it will be. As such I extracted the `test/hotspot/jtreg/runtime/os/HugePageConfiguration.java` utility into shared test library. On none linux machines the `vm.opt.final.UseTransparentHugePages` will be null, but the test checks `os.family == "linux"` first. I have not observed an issue with the JTREG filter on none linux machines. But will double check that it does not cause an issue. ------------- Commit messages: - 8340422: ZGC: TestAllocateHeapAt.java should not run with transparent hugepages Changes: https://git.openjdk.org/jdk/pull/21129/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21129&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340422 Stats: 44 lines in 7 files changed: 41 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21129.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21129/head:pull/21129 PR: https://git.openjdk.org/jdk/pull/21129 From rcastanedalo at openjdk.org Mon Sep 23 07:57:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 23 Sep 2024 07:57:52 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v25] In-Reply-To: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Fri, 13 Sep 2024 22:51:59 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 46 additional commits since the last revision: >> >> - Merge jdk-24+16 >> - Ensure that detected encode-and-store patterns are matched >> - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion >> - Remove redundant comment >> - Assert that unneeded stub tmp registers are not initialized in x64 and aarch64 platforms >> - Set tmp registers to noreg by default in G1PreBarrierStubC2::initialize_registers, for consistency >> - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion >> - Restore some asserts >> - Default values for tmp regs of G1PostBarrierStubC2 >> - 8334060: [arm32] Implementation of Late Barrier Expansion for G1 >> - ... and 36 more: https://git.openjdk.org/jdk/compare/da906826...47c982ba > > src/hotspot/share/opto/matcher.cpp line 1821: > >> 1819: if( rule >= _END_INST_CHAIN_RULE || rule < _BEGIN_INST_CHAIN_RULE ) { >> 1820: assert(C->node_arena()->contains(s->_leaf) || !has_new_node(s->_leaf), >> 1821: "duplicating node that's already been matched"); > > Why it was removed? The assertion was failing due to it being too strict in several cases where the matcher would generate valid code anyway. One of them is when `is_encode_and_store_pattern(n, m)` returns true but `m -> n` cannot be matched by a single `g1EncodePAndStoreN` instruction. Commit 9ad158b6 removes this case by ensuring that `is_encode_and_store_pattern(n, m)` holds only if `m -> n` can indeed be matched. There are other cases (all of them harmless as far as I can see) in which this assertion can fail. I am investigating whether they can be avoided so that the assertion can be restored, and what would be the impact on the "redundant decompression removal" (`g1EncodePAndStoreN`) optimization. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1770925777 From stefank at openjdk.org Mon Sep 23 08:07:37 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 23 Sep 2024 08:07:37 GMT Subject: RFR: 8340422: ZGC: TestAllocateHeapAt.java should not run with transparent hugepages In-Reply-To: <7DtNgJ7IORWdKXdZwQsKuWLDG8uZJmGLAQaoFbGcg9I=.95486530-8384-4f65-b0a4-8793139078dd@github.com> References: <7DtNgJ7IORWdKXdZwQsKuWLDG8uZJmGLAQaoFbGcg9I=.95486530-8384-4f65-b0a4-8793139078dd@github.com> Message-ID: On Mon, 23 Sep 2024 07:43:32 GMT, Axel Boldt-Christmas wrote: > Similarly to [JDK-8340146](https://bugs.openjdk.org/browse/JDK-8340146) / #21127 does not work well with transparent hugepages. > > Because a machine may be configured in such a way that UseTransperetHugePages option gets ignored, the test driver must also check if it will be. As such I extracted the `test/hotspot/jtreg/runtime/os/HugePageConfiguration.java` utility into shared test library. > > On none linux machines the `vm.opt.final.UseTransparentHugePages` will be null, but the test checks `os.family == "linux"` first. I have not observed an issue with the JTREG filter on none linux machines. But will double check that it does not cause an issue. Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21129#pullrequestreview-2321420577 From chagedorn at openjdk.org Mon Sep 23 09:10:41 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 23 Sep 2024 09:10:41 GMT Subject: RFR: 8335334: Stress mode to randomly execute unstable if traps [v4] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 12:48:53 GMT, Tobias Hartmann wrote: >> Unstable if traps are supposed to be taken rarely. This patch introduces a `StressUnstableIfTraps` flag that forces unstable if traps to be taken randomly and thus potentially triggering intermittent bugs such as incorrect debug information. It works by adding another if before the unstable if that checks a "random" condition at runtime (a simple shared counter) and then either takes the trap or executes the original, unstable if. >> >> This stress option also has the nice side effect of triggering re-compilation of methods that would otherwise not be re-compiled (see [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843)). >> >> I had to adjust a few tests that rely on methods being compiled / deoptimized because with the stress option enabled, deoptimization might unexpectedly (not) happen. >> >> It reliably triggers [JDK-8335977](https://bugs.openjdk.org/browse/JDK-8335977), [JDK-8320308](https://bugs.openjdk.org/browse/JDK-8320308) and [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843) in our testing. >> >> Tested with multiple runs up to tier6. I'll integrate it into our (Oracle internal) testing and CTW [(JDK-8340302)](https://bugs.openjdk.org/browse/JDK-8340302) once all the bugs that it currently triggers are fixed. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > More reviewer comments Small suggestion, otherwise, it looks good to me, too! src/hotspot/share/opto/parse2.cpp line 1587: > 1585: } > 1586: > 1587: void Parse::stress_trap(IfNode* orig_iff, Node* counter, Node* incr_store) { Can you add a brief summary as method comment (i.e. following the explanation in the PR)? This would help to understand the reason why we want to stress traps, how it looks in the IR, and how frequently the trap is taken. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21037#pullrequestreview-2321188322 PR Review Comment: https://git.openjdk.org/jdk/pull/21037#discussion_r1770809178 From luhenry at openjdk.org Mon Sep 23 09:12:36 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 23 Sep 2024 09:12:36 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v2] In-Reply-To: References: Message-ID: <24D2-W-fmlFZ4Ke2wLc-FPKnpskxNIa4aB7NL5ArI8U=.f0e5bdbd-697e-4aea-99ac-1472d14136f7@github.com> On Mon, 23 Sep 2024 07:30:59 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> Thanks! >> >> This patch is based on https://github.com/openjdk/jdk/pull/20781 which added the sleef source (in particular the generated sleef inline headers). We use sleef api to vectorize the math operations in vector api. >> >> On machine with vector intrinsic support on riscv (e.g. gcc 14+) it will generate libsleef.so with the bridge functions to sleef api, otherwise without the bridge functions. >> >> ### Test >> test/jdk/jdk/incubator/vector >> >> ### Performance >> data on bananapi >> >> Benchmark - bananapi | (size) | Mode | Cnt | Score +intrinsic | Error +intrinsic | Score -intrinsic | Error -intrinsic | Units | Improvement >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Double128Vector.ACOS | 1024 | avgt | 10 | 112444.388 | 655.761 | 208554.742 | 1508.709 | ns/op | 1.855 >> Double128Vector.ASIN | 1024 | avgt | 10 | 104121.259 | 243.167 | 208314.499 | 2833.61 | ns/op | 2.001 >> Double128Vector.ATAN | 1024 | avgt | 10 | 136941.263 | 243.486 | 284024.53 | 2204.224 | ns/op | 2.074 >> Double128Vector.ATAN2 | 1024 | avgt | 10 | 163228.681 | 435.455 | 427589.587 | 3045.192 | ns/op | 2.62 >> Double128Vector.CBRT | 1024 | avgt | 10 | 146395.753 | 239.355 | 317136.654 | 1330.869 | ns/op | 2.166 >> Double128Vector.COS | 1024 | avgt | 10 | 154865.298 | 235.697 | 305721.518 | 1319.313 | ns/op | 1.974 >> Double128Vector.COSH | 1024 | avgt | 10 | 189212.943 | 262.399 | 220756.27 | 61324.863 | ns/op | 1.167 >> Double128Vector.EXP | 1024 | avgt | 10 | 113941.594 | 219.647 | 252853.07 | 891.272 | ns/op | 2.219 >> Double128Vector.EXPM1 | 1024 | avgt | 10 | 184552.939 | 513.715 | 254087.184 | 2144.997 | ns/op | 1.377 >> Double128Vector.HYPOT | 1024 | avgt | 10 | 111580.194 | 423.282 | 374537.338 | 2091.811 | ns/op | 3.357 >> Double128Vector.LOG | 1024 | avgt | 10 | 110680.548 | 192.731 | 265391.129 | 2653.519 | ns/op | 2.398 >> Double128Vector.LOG10 | 1024 | avgt | 10 | 116708.105 | 167.095 | 285764.405 | 2489.08 | ns/op | 2.449 >> Double128Vector.LOG1P | 1024 | avgt | 10 | 115633.302 | 567.7 | 317235.967 | 1062.848 | ns/op | 2.743 >> Double128Vector.POW | 1024 | avgt | 10 | 321655.14 | 3... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > modify cflags style src/jdk.incubator.vector/linux/native/libsleef/lib/vector_math_rvv.c line 24: > 22: */ > 23: > 24: #ifdef __riscv_v_intrinsic It would be worth adding a comment on which version of the compiler would be affected, and what would then be the behavior ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1771026089 From ogillespie at openjdk.org Mon Sep 23 09:18:35 2024 From: ogillespie at openjdk.org (Oli Gillespie) Date: Mon, 23 Sep 2024 09:18:35 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v2] In-Reply-To: References: Message-ID: On Sat, 21 Sep 2024 09:09:08 GMT, David Holmes wrote: > What is the performance hit in your scenario of starting many threads? Do you mean what is the overhead of creating the threads themselves, compared to the time-to-safepoint issue? I'm not sure how best to measure that. It's expensive (especially with SMR), and obviously an antipattern, but you would usually expect the cost of starting threads to be mostly localized to the caller compared to this issue which affects the whole vm in an unexpected way. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21111#issuecomment-2367648010 From dholmes at openjdk.org Mon Sep 23 09:23:36 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 23 Sep 2024 09:23:36 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: <7TWC-8mou61HHIklbJ9Ox2XpSNQBD-QfBKXWyDhd3C8=.431c5d1b-448b-4676-8a0d-f23906480546@github.com> References: <7TWC-8mou61HHIklbJ9Ox2XpSNQBD-QfBKXWyDhd3C8=.431c5d1b-448b-4676-8a0d-f23906480546@github.com> Message-ID: On Mon, 23 Sep 2024 07:14:52 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/oops/instanceKlass.cpp line 4099: >> >>> 4097: #endif >>> 4098: assert(_init_thread == nullptr, "should be cleared before state change"); >>> 4099: Atomic::release_store_fence(&_init_state, state); >> >> Why not just a release_store ?? Why do we need the trailing fence? > > Says in PR: "Current patch makes a seqcst write, which is stronger than strictly necessary. I think it is okay to be extra paranoid on rarely-executed class initialization path." But I can turn it into just a weaker release, sure. I thought a seqcst write would be `fence(); store; fence()`? Anyway I don't like "paranoid when it comes to memory barriers because that says to me "hey we don't understand what is going on here so we're just going to do the heaviest barrier we can 'just in case'." ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21110#discussion_r1771041123 From dholmes at openjdk.org Mon Sep 23 09:26:37 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 23 Sep 2024 09:26:37 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. Current patch makes a seqcst write, which is stronger than strictly necessary. I think it is okay to be extra paranoid on rarely-executed class initialization path. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release The problem is we have completely different code paths that look at the different states of a class (loaded, linked, initialized, in-error) and those actions use different locks. This issue was, I thought, only about the lock-free fast-paths checking the "is initialized" state not anything else. These extra barriers could be completely redundant for "is loaded" or "is linked" or "is in error" checks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2367663009 From dholmes at openjdk.org Mon Sep 23 09:31:35 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 23 Sep 2024 09:31:35 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 09:16:09 GMT, Oli Gillespie wrote: > Do you mean what is the overhead of creating the threads themselves, compared to the time-to-safepoint issue? Yes. By adding the additional layer of throttling you've addressed your TTSP issue but at the expense of slowing down the overall rate of thread creation. So what is the likely overhead for applications that create lots of threads (at a relatively fast rate) but which don't see a TTSP issue? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21111#issuecomment-2367674769 From duke at openjdk.org Mon Sep 23 09:40:19 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Mon, 23 Sep 2024 09:40:19 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v11] In-Reply-To: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: > Hello, > > Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. > > The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. > > At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. > > # Performance > > ## Neoverse N1 > > > -------------------------------------------------------------------------------------------- > Version Baseline This patch > -------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > -------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op > ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op > ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op > ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op > ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op > ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op > ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op > ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op > ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op > ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op > ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op > ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op > ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op > ArraysHashCode.multibytes 10 avgt 15 5.4... Mikhail Ablakatov has updated the pull request incrementally with two additional commits since the last revision: - Add asm tests for Neon Vector - Scalar insts - fixup: restrict Vm to V0-V15 for mulvs when esize is H ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18487/files - new: https://git.openjdk.org/jdk/pull/18487/files/a824a742..132baf86 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=09-10 Stats: 680 lines in 3 files changed: 87 ins; 0 del; 593 mod Patch: https://git.openjdk.org/jdk/pull/18487.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18487/head:pull/18487 PR: https://git.openjdk.org/jdk/pull/18487 From duke at openjdk.org Mon Sep 23 09:40:19 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Mon, 23 Sep 2024 09:40:19 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v9] In-Reply-To: <58jnT00LJ-V7_N-pFrR8duCccBUHxZiq2cFuj-uS9ww=.3468c4b9-0b10-4909-96e9-7d40a0cffa62@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <58jnT00LJ-V7_N-pFrR8duCccBUHxZiq2cFuj-uS9ww=.3468c4b9-0b10-4909-96e9-7d40a0cffa62@github.com> Message-ID: On Fri, 20 Sep 2024 09:33:56 GMT, Andrew Haley wrote: >> It was renamed to `mulvs` by https://github.com/openjdk/jdk/pull/18487/commits/419f39473b53099b7bd42c33380a6ccb3917ab16 > > It's certainly possible that we are missing them. There was a period when `Assembler` changes were't being fully tested, but I've reviewed PRs more strictly since I realized. > In this case, though, there is a bug which will be revealed by testing. Fixed and tested by https://github.com/openjdk/jdk/pull/18487/commits/3d7af279cd33b842ea332404005bbb54e2cd1d0b and https://github.com/openjdk/jdk/pull/18487/commits/132baf86e4c2418ba4e9f337612f6a38e37da777 accordingly, please check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1771066627 From eosterlund at openjdk.org Mon Sep 23 10:14:50 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 23 Sep 2024 10:14:50 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v3] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 18:29:23 GMT, Fredrik Bredberg wrote: >> Removed the concept of an ObjectMonitor Responsible thread. >> >> The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. >> >> The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). >> >> After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. >> >> Passes tier1-tier7 on supported platforms. >> x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. >> Arm32 and Zero doesn't need any changes as far as I can tell. > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Update two, after the review This looks generally good to me. Found some weirdness, but I think we can fix it after this goes in. Some nit too but I don't need to see the updated patch for that. Thanks for fixing this! src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 222: > 220: // Check if the entry lists are empty. > 221: ldr(rscratch1, Address(tmp, ObjectMonitor::EntryList_offset())); > 222: ldr(tmpReg, Address(tmp, ObjectMonitor::cxq_offset())); What strikes me a bit is that the EntryList vs cxq loads have inconsistent ordering in the runtime and the different ports. Notably, the x86 port loads EntryList first while here for example we load cxq first. This adds some cognitive overhead reasoning about the implications, at least for me. In particular, with respect to the ordering we see here where EntryList is loaded first, the following race is possible (which is not possible in the x86 intrinsic): 1. Thread 1 enters the monitor. 2. Thread 2 tries to enter the monitor, fails, adds itself to cxq, tries again, and eventually parks. 3. Thread 1 starts exiting the monitor and runs all the instructions up to and including the EntryList load above on line 221, which yields an empty EntryList. 4. Thread 3 enters the monitor since it is no longer owned. 5. Thread 3 exits the monitor, and moves cxq to EntryList in the process while still holding the lock. 6. Thread 1 runs the next instruction loading cxq on line 222, resulting in an empty list. 7. Thread 1 draws the conclusion that there is no thread waiting for the monitor, even though Thread 2 has waited for the monitor since before Thread 2 released it. Even though this choice of ordering between reading EntryList and cxq allows a waiter to be completely unnoticed throughout the monitorexit procedure due to unfortunate interleavings, which is weird, the protocol deals with this just fine. Since Thread 2 managed to enter the monitor, the responsibility of ensuring liveness of the monitor becomes the responsibility of Thread 2 which will make Thread 1 the successor and unpark Thread 1. I'm not proposing we have the "other" more intuitive ordering of loading cxq first which removes this race so we don't have to think about it. Enforcing that ordering requires an extra loadload fence, which isn't free. I think what I would prefer to see, is that we at least use a consistent ordering so we don't have to think about why some platforms use one ordering while others use another one and what different races only affect some platforms. And I'd prefer if this less obvious ordering of reading EntryList before cxq is the championed ordering, with a comment in the shared code explaining why it's okay that waiters slip undetected between the two loads. Because without explicit fencing, that's an order that has to be valid anyway, unless we jam in some loadload fences. Anyway, I think this could be a follow-up cleanup, but doesn't really need to be fixed in this patch. src/hotspot/share/runtime/objectMonitor.cpp line 335: > 333: // ObjectMonitor::deflate_monitor() will decrement contentions > 334: // after it recognizes that the async deflation was cancelled. > 335: contention_mark.extend(); This is a bit scary. Previously the locking_thread would already have 1 stake in the _contentions, and recognizes that after cancelling deflation, that stake is asynchronously released by the deflation thread. This means that in practice, we have 0 stakes left in the contention counter after the CAS that swings the owner to the locking_thread succeeds. Yet the ObjectMonitorContentionMark RAII object passed in from the caller looks like it guarantees there is a stake in the _contentions throughout its scope. By explicitly adding to contentions, the locking_thread reclaimed its stake in the _contentions counter while holding the lock, guaranteeing that deflation is indeed impossible until the end of the scope. The new extend() mechanism seems to consider it equivalent to not increment here and also not decrement later. +1 -1 == 0 right? However, that is not equivalent. HotSpot math works in mysterious ways. The old mechanism guaranteed the linearization point for deflation is blocked until you get out of scope. The new mechanism does not. Instead, it's up to the user to reason about for how long deflation is blocked out. It's blocked out as long as the monitor is held naturally, but if it is released and the scope is still active, there is no stake in the _contentions counter and deflation would succeed if it tries again. Things like the _waiters counter might tell the heuristics of deflation to not try again. An absence of a safepoint poll might also prevent deflation from trying again in a timely fashion. But the point is that the linearization point for deflation is no longer blocked, and the abstraction looks safer than it is. In practice, I don't know of any bug because of this. Seems like deflation is in other ways blocked out in practice. But I would really prefer if extend() would add to contentions and the destructor always decrements. This way, the contract is stronger and it's easier to convince ourselves that we have not messed up. The scope would on its own prevent deflation, regardless of how it is used, which cannot be guaranteed any longer with the current extend() implementation. Again, this might be better suited for a follow-up RFE. src/hotspot/share/runtime/objectMonitor.hpp line 226: > 224: static ByteSize succ_offset() { return byte_offset_of(ObjectMonitor, _succ); } > 225: static ByteSize EntryList_offset() { return byte_offset_of(ObjectMonitor, _EntryList); } > 226: static ByteSize contentions_offset() { return byte_offset_of(ObjectMonitor, _contentions); } Looks like a leftover from the previous approach that tried to deal with deflation races in assembly code. It should probably be removed. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19454#pullrequestreview-2292382088 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1771021064 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1771096278 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1751894926 From eosterlund at openjdk.org Mon Sep 23 10:14:51 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 23 Sep 2024 10:14:51 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v3] In-Reply-To: <8t6rNdDbJerisofk7hjzrB3Qt_KTV9MXxTZg4jpukao=.aca6e161-0401-47c9-85ae-37389c77f1c7@github.com> References: <8t6rNdDbJerisofk7hjzrB3Qt_KTV9MXxTZg4jpukao=.aca6e161-0401-47c9-85ae-37389c77f1c7@github.com> Message-ID: On Wed, 18 Sep 2024 20:53:58 GMT, Axel Boldt-Christmas wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update two, after the review > > src/hotspot/share/runtime/objectMonitor.cpp line 396: > >> 394: // to use ObjectMonitor::try_enter() as a public way of doing TryLock(). >> 395: // Used this way in SharedRuntime::monitor_exit_helper(). >> 396: if (check_owner) { > > Probably preference but an early return here is easier for me to parse. I agree. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1771102939 From ogillespie at openjdk.org Mon Sep 23 10:16:36 2024 From: ogillespie at openjdk.org (Oli Gillespie) Date: Mon, 23 Sep 2024 10:16:36 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v2] In-Reply-To: References: Message-ID: On Fri, 20 Sep 2024 15:56:08 GMT, Oli Gillespie wrote: >> Mitigate the impact of JVM_StartThread on safepoint synchronization, by adding a new ThreadStart_lock which limits the number of JVM_StartThread invocations competing for Threads_lock at any given time to 1. >> This gives a VM thread trying to call a safepoint a much better chance of acquiring Threads_lock when there are many JVM_StartThread invocations in flight, at the cost of one extra lock/unlock for every new thread. >> >> Can be disabled with new diagnostic flag `-XX:-UseExtraThreadStartLock`. >> >> Before (ThreadStartTtsp.java is shared in JDK-8340547): >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 1291591 ns >> Reaching safepoint: 59962 ns >> Reaching safepoint: 1958065 ns >> Reaching safepoint: 14456666258 ns <-- 14 seconds! >> ... >> >> >> After: >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 214269 ns >> Reaching safepoint: 60253 ns >> Reaching safepoint: 2040680 ns >> Reaching safepoint: 3089284 ns >> Reaching safepoint: 2998303 ns >> Reaching safepoint: 4433713 ns <-- 4.4ms >> Reaching safepoint: 3368436 ns >> Reaching safepoint: 2986519 ns >> Reaching safepoint: 3269102 ns >> ... >> >> >> >> **Alternatives** >> >> I considered some other options for mitigating this. For example, could we reduce the time spent holding the lock in StartThread? Most of the time is spent managing the threads list for ThreadSMR support, and each time we add a thread to that list we need to copy the whole list and free every entry in the original, which is slow. But I didn't see an easy way to avoid this. >> I also looked at some kind of signal from the VM thread that it is ready to start synchronizing that StartThread could check before trying to grab Threads_lock, but I didn't find anything better than this extra lock. > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Fix build and address comments Oh I see what you mean now, sorry - what penalty does the fix add to the normal case? On my x86 host: @Benchmark public Thread startThread() { Thread t = new Thread(() -> {}); t.start(); return t; } With fix: 16285 ? 189 op/s. Without fix: 16375 ? 158 ops/s So possibly within the noise, or perhaps somewhere around ~0.5% overhead. It's the cost of acquiring one extra lock, which intuitively is negligible compared to the significant cost of starting a thread with SMR. Starting threads is already slow, adding a little to that to avoid hard to debug vm-wide delays seems sensible to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21111#issuecomment-2367781829 From aph at openjdk.org Mon Sep 23 10:23:41 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 23 Sep 2024 10:23:41 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v11] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Mon, 23 Sep 2024 09:40:19 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request incrementally with two additional commits since the last revision: > > - Add asm tests for Neon Vector - Scalar insts > - fixup: restrict Vm to V0-V15 for mulvs when esize is H src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 290: > 288: } > 289: > 290: //<0-15>reg Suggestion: //<0-15>reg: As `rf(FloatRegister)`, but only the lower 16 FloatRegisters are allowed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1771131379 From duke at openjdk.org Mon Sep 23 10:29:55 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Mon, 23 Sep 2024 10:29:55 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v12] In-Reply-To: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: > Hello, > > Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. > > The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. > > At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. > > # Performance > > ## Neoverse N1 > > > -------------------------------------------------------------------------------------------- > Version Baseline This patch > -------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > -------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op > ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op > ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op > ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op > ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op > ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op > ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op > ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op > ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op > ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op > ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op > ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op > ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op > ArraysHashCode.multibytes 10 avgt 15 5.4... Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: cleanup: extend the description of lrf() Co-authored-by: Andrew Haley ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18487/files - new: https://git.openjdk.org/jdk/pull/18487/files/132baf86..66b07903 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=10-11 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18487.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18487/head:pull/18487 PR: https://git.openjdk.org/jdk/pull/18487 From thartmann at openjdk.org Mon Sep 23 10:42:12 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 23 Sep 2024 10:42:12 GMT Subject: RFR: 8335334: Stress mode to randomly execute unstable if traps [v5] In-Reply-To: References: Message-ID: <4kU6soGY-6o17vEgi-DTdazsMjmc67vUvsJIUuLlW78=.61d38ee9-350b-42de-9e67-bc64a4d050bb@github.com> > Unstable if traps are supposed to be taken rarely. This patch introduces a `StressUnstableIfTraps` flag that forces unstable if traps to be taken randomly and thus potentially triggering intermittent bugs such as incorrect debug information. It works by adding another if before the unstable if that checks a "random" condition at runtime (a simple shared counter) and then either takes the trap or executes the original, unstable if. > > This stress option also has the nice side effect of triggering re-compilation of methods that would otherwise not be re-compiled (see [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843)). > > I had to adjust a few tests that rely on methods being compiled / deoptimized because with the stress option enabled, deoptimization might unexpectedly (not) happen. > > It reliably triggers [JDK-8335977](https://bugs.openjdk.org/browse/JDK-8335977), [JDK-8320308](https://bugs.openjdk.org/browse/JDK-8320308) and [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843) in our testing. > > Tested with multiple runs up to tier6. I'll integrate it into our (Oracle internal) testing and CTW [(JDK-8340302)](https://bugs.openjdk.org/browse/JDK-8340302) once all the bugs that it currently triggers are fixed. > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Added a comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21037/files - new: https://git.openjdk.org/jdk/pull/21037/files/691af16c..83830d88 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21037&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21037&range=03-04 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21037.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21037/head:pull/21037 PR: https://git.openjdk.org/jdk/pull/21037 From thartmann at openjdk.org Mon Sep 23 10:42:12 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 23 Sep 2024 10:42:12 GMT Subject: RFR: 8335334: Stress mode to randomly execute unstable if traps [v4] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 12:48:53 GMT, Tobias Hartmann wrote: >> Unstable if traps are supposed to be taken rarely. This patch introduces a `StressUnstableIfTraps` flag that forces unstable if traps to be taken randomly and thus potentially triggering intermittent bugs such as incorrect debug information. It works by adding another if before the unstable if that checks a "random" condition at runtime (a simple shared counter) and then either takes the trap or executes the original, unstable if. >> >> This stress option also has the nice side effect of triggering re-compilation of methods that would otherwise not be re-compiled (see [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843)). >> >> I had to adjust a few tests that rely on methods being compiled / deoptimized because with the stress option enabled, deoptimization might unexpectedly (not) happen. >> >> It reliably triggers [JDK-8335977](https://bugs.openjdk.org/browse/JDK-8335977), [JDK-8320308](https://bugs.openjdk.org/browse/JDK-8320308) and [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843) in our testing. >> >> Tested with multiple runs up to tier6. I'll integrate it into our (Oracle internal) testing and CTW [(JDK-8340302)](https://bugs.openjdk.org/browse/JDK-8340302) once all the bugs that it currently triggers are fixed. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > More reviewer comments Thanks for the review Christian! I added a comment to `Parse::stress_trap`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21037#issuecomment-2367833616 From aph at openjdk.org Mon Sep 23 10:55:40 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 23 Sep 2024 10:55:40 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v11] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Mon, 23 Sep 2024 09:40:19 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request incrementally with two additional commits since the last revision: > > - Add asm tests for Neon Vector - Scalar insts > - fixup: restrict Vm to V0-V15 for mulvs when esize is H src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2887: > 2885: f(0b10, 23, 22), f(index & 1, 21), rf(Vm, 16), f(op2, 15, 12), f(index >> 1, 11); \ > 2886: } \ > 2887: f(0, 10), rf(Vn, 5), rf(Vd, 0); \ Suggestion: #define INSN(NAME, op1, op2) \ void NAME(FloatRegister Vd, SIMD_Arrangement T, FloatRegister Vn, FloatRegister Vm, int index) { \ starti; \ assert(T == T4H || T == T8H || T == T2S || T == T4S, "invalid arrangement"); \ assert(index >= 0 && \ ((T == T2S && index <= 1) || (T != T2S && index <= 3) || (T == T8H && index <= 7)), \ "invalid index"); \ assert((T != T4H && T != T8H) || Vm->encoding() < 16, "invalid source SIMD&FP register"); \ f(0, 31), f((int)T & 1, 30), f(op1, 29), f(0b01111, 28, 24), f(0b01, 23, 22); \ if (T == T4H || T == T8H) { \ f(index & 0b11, 21, 20), lrf(Vm, 16); \ } else { \ f(index & 1, 21), rf(Vm, 16); \ } \ f(op2, 15, 12), f(index >> 1, 11), f(0, 10), rf(Vn, 5), rf(Vd, 0); \ I think it's a bit easier to see what's going on here if we lose the duplicated code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1771185298 From aph at openjdk.org Mon Sep 23 11:00:46 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 23 Sep 2024 11:00:46 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v9] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <58jnT00LJ-V7_N-pFrR8duCccBUHxZiq2cFuj-uS9ww=.3468c4b9-0b10-4909-96e9-7d40a0cffa62@github.com> Message-ID: <1kDcOmDbVeuvcuh29ZXDBvbedgYwOI5YuvoPRh2IrxA=.f8bb9772-9e55-40b0-87af-6aaee28edd29@github.com> On Mon, 23 Sep 2024 09:37:17 GMT, Mikhail Ablakatov wrote: >> It's certainly possible that we are missing them. There was a period when `Assembler` changes were't being fully tested, but I've reviewed PRs more strictly since I realized. >> In this case, though, there is a bug which will be revealed by testing. > > Fixed and tested by https://github.com/openjdk/jdk/pull/18487/commits/3d7af279cd33b842ea332404005bbb54e2cd1d0b and https://github.com/openjdk/jdk/pull/18487/commits/132baf86e4c2418ba4e9f337612f6a38e37da777 accordingly, please check. > Fixed and tested by [3d7af27](https://github.com/openjdk/jdk/commit/3d7af279cd33b842ea332404005bbb54e2cd1d0b) and [132baf8](https://github.com/openjdk/jdk/commit/132baf86e4c2418ba4e9f337612f6a38e37da777) accordingly, please check. See my minor style suggestion, but otherwise this looks fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1771196781 From aph at openjdk.org Mon Sep 23 11:22:36 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 23 Sep 2024 11:22:36 GMT Subject: RFR: 8318127: align_up has potential overflow In-Reply-To: References: Message-ID: On Sun, 22 Sep 2024 11:02:57 GMT, Kim Barrett wrote: > There's no "perhaps" about the intended meaning in the JBS issue. I wrote that issue; I remember what I meant. Sorry, I didn't mean to suggest otherwise. I was quibbling about the "mathematical result", but it's not important. > :) I suppose I could have been more precise. > > So I disagree. I think align_up has an implied post-condition that the result is not less than the value being aligned. That's certainly how it's used, in every occurrance I've looked at. (I admit I didn't look at all ~450 uses though.) It seems we have a genuine difference of opinion about what the user can reasonably expect. I'd expect modular arithmetic, because C++ says so. However, I'll withdraw my objection, if only for the sake of not spending too much time discussing this issue. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20808#discussion_r1771222538 From chagedorn at openjdk.org Mon Sep 23 11:24:39 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 23 Sep 2024 11:24:39 GMT Subject: RFR: 8335334: Stress mode to randomly execute unstable if traps [v5] In-Reply-To: <4kU6soGY-6o17vEgi-DTdazsMjmc67vUvsJIUuLlW78=.61d38ee9-350b-42de-9e67-bc64a4d050bb@github.com> References: <4kU6soGY-6o17vEgi-DTdazsMjmc67vUvsJIUuLlW78=.61d38ee9-350b-42de-9e67-bc64a4d050bb@github.com> Message-ID: On Mon, 23 Sep 2024 10:42:12 GMT, Tobias Hartmann wrote: >> Unstable if traps are supposed to be taken rarely. This patch introduces a `StressUnstableIfTraps` flag that forces unstable if traps to be taken randomly and thus potentially triggering intermittent bugs such as incorrect debug information. It works by adding another if before the unstable if that checks a "random" condition at runtime (a simple shared counter) and then either takes the trap or executes the original, unstable if. >> >> This stress option also has the nice side effect of triggering re-compilation of methods that would otherwise not be re-compiled (see [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843)). >> >> I had to adjust a few tests that rely on methods being compiled / deoptimized because with the stress option enabled, deoptimization might unexpectedly (not) happen. >> >> It reliably triggers [JDK-8335977](https://bugs.openjdk.org/browse/JDK-8335977), [JDK-8320308](https://bugs.openjdk.org/browse/JDK-8320308) and [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843) in our testing. >> >> Tested with multiple runs up to tier6. I'll integrate it into our (Oracle internal) testing and CTW [(JDK-8340302)](https://bugs.openjdk.org/browse/JDK-8340302) once all the bugs that it currently triggers are fixed. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Added a comment Thanks for adding the comment! Great to have this new stress mode :-) ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21037#pullrequestreview-2321895279 From ogillespie at openjdk.org Mon Sep 23 11:28:10 2024 From: ogillespie at openjdk.org (Oli Gillespie) Date: Mon, 23 Sep 2024 11:28:10 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v3] In-Reply-To: References: Message-ID: > Mitigate the impact of JVM_StartThread on safepoint synchronization, by adding a new ThreadStart_lock which limits the number of JVM_StartThread invocations competing for Threads_lock at any given time to 1. > This gives a VM thread trying to call a safepoint a much better chance of acquiring Threads_lock when there are many JVM_StartThread invocations in flight, at the cost of one extra lock/unlock for every new thread. > > Can be disabled with new diagnostic flag `-XX:-UseExtraThreadStartLock`. > > Before (ThreadStartTtsp.java is shared in JDK-8340547): > > java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' > Reaching safepoint: 1291591 ns > Reaching safepoint: 59962 ns > Reaching safepoint: 1958065 ns > Reaching safepoint: 14456666258 ns <-- 14 seconds! > ... > > > After: > > java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' > Reaching safepoint: 214269 ns > Reaching safepoint: 60253 ns > Reaching safepoint: 2040680 ns > Reaching safepoint: 3089284 ns > Reaching safepoint: 2998303 ns > Reaching safepoint: 4433713 ns <-- 4.4ms > Reaching safepoint: 3368436 ns > Reaching safepoint: 2986519 ns > Reaching safepoint: 3269102 ns > ... > > > > **Alternatives** > > I considered some other options for mitigating this. For example, could we reduce the time spent holding the lock in StartThread? Most of the time is spent managing the threads list for ThreadSMR support, and each time we add a thread to that list we need to copy the whole list and free every entry in the original, which is slow. But I didn't see an easy way to avoid this. > I also looked at some kind of signal from the VM thread that it is ready to start synchronizing that StartThread could check before trying to grab Threads_lock, but I didn't find anything better than this extra lock. Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: Fix lock ranking ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21111/files - new: https://git.openjdk.org/jdk/pull/21111/files/b9550f68..fc48bbe4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21111&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21111&range=01-02 Stats: 4 lines in 1 file changed: 2 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21111.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21111/head:pull/21111 PR: https://git.openjdk.org/jdk/pull/21111 From shade at openjdk.org Mon Sep 23 11:43:44 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 23 Sep 2024 11:43:44 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v3] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 11:28:10 GMT, Oli Gillespie wrote: >> Mitigate the impact of JVM_StartThread on safepoint synchronization, by adding a new ThreadStart_lock which limits the number of JVM_StartThread invocations competing for Threads_lock at any given time to 1. >> This gives a VM thread trying to call a safepoint a much better chance of acquiring Threads_lock when there are many JVM_StartThread invocations in flight, at the cost of one extra lock/unlock for every new thread. >> >> Can be disabled with new diagnostic flag `-XX:-UseExtraThreadStartLock`. >> >> Before (ThreadStartTtsp.java is shared in JDK-8340547): >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 1291591 ns >> Reaching safepoint: 59962 ns >> Reaching safepoint: 1958065 ns >> Reaching safepoint: 14456666258 ns <-- 14 seconds! >> ... >> >> >> After: >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 214269 ns >> Reaching safepoint: 60253 ns >> Reaching safepoint: 2040680 ns >> Reaching safepoint: 3089284 ns >> Reaching safepoint: 2998303 ns >> Reaching safepoint: 4433713 ns <-- 4.4ms >> Reaching safepoint: 3368436 ns >> Reaching safepoint: 2986519 ns >> Reaching safepoint: 3269102 ns >> ... >> >> >> >> **Alternatives** >> >> I considered some other options for mitigating this. For example, could we reduce the time spent holding the lock in StartThread? Most of the time is spent managing the threads list for ThreadSMR support, and each time we add a thread to that list we need to copy the whole list and free every entry in the original, which is slow. But I didn't see an easy way to avoid this. >> I also looked at some kind of signal from the VM thread that it is ready to start synchronizing that StartThread could check before trying to grab Threads_lock, but I didn't find anything better than this extra lock. > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Fix lock ranking I used to worry about scalability impact of this change, i.e. contending on a new lock, but quickly realized we are likely contending on `ThreadLock` in that scenario. I suspect this just shifts the contention from `ThreadLock` to `StartThreadLock` when multiple threads are starting up, so it makes sense if we do not see much of the impact on `Thread.start`. You need to also test perf with multiple threads doing `Thread.start`, @olivergillespie -- so that these locks be contended. Your reproducer already has the kernel for it, maybe just measure the thread starting speed (in threads/sec) before and after this patch? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21111#issuecomment-2367961929 From coleenp at openjdk.org Mon Sep 23 12:02:42 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 23 Sep 2024 12:02:42 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v3] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 09:05:58 GMT, Erik ?sterlund wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update two, after the review > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 222: > >> 220: // Check if the entry lists are empty. >> 221: ldr(rscratch1, Address(tmp, ObjectMonitor::EntryList_offset())); >> 222: ldr(tmpReg, Address(tmp, ObjectMonitor::cxq_offset())); > > What strikes me a bit is that the EntryList vs cxq loads have inconsistent ordering in the runtime and the different ports. Notably, the x86 port loads EntryList first while here for example we load cxq first. This adds some cognitive overhead reasoning about the implications, at least for me. In particular, with respect to the ordering we see here where EntryList is loaded first, the following race is possible (which is not possible in the x86 intrinsic): > 1. Thread 1 enters the monitor. > 2. Thread 2 tries to enter the monitor, fails, adds itself to cxq, tries again, and eventually parks. > 3. Thread 1 starts exiting the monitor and runs all the instructions up to and including the EntryList load above on line 221, which yields an empty EntryList. > 4. Thread 3 enters the monitor since it is no longer owned. > 5. Thread 3 exits the monitor, and moves cxq to EntryList in the process while still holding the lock. > 6. Thread 1 runs the next instruction loading cxq on line 222, resulting in an empty list. > 7. Thread 1 draws the conclusion that there is no thread waiting for the monitor, even though Thread 2 has waited for the monitor since before Thread 2 released it. > > Even though this choice of ordering between reading EntryList and cxq allows a waiter to be completely unnoticed throughout the monitorexit procedure due to unfortunate interleavings, which is weird, the protocol deals with this just fine. Since Thread 2 managed to enter the monitor, the responsibility of ensuring liveness of the monitor becomes the responsibility of Thread 2 which will make Thread 1 the successor and unpark Thread 1. > > I'm not proposing we have the "other" more intuitive ordering of loading cxq first which removes this race so we don't have to think about it. Enforcing that ordering requires an extra loadload fence, which isn't free. I think what I would prefer to see, is that we at least use a consistent ordering so we don't have to think about why some platforms use one ordering while others use another one and what different races only affect some platforms. And I'd prefer if this less obvious ordering of reading EntryList before cxq is the championed ordering, with a comment in the shared code explaining why it's okay that waiters slip... This should be in a follow-up RFE with this description. Thanks for the description. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1771273296 From coleenp at openjdk.org Mon Sep 23 12:07:47 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 23 Sep 2024 12:07:47 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v3] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 09:55:20 GMT, Erik ?sterlund wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update two, after the review > > src/hotspot/share/runtime/objectMonitor.cpp line 335: > >> 333: // ObjectMonitor::deflate_monitor() will decrement contentions >> 334: // after it recognizes that the async deflation was cancelled. >> 335: contention_mark.extend(); > > This is a bit scary. Previously the locking_thread would already have 1 stake in the _contentions, and recognizes that after cancelling deflation, that stake is asynchronously released by the deflation thread. This means that in practice, we have 0 stakes left in the contention counter after the CAS that swings the owner to the locking_thread succeeds. Yet the ObjectMonitorContentionMark RAII object passed in from the caller looks like it guarantees there is a stake in the _contentions throughout its scope. By explicitly adding to contentions, the locking_thread reclaimed its stake in the _contentions counter while holding the lock, guaranteeing that deflation is indeed impossible until the end of the scope. > The new extend() mechanism seems to consider it equivalent to not increment here and also not decrement later. +1 -1 == 0 right? However, that is not equivalent. HotSpot math works in mysterious ways. The old mechanism guaranteed the linearization point for deflation is blocked until you get out of scope. The new mechanism does not. Instead, it's up to the user to reason about for how long deflation is blocked out. It's blocked out as long as the monitor is held naturally, but if it is released and the scope is still active, there is no stake in the _contentions counter and deflation would succeed if it tries again. Things like the _waiters counter might tell the heuristics of deflation to not try again. An absence of a safepoint poll might also prevent deflation from trying again in a timely fashion. But the point is that the linearization point for deflation is no longer blocked, and the abstraction looks safer than it is. > > In practice, I don't know of any bug because of this. Seems like deflation is in other ways blocked out in practice. But I would really prefer if extend() would add to contentions and the destructor always decrements. This way, the contract is stronger and it's easier to convince ourselves that we have not messed up. The scope would on its own prevent deflation, regardless of how it is used, which cannot be guaranteed any longer with the current extend() implementation. > > Again, this might be better suited for a follow-up RFE. I think extends() should add to contentions in this PR since extends() is part of this PR, and initially I expected it to be sort of a refcount (with a case or two for extending the scope of the refcount). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1771279039 From duke at openjdk.org Mon Sep 23 12:25:39 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Mon, 23 Sep 2024 12:25:39 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v11] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Mon, 23 Sep 2024 10:49:40 GMT, Andrew Haley wrote: >> Mikhail Ablakatov has updated the pull request incrementally with two additional commits since the last revision: >> >> - Add asm tests for Neon Vector - Scalar insts >> - fixup: restrict Vm to V0-V15 for mulvs when esize is H > > src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2887: > >> 2885: f(0b10, 23, 22), f(index & 1, 21), rf(Vm, 16), f(op2, 15, 12), f(index >> 1, 11); \ >> 2886: } \ >> 2887: f(0, 10), rf(Vn, 5), rf(Vd, 0); \ > > Suggestion: > > #define INSN(NAME, op1, op2) \ > void NAME(FloatRegister Vd, SIMD_Arrangement T, FloatRegister Vn, FloatRegister Vm, int index) { \ > starti; \ > assert(T == T4H || T == T8H || T == T2S || T == T4S, "invalid arrangement"); \ > assert(index >= 0 && \ > ((T == T2S && index <= 1) || (T != T2S && index <= 3) || (T == T8H && index <= 7)), \ > "invalid index"); \ > assert((T != T4H && T != T8H) || Vm->encoding() < 16, "invalid source SIMD&FP register"); \ > f(0, 31), f((int)T & 1, 30), f(op1, 29), f(0b01111, 28, 24), f(0b01, 23, 22); \ > if (T == T4H || T == T8H) { \ > f(index & 0b11, 21, 20), lrf(Vm, 16); \ > } else { \ > f(index & 1, 21), rf(Vm, 16); \ > } \ > f(op2, 15, 12), f(index >> 1, 11), f(0, 10), rf(Vn, 5), rf(Vd, 0); \ > > I think it's a bit easier to see what's going on here if we lose the duplicated code. Looks like that's incorrect: the 22th-23th bits and 11th bits differ. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1771302511 From thartmann at openjdk.org Mon Sep 23 12:32:44 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 23 Sep 2024 12:32:44 GMT Subject: RFR: 8335334: Stress mode to randomly execute unstable if traps [v5] In-Reply-To: <4kU6soGY-6o17vEgi-DTdazsMjmc67vUvsJIUuLlW78=.61d38ee9-350b-42de-9e67-bc64a4d050bb@github.com> References: <4kU6soGY-6o17vEgi-DTdazsMjmc67vUvsJIUuLlW78=.61d38ee9-350b-42de-9e67-bc64a4d050bb@github.com> Message-ID: On Mon, 23 Sep 2024 10:42:12 GMT, Tobias Hartmann wrote: >> Unstable if traps are supposed to be taken rarely. This patch introduces a `StressUnstableIfTraps` flag that forces unstable if traps to be taken randomly and thus potentially triggering intermittent bugs such as incorrect debug information. It works by adding another if before the unstable if that checks a "random" condition at runtime (a simple shared counter) and then either takes the trap or executes the original, unstable if. >> >> This stress option also has the nice side effect of triggering re-compilation of methods that would otherwise not be re-compiled (see [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843)). >> >> I had to adjust a few tests that rely on methods being compiled / deoptimized because with the stress option enabled, deoptimization might unexpectedly (not) happen. >> >> It reliably triggers [JDK-8335977](https://bugs.openjdk.org/browse/JDK-8335977), [JDK-8320308](https://bugs.openjdk.org/browse/JDK-8320308) and [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843) in our testing. >> >> Tested with multiple runs up to tier6. I'll integrate it into our (Oracle internal) testing and CTW [(JDK-8340302)](https://bugs.openjdk.org/browse/JDK-8340302) once all the bugs that it currently triggers are fixed. >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Added a comment Thanks Christian! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21037#issuecomment-2368073452 From thartmann at openjdk.org Mon Sep 23 12:32:44 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 23 Sep 2024 12:32:44 GMT Subject: Integrated: 8335334: Stress mode to randomly execute unstable if traps In-Reply-To: References: Message-ID: <3ZDB_bKjYtZnCo-B6S5cR6usLGEwjFcVqLRUIoXuRPY=.9bbd5efa-a1f3-428b-bca0-160ebb55d6d9@github.com> On Tue, 17 Sep 2024 11:01:27 GMT, Tobias Hartmann wrote: > Unstable if traps are supposed to be taken rarely. This patch introduces a `StressUnstableIfTraps` flag that forces unstable if traps to be taken randomly and thus potentially triggering intermittent bugs such as incorrect debug information. It works by adding another if before the unstable if that checks a "random" condition at runtime (a simple shared counter) and then either takes the trap or executes the original, unstable if. > > This stress option also has the nice side effect of triggering re-compilation of methods that would otherwise not be re-compiled (see [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843)). > > I had to adjust a few tests that rely on methods being compiled / deoptimized because with the stress option enabled, deoptimization might unexpectedly (not) happen. > > It reliably triggers [JDK-8335977](https://bugs.openjdk.org/browse/JDK-8335977), [JDK-8320308](https://bugs.openjdk.org/browse/JDK-8320308) and [JDK-8335843](https://bugs.openjdk.org/browse/JDK-8335843) in our testing. > > Tested with multiple runs up to tier6. I'll integrate it into our (Oracle internal) testing and CTW [(JDK-8340302)](https://bugs.openjdk.org/browse/JDK-8340302) once all the bugs that it currently triggers are fixed. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 63e611cd Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/63e611cd5d7eb4fc6ea6633ff9123e4bee5f5993 Stats: 116 lines in 15 files changed: 106 ins; 6 del; 4 mod 8335334: Stress mode to randomly execute unstable if traps Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21037 From ogillespie at openjdk.org Mon Sep 23 12:37:36 2024 From: ogillespie at openjdk.org (Oli Gillespie) Date: Mon, 23 Sep 2024 12:37:36 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v3] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 11:28:10 GMT, Oli Gillespie wrote: >> Mitigate the impact of JVM_StartThread on safepoint synchronization, by adding a new ThreadStart_lock which limits the number of JVM_StartThread invocations competing for Threads_lock at any given time to 1. >> This gives a VM thread trying to call a safepoint a much better chance of acquiring Threads_lock when there are many JVM_StartThread invocations in flight, at the cost of one extra lock/unlock for every new thread. >> >> Can be disabled with new diagnostic flag `-XX:-UseExtraThreadStartLock`. >> >> Before (ThreadStartTtsp.java is shared in JDK-8340547): >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 1291591 ns >> Reaching safepoint: 59962 ns >> Reaching safepoint: 1958065 ns >> Reaching safepoint: 14456666258 ns <-- 14 seconds! >> ... >> >> >> After: >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 214269 ns >> Reaching safepoint: 60253 ns >> Reaching safepoint: 2040680 ns >> Reaching safepoint: 3089284 ns >> Reaching safepoint: 2998303 ns >> Reaching safepoint: 4433713 ns <-- 4.4ms >> Reaching safepoint: 3368436 ns >> Reaching safepoint: 2986519 ns >> Reaching safepoint: 3269102 ns >> ... >> >> >> >> **Alternatives** >> >> I considered some other options for mitigating this. For example, could we reduce the time spent holding the lock in StartThread? Most of the time is spent managing the threads list for ThreadSMR support, and each time we add a thread to that list we need to copy the whole list and free every entry in the original, which is slow. But I didn't see an easy way to avoid this. >> I also looked at some kind of signal from the VM thread that it is ready to start synchronizing that StartThread could check before trying to grab Threads_lock, but I didn't find anything better than this extra lock. > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Fix lock ranking With a slightly modified version of my reproducer (removing the GC triggers): import java.util.concurrent.Semaphore; import java.util.concurrent.ThreadLocalRandom; public class ThreadStartTtsp { static int NUM_THREADS = 1_000; public static void main(String[] args) throws InterruptedException { Semaphore s = new Semaphore(NUM_THREADS); // Start lots of threads at the same time to cause contention on freelist for (int i = 0; i < NUM_THREADS; i++) { new Thread(() -> { while (true) { try { s.acquire(); } catch (Exception e) { throw new RuntimeException(e); } new Thread().start(); } }).start(); } // Periodically 'trigger' the thread creation for (int i = 0; i < 100; i++) { s.release(NUM_THREADS); long start = System.nanoTime(); while (s.availablePermits() != 0) { } System.out.printf("Started %d threads in %.3fms%n", NUM_THREADS, (System.nanoTime() - start)/1_000_000.0); } } } ``` I see higher throughput with the new lock enabled compared to disabled. ``` java -XX:+UnlockDiagnosticVMOptions -XX:-UseThreadStartLock ThreadStartTtsp.java ... Started 1000 threads in 134.874ms Started 1000 threads in 134.799ms Started 1000 threads in 132.662ms ... ./build/release/images/jdk/bin/java -XX:+UnlockDiagnosticVMOptions -XX:+UseThreadStartLock ThreadStartTtsp.java ... Started 1000 threads in 89.469ms Started 1000 threads in 90.030ms Started 1000 threads in 89.855ms ... I think that's because of reduced contention with `JavaThread::exit`, not sure if it's a fair comparison. Probably fairer is this: import java.util.concurrent.atomic.*; public class ThreadStartTtsp { static int NUM_THREADS = 1_00; static AtomicInteger started = new AtomicInteger(0); static long start = System.nanoTime(); public static void main(String[] args) throws InterruptedException { for (int i = 0; i < NUM_THREADS; i++) { new Thread(() -> { while (true) { new Thread(() -> { if (started.incrementAndGet() % 10_000 == 0) { System.out.printf("Started %d threads in %.3fms%n", 10_000, (System.nanoTime() - start)/1_000_000.0); start = System.nanoTime(); } }).start(); } }).start(); } } } For which I still see about 4% improvement from adding the new lock, and the gap seems to widen as concurrency increases. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21111#issuecomment-2368086158 From coleenp at openjdk.org Mon Sep 23 12:43:36 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 23 Sep 2024 12:43:36 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. Current patch makes a seqcst write, which is stronger than strictly necessary. I think it is okay to be extra paranoid on rarely-executed class initialization path. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release I like this patch. src/hotspot/share/oops/instanceKlass.hpp line 517: > 515: bool is_in_error_state() const { return init_state() == initialization_error; } > 516: bool is_reentrant_initialization(Thread *thread) { return thread == _init_thread; } > 517: ClassState init_state() const { return Atomic::load_acquire(&_init_state); } This is the code that I want the most with this patch. If we're reading this field outside a lock, we need the acquire. Let's not make it more complicated than that. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21110#pullrequestreview-2322066095 PR Review Comment: https://git.openjdk.org/jdk/pull/21110#discussion_r1771328239 From qamai at openjdk.org Mon Sep 23 13:08:40 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 23 Sep 2024 13:08:40 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v3] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 11:28:10 GMT, Oli Gillespie wrote: >> Mitigate the impact of JVM_StartThread on safepoint synchronization, by adding a new ThreadStart_lock which limits the number of JVM_StartThread invocations competing for Threads_lock at any given time to 1. >> This gives a VM thread trying to call a safepoint a much better chance of acquiring Threads_lock when there are many JVM_StartThread invocations in flight, at the cost of one extra lock/unlock for every new thread. >> >> Can be disabled with new diagnostic flag `-XX:-UseExtraThreadStartLock`. >> >> Before (ThreadStartTtsp.java is shared in JDK-8340547): >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 1291591 ns >> Reaching safepoint: 59962 ns >> Reaching safepoint: 1958065 ns >> Reaching safepoint: 14456666258 ns <-- 14 seconds! >> ... >> >> >> After: >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 214269 ns >> Reaching safepoint: 60253 ns >> Reaching safepoint: 2040680 ns >> Reaching safepoint: 3089284 ns >> Reaching safepoint: 2998303 ns >> Reaching safepoint: 4433713 ns <-- 4.4ms >> Reaching safepoint: 3368436 ns >> Reaching safepoint: 2986519 ns >> Reaching safepoint: 3269102 ns >> ... >> >> >> >> **Alternatives** >> >> I considered some other options for mitigating this. For example, could we reduce the time spent holding the lock in StartThread? Most of the time is spent managing the threads list for ThreadSMR support, and each time we add a thread to that list we need to copy the whole list and free every entry in the original, which is slow. But I didn't see an easy way to avoid this. >> I also looked at some kind of signal from the VM thread that it is ready to start synchronizing that StartThread could check before trying to grab Threads_lock, but I didn't find anything better than this extra lock. > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Fix lock ranking > I considered some other options for mitigating this. For example, could we reduce the time spent holding the lock in StartThread? Most of the time is spent managing the threads list for ThreadSMR support, and each time we add a thread to that list we need to copy the whole list and free every entry in the original, which is slow. But I didn't see an easy way to avoid this. I also looked at some kind of signal from the VM thread that it is ready to start synchronizing that StartThread could check before trying to grab Threads_lock, but I didn't find anything better than this extra lock. Maybe you have already considered it but an alternative is to use a priority lock. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21111#issuecomment-2368168989 From shade at openjdk.org Mon Sep 23 13:14:38 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 23 Sep 2024 13:14:38 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 09:23:29 GMT, David Holmes wrote: > The problem is we have completely different code paths that look at the different states of a class (loaded, linked, initialized, in-error) and those actions use different locks. This issue was, I thought, only about the lock-free fast-paths checking the "is initialized" state not anything else. These extra barriers could be completely redundant for "is loaded" or "is linked" or "is in error" checks. Right. I chose this code shape to make sure we cover _all_ paths that poll `init_state` to extra safety. We could, in principle, only protect `is_initialized()` path with the acquire. But I think we then start to depend on downstream code not doing "smart" things bypassing that check, for example polling `is_loaded() { _init_state >= loaded }` to implicitly (and too optimistically) check for `init_state >= fully_initialized`, or even doing `init_state() > being_initialized` somewhere. I would not discount the possibility that something somewhere would depend on pre-fully-initialized states to publish the intermediate class state. Looking around, I see some interesting uses in `InstanceKlass::methods_do`, `ClassLoaderData::methods_do`, `ClassLoaderData::loaded_classes_do`, `LoaderConstraintTable::find_constrained_klass`, ... It feels much safer to be extra paranoid here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2368197595 From erikj at openjdk.org Mon Sep 23 13:14:39 2024 From: erikj at openjdk.org (Erik Joelsson) Date: Mon, 23 Sep 2024 13:14:39 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:30:59 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> Thanks! >> >> This patch is based on https://github.com/openjdk/jdk/pull/20781 which added the sleef source (in particular the generated sleef inline headers). We use sleef api to vectorize the math operations in vector api. >> >> On machine with vector intrinsic support on riscv (e.g. gcc 14+) it will generate libsleef.so with the bridge functions to sleef api, otherwise without the bridge functions. >> >> ### Test >> test/jdk/jdk/incubator/vector >> >> ### Performance >> data on bananapi >> >> Benchmark - bananapi | (size) | Mode | Cnt | Score +intrinsic | Error +intrinsic | Score -intrinsic | Error -intrinsic | Units | Improvement >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Double128Vector.ACOS | 1024 | avgt | 10 | 112444.388 | 655.761 | 208554.742 | 1508.709 | ns/op | 1.855 >> Double128Vector.ASIN | 1024 | avgt | 10 | 104121.259 | 243.167 | 208314.499 | 2833.61 | ns/op | 2.001 >> Double128Vector.ATAN | 1024 | avgt | 10 | 136941.263 | 243.486 | 284024.53 | 2204.224 | ns/op | 2.074 >> Double128Vector.ATAN2 | 1024 | avgt | 10 | 163228.681 | 435.455 | 427589.587 | 3045.192 | ns/op | 2.62 >> Double128Vector.CBRT | 1024 | avgt | 10 | 146395.753 | 239.355 | 317136.654 | 1330.869 | ns/op | 2.166 >> Double128Vector.COS | 1024 | avgt | 10 | 154865.298 | 235.697 | 305721.518 | 1319.313 | ns/op | 1.974 >> Double128Vector.COSH | 1024 | avgt | 10 | 189212.943 | 262.399 | 220756.27 | 61324.863 | ns/op | 1.167 >> Double128Vector.EXP | 1024 | avgt | 10 | 113941.594 | 219.647 | 252853.07 | 891.272 | ns/op | 2.219 >> Double128Vector.EXPM1 | 1024 | avgt | 10 | 184552.939 | 513.715 | 254087.184 | 2144.997 | ns/op | 1.377 >> Double128Vector.HYPOT | 1024 | avgt | 10 | 111580.194 | 423.282 | 374537.338 | 2091.811 | ns/op | 3.357 >> Double128Vector.LOG | 1024 | avgt | 10 | 110680.548 | 192.731 | 265391.129 | 2653.519 | ns/op | 2.398 >> Double128Vector.LOG10 | 1024 | avgt | 10 | 116708.105 | 167.095 | 285764.405 | 2489.08 | ns/op | 2.449 >> Double128Vector.LOG1P | 1024 | avgt | 10 | 115633.302 | 567.7 | 317235.967 | 1062.848 | ns/op | 2.743 >> Double128Vector.POW | 1024 | avgt | 10 | 321655.14 | 3... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > modify cflags style Build changes look ok. ------------- Marked as reviewed by erikj (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21083#pullrequestreview-2322170469 From ogillespie at openjdk.org Mon Sep 23 14:00:39 2024 From: ogillespie at openjdk.org (Oli Gillespie) Date: Mon, 23 Sep 2024 14:00:39 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v3] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 13:05:35 GMT, Quan Anh Mai wrote: > Maybe you have already considered it but an alternative is to use a priority lock. Thanks. Is there an implementation available in hotspot? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21111#issuecomment-2368352150 From fbredberg at openjdk.org Mon Sep 23 14:00:45 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Mon, 23 Sep 2024 14:00:45 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v3] In-Reply-To: References: Message-ID: <_sp3TMOUO71tQAoMpW2QwB2i5Krvw2dyNWSt-Ex5nrs=.3ea2088d-592c-4e77-9ff5-324c1bccde27@github.com> On Mon, 23 Sep 2024 12:00:27 GMT, Coleen Phillimore wrote: >> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 222: >> >>> 220: // Check if the entry lists are empty. >>> 221: ldr(rscratch1, Address(tmp, ObjectMonitor::EntryList_offset())); >>> 222: ldr(tmpReg, Address(tmp, ObjectMonitor::cxq_offset())); >> >> What strikes me a bit is that the EntryList vs cxq loads have inconsistent ordering in the runtime and the different ports. Notably, the x86 port loads EntryList first while here for example we load cxq first. This adds some cognitive overhead reasoning about the implications, at least for me. In particular, with respect to the ordering we see here where EntryList is loaded first, the following race is possible (which is not possible in the x86 intrinsic): >> 1. Thread 1 enters the monitor. >> 2. Thread 2 tries to enter the monitor, fails, adds itself to cxq, tries again, and eventually parks. >> 3. Thread 1 starts exiting the monitor and runs all the instructions up to and including the EntryList load above on line 221, which yields an empty EntryList. >> 4. Thread 3 enters the monitor since it is no longer owned. >> 5. Thread 3 exits the monitor, and moves cxq to EntryList in the process while still holding the lock. >> 6. Thread 1 runs the next instruction loading cxq on line 222, resulting in an empty list. >> 7. Thread 1 draws the conclusion that there is no thread waiting for the monitor, even though Thread 2 has waited for the monitor since before Thread 2 released it. >> >> Even though this choice of ordering between reading EntryList and cxq allows a waiter to be completely unnoticed throughout the monitorexit procedure due to unfortunate interleavings, which is weird, the protocol deals with this just fine. Since Thread 2 managed to enter the monitor, the responsibility of ensuring liveness of the monitor becomes the responsibility of Thread 2 which will make Thread 1 the successor and unpark Thread 1. >> >> I'm not proposing we have the "other" more intuitive ordering of loading cxq first which removes this race so we don't have to think about it. Enforcing that ordering requires an extra loadload fence, which isn't free. I think what I would prefer to see, is that we at least use a consistent ordering so we don't have to think about why some platforms use one ordering while others use another one and what different races only affect some platforms. And I'd prefer if this less obvious ordering of reading EntryList before cxq is the championed ordering, with a comment in the shared code explainin... > > This should be in a follow-up RFE with this description. Thanks for the description. I'll create a new follow-up RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1771500290 From duke at openjdk.org Mon Sep 23 15:55:00 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Mon, 23 Sep 2024 15:55:00 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v13] In-Reply-To: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: > Hello, > > Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. > > The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. > > At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. > > # Performance > > ## Neoverse N1 > > > -------------------------------------------------------------------------------------------- > Version Baseline This patch > -------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > -------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op > ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op > ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op > ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op > ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op > ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op > ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op > ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op > ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op > ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op > ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op > ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op > ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op > ArraysHashCode.multibytes 10 avgt 15 5.4... Mikhail Ablakatov has updated the pull request incrementally with two additional commits since the last revision: - Add Assembler::load() and Assembler::store() methods - fixup: make Windows happy there's no potentially lossfull conversion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18487/files - new: https://git.openjdk.org/jdk/pull/18487/files/66b07903..091eecc5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=11-12 Stats: 62 lines in 3 files changed: 10 ins; 46 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/18487.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18487/head:pull/18487 PR: https://git.openjdk.org/jdk/pull/18487 From duke at openjdk.org Mon Sep 23 15:55:01 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Mon, 23 Sep 2024 15:55:01 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v10] In-Reply-To: <007crTrGW5cVq0iWhpvI2J0J6lI5CKC_xdVVwu4aSt8=.cd62acd6-f164-4487-bc0f-8bc661eccc85@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <007crTrGW5cVq0iWhpvI2J0J6lI5CKC_xdVVwu4aSt8=.cd62acd6-f164-4487-bc0f-8bc661eccc85@github.com> Message-ID: On Fri, 20 Sep 2024 09:50:36 GMT, Andrew Haley wrote: >> Mikhail Ablakatov has updated the pull request incrementally with four additional commits since the last revision: >> >> - cleanup: use switch-case instead of if-else statements and ternary operators >> - Don't try align basic blocks as it brings no measurable performance benefits >> - fixup: rename the newly added Vector-Scalar mulv to mulvs >> - fixup: fix Windows build by not using RELATIVE as an identifier > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 140: > >> 138: } >> 139: >> 140: void C2_MacroAssembler::arrays_hashcode_elload(Register dst, Address src, BasicType eltype) { > > This method has nothing to do with either arrays nor hashcode. Looking at class `Assembler`, it would make sense to have a general-purpose load that takes a `BasicType`. > > We already have this `Assembler` function: > > ` void ld_st2(Register Rt, const Address &adr, int size, int op, int V = 0)` > > where two bits of `op` gives us store and three versions of sign extension. > > Please use `ld_st2` to define a general-purpose `MacroAssembler::load` function with this set of arguments. > It was perhaps a historic mistake of ours not to use an assembler function with the size and the signedness of an operand as an argument to ldr/str. Added by https://github.com/openjdk/jdk/pull/18487/commits/091eecc5c2fd809ded7484baf9fea319732d9408 . Basically, I just used your older suggestion from here with a minor adjustment for `T_INT`: https://github.com/mikabl-arm/jdk/commit/3a52c7f89c293b79559201149f3159d5a8c831b6#r145057214 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1771711060 From mli at openjdk.org Mon Sep 23 16:25:02 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 23 Sep 2024 16:25:02 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v3] In-Reply-To: References: Message-ID: <43BqlxIZJ5r6flQcExtgr2OhOc3OPQtsGWeha0G8DeU=.f6ac2d19-50e5-4426-af45-5af367782c8a@github.com> > Hi, > Can you help to review this patch? > Thanks! > > This patch is based on https://github.com/openjdk/jdk/pull/20781 which added the sleef source (in particular the generated sleef inline headers). We use sleef api to vectorize the math operations in vector api. > > On machine with vector intrinsic support on riscv (e.g. gcc 14+) it will generate libsleef.so with the bridge functions to sleef api, otherwise without the bridge functions. > > ### Test > test/jdk/jdk/incubator/vector > > ### Performance > data on bananapi > > Benchmark - bananapi | (size) | Mode | Cnt | Score +intrinsic | Error +intrinsic | Score -intrinsic | Error -intrinsic | Units | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Double128Vector.ACOS | 1024 | avgt | 10 | 112444.388 | 655.761 | 208554.742 | 1508.709 | ns/op | 1.855 > Double128Vector.ASIN | 1024 | avgt | 10 | 104121.259 | 243.167 | 208314.499 | 2833.61 | ns/op | 2.001 > Double128Vector.ATAN | 1024 | avgt | 10 | 136941.263 | 243.486 | 284024.53 | 2204.224 | ns/op | 2.074 > Double128Vector.ATAN2 | 1024 | avgt | 10 | 163228.681 | 435.455 | 427589.587 | 3045.192 | ns/op | 2.62 > Double128Vector.CBRT | 1024 | avgt | 10 | 146395.753 | 239.355 | 317136.654 | 1330.869 | ns/op | 2.166 > Double128Vector.COS | 1024 | avgt | 10 | 154865.298 | 235.697 | 305721.518 | 1319.313 | ns/op | 1.974 > Double128Vector.COSH | 1024 | avgt | 10 | 189212.943 | 262.399 | 220756.27 | 61324.863 | ns/op | 1.167 > Double128Vector.EXP | 1024 | avgt | 10 | 113941.594 | 219.647 | 252853.07 | 891.272 | ns/op | 2.219 > Double128Vector.EXPM1 | 1024 | avgt | 10 | 184552.939 | 513.715 | 254087.184 | 2144.997 | ns/op | 1.377 > Double128Vector.HYPOT | 1024 | avgt | 10 | 111580.194 | 423.282 | 374537.338 | 2091.811 | ns/op | 3.357 > Double128Vector.LOG | 1024 | avgt | 10 | 110680.548 | 192.731 | 265391.129 | 2653.519 | ns/op | 2.398 > Double128Vector.LOG10 | 1024 | avgt | 10 | 116708.105 | 167.095 | 285764.405 | 2489.08 | ns/op | 2.449 > Double128Vector.LOG1P | 1024 | avgt | 10 | 115633.302 | 567.7 | 317235.967 | 1062.848 | ns/op | 2.743 > Double128Vector.POW | 1024 | avgt | 10 | 321655.14 | 36.55 | 560765.066 | 2669.33 | ns/op | 1.743 > Double128Vector.... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21083/files - new: https://git.openjdk.org/jdk/pull/21083/files/26a68071..f879fa2c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21083&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21083&range=01-02 Stats: 9 lines in 1 file changed: 9 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21083.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21083/head:pull/21083 PR: https://git.openjdk.org/jdk/pull/21083 From mli at openjdk.org Mon Sep 23 16:25:03 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 23 Sep 2024 16:25:03 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v2] In-Reply-To: References: Message-ID: <3qDkVnAvugamrnixutLNJPW25dtfZjcCqhnWS33lSVE=.3277081f-189d-4ff6-8bcd-456e8d4eb5af@github.com> On Mon, 23 Sep 2024 07:30:59 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> Thanks! >> >> This patch is based on https://github.com/openjdk/jdk/pull/20781 which added the sleef source (in particular the generated sleef inline headers). We use sleef api to vectorize the math operations in vector api. >> >> On machine with vector intrinsic support on riscv (e.g. gcc 14+) it will generate libsleef.so with the bridge functions to sleef api, otherwise without the bridge functions. >> >> ### Test >> test/jdk/jdk/incubator/vector >> >> ### Performance >> data on bananapi >> >> Benchmark - bananapi | (size) | Mode | Cnt | Score +intrinsic | Error +intrinsic | Score -intrinsic | Error -intrinsic | Units | Improvement >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Double128Vector.ACOS | 1024 | avgt | 10 | 112444.388 | 655.761 | 208554.742 | 1508.709 | ns/op | 1.855 >> Double128Vector.ASIN | 1024 | avgt | 10 | 104121.259 | 243.167 | 208314.499 | 2833.61 | ns/op | 2.001 >> Double128Vector.ATAN | 1024 | avgt | 10 | 136941.263 | 243.486 | 284024.53 | 2204.224 | ns/op | 2.074 >> Double128Vector.ATAN2 | 1024 | avgt | 10 | 163228.681 | 435.455 | 427589.587 | 3045.192 | ns/op | 2.62 >> Double128Vector.CBRT | 1024 | avgt | 10 | 146395.753 | 239.355 | 317136.654 | 1330.869 | ns/op | 2.166 >> Double128Vector.COS | 1024 | avgt | 10 | 154865.298 | 235.697 | 305721.518 | 1319.313 | ns/op | 1.974 >> Double128Vector.COSH | 1024 | avgt | 10 | 189212.943 | 262.399 | 220756.27 | 61324.863 | ns/op | 1.167 >> Double128Vector.EXP | 1024 | avgt | 10 | 113941.594 | 219.647 | 252853.07 | 891.272 | ns/op | 2.219 >> Double128Vector.EXPM1 | 1024 | avgt | 10 | 184552.939 | 513.715 | 254087.184 | 2144.997 | ns/op | 1.377 >> Double128Vector.HYPOT | 1024 | avgt | 10 | 111580.194 | 423.282 | 374537.338 | 2091.811 | ns/op | 3.357 >> Double128Vector.LOG | 1024 | avgt | 10 | 110680.548 | 192.731 | 265391.129 | 2653.519 | ns/op | 2.398 >> Double128Vector.LOG10 | 1024 | avgt | 10 | 116708.105 | 167.095 | 285764.405 | 2489.08 | ns/op | 2.449 >> Double128Vector.LOG1P | 1024 | avgt | 10 | 115633.302 | 567.7 | 317235.967 | 1062.848 | ns/op | 2.743 >> Double128Vector.POW | 1024 | avgt | 10 | 321655.14 | 3... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > modify cflags style > Build changes look ok. > > /reviewers 2 Thanks for your reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21083#issuecomment-2368771856 From mli at openjdk.org Mon Sep 23 16:25:03 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 23 Sep 2024 16:25:03 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v2] In-Reply-To: <24D2-W-fmlFZ4Ke2wLc-FPKnpskxNIa4aB7NL5ArI8U=.f0e5bdbd-697e-4aea-99ac-1472d14136f7@github.com> References: <24D2-W-fmlFZ4Ke2wLc-FPKnpskxNIa4aB7NL5ArI8U=.f0e5bdbd-697e-4aea-99ac-1472d14136f7@github.com> Message-ID: On Mon, 23 Sep 2024 09:09:46 GMT, Ludovic Henry wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> modify cflags style > > src/jdk.incubator.vector/linux/native/libsleef/lib/vector_math_rvv.c line 24: > >> 22: */ >> 23: >> 24: #ifdef __riscv_v_intrinsic > > It would be worth adding a comment on which version of the compiler would be affected, and what would then be the behavior Added some comments, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1771751510 From duke at openjdk.org Mon Sep 23 16:26:45 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 23 Sep 2024 16:26:45 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v12] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 21:15:11 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm >> >> Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup >> -- | -- | -- | -- >> MathBench.tanhDouble | 70900 | 95618 | 1.35x > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > fix is_intrinsic_supported to work properly Hello Vladimir (@vnkozlov), Could you please run the tests for this PR and let us know? We're hoping to integrate this PR soon. Thanks, Vamsi ------------- PR Comment: https://git.openjdk.org/jdk/pull/20657#issuecomment-2368777178 From luhenry at openjdk.org Mon Sep 23 16:44:14 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 23 Sep 2024 16:44:14 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v3] In-Reply-To: <43BqlxIZJ5r6flQcExtgr2OhOc3OPQtsGWeha0G8DeU=.f6ac2d19-50e5-4426-af45-5af367782c8a@github.com> References: <43BqlxIZJ5r6flQcExtgr2OhOc3OPQtsGWeha0G8DeU=.f6ac2d19-50e5-4426-af45-5af367782c8a@github.com> Message-ID: On Mon, 23 Sep 2024 16:25:02 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> Thanks! >> >> This patch is based on https://github.com/openjdk/jdk/pull/20781 which added the sleef source (in particular the generated sleef inline headers). We use sleef api to vectorize the math operations in vector api. >> >> On machine with vector intrinsic support on riscv (e.g. gcc 14+) it will generate libsleef.so with the bridge functions to sleef api, otherwise without the bridge functions. >> >> ### Test >> test/jdk/jdk/incubator/vector >> >> ### Performance >> data on bananapi >> >> Benchmark - bananapi | (size) | Mode | Cnt | Score +intrinsic | Error +intrinsic | Score -intrinsic | Error -intrinsic | Units | Improvement >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Double128Vector.ACOS | 1024 | avgt | 10 | 112444.388 | 655.761 | 208554.742 | 1508.709 | ns/op | 1.855 >> Double128Vector.ASIN | 1024 | avgt | 10 | 104121.259 | 243.167 | 208314.499 | 2833.61 | ns/op | 2.001 >> Double128Vector.ATAN | 1024 | avgt | 10 | 136941.263 | 243.486 | 284024.53 | 2204.224 | ns/op | 2.074 >> Double128Vector.ATAN2 | 1024 | avgt | 10 | 163228.681 | 435.455 | 427589.587 | 3045.192 | ns/op | 2.62 >> Double128Vector.CBRT | 1024 | avgt | 10 | 146395.753 | 239.355 | 317136.654 | 1330.869 | ns/op | 2.166 >> Double128Vector.COS | 1024 | avgt | 10 | 154865.298 | 235.697 | 305721.518 | 1319.313 | ns/op | 1.974 >> Double128Vector.COSH | 1024 | avgt | 10 | 189212.943 | 262.399 | 220756.27 | 61324.863 | ns/op | 1.167 >> Double128Vector.EXP | 1024 | avgt | 10 | 113941.594 | 219.647 | 252853.07 | 891.272 | ns/op | 2.219 >> Double128Vector.EXPM1 | 1024 | avgt | 10 | 184552.939 | 513.715 | 254087.184 | 2144.997 | ns/op | 1.377 >> Double128Vector.HYPOT | 1024 | avgt | 10 | 111580.194 | 423.282 | 374537.338 | 2091.811 | ns/op | 3.357 >> Double128Vector.LOG | 1024 | avgt | 10 | 110680.548 | 192.731 | 265391.129 | 2653.519 | ns/op | 2.398 >> Double128Vector.LOG10 | 1024 | avgt | 10 | 116708.105 | 167.095 | 285764.405 | 2489.08 | ns/op | 2.449 >> Double128Vector.LOG1P | 1024 | avgt | 10 | 115633.302 | 567.7 | 317235.967 | 1062.848 | ns/op | 2.743 >> Double128Vector.POW | 1024 | avgt | 10 | 321655.14 | 3... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > comment src/jdk.incubator.vector/linux/native/libsleef/lib/vector_math_rvv.c line 31: > 29: // bridge functions built in the library, otherwise no such fuctions in the library. > 30: // At runtime, if the library is found and bridge fuctions are found in the library, > 31: // then the java vector API will call into bridge functions and sleef, otherwise not. Suggestion: // At compile-time, if the current compiler does support vector intrinsics, bridge // functions will be built in the library. In case the current compiler doesn't support // vector intrinsics (gcc < 14), then the bridge functions won't be compiled. // At run-time, if the library is found and the bridge functions are available in the // library, then the java vector API will call into the bridge functions and sleef. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1771775908 From iklam at openjdk.org Mon Sep 23 17:07:56 2024 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 23 Sep 2024 17:07:56 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v12] In-Reply-To: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: > This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > **Overview** > > - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. > - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. > - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. > - The boot classes are loaded as part of `vmClasses::resolve_all()` > - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). > - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. > > **All-or-nothing Loading** > > - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. > - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: > - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. > - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. > - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. > - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exact same set of modules are visible whe... Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: - Merge branch 'jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver' into jep-483-step-03-8329706-implement-xx-aot-class-linking - @dholmes-ora comments - @dholmes-ora comments - Fixed ZERO build - minor comment fix - @ashu-mehra comment: move code outside of call_initPhase2(); also renamed BOOT/BOOT2 to BOOT1/BOOT2 and refactored code related to AOTLinkedClassCategory - @ashu-mehra reviews - @ashu-mehra comments - @adinn comments - @dholmes-ora comments: logging indents - ... and 5 more: https://git.openjdk.org/jdk/compare/49dbfa6a...6029b35f ------------- Changes: https://git.openjdk.org/jdk/pull/20843/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20843&range=11 Stats: 1787 lines in 47 files changed: 1630 ins; 57 del; 100 mod Patch: https://git.openjdk.org/jdk/pull/20843.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20843/head:pull/20843 PR: https://git.openjdk.org/jdk/pull/20843 From iklam at openjdk.org Mon Sep 23 17:38:42 2024 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 23 Sep 2024 17:38:42 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v11] In-Reply-To: <_L_sO08BUKwfLyb7gI92xLcxfkk--1rnb01Zqonavro=.fa2cfb32-3b5d-4b4f-8175-e5d51b7e1bb1@github.com> References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> <_L_sO08BUKwfLyb7gI92xLcxfkk--1rnb01Zqonavro=.fa2cfb32-3b5d-4b4f-8175-e5d51b7e1bb1@github.com> Message-ID: On Thu, 19 Sep 2024 05:37:17 GMT, David Holmes wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> @dholmes-ora comments > > src/hotspot/share/cds/aotClassLinker.cpp line 122: > >> 120: assert(CDSConfig::is_dumping_aot_linked_classes(), "sanity"); >> 121: >> 122: if (!SystemDictionaryShared::is_builtin(ik)) { > > What does this actually mean by "built-in"? Boot/Platform/App loaders. The meaning of "built-in" is documented at the top of systemDictionaryShared.hpp ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1771839512 From iklam at openjdk.org Mon Sep 23 17:44:41 2024 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 23 Sep 2024 17:44:41 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v9] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 18:35:22 GMT, Coleen Phillimore wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed typo > > src/hotspot/share/cds/aotClassInitializer.cpp line 41: > >> 39: } else if (ik->name()->equals("jdk/internal/constant/PrimitiveClassDescImpl") || >> 40: ik->name()->equals("jdk/internal/constant/ReferenceClassDescImpl") || >> 41: ik->name()->equals("java/lang/constant/ConstantDescs")) { > > Why not intern these strings as Symbols so you can test for == ? In subsequent PRs, more classes are added here in a somewhat ad-hoc fashion (in order to match the behavior of the Java code in the core library). Yes, this is not ideal but we don't have a clean solution yet. Hopefully we will come up with a better design later in Project Leyden. For the time being, as this list may need to be updated from time to time (as core lib is being updated), and some of the classes are used only in here, I am hesitant to add their names to vmSymbols.hpp. This function is called only when the AOT cache is dumped, and called once per class, so performance shouldn't matter. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20958#discussion_r1771845584 From sviswanathan at openjdk.org Mon Sep 23 18:27:45 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 23 Sep 2024 18:27:45 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v12] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Wed, 18 Sep 2024 07:21:52 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Incorporating review and documentation suggestions. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 2600: > 2598: assert ((vlen & (vlen -1)) == 0); > 2599: int twoVectorLenMask = (vlen << 1) - 1; > 2600: ByteVector wrapped_indexes = this.lanewise(VectorOperators.AND, twoVectorLenMask); This assert and the following AND forcing power of two vector length seems out of place in Java code. You could move the wrapping within the selectFromTwoVectorOp on similar lines as the PR #20634. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1771898190 From iklam at openjdk.org Mon Sep 23 18:54:12 2024 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 23 Sep 2024 18:54:12 GMT Subject: RFR: 8293337: Store method handle intrinsics in AOT cache [v5] In-Reply-To: <4fMDUSZRg0HcIiZmr-yqr7vleFXrD_zNXpdd_pfgHQ8=.4a679e86-b0f9-4101-bcc2-f49d8bcb417b@github.com> References: <4fMDUSZRg0HcIiZmr-yqr7vleFXrD_zNXpdd_pfgHQ8=.4a679e86-b0f9-4101-bcc2-f49d8bcb417b@github.com> Message-ID: > This is the 5th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store Java heap objects that have native pointers to the C++ `Method` objects returned by `SystemDictionary::find_method_handle_intrinsic()` > > These `Method` objects are created within the JVM. They do not belong to any actual Java classes. We store all these `Method` objects into the AOT cache, so that they can be referenced by other artifacts in the AOT cache. > > --- > See [here](https://bugs.openjdk.org/browse/JDK-8315737) for the sequence of dependent RFEs for implementing JEP 483. Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: @coleenp comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20959/files - new: https://git.openjdk.org/jdk/pull/20959/files/988f101c..66157452 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20959&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20959&range=03-04 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20959.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20959/head:pull/20959 PR: https://git.openjdk.org/jdk/pull/20959 From iklam at openjdk.org Mon Sep 23 18:54:14 2024 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 23 Sep 2024 18:54:14 GMT Subject: RFR: 8293337: Store method handle intrinsics in AOT cache [v4] In-Reply-To: References: <4fMDUSZRg0HcIiZmr-yqr7vleFXrD_zNXpdd_pfgHQ8=.4a679e86-b0f9-4101-bcc2-f49d8bcb417b@github.com> Message-ID: On Thu, 19 Sep 2024 19:22:33 GMT, Coleen Phillimore wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: >> >> - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics >> - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics >> - @vnkozlov comment - added NOT_CDS_RETURN >> - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics >> - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics >> - some clean up >> - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics >> - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics >> - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics >> - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics >> - ... and 1 more: https://git.openjdk.org/jdk/compare/4d11c6b5...988f101c > > src/hotspot/share/classfile/systemDictionary.cpp line 2095: > >> 2093: } >> 2094: } >> 2095: #endif > > Can you add // INCLUDE_CDS > > This is called at startup time before anything so it doesn't need the locking? I added the INCLUDE_CDS. I also added locking (or assert safepoint). They are probably not needed now, but will make the code safer in case someone tries to move or refactor it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20959#discussion_r1771928718 From ccheung at openjdk.org Mon Sep 23 19:06:42 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Mon, 23 Sep 2024 19:06:42 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v12] In-Reply-To: References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: On Mon, 23 Sep 2024 17:07:56 GMT, Ioi Lam wrote: >> This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> **Overview** >> >> - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. >> - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. >> - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. >> - The boot classes are loaded as part of `vmClasses::resolve_all()` >> - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). >> - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. >> >> **All-or-nothing Loading** >> >> - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. >> - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: >> - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. >> - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. >> - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. >> - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exa... > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Merge branch 'jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver' into jep-483-step-03-8329706-implement-xx-aot-class-linking > - @dholmes-ora comments > - @dholmes-ora comments > - Fixed ZERO build > - minor comment fix > - @ashu-mehra comment: move code outside of call_initPhase2(); also renamed BOOT/BOOT2 to BOOT1/BOOT2 and refactored code related to AOTLinkedClassCategory > - @ashu-mehra reviews > - @ashu-mehra comments > - @adinn comments > - @dholmes-ora comments: logging indents > - ... and 5 more: https://git.openjdk.org/jdk/compare/49dbfa6a...6029b35f Spotted a few nits. src/hotspot/share/cds/metaspaceShared.cpp line 1536: > 1534: if (lsh.is_enabled()) { > 1535: lsh.print("Using AOT-linked classes: %s (static archive: %s aot-linked classes", > 1536: CDSConfig::is_using_aot_linked_classes() ? "true" : "false", Suggestion: `BOOL_TO_STR(CDSConfig::is_using_aot_linked_classes()),` test/hotspot/jtreg/runtime/cds/appcds/jvmti/ClassFileLoadHookTest.java line 100: > 98: TestCommon.checkExec(out); > 99: > 100: // JEP 483: if dumped with -XX:+AOTClassLinking, cannot use archive when CFLH Suggestion: `..., cannot use archive when CFLH is enabled` test/lib/jdk/test/lib/cds/CDSAppTester.java line 50: > 48: public CDSAppTester(String name) { > 49: if (CDSTestUtils.DYNAMIC_DUMP) { > 50: throw new jtreg.SkippedException("Tests based on CDSAppTester should be excluded when -Dtest.dynamic.cds.archive is specified"); You could omit the `jtreg.` in the above throw statement. ------------- PR Review: https://git.openjdk.org/jdk/pull/20843#pullrequestreview-2322962760 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1771865540 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1771878531 PR Review Comment: https://git.openjdk.org/jdk/pull/20843#discussion_r1771871838 From kvn at openjdk.org Mon Sep 23 19:16:39 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 23 Sep 2024 19:16:39 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v12] In-Reply-To: References: Message-ID: On Thu, 19 Sep 2024 21:15:11 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm >> >> Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup >> -- | -- | -- | -- >> MathBench.tanhDouble | 70900 | 95618 | 1.35x > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > fix is_intrinsic_supported to work properly Looks good. I have only one nitpick. I will start testing. src/hotspot/share/c1/c1_Compiler.cpp line 170: > 168: case vmIntrinsics::_dcos: > 169: case vmIntrinsics::_dtan: > 170: #if defined(X86) Use `#ifdef AMD64` for x64 only ------------- PR Review: https://git.openjdk.org/jdk/pull/20657#pullrequestreview-2323102058 PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1771949759 From duke at openjdk.org Mon Sep 23 19:24:51 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 23 Sep 2024 19:24:51 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v13] In-Reply-To: References: Message-ID: > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: change ifdef from x86 to AMD64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20657/files - new: https://git.openjdk.org/jdk/pull/20657/files/5da2754a..4dc2e36a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20657&range=11-12 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20657/head:pull/20657 PR: https://git.openjdk.org/jdk/pull/20657 From duke at openjdk.org Mon Sep 23 19:24:51 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 23 Sep 2024 19:24:51 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v12] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 19:14:10 GMT, Vladimir Kozlov wrote: > Looks good. I have only one nitpick. I will start testing. Thank you Vladimir! > src/hotspot/share/c1/c1_Compiler.cpp line 170: > >> 168: case vmIntrinsics::_dcos: >> 169: case vmIntrinsics::_dtan: >> 170: #if defined(X86) > > Use `#ifdef AMD64` for x64 only Thanks Vladimir! Please see the code updated with `#ifdef AMD64`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20657#issuecomment-2369168165 PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1771961469 From mli at openjdk.org Mon Sep 23 19:32:10 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 23 Sep 2024 19:32:10 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v4] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > Thanks! > > This patch is based on https://github.com/openjdk/jdk/pull/20781 which added the sleef source (in particular the generated sleef inline headers). We use sleef api to vectorize the math operations in vector api. > > On machine with vector intrinsic support on riscv (e.g. gcc 14+) it will generate libsleef.so with the bridge functions to sleef api, otherwise without the bridge functions. > > ### Test > test/jdk/jdk/incubator/vector > > ### Performance > data on bananapi > > Benchmark - bananapi | (size) | Mode | Cnt | Score +intrinsic | Error +intrinsic | Score -intrinsic | Error -intrinsic | Units | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Double128Vector.ACOS | 1024 | avgt | 10 | 112444.388 | 655.761 | 208554.742 | 1508.709 | ns/op | 1.855 > Double128Vector.ASIN | 1024 | avgt | 10 | 104121.259 | 243.167 | 208314.499 | 2833.61 | ns/op | 2.001 > Double128Vector.ATAN | 1024 | avgt | 10 | 136941.263 | 243.486 | 284024.53 | 2204.224 | ns/op | 2.074 > Double128Vector.ATAN2 | 1024 | avgt | 10 | 163228.681 | 435.455 | 427589.587 | 3045.192 | ns/op | 2.62 > Double128Vector.CBRT | 1024 | avgt | 10 | 146395.753 | 239.355 | 317136.654 | 1330.869 | ns/op | 2.166 > Double128Vector.COS | 1024 | avgt | 10 | 154865.298 | 235.697 | 305721.518 | 1319.313 | ns/op | 1.974 > Double128Vector.COSH | 1024 | avgt | 10 | 189212.943 | 262.399 | 220756.27 | 61324.863 | ns/op | 1.167 > Double128Vector.EXP | 1024 | avgt | 10 | 113941.594 | 219.647 | 252853.07 | 891.272 | ns/op | 2.219 > Double128Vector.EXPM1 | 1024 | avgt | 10 | 184552.939 | 513.715 | 254087.184 | 2144.997 | ns/op | 1.377 > Double128Vector.HYPOT | 1024 | avgt | 10 | 111580.194 | 423.282 | 374537.338 | 2091.811 | ns/op | 3.357 > Double128Vector.LOG | 1024 | avgt | 10 | 110680.548 | 192.731 | 265391.129 | 2653.519 | ns/op | 2.398 > Double128Vector.LOG10 | 1024 | avgt | 10 | 116708.105 | 167.095 | 285764.405 | 2489.08 | ns/op | 2.449 > Double128Vector.LOG1P | 1024 | avgt | 10 | 115633.302 | 567.7 | 317235.967 | 1062.848 | ns/op | 2.743 > Double128Vector.POW | 1024 | avgt | 10 | 321655.14 | 36.55 | 560765.066 | 2669.33 | ns/op | 1.743 > Double128Vector.... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: refine comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21083/files - new: https://git.openjdk.org/jdk/pull/21083/files/f879fa2c..32eb54d5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21083&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21083&range=02-03 Stats: 5 lines in 1 file changed: 1 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21083.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21083/head:pull/21083 PR: https://git.openjdk.org/jdk/pull/21083 From mli at openjdk.org Mon Sep 23 19:32:11 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 23 Sep 2024 19:32:11 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v3] In-Reply-To: References: <43BqlxIZJ5r6flQcExtgr2OhOc3OPQtsGWeha0G8DeU=.f6ac2d19-50e5-4426-af45-5af367782c8a@github.com> Message-ID: On Mon, 23 Sep 2024 16:40:55 GMT, Ludovic Henry wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> comment > > src/jdk.incubator.vector/linux/native/libsleef/lib/vector_math_rvv.c line 31: > >> 29: // bridge functions built in the library, otherwise no such fuctions in the library. >> 30: // At runtime, if the library is found and bridge fuctions are found in the library, >> 31: // then the java vector API will call into bridge functions and sleef, otherwise not. > > Suggestion: > > // At compile-time, if the current compiler does support vector intrinsics, bridge > // functions will be built in the library. In case the current compiler doesn't support > // vector intrinsics (gcc < 14), then the bridge functions won't be compiled. > // At run-time, if the library is found and the bridge functions are available in the > // library, then the java vector API will call into the bridge functions and sleef. Thanks for the suggestion, it's much better! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1771970020 From gcao at openjdk.org Tue Sep 24 00:15:43 2024 From: gcao at openjdk.org (Gui Cao) Date: Tue, 24 Sep 2024 00:15:43 GMT Subject: RFR: 8340643: RISC-V: Small refactoring for sub/subw macro-assembler routines Message-ID: Hi, please help review that, small refactoring for sub/subw macro-assembler routines. ### Testing - [x] Run tier1 tests on SOPHON SG2042 (release) ------------- Commit messages: - Polishing - Polishing - 8340643: RISC-V: Small refactoring for sub/subw macro-assembler routines Changes: https://git.openjdk.org/jdk/pull/21135/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21135&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340643 Stats: 14 lines in 1 file changed: 0 ins; 12 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21135.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21135/head:pull/21135 PR: https://git.openjdk.org/jdk/pull/21135 From dholmes at openjdk.org Tue Sep 24 00:20:50 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 24 Sep 2024 00:20:50 GMT Subject: RFR: 8340707: ProblemList applications/ctw/modules/java_base.java due to JDK-8340683 Message-ID: Please review Thanks ------------- Commit messages: - 8340707: ProblemList applications/ctw/modules/java_base.java Changes: https://git.openjdk.org/jdk/pull/21146/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21146&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340707 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21146.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21146/head:pull/21146 PR: https://git.openjdk.org/jdk/pull/21146 From darcy at openjdk.org Tue Sep 24 00:26:34 2024 From: darcy at openjdk.org (Joe Darcy) Date: Tue, 24 Sep 2024 00:26:34 GMT Subject: RFR: 8340707: ProblemList applications/ctw/modules/java_base.java due to JDK-8340683 In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 00:11:52 GMT, David Holmes wrote: > Please review > > Thanks Marked as reviewed by darcy (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21146#pullrequestreview-2323735258 From dholmes at openjdk.org Tue Sep 24 00:40:39 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 24 Sep 2024 00:40:39 GMT Subject: RFR: 8340707: ProblemList applications/ctw/modules/java_base.java due to JDK-8340683 In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 00:23:42 GMT, Joe Darcy wrote: >> Please review >> >> Thanks > > Marked as reviewed by darcy (Reviewer). Thanks @jddarcy ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21146#issuecomment-2369864598 From dholmes at openjdk.org Tue Sep 24 00:40:40 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 24 Sep 2024 00:40:40 GMT Subject: Integrated: 8340707: ProblemList applications/ctw/modules/java_base.java due to JDK-8340683 In-Reply-To: References: Message-ID: <8wU_xUrTmiVNzMpOuUGReBGIvzW1Vv_xJekD7h7K7EQ=.adcd8646-8bae-40e4-b9c1-c8274323a280@github.com> On Tue, 24 Sep 2024 00:11:52 GMT, David Holmes wrote: > Please review > > Thanks This pull request has now been integrated. Changeset: c8ae8480 Author: David Holmes URL: https://git.openjdk.org/jdk/commit/c8ae8480496d56a8e51b9f5a6df50c70a429672f Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8340707: ProblemList applications/ctw/modules/java_base.java due to JDK-8340683 Reviewed-by: darcy ------------- PR: https://git.openjdk.org/jdk/pull/21146 From dlong at openjdk.org Tue Sep 24 00:50:42 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 24 Sep 2024 00:50:42 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v3] In-Reply-To: References: Message-ID: <0Dr860QgmZaGHq1QGgz5bqKLpiwVSZL-lDOV1JNjkdk=.1c09c464-e9cd-4f66-88c1-2b97e3a9f7ce@github.com> On Mon, 23 Sep 2024 11:28:10 GMT, Oli Gillespie wrote: >> Mitigate the impact of JVM_StartThread on safepoint synchronization, by adding a new ThreadStart_lock which limits the number of JVM_StartThread invocations competing for Threads_lock at any given time to 1. >> This gives a VM thread trying to call a safepoint a much better chance of acquiring Threads_lock when there are many JVM_StartThread invocations in flight, at the cost of one extra lock/unlock for every new thread. >> >> Can be disabled with new diagnostic flag `-XX:-UseExtraThreadStartLock`. >> >> Before (ThreadStartTtsp.java is shared in JDK-8340547): >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 1291591 ns >> Reaching safepoint: 59962 ns >> Reaching safepoint: 1958065 ns >> Reaching safepoint: 14456666258 ns <-- 14 seconds! >> ... >> >> >> After: >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 214269 ns >> Reaching safepoint: 60253 ns >> Reaching safepoint: 2040680 ns >> Reaching safepoint: 3089284 ns >> Reaching safepoint: 2998303 ns >> Reaching safepoint: 4433713 ns <-- 4.4ms >> Reaching safepoint: 3368436 ns >> Reaching safepoint: 2986519 ns >> Reaching safepoint: 3269102 ns >> ... >> >> >> >> **Alternatives** >> >> I considered some other options for mitigating this. For example, could we reduce the time spent holding the lock in StartThread? Most of the time is spent managing the threads list for ThreadSMR support, and each time we add a thread to that list we need to copy the whole list and free every entry in the original, which is slow. But I didn't see an easy way to avoid this. >> I also looked at some kind of signal from the VM thread that it is ready to start synchronizing that StartThread could check before trying to grab Threads_lock, but I didn't find anything better than this extra lock. > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Fix lock ranking If JVM_StartThread is only called by Thread.start0, then how about putting the new lock in Java instead? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21111#issuecomment-2369876543 From iklam at openjdk.org Tue Sep 24 00:52:52 2024 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 24 Sep 2024 00:52:52 GMT Subject: RFR: 8293337: Store method handle intrinsics in AOT cache [v6] In-Reply-To: <4fMDUSZRg0HcIiZmr-yqr7vleFXrD_zNXpdd_pfgHQ8=.4a679e86-b0f9-4101-bcc2-f49d8bcb417b@github.com> References: <4fMDUSZRg0HcIiZmr-yqr7vleFXrD_zNXpdd_pfgHQ8=.4a679e86-b0f9-4101-bcc2-f49d8bcb417b@github.com> Message-ID: > This is the 5th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store Java heap objects that have native pointers to the C++ `Method` objects returned by `SystemDictionary::find_method_handle_intrinsic()` > > These `Method` objects are created within the JVM. They do not belong to any actual Java classes. We store all these `Method` objects into the AOT cache, so that they can be referenced by other artifacts in the AOT cache. > > --- > See [here](https://bugs.openjdk.org/browse/JDK-8315737) for the sequence of dependent RFEs for implementing JEP 483. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 294 additional commits since the last revision: - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap - Merge branch 'jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver' into jep-483-step-03-8329706-implement-xx-aot-class-linking - Merge branch 'jep-483-step-01-8338017-add-aot-command-line-aliases' into jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver - Merge branch 'master' into jep-483-step-01-8338017-add-aot-command-line-aliases - 8339895: Open source several AWT focus tests - series 3 Reviewed-by: prr - 8339192: Native annotation parsing code of deprecated annotations causes crash Reviewed-by: jrose, mgronlun - 8340480: Bad copyright notices in changes from JDK-8339902 Reviewed-by: kcr, bpb, kizune - 8340353: Remove CompressedOops::ptrs_base Reviewed-by: stefank, coleenp, shade, mli - 8339902: Open source couple TextField related tests Reviewed-by: honkar - ... and 284 more: https://git.openjdk.org/jdk/compare/d354ee52...59dd8879 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20959/files - new: https://git.openjdk.org/jdk/pull/20959/files/66157452..59dd8879 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20959&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20959&range=04-05 Stats: 174327 lines in 1399 files changed: 158717 ins; 8139 del; 7471 mod Patch: https://git.openjdk.org/jdk/pull/20959.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20959/head:pull/20959 PR: https://git.openjdk.org/jdk/pull/20959 From kvn at openjdk.org Tue Sep 24 01:04:38 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 24 Sep 2024 01:04:38 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v13] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 19:24:51 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm >> >> Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup >> -- | -- | -- | -- >> MathBench.tanhDouble | 70900 | 95618 | 1.35x > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > change ifdef from x86 to AMD64 My testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20657#pullrequestreview-2323769774 From fyang at openjdk.org Tue Sep 24 02:13:35 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 24 Sep 2024 02:13:35 GMT Subject: RFR: 8340643: RISC-V: Small refactoring for sub/subw macro-assembler routines In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 12:39:07 GMT, Gui Cao wrote: > Hi, please help review that, small refactoring for sub/subw macro-assembler routines. > > ### Testing > - [x] Run tier1 tests on SOPHON SG2042 (release) Good to see duplicate code removed. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21135#pullrequestreview-2323853990 From duke at openjdk.org Tue Sep 24 03:39:36 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 24 Sep 2024 03:39:36 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v13] In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 01:01:54 GMT, Vladimir Kozlov wrote: > My testing passed. Thank You Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20657#issuecomment-2370044599 From duke at openjdk.org Tue Sep 24 03:39:37 2024 From: duke at openjdk.org (duke) Date: Tue, 24 Sep 2024 03:39:37 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v13] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 19:24:51 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm >> >> Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup >> -- | -- | -- | -- >> MathBench.tanhDouble | 70900 | 95618 | 1.35x > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > change ifdef from x86 to AMD64 @vamsi-parasa Your change (at version 4dc2e36a8a2897d0a39d30a5580b18fbd9e5baf5) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20657#issuecomment-2370047322 From dholmes at openjdk.org Tue Sep 24 05:09:35 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 24 Sep 2024 05:09:35 GMT Subject: RFR: 8328313: Archived module graph should allow identical --module-path to be specified during dump time and run time [v3] In-Reply-To: References: Message-ID: On Fri, 20 Sep 2024 18:16:57 GMT, Calvin Cheung wrote: >> Prior to this patch, if `--module-path` is specified in the command line: >> during CDS dump time, full module graph will not be included in the CDS archive; >> during run time, full module graph will not be used. >> >> With this patch, the full module graph will be included in the CDS archive with the `--module-path` option. During run time, if the same `--module-path` option is specified, the archived module graph will be used. >> >> The checking of module paths between dump time and run time is more lenient compared with the checking of class paths; the ordering of the modules is unimportant, duplicate module names are ignored. >> E.g. the following is considered a match: >> dump time runtime >> m1,m2 m2,m1 >> m1,m2 m1,m2,m2 >> >> I included some [notes](https://bugs.openjdk.org/browse/JDK-8328313?focusedId=14699275&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14699275) in the bug report regarding some changes in the corelib classes. > > Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: > > trailing whitespace src/java.base/share/classes/jdk/internal/module/ModuleBootstrap.java line 481: > 479: cf, > 480: clf, > 481: mainModule); This was correctly aligned before, now it isn't. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1772609469 From rehn at openjdk.org Tue Sep 24 05:28:36 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 24 Sep 2024 05:28:36 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v4] In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 03:45:32 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> refine comment > > src/hotspot/cpu/riscv/assembler_riscv.hpp line 51: > >> 49: n_int_register_parameters_c = 8, // x10, x11, ... x17 (c_rarg0, c_rarg1, ...) >> 50: n_float_register_parameters_c = 8, // f10, f11, ... f17 (c_farg0, c_farg1, ... ) >> 51: n_vector_register_parameters_c = 8, // v8, v9, ... v15 > > I know vector registers are not used for passing arguments or return values by the RISCV ABI for now. But I guess it might be better and consistent to align with the numbering of integer and floating-point argument registers (x10 - x17, f10 - f17)? That is v10 - v17. Note in the RISC-V ELF psABI there is a convetion variant for v-regs. If you add function attribute riscv_vector_cc it should be used for C/C++. (I never tested it) v0 = first vector mask argument v8-v23 = args/rets v1-v7/v24-v31 = caller saved https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1772630985 From aboldtch at openjdk.org Tue Sep 24 05:51:06 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 24 Sep 2024 05:51:06 GMT Subject: RFR: 8340422: ZGC: TestAllocateHeapAt.java should not run with transparent hugepages [v2] In-Reply-To: <7DtNgJ7IORWdKXdZwQsKuWLDG8uZJmGLAQaoFbGcg9I=.95486530-8384-4f65-b0a4-8793139078dd@github.com> References: <7DtNgJ7IORWdKXdZwQsKuWLDG8uZJmGLAQaoFbGcg9I=.95486530-8384-4f65-b0a4-8793139078dd@github.com> Message-ID: > Similarly to [JDK-8340146](https://bugs.openjdk.org/browse/JDK-8340146) / #21127 does not work well with transparent hugepages. > > Because a machine may be configured in such a way that UseTransperetHugePages option gets ignored, the test driver must also check if it will be. As such I extracted the `test/hotspot/jtreg/runtime/os/HugePageConfiguration.java` utility into shared test library. > > On none linux machines the `vm.opt.final.UseTransparentHugePages` will be null, but the test checks `os.family == "linux"` first. I have not observed an issue with the JTREG filter on none linux machines. But will double check that it does not cause an issue. Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge remote-tracking branch 'upstream_jdk/master' into JDK-8340422 - 8340422: ZGC: TestAllocateHeapAt.java should not run with transparent hugepages ------------- Changes: https://git.openjdk.org/jdk/pull/21129/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21129&range=01 Stats: 44 lines in 7 files changed: 41 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21129.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21129/head:pull/21129 PR: https://git.openjdk.org/jdk/pull/21129 From dholmes at openjdk.org Tue Sep 24 05:54:41 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 24 Sep 2024 05:54:41 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: <2oh0M5OXUp35ibGjTLXkJFyDEjF1zt3816WdKpB98tQ=.feabfe80-c6b8-430d-a0b9-30cb4149680a@github.com> On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. Current patch makes a seqcst write, which is stronger than strictly necessary. I think it is okay to be extra paranoid on rarely-executed class initialization path. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release Well I don't like "paranoid" code when it comes to concurrency for the reason I already gave. I think part of the problem here is that so many different locks are involved in the different stages of class loading, linking and initialization, that it can be unclear when you've zoomed in exactly which lock should be part of the code path you're dealing with (e.g the loader constraint table code is protected by the SD lock so the checking of the `is_loaded` state is not lock-free). But this code is functionally correct so the only potential harm here (other than complicating code understanding) is to performance, which we will just have to keep an eye on. FYI I'm away for the next couple of days. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21110#pullrequestreview-2324129700 From rehn at openjdk.org Tue Sep 24 06:00:37 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 24 Sep 2024 06:00:37 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v4] In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 03:50:29 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> refine comment > > src/hotspot/cpu/riscv/riscv.ad line 10078: > >> 10076: match(CallLeafVector); >> 10077: >> 10078: effect(USE meth); > > It's possible for the runtime call to clobber rFlagsReg `cr` (aka `t1`). So safer to add `KILL cr` to the effect. Good find, it will definitely be clobbered. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1772674338 From fyang at openjdk.org Tue Sep 24 07:02:39 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 24 Sep 2024 07:02:39 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v4] In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 05:25:38 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/assembler_riscv.hpp line 51: >> >>> 49: n_int_register_parameters_c = 8, // x10, x11, ... x17 (c_rarg0, c_rarg1, ...) >>> 50: n_float_register_parameters_c = 8, // f10, f11, ... f17 (c_farg0, c_farg1, ... ) >>> 51: n_vector_register_parameters_c = 8, // v8, v9, ... v15 >> >> I know vector registers are not used for passing arguments or return values by the RISCV ABI for now. But I guess it might be better and consistent to align with the numbering of integer and floating-point argument registers (x10 - x17, f10 - f17)? That is v10 - v17. > > Note in the RISC-V ELF psABI there is a convetion variant for v-regs. > If you add function attribute riscv_vector_cc it should be used for C/C++. (I never tested it) > v0 = first vector mask argument > v8-v23 = args/rets > v1-v7/v24-v31 = caller saved > > https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc Ah, I think I missed that. I was reading psABI spec 1.0 release. Thanks for this info. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1772742406 From jbhateja at openjdk.org Tue Sep 24 07:10:24 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 24 Sep 2024 07:10:24 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Handling NPOT vector length for AArch64 SVE with vector sizes varying b/w 128 and 2048 bits at 128 bit increments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/31a58642..42ca80c5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=11-12 Stats: 225 lines in 41 files changed: 25 ins; 82 del; 118 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From stefank at openjdk.org Tue Sep 24 07:17:37 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 24 Sep 2024 07:17:37 GMT Subject: RFR: 8340422: ZGC: TestAllocateHeapAt.java should not run with transparent hugepages [v2] In-Reply-To: References: <7DtNgJ7IORWdKXdZwQsKuWLDG8uZJmGLAQaoFbGcg9I=.95486530-8384-4f65-b0a4-8793139078dd@github.com> Message-ID: On Tue, 24 Sep 2024 05:51:06 GMT, Axel Boldt-Christmas wrote: >> Similarly to [JDK-8340146](https://bugs.openjdk.org/browse/JDK-8340146) / #21127 does not work well with transparent hugepages. >> >> Because a machine may be configured in such a way that UseTransperetHugePages option gets ignored, the test driver must also check if it will be. As such I extracted the `test/hotspot/jtreg/runtime/os/HugePageConfiguration.java` utility into shared test library. >> >> On none linux machines the `vm.opt.final.UseTransparentHugePages` will be null, but the test checks `os.family == "linux"` first. I have not observed an issue with the JTREG filter on none linux machines. But will double check that it does not cause an issue. > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge remote-tracking branch 'upstream_jdk/master' into JDK-8340422 > - 8340422: ZGC: TestAllocateHeapAt.java should not run with transparent hugepages Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21129#pullrequestreview-2324287949 From duke at openjdk.org Tue Sep 24 07:21:37 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Tue, 24 Sep 2024 07:21:37 GMT Subject: RFR: 8340363: Tag-specific default decorators for UnifiedLogging In-Reply-To: References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: On Thu, 19 Sep 2024 01:33:53 GMT, David Holmes wrote: >> Hi all, >> >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: >> >> - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. >> - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. >> - Additionally, defaults may target a specific log level. >> >> Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. >> >> In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. >> >> Please consider this PR, and thanks! > > This affects all hotspot developers using UL so extending coverage: @dholmes-ora in addition to the socialising comments that Roberto and Johan have already responded, I'll try to clarify the PR a little (I've updated the title as well to make it clearer): - `Xlog:jit+inlining` is the same as `Xlog:inlining+jit`, and will get the same treatment - Wildcards have been chosen to be ignored as you could potentially match too many defaults. In the end, these defaults only attempt to offer a "help" to developers using the same specific `-Xlog` option and that don't want to specify the tagset they are interested on every time - I am not planning on changing the design idea of "decorators associated with output device". This PR enables having defaults for `-Xlog`-selected tagsets, but once the output device is configured we will end up with the same decorators throughout it (does not matter if it is stdout or a real file) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20988#issuecomment-2370395774 From duke at openjdk.org Tue Sep 24 07:27:36 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Tue, 24 Sep 2024 07:27:36 GMT Subject: RFR: 8340363: Tag-specific default decorators for UnifiedLogging In-Reply-To: References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: On Thu, 19 Sep 2024 07:08:56 GMT, Roberto Casta?eda Lozano wrote: >> Hi all, >> >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: >> >> - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. >> - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. >> - Additionally, defaults may target a specific log level. >> >> Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. >> >> In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. >> >> Please consider this PR, and thanks! > > Nice proposal, Ant?n! This will make it possible to migrate lots of debug/trace-level ad-hoc logging in the compiler code to the UL while preserving its current format (e.g. time decorators are hardly needed when examining the output of `-XX:+TraceLoopOpts`). > > Having said this, I find the following behavior unintuitive. If I run: > > > -Xlog:jit*=debug > > > I get the global default decorators, i.e. `uptime,level,tags`, which is what I expected. But if I run: > > > java -Xlog:jit+compilation=debug,jit+inlining=debug,jit+thread=debug > > > I would expect to get the same decorators, but instead I get the default decorators for `jit+inlining`, i.e. none. Is this intentional? > > In general, as a HotSpot developer the behavior I would find most natural is to select the union of all decorators for all chosen tags (regardless of whether the decorators for a tag have been chosen actively by the user, specified as default for the tag, or "inherited" from the global default), as in the first option (`-Xlog:jit*=debug`). @robcasloz Originally I had thought of these defaults "taking over" if there were no defaults for the rest, but I know what you mean and in a way the uptime-level-tags are some implicit defaults that should also be applied. Merging all default specification as suggested by @jdksjolen would automatically enable this behaviour ------------- PR Comment: https://git.openjdk.org/jdk/pull/20988#issuecomment-2370406970 From mli at openjdk.org Tue Sep 24 07:27:48 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 24 Sep 2024 07:27:48 GMT Subject: RFR: 8340732: RISC-V: Refactor crc32 scalar version Message-ID: Hi, Can you help to review this patch? As discussed in?https://github.com/openjdk/jdk/pull/20910#discussion_r1755150447,?it's helpful to refactor the existing scalar version of crc32 intrinsic. Several refactoring are done in this pr, 1. Simplify the `len` usage, now it only decreases (i.e. change in one direction) 2. Simplify the code paths 3. Remove one instruction in `L_by4_loop` 4. Remove unnecessary code 5. Other misc Thanks! ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/21150/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21150&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340732 Stats: 57 lines in 2 files changed: 9 ins; 26 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/21150.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21150/head:pull/21150 PR: https://git.openjdk.org/jdk/pull/21150 From dholmes at openjdk.org Tue Sep 24 07:46:39 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 24 Sep 2024 07:46:39 GMT Subject: RFR: 8340363: Tag-specific default decorators for UnifiedLogging In-Reply-To: References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: On Tue, 24 Sep 2024 07:19:21 GMT, Ant?n Seoane wrote: > I am not planning on changing the design idea of "decorators associated with output device". This PR enables having defaults for -Xlog-selected tagsets, but once the output device is configured we will end up with the same decorators throughout Sorry can you please clarify exactly how these compose. If I set one set of defaults for tags A+B and another set for C+D, then what happens if I specify `-Xlog:A+B,C+D`? And what happens if I configure decorators for say stdout and in addition enable A+B on stdout - what is the resulting set of decorators? Thanks ------------- PR Comment: https://git.openjdk.org/jdk/pull/20988#issuecomment-2370444955 From duke at openjdk.org Tue Sep 24 08:38:37 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Tue, 24 Sep 2024 08:38:37 GMT Subject: RFR: 8340363: Tag-specific default decorators for UnifiedLogging In-Reply-To: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: On Fri, 13 Sep 2024 09:03:55 GMT, Ant?n Seoane wrote: > Hi all, > > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: > > - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. > - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. > - Additionally, defaults may target a specific log level. > > Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. > > In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. > > Please consider this PR, and thanks! These defaults are not meant to target a specific selected output, so nothing different would occur. With respect to the first question, right now we would not get any defaults applied as there is a "collision" between A+B and C+D. That was my original idea, where I assumed it might be unwanted to apply all the possible defaults one over another. However, now (a) I don't think we will have that many defaults to drive this to chaos, and (b) as per @robcasloz feedback I believe it would be useful to apply all. Going back to your question: right now we do not apply any defaults upon that `-Xlog:A+B,C+D`, but I am working on some changes I will push soon that will change this behaviour to "merge" the default decorators for A+B and C+D ------------- PR Comment: https://git.openjdk.org/jdk/pull/20988#issuecomment-2370628280 From rcastanedalo at openjdk.org Tue Sep 24 09:01:53 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 24 Sep 2024 09:01:53 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: <6UJOrZqmfsJj6pRzMjPdlYt191QgBV6fIv1qJAYsv60=.15284272-464f-4321-b76c-3412dafc6c63@github.com> Message-ID: On Fri, 20 Sep 2024 15:26:36 GMT, Roman Kennke wrote: >> I tried to reproduce for a few hours now using a custom testcase, with no success. >> I am pretty sure that this can happen, that is why I added this code. Originally I had an assert there asserting that index is not used. I do remember that this happens very rarely, and I don't remember the exact condition. Looking at the possible operands in opclass memory, I think this can only happen when we load an nKlass from an address of the form [rX, rY], i.e. the address in rX indexed by rY. This is an odd thing to happen for loadNKlass, I think, because rY should always be klass_offset_in_bytes. Maybe this is possible when we get odd address merges where we get a PhiNode as the offset/index? I don't know. >> I agree, this *might* lead to surprising problems with implicit null-checking, if it is expected that the first instruction in loadNKlass provokes the SIGSEGV. A way around this would be to declare an opclass that is a subset of 'memory' that excludes all operands with index, and match on that. I think this would force the lea as a separate instruction and ensure that we never see such a thing in loadNKlass. However, I would not feel very confident to do that without a reproducer. Let me dig a little further. >> >> For reference, here is my unsuccessful reproducer: https://gist.github.com/rkennke/8a57610d74fcde07a9390f268ec35738 > > Something like this is what I have in mind. It seems to pass tier1 tests. I still haven't managed to reproduce the path that requires an index register, though. > https://github.com/rkennke/jdk/commit/2c4a7877e4ef94017c8155578d8cfc9342441656 Thanks for the update! If there is a path requiring an index register, I would agree on limiting the memory opclass to exclude indices as you suggest. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1772945253 From ogillespie at openjdk.org Tue Sep 24 09:29:38 2024 From: ogillespie at openjdk.org (Oli Gillespie) Date: Tue, 24 Sep 2024 09:29:38 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v3] In-Reply-To: <0Dr860QgmZaGHq1QGgz5bqKLpiwVSZL-lDOV1JNjkdk=.1c09c464-e9cd-4f66-88c1-2b97e3a9f7ce@github.com> References: <0Dr860QgmZaGHq1QGgz5bqKLpiwVSZL-lDOV1JNjkdk=.1c09c464-e9cd-4f66-88c1-2b97e3a9f7ce@github.com> Message-ID: On Tue, 24 Sep 2024 00:47:52 GMT, Dean Long wrote: > If JVM_StartThread is only called by Thread.start0, then how about putting the new lock in Java instead? What benefit do you see of that? One downside is that the lock will be coarser than necessary. I'd rather keep the lock as tightly scoped as possible. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21111#issuecomment-2370752972 From fbredberg at openjdk.org Tue Sep 24 09:51:40 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Tue, 24 Sep 2024 09:51:40 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v3] In-Reply-To: <_sp3TMOUO71tQAoMpW2QwB2i5Krvw2dyNWSt-Ex5nrs=.3ea2088d-592c-4e77-9ff5-324c1bccde27@github.com> References: <_sp3TMOUO71tQAoMpW2QwB2i5Krvw2dyNWSt-Ex5nrs=.3ea2088d-592c-4e77-9ff5-324c1bccde27@github.com> Message-ID: On Mon, 23 Sep 2024 13:58:26 GMT, Fredrik Bredberg wrote: >> This should be in a follow-up RFE with this description. Thanks for the description. > > I'll create a new follow-up RFE. [JDK-8340796](https://bugs.openjdk.org/browse/JDK-8340796) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1773021466 From adinn at openjdk.org Tue Sep 24 09:53:44 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 24 Sep 2024 09:53:44 GMT Subject: RFR: 8337987: Relocate jfr and throw_exception stubs from StubGenerator to SharedRuntime [v3] In-Reply-To: References: <8skBH3HkEp_YKU16db-RAUNwZ2o9uPIClIm5JZOe42s=.dd09269a-abe9-4397-9813-086172ffa418@github.com> <07DqhAfjMD9qfeno10HOAuNBeiIul86acqTMpE6YtaY=.2569accb-c0ab-470f-b348-5894831be5d5@github.com> Message-ID: On Fri, 20 Sep 2024 14:22:12 GMT, Hao Sun wrote: >> @RealFYang Thanks! > > Hi @adinn , I encountered Client build failure on AArch64 after this commit. Could you help take a look at it when you have spare time? Thanks. > > Here shows the configuration > > ==================================================== > The existing configuration has been successfully updated in > /tmp/test123/build-release > using configure arguments '--with-debug-level=release --with-version-opt=git-fe80618bf3f --with-jvm-variants=client'. > > Configuration summary: > * Name: /tmp/test123/build-release > * Debug level: release > * HS debug level: product > * JVM variants: client > * JVM features: client: 'cds compiler1 epsilongc g1gc jfr jni-check jvmti management parallelgc serialgc services shenandoahgc vm-structs zgc' > * OpenJDK target: OS: linux, CPU architecture: aarch64, address length: 64 > * Version string: 24-internal-git-fe80618bf3f (24-internal) > * Source date: 1726841146 (2024-09-20T14:05:46Z) > > Tools summary: > * Boot JDK: openjdk version "22.0.2" 2024-07-16 OpenJDK Runtime Environment (build 22.0.2+9-70) OpenJDK 64-Bit Server VM (build 22.0.2+9-70, mixed mode, sharing) (at /usr/lib/jvm/jdk-22.0.2) > * Toolchain: gcc (GNU Compiler Collection) > * C Compiler: Version 13.2.0 (at /usr/bin/gcc) > * C++ Compiler: Version 13.2.0 (at /usr/bin/g++) > > Build performance summary: > * Build jobs: 72 > * Memory limit: 587068 MB > > > And here shows the error msg: > > === Output from failing command(s) repeated here === > * For target hotspot_variant-client_libjvm_objs_sharedRuntime_aarch64.o: > /tmp/test123/jdk-src/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp: In static member function ?static RuntimeStub* SharedRuntime::generate_throw_exception(const char*, address)?: > /tmp/test123/jdk-src/src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp:2809:3: error: ?TraceTime? was not declared in this scope; did you mean ?traceid?? > 2809 | TraceTime timer(timer_msg, TRACETIME_LOG(Info, startuptime)); > | ^~~~~~~~~ > | traceid > > * All command lines available in /tmp/test123/build-release/make-support/failure-logs. > === End of repeated output === @shqking Thanks for reporting this problem. I reproduced the same failure on aarch64 and have noticed that the problem also arises on arm. File `sharedRuntime_aarch64.cpp` needs to include the header that declares class TraceTime as does `sharedRuntime_arm.cpp`. I raised JDK-8340793 and will push a patch to fix it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20566#issuecomment-2370806327 From aph at openjdk.org Tue Sep 24 10:09:40 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 24 Sep 2024 10:09:40 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v11] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Mon, 23 Sep 2024 12:22:55 GMT, Mikhail Ablakatov wrote: >> src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2887: >> >>> 2885: f(0b10, 23, 22), f(index & 1, 21), rf(Vm, 16), f(op2, 15, 12), f(index >> 1, 11); \ >>> 2886: } \ >>> 2887: f(0, 10), rf(Vn, 5), rf(Vd, 0); \ >> >> Suggestion: >> >> #define INSN(NAME, op1, op2) \ >> void NAME(FloatRegister Vd, SIMD_Arrangement T, FloatRegister Vn, FloatRegister Vm, int index) { \ >> starti; \ >> assert(T == T4H || T == T8H || T == T2S || T == T4S, "invalid arrangement"); \ >> assert(index >= 0 && \ >> ((T == T2S && index <= 1) || (T != T2S && index <= 3) || (T == T8H && index <= 7)), \ >> "invalid index"); \ >> assert((T != T4H && T != T8H) || Vm->encoding() < 16, "invalid source SIMD&FP register"); \ >> f(0, 31), f((int)T & 1, 30), f(op1, 29), f(0b01111, 28, 24), f(0b01, 23, 22); \ >> if (T == T4H || T == T8H) { \ >> f(index & 0b11, 21, 20), lrf(Vm, 16); \ >> } else { \ >> f(index & 1, 21), rf(Vm, 16); \ >> } \ >> f(op2, 15, 12), f(index >> 1, 11), f(0, 10), rf(Vn, 5), rf(Vd, 0); \ >> >> I think it's a bit easier to see what's going on here if we lose the duplicated code. > > Looks like that's incorrect: the 22th-23th bits and 11th bits differ. It's untested. What I'm trying to say is that we shouldn't duplicate stuff. Perhaps I should have been clearer. Separate the fields into what is the same, and what is different. Put the different inside the if. Put the common outside the if. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1773049080 From duke at openjdk.org Tue Sep 24 10:17:18 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Tue, 24 Sep 2024 10:17:18 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v14] In-Reply-To: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: <_FQX9bjvQ0oKFXqCWA0kQmqFh4Ffvfcp_hQVkxjSWTA=.3caf10c7-27b5-4922-9887-effc4c147030@github.com> > Hello, > > Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. > > The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. > > At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. > > # Performance > > ## Neoverse N1 > > > -------------------------------------------------------------------------------------------- > Version Baseline This patch > -------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > -------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op > ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op > ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op > ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op > ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op > ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op > ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op > ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op > ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op > ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op > ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op > ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op > ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op > ArraysHashCode.multibytes 10 avgt 15 5.4... Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: cleanup: fix a comment typo Co-authored-by: Andrew Haley ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18487/files - new: https://git.openjdk.org/jdk/pull/18487/files/091eecc5..b56be377 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=12-13 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/18487.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18487/head:pull/18487 PR: https://git.openjdk.org/jdk/pull/18487 From aph at openjdk.org Tue Sep 24 10:17:19 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 24 Sep 2024 10:17:19 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v13] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Mon, 23 Sep 2024 15:55:00 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request incrementally with two additional commits since the last revision: > > - Add Assembler::load() and Assembler::store() methods > - fixup: make Windows happy there's no potentially lossfull conversion src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 92: > 90: > 91: // large_arrays_hashcode(T_INT) performs worse than the scalar loop below when the Neon loop > 92: // implemented by the stub executes just once. Call the stub only if at least two iteration will Suggestion: // implemented by the stub executes just once. Call the stub only if at least two iterations will ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1773058819 From aph at openjdk.org Tue Sep 24 10:22:42 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 24 Sep 2024 10:22:42 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v5] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Thu, 22 Aug 2024 13:48:57 GMT, Andrew Dinn wrote: >> Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: >> >> cleanup: use a constexpr function for intpow instead of a templated class > > Oh, and I should have said: very nice work! It's all looking good to me. Ping @adinn for a second review, and we're good to go. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2370866520 From duke at openjdk.org Tue Sep 24 10:31:45 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Tue, 24 Sep 2024 10:31:45 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v11] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Tue, 24 Sep 2024 10:06:31 GMT, Andrew Haley wrote: >> Looks like that's incorrect: the 22th-23th bits and 11th bits differ. > > It's untested. What I'm trying to say is that we shouldn't duplicate stuff. Perhaps I should have been clearer. > > Separate the fields into what is the same, and what is different. Put the different inside the if. Put the common outside the if. The only matching field out of the sequence of four under the if/else is `f(op2, 15, 12)`, but OK, I can hoist it if this makes things clearer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1773078306 From adinn at openjdk.org Tue Sep 24 11:07:44 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 24 Sep 2024 11:07:44 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v5] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Mon, 16 Sep 2024 17:50:19 GMT, Mikhail Ablakatov wrote: >> This is what I'm seeing now. Scorching fast with large blocks, poor with smaller ones. >> >> Benchmark (size) Mode Cnt Score Error Units >> ArraysHashCode.bytes 1 avgt 5 0.532 ? 0.036 ns/op >> ArraysHashCode.bytes 2 avgt 5 0.812 ? 0.011 ns/op >> ArraysHashCode.bytes 4 avgt 5 1.104 ? 0.020 ns/op >> ArraysHashCode.bytes 8 avgt 5 2.136 ? 0.032 ns/op >> ArraysHashCode.bytes 12 avgt 5 3.596 ? 0.061 ns/op >> ArraysHashCode.bytes 16 avgt 5 5.278 ? 0.240 ns/op >> ArraysHashCode.bytes 20 avgt 5 7.390 ? 0.043 ns/op >> ArraysHashCode.bytes 24 avgt 5 9.606 ? 0.059 ns/op >> ArraysHashCode.bytes 28 avgt 5 12.144 ? 0.064 ns/op >> ArraysHashCode.bytes 32 avgt 5 3.898 ? 0.096 ns/op >> ArraysHashCode.bytes 36 avgt 5 4.468 ? 0.113 ns/op >> ArraysHashCode.bytes 40 avgt 5 4.481 ? 0.082 ns/op >> ArraysHashCode.bytes 44 avgt 5 5.143 ? 0.060 ns/op >> ArraysHashCode.bytes 48 avgt 5 6.727 ? 0.103 ns/op >> ArraysHashCode.bytes 52 avgt 5 8.844 ? 0.029 ns/op >> ArraysHashCode.bytes 56 avgt 5 11.108 ? 0.108 ns/op >> ArraysHashCode.bytes 60 avgt 5 13.864 ? 0.071 ns/op >> ArraysHashCode.bytes 64 avgt 5 5.796 ? 0.146 ns/op > > Hi @theRealAph , > > I've updated the implementation so that arrays with 8 or more elements are now handled by the Neon stub. You can find a performance comparison below. There are significant performance improvements for relatively short arrays, from 16 elements long and above. To keep the change concise, I chose not to introduce new stubs for handling special cases like arrays that are 8-15 elements long. Adding the code you referenced in the quote below to the inlined intrinsic would significantly increase code size of the inlined portion so it was kept as is. > >> - Maybe replace the serial tail-handling iteration with the 4-wide vectorized version which you presented earlier. > > While I was at it, I also noticed that we can handle `short`/`char` arrays using `T8H` arrangement instead of `T4H`. During development, I found that this further improves the performance for these types. > > Below are the benchmark results for different data types collected on a Neoverse-V2 CPU. The graphs use GB/s as a metric, so higher values indicate better performance. For detailed JMH outputs, please see the attached files. bfa9369 represents the current state of this PR, and 31dc328 represents its previous state. > > Thank you for your suggestions! I look forward to your feedback on these updates. > > ![bytes](https://github.com/user-attachments/assets/1f58f6db-be82-4a7c-95fc-5c190381c9c2) > ![shorts](https://github.com/user-attachments/assets/71f26f55-c9b1-4009-b1af-15db904b4f87) > ![ints](https://github.com/user-attachments/assets/5e6651f9-0a0f-419d-ae10-9c7cdd2e3254) > > [ArraysHashCode-v2-31dc328.txt](https://github.com/user-attachments/files/17017053/ArraysHashCode-v2-31dc328.txt) > [ArraysHashCode-v2-bfa9369.txt](https://github.com/user-attachments/files/17017054/ArraysHashCode-v2-bfa9369.txt) @mikabl-arm I'm re-reviewing this now. I will let you know asap whether anything more needs doing before pushing. We also need to see that the tests pass. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2370950137 From shade at openjdk.org Tue Sep 24 11:19:09 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 24 Sep 2024 11:19:09 GMT Subject: RFR: 8340181: Shenandoah: Cleanup ShenandoahRuntime stubs Message-ID: Noticed this while working on Leyden, which has to enumerate Shenandoah stubs for code archival to work. `ShenandoahRuntime::shenandoah_clone_barrier` is excessive name. `ShenandoahRuntime::arraycopy_barrier_oop_entry` and friends is not covered by `JRT_LEAF`. This change hopefully homogenizes the namings for the stubs. Additional testing: - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` ------------- Commit messages: - More build fixes - Fix Changes: https://git.openjdk.org/jdk/pull/21152/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21152&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340181 Stats: 71 lines in 9 files changed: 7 ins; 11 del; 53 mod Patch: https://git.openjdk.org/jdk/pull/21152.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21152/head:pull/21152 PR: https://git.openjdk.org/jdk/pull/21152 From ogillespie at openjdk.org Tue Sep 24 11:24:39 2024 From: ogillespie at openjdk.org (Oli Gillespie) Date: Tue, 24 Sep 2024 11:24:39 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v3] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 11:28:10 GMT, Oli Gillespie wrote: >> Mitigate the impact of JVM_StartThread on safepoint synchronization, by adding a new ThreadStart_lock which limits the number of JVM_StartThread invocations competing for Threads_lock at any given time to 1. >> This gives a VM thread trying to call a safepoint a much better chance of acquiring Threads_lock when there are many JVM_StartThread invocations in flight, at the cost of one extra lock/unlock for every new thread. >> >> Can be disabled with new diagnostic flag `-XX:-UseExtraThreadStartLock`. >> >> Before (ThreadStartTtsp.java is shared in JDK-8340547): >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 1291591 ns >> Reaching safepoint: 59962 ns >> Reaching safepoint: 1958065 ns >> Reaching safepoint: 14456666258 ns <-- 14 seconds! >> ... >> >> >> After: >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 214269 ns >> Reaching safepoint: 60253 ns >> Reaching safepoint: 2040680 ns >> Reaching safepoint: 3089284 ns >> Reaching safepoint: 2998303 ns >> Reaching safepoint: 4433713 ns <-- 4.4ms >> Reaching safepoint: 3368436 ns >> Reaching safepoint: 2986519 ns >> Reaching safepoint: 3269102 ns >> ... >> >> >> >> **Alternatives** >> >> I considered some other options for mitigating this. For example, could we reduce the time spent holding the lock in StartThread? Most of the time is spent managing the threads list for ThreadSMR support, and each time we add a thread to that list we need to copy the whole list and free every entry in the original, which is slow. But I didn't see an easy way to avoid this. >> I also looked at some kind of signal from the VM thread that it is ready to start synchronizing that StartThread could check before trying to grab Threads_lock, but I didn't find anything better than this extra lock. > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Fix lock ranking The same kind of issue happens on thread exit, via `JavaThread::post_run -> JavaThread::exit -> Threads::remove -> Threads_lock.lock`, so I will update the implementation to cover both with the same approach. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21111#issuecomment-2370980518 From rkennke at openjdk.org Tue Sep 24 11:42:30 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 24 Sep 2024 11:42:30 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v24] In-Reply-To: References: Message-ID: <7N9vxRKxAK2GCBNlnU5E0Bj0sGV6_T-2QX9fKCCxlWg=.bdee038b-cee3-4c52-825c-d381d3616092@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Improve matching of loadNKlassCompactHeaders on aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/0d8a9236..2c4a7877 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=22-23 Stats: 17 lines in 3 files changed: 5 ins; 5 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From dnsimon at openjdk.org Tue Sep 24 11:55:44 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 24 Sep 2024 11:55:44 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent Message-ID: This PR replaces some uses of `UseJVMCICompiler` with `EnableJVMCI` so that JVMCI code paths are taken when JVMCI is only used for non-CompilerBroker compilations. ------------- Commit messages: - Merge branch 'master' into JDK-8340576 - fix incorrect code in oopMap.inline.hpp - since UseJVMCICompiler implies EnableJVMCI, remove the latter from conjunctive tests of both - disentangle EnableJVMCI and UseJVMCICompiler Changes: https://git.openjdk.org/jdk/pull/21120/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21120&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340576 Stats: 15 lines in 8 files changed: 2 ins; 4 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/21120.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21120/head:pull/21120 PR: https://git.openjdk.org/jdk/pull/21120 From duke at openjdk.org Tue Sep 24 11:55:44 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Tue, 24 Sep 2024 11:55:44 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent In-Reply-To: References: Message-ID: On Sun, 22 Sep 2024 11:51:20 GMT, Doug Simon wrote: > This PR replaces some uses of `UseJVMCICompiler` with `EnableJVMCI` so that JVMCI code paths are taken when JVMCI is only used for non-CompilerBroker compilations. Marked as reviewed by tzezula at github.com (no known OpenJDK username). Looks good to me. ------------- PR Review: https://git.openjdk.org/jdk/pull/21120#pullrequestreview-2321351214 PR Comment: https://git.openjdk.org/jdk/pull/21120#issuecomment-2367447737 From dnsimon at openjdk.org Tue Sep 24 11:55:46 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 24 Sep 2024 11:55:46 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent In-Reply-To: <5qxqeWQofN__IQVFSrm3Jt6NFgbcxPFmbLTr_ivQmX0=.878e9c33-fe64-49b6-9704-500bc12caa7f@github.com> References: <5qxqeWQofN__IQVFSrm3Jt6NFgbcxPFmbLTr_ivQmX0=.878e9c33-fe64-49b6-9704-500bc12caa7f@github.com> Message-ID: On Sun, 22 Sep 2024 14:28:46 GMT, Yudi Zheng wrote: >> This PR replaces some uses of `UseJVMCICompiler` with `EnableJVMCI` so that JVMCI code paths are taken when JVMCI is only used for non-CompilerBroker compilations. > > src/hotspot/share/jvmci/jvmci_globals.cpp line 83: > >> 81: CHECK_NOT_SET(LibJVMCICompilerThreadHidden, UseJVMCICompiler) >> 82: >> 83: if ((UseJVMCICompiler || EnableJVMCI) && > > Doesn't `UseJVMCICompiler` require `EnableJVMCI`? No. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21120#discussion_r1770577425 From dnsimon at openjdk.org Tue Sep 24 11:55:46 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 24 Sep 2024 11:55:46 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent In-Reply-To: References: <5qxqeWQofN__IQVFSrm3Jt6NFgbcxPFmbLTr_ivQmX0=.878e9c33-fe64-49b6-9704-500bc12caa7f@github.com> Message-ID: On Mon, 23 Sep 2024 07:34:03 GMT, Tom?? Zezula wrote: >> No. > > This will conflict with my change https://github.com/openjdk/jdk/pull/21069/commits/b75504633bc0f4fcecc1c08552e556b26d9ffbb9. No problem - I'll resolve it once your PR is merged. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21120#discussion_r1770898600 From yzheng at openjdk.org Tue Sep 24 11:55:46 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 24 Sep 2024 11:55:46 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent In-Reply-To: References: Message-ID: <5qxqeWQofN__IQVFSrm3Jt6NFgbcxPFmbLTr_ivQmX0=.878e9c33-fe64-49b6-9704-500bc12caa7f@github.com> On Sun, 22 Sep 2024 11:51:20 GMT, Doug Simon wrote: > This PR replaces some uses of `UseJVMCICompiler` with `EnableJVMCI` so that JVMCI code paths are taken when JVMCI is only used for non-CompilerBroker compilations. src/hotspot/share/jvmci/jvmci_globals.cpp line 83: > 81: CHECK_NOT_SET(LibJVMCICompilerThreadHidden, UseJVMCICompiler) > 82: > 83: if ((UseJVMCICompiler || EnableJVMCI) && Doesn't `UseJVMCICompiler` require `EnableJVMCI`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21120#discussion_r1770567413 From dnsimon at openjdk.org Tue Sep 24 11:55:45 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 24 Sep 2024 11:55:45 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 17:05:21 GMT, Tom Rodriguez wrote: >> This PR replaces some uses of `UseJVMCICompiler` with `EnableJVMCI` so that JVMCI code paths are taken when JVMCI is only used for non-CompilerBroker compilations. > > src/hotspot/share/compiler/oopMap.inline.hpp line 69: > >> 67: >> 68: #ifndef COMPILER2 >> 69: COMPILER1_PRESENT(ShouldNotReachHere();) > > As I noted in a private comment, I think this logic is simply wrong for JVMCI but since we never build JVMCI without C2 we never encounter it. Derived oops should only be encountered if C2 in available or if EnableJVMCI is true. I don't really understand the `COMPILER1_PRESENT` guard here either. It seems like it should be more like: > > #ifndef COMPILER2 > #if INCLUDE_JVMCI > if (!EnableJVMCI) > #endif > ShouldNotReachHere(); > #endif // !COMPILER2 Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21120#discussion_r1772737069 From duke at openjdk.org Tue Sep 24 11:55:46 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Tue, 24 Sep 2024 11:55:46 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent In-Reply-To: References: <5qxqeWQofN__IQVFSrm3Jt6NFgbcxPFmbLTr_ivQmX0=.878e9c33-fe64-49b6-9704-500bc12caa7f@github.com> Message-ID: On Sun, 22 Sep 2024 15:20:22 GMT, Doug Simon wrote: >> src/hotspot/share/jvmci/jvmci_globals.cpp line 83: >> >>> 81: CHECK_NOT_SET(LibJVMCICompilerThreadHidden, UseJVMCICompiler) >>> 82: >>> 83: if ((UseJVMCICompiler || EnableJVMCI) && >> >> Doesn't `UseJVMCICompiler` require `EnableJVMCI`? > > No. This will conflict with my change https://github.com/openjdk/jdk/pull/21069/commits/b75504633bc0f4fcecc1c08552e556b26d9ffbb9. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21120#discussion_r1770895396 From never at openjdk.org Tue Sep 24 11:55:45 2024 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 24 Sep 2024 11:55:45 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent In-Reply-To: References: Message-ID: On Sun, 22 Sep 2024 11:51:20 GMT, Doug Simon wrote: > This PR replaces some uses of `UseJVMCICompiler` with `EnableJVMCI` so that JVMCI code paths are taken when JVMCI is only used for non-CompilerBroker compilations. src/hotspot/share/compiler/oopMap.inline.hpp line 69: > 67: > 68: #ifndef COMPILER2 > 69: COMPILER1_PRESENT(ShouldNotReachHere();) As I noted in a private comment, I think this logic is simply wrong for JVMCI but since we never build JVMCI without C2 we never encounter it. Derived oops should only be encountered if C2 in available or if EnableJVMCI is true. I don't really understand the `COMPILER1_PRESENT` guard here either. It seems like it should be more like: #ifndef COMPILER2 #if INCLUDE_JVMCI if (!EnableJVMCI) #endif ShouldNotReachHere(); #endif // !COMPILER2 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21120#discussion_r1771805501 From luhenry at openjdk.org Tue Sep 24 11:59:35 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 24 Sep 2024 11:59:35 GMT Subject: RFR: 8340643: RISC-V: Small refactoring for sub/subw macro-assembler routines In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 12:39:07 GMT, Gui Cao wrote: > Hi, please help review that, small refactoring for sub/subw macro-assembler routines. > > ### Testing > - [x] Run tier1 tests on SOPHON SG2042 (release) Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21135#pullrequestreview-2325005150 From duke at openjdk.org Tue Sep 24 12:19:40 2024 From: duke at openjdk.org (Yuri Gaevsky) Date: Tue, 24 Sep 2024 12:19:40 GMT Subject: RFR: 8324124: RISC-V: implement _vectorizedMismatch intrinsic In-Reply-To: References: Message-ID: <6WYfRw-VEN5QJf7HWBmC0_DG77qJuvJMlYRz600tvXo=.bce9de2a-d48b-4342-8b0d-b6d02a60ead8@github.com> On Wed, 7 Feb 2024 14:35:55 GMT, Yuri Gaevsky wrote: > Hello All, > > Please review these changes to enable the __vectorizedMismatch_ intrinsic on RISC-V platform with RVV instructions supported. > > Thank you, > -Yuri Gaevsky > > **Correctness checks:** > hotspot/jtreg/compiler/{intrinsic/c1/c2}/ under QEMU-8.1 with RVV v1.0.0 and -XX:TieredStopAtLevel=1/2/3/4. . ------------- PR Comment: https://git.openjdk.org/jdk/pull/17750#issuecomment-2371107494 From dholmes at openjdk.org Tue Sep 24 12:48:42 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 24 Sep 2024 12:48:42 GMT Subject: RFR: 8340363: Tag-specific default decorators for UnifiedLogging In-Reply-To: References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: On Tue, 24 Sep 2024 08:36:12 GMT, Ant?n Seoane wrote: > These defaults are not meant to target a specific selected output, so nothing different would occur. Sorry but that doesn't make sense. When you set the tagset defaults they are output agnostic, but once you set A+B on the command-line then that is associated with a specific output and so the decorators apply to that output. > I will push soon that will change this behaviour to "merge" the default decorators for A+B and C+D What does "merge" mean? union? intersection? I can't see how you can come up with rules that will universally make sense here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20988#issuecomment-2371173732 From dholmes at openjdk.org Tue Sep 24 12:53:37 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 24 Sep 2024 12:53:37 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v3] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 11:28:10 GMT, Oli Gillespie wrote: >> Mitigate the impact of JVM_StartThread on safepoint synchronization, by adding a new ThreadStart_lock which limits the number of JVM_StartThread invocations competing for Threads_lock at any given time to 1. >> This gives a VM thread trying to call a safepoint a much better chance of acquiring Threads_lock when there are many JVM_StartThread invocations in flight, at the cost of one extra lock/unlock for every new thread. >> >> Can be disabled with new diagnostic flag `-XX:-UseExtraThreadStartLock`. >> >> Before (ThreadStartTtsp.java is shared in JDK-8340547): >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 1291591 ns >> Reaching safepoint: 59962 ns >> Reaching safepoint: 1958065 ns >> Reaching safepoint: 14456666258 ns <-- 14 seconds! >> ... >> >> >> After: >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 214269 ns >> Reaching safepoint: 60253 ns >> Reaching safepoint: 2040680 ns >> Reaching safepoint: 3089284 ns >> Reaching safepoint: 2998303 ns >> Reaching safepoint: 4433713 ns <-- 4.4ms >> Reaching safepoint: 3368436 ns >> Reaching safepoint: 2986519 ns >> Reaching safepoint: 3269102 ns >> ... >> >> >> >> **Alternatives** >> >> I considered some other options for mitigating this. For example, could we reduce the time spent holding the lock in StartThread? Most of the time is spent managing the threads list for ThreadSMR support, and each time we add a thread to that list we need to copy the whole list and free every entry in the original, which is slow. But I didn't see an easy way to avoid this. >> I also looked at some kind of signal from the VM thread that it is ready to start synchronizing that StartThread could check before trying to grab Threads_lock, but I didn't find anything better than this extra lock. > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Fix lock ranking If you do this at exit too then my concerns have doubled. I'd want to see some broader benchmarking of this change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21111#issuecomment-2371186313 From duke at openjdk.org Tue Sep 24 12:54:24 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Tue, 24 Sep 2024 12:54:24 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v15] In-Reply-To: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: > Hello, > > Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. > > The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. > > At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. > > # Performance > > ## Neoverse N1 > > > -------------------------------------------------------------------------------------------- > Version Baseline This patch > -------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > -------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op > ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op > ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op > ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op > ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op > ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op > ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op > ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op > ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op > ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op > ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op > ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op > ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op > ArraysHashCode.multibytes 10 avgt 15 5.4... Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: cleanup: hoist a common statement out of the if/else block ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18487/files - new: https://git.openjdk.org/jdk/pull/18487/files/b56be377..a28bbcd3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=13-14 Stats: 16 lines in 1 file changed: 0 ins; 1 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/18487.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18487/head:pull/18487 PR: https://git.openjdk.org/jdk/pull/18487 From duke at openjdk.org Tue Sep 24 12:54:25 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Tue, 24 Sep 2024 12:54:25 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v11] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Tue, 24 Sep 2024 10:28:49 GMT, Mikhail Ablakatov wrote: >> It's untested. What I'm trying to say is that we shouldn't duplicate stuff. Perhaps I should have been clearer. >> >> Separate the fields into what is the same, and what is different. Put the different inside the if. Put the common outside the if. > > The only matching field out of the sequence of four under the if/else is `f(op2, 15, 12)`, but OK, I can hoist it if this makes things clearer. Fixed by https://github.com/openjdk/jdk/pull/18487/commits/a28bbcd3bbe7555e715b107b71c18efaf72a7ce7. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1773284185 From ogillespie at openjdk.org Tue Sep 24 13:10:36 2024 From: ogillespie at openjdk.org (Oli Gillespie) Date: Tue, 24 Sep 2024 13:10:36 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v3] In-Reply-To: References: Message-ID: <9YzNUVSFLXVc9hAuELoaEcUiQXn8H_vEOmTAbzwD69o=.e9404763-0548-4fd3-875e-a71c7d5e5ad7@github.com> On Mon, 23 Sep 2024 11:28:10 GMT, Oli Gillespie wrote: >> Mitigate the impact of JVM_StartThread on safepoint synchronization, by adding a new ThreadStart_lock which limits the number of JVM_StartThread invocations competing for Threads_lock at any given time to 1. >> This gives a VM thread trying to call a safepoint a much better chance of acquiring Threads_lock when there are many JVM_StartThread invocations in flight, at the cost of one extra lock/unlock for every new thread. >> >> Can be disabled with new diagnostic flag `-XX:-UseExtraThreadStartLock`. >> >> Before (ThreadStartTtsp.java is shared in JDK-8340547): >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 1291591 ns >> Reaching safepoint: 59962 ns >> Reaching safepoint: 1958065 ns >> Reaching safepoint: 14456666258 ns <-- 14 seconds! >> ... >> >> >> After: >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 214269 ns >> Reaching safepoint: 60253 ns >> Reaching safepoint: 2040680 ns >> Reaching safepoint: 3089284 ns >> Reaching safepoint: 2998303 ns >> Reaching safepoint: 4433713 ns <-- 4.4ms >> Reaching safepoint: 3368436 ns >> Reaching safepoint: 2986519 ns >> Reaching safepoint: 3269102 ns >> ... >> >> >> >> **Alternatives** >> >> I considered some other options for mitigating this. For example, could we reduce the time spent holding the lock in StartThread? Most of the time is spent managing the threads list for ThreadSMR support, and each time we add a thread to that list we need to copy the whole list and free every entry in the original, which is slow. But I didn't see an easy way to avoid this. >> I also looked at some kind of signal from the VM thread that it is ready to start synchronizing that StartThread could check before trying to grab Threads_lock, but I didn't find anything better than this extra lock. > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Fix lock ranking My targeted benchmarks show either significant improvements or negligible slowdown on a (typically, and arguably always if performance is key) non-critical path, which matches the theory since we're not adding any contention, only adding one extra monitor acquisition on a slow path. Hopefully that's enough to indicate reasonable performance, so we can finalize the implementation and then perhaps do some final benchmarking before integration? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21111#issuecomment-2371225840 From ogillespie at openjdk.org Tue Sep 24 14:06:17 2024 From: ogillespie at openjdk.org (Oli Gillespie) Date: Tue, 24 Sep 2024 14:06:17 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v4] In-Reply-To: References: Message-ID: > Mitigate the impact of JVM_StartThread on safepoint synchronization, by adding a new ThreadStart_lock which limits the number of JVM_StartThread invocations competing for Threads_lock at any given time to 1. > This gives a VM thread trying to call a safepoint a much better chance of acquiring Threads_lock when there are many JVM_StartThread invocations in flight, at the cost of one extra lock/unlock for every new thread. > > Can be disabled with new diagnostic flag `-XX:-UseExtraThreadStartLock`. > > Before (ThreadStartTtsp.java is shared in JDK-8340547): > > java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' > Reaching safepoint: 1291591 ns > Reaching safepoint: 59962 ns > Reaching safepoint: 1958065 ns > Reaching safepoint: 14456666258 ns <-- 14 seconds! > ... > > > After: > > java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' > Reaching safepoint: 214269 ns > Reaching safepoint: 60253 ns > Reaching safepoint: 2040680 ns > Reaching safepoint: 3089284 ns > Reaching safepoint: 2998303 ns > Reaching safepoint: 4433713 ns <-- 4.4ms > Reaching safepoint: 3368436 ns > Reaching safepoint: 2986519 ns > Reaching safepoint: 3269102 ns > ... > > > > **Alternatives** > > I considered some other options for mitigating this. For example, could we reduce the time spent holding the lock in StartThread? Most of the time is spent managing the threads list for ThreadSMR support, and each time we add a thread to that list we need to copy the whole list and free every entry in the original, which is slow. But I didn't see an easy way to avoid this. > I also looked at some kind of signal from the VM thread that it is ready to start synchronizing that StartThread could check before trying to grab Threads_lock, but I didn't find anything better than this extra lock. Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: Also address Thread::exit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21111/files - new: https://git.openjdk.org/jdk/pull/21111/files/fc48bbe4..42909653 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21111&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21111&range=02-03 Stats: 16 lines in 5 files changed: 3 ins; 3 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/21111.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21111/head:pull/21111 PR: https://git.openjdk.org/jdk/pull/21111 From ogillespie at openjdk.org Tue Sep 24 14:06:18 2024 From: ogillespie at openjdk.org (Oli Gillespie) Date: Tue, 24 Sep 2024 14:06:18 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v3] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 11:28:10 GMT, Oli Gillespie wrote: >> Mitigate the impact of JVM_StartThread on safepoint synchronization, by adding a new ThreadStart_lock which limits the number of JVM_StartThread invocations competing for Threads_lock at any given time to 1. >> This gives a VM thread trying to call a safepoint a much better chance of acquiring Threads_lock when there are many JVM_StartThread invocations in flight, at the cost of one extra lock/unlock for every new thread. >> >> Can be disabled with new diagnostic flag `-XX:-UseExtraThreadStartLock`. >> >> Before (ThreadStartTtsp.java is shared in JDK-8340547): >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 1291591 ns >> Reaching safepoint: 59962 ns >> Reaching safepoint: 1958065 ns >> Reaching safepoint: 14456666258 ns <-- 14 seconds! >> ... >> >> >> After: >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 214269 ns >> Reaching safepoint: 60253 ns >> Reaching safepoint: 2040680 ns >> Reaching safepoint: 3089284 ns >> Reaching safepoint: 2998303 ns >> Reaching safepoint: 4433713 ns <-- 4.4ms >> Reaching safepoint: 3368436 ns >> Reaching safepoint: 2986519 ns >> Reaching safepoint: 3269102 ns >> ... >> >> >> >> **Alternatives** >> >> I considered some other options for mitigating this. For example, could we reduce the time spent holding the lock in StartThread? Most of the time is spent managing the threads list for ThreadSMR support, and each time we add a thread to that list we need to copy the whole list and free every entry in the original, which is slow. But I didn't see an easy way to avoid this. >> I also looked at some kind of signal from the VM thread that it is ready to start synchronizing that StartThread could check before trying to grab Threads_lock, but I didn't find anything better than this extra lock. > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Fix lock ranking Added `ThreadStopTtsp.java` reproducer to JDK-8340547 for the thread exit case, and updated the implementation here to cover both cases. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21111#issuecomment-2371385659 From mli at openjdk.org Tue Sep 24 14:11:05 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 24 Sep 2024 14:11:05 GMT Subject: RFR: 8340808: RISC-V: Client build fails after JDK-8339738 Message-ID: <2SNxwljy44SGeKeDA1yBPgr8UB6oNq41o4ScsDbr1j4=.8a0bc0de-a62c-47cc-be78-8864fb4d0187@github.com> Hi, Can you help to review this simple patch? Previously, the crc32 intrinsic (scalar version) was added for both c1/c2, then the vector version was added but depends on a global flag MaxVectorSize which is only valid in c2. This pr is to put all vector crc32 related code under COMPILER2 macro protection. I tested with/without --with-jvm-variants=client. Thanks! ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/21159/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21159&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340808 Stats: 10 lines in 2 files changed: 8 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21159.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21159/head:pull/21159 PR: https://git.openjdk.org/jdk/pull/21159 From mli at openjdk.org Tue Sep 24 14:56:28 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 24 Sep 2024 14:56:28 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v5] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > Thanks! > > This patch is based on https://github.com/openjdk/jdk/pull/20781 which added the sleef source (in particular the generated sleef inline headers). We use sleef api to vectorize the math operations in vector api. > > On machine with vector intrinsic support on riscv (e.g. gcc 14+) it will generate libsleef.so with the bridge functions to sleef api, otherwise without the bridge functions. > > ### Test > test/jdk/jdk/incubator/vector > > ### Performance > data on bananapi > > Benchmark - bananapi | (size) | Mode | Cnt | Score +intrinsic | Error +intrinsic | Score -intrinsic | Error -intrinsic | Units | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Double128Vector.ACOS | 1024 | avgt | 10 | 112444.388 | 655.761 | 208554.742 | 1508.709 | ns/op | 1.855 > Double128Vector.ASIN | 1024 | avgt | 10 | 104121.259 | 243.167 | 208314.499 | 2833.61 | ns/op | 2.001 > Double128Vector.ATAN | 1024 | avgt | 10 | 136941.263 | 243.486 | 284024.53 | 2204.224 | ns/op | 2.074 > Double128Vector.ATAN2 | 1024 | avgt | 10 | 163228.681 | 435.455 | 427589.587 | 3045.192 | ns/op | 2.62 > Double128Vector.CBRT | 1024 | avgt | 10 | 146395.753 | 239.355 | 317136.654 | 1330.869 | ns/op | 2.166 > Double128Vector.COS | 1024 | avgt | 10 | 154865.298 | 235.697 | 305721.518 | 1319.313 | ns/op | 1.974 > Double128Vector.COSH | 1024 | avgt | 10 | 189212.943 | 262.399 | 220756.27 | 61324.863 | ns/op | 1.167 > Double128Vector.EXP | 1024 | avgt | 10 | 113941.594 | 219.647 | 252853.07 | 891.272 | ns/op | 2.219 > Double128Vector.EXPM1 | 1024 | avgt | 10 | 184552.939 | 513.715 | 254087.184 | 2144.997 | ns/op | 1.377 > Double128Vector.HYPOT | 1024 | avgt | 10 | 111580.194 | 423.282 | 374537.338 | 2091.811 | ns/op | 3.357 > Double128Vector.LOG | 1024 | avgt | 10 | 110680.548 | 192.731 | 265391.129 | 2653.519 | ns/op | 2.398 > Double128Vector.LOG10 | 1024 | avgt | 10 | 116708.105 | 167.095 | 285764.405 | 2489.08 | ns/op | 2.449 > Double128Vector.LOG1P | 1024 | avgt | 10 | 115633.302 | 567.7 | 317235.967 | 1062.848 | ns/op | 2.743 > Double128Vector.POW | 1024 | avgt | 10 | 321655.14 | 36.55 | 560765.066 | 2669.33 | ns/op | 1.743 > Double128Vector.... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: misc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21083/files - new: https://git.openjdk.org/jdk/pull/21083/files/32eb54d5..f190709a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21083&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21083&range=03-04 Stats: 97 lines in 4 files changed: 49 ins; 44 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21083.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21083/head:pull/21083 PR: https://git.openjdk.org/jdk/pull/21083 From mli at openjdk.org Tue Sep 24 14:56:29 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 24 Sep 2024 14:56:29 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v4] In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 06:59:58 GMT, Fei Yang wrote: >> Note in the RISC-V ELF psABI there is a convention variant for v-regs. >> If you add function attribute riscv_vector_cc it should be used for C/C++. (I never tested it) >> v0 = first vector mask argument >> v8-v23 = args/rets >> v1-v7/v24-v31 = caller saved >> >> https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc > > Ah, I think I missed that. I was reading psABI spec 1.0 release. Thanks for this info. Thanks Robbin for helping explaining! minor correction: v1-v7/v24-v31 = callee saved ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1773530570 From mli at openjdk.org Tue Sep 24 14:56:30 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 24 Sep 2024 14:56:30 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v4] In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 05:57:08 GMT, Robbin Ehn wrote: >> src/hotspot/cpu/riscv/riscv.ad line 10078: >> >>> 10076: match(CallLeafVector); >>> 10077: >>> 10078: effect(USE meth); >> >> It's possible for the runtime call to clobber rFlagsReg `cr` (aka `t1`). So safer to add `KILL cr` to the effect. > > Good find, it will definitely be clobbered. Thanks for catching! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1773531542 From mli at openjdk.org Tue Sep 24 14:56:32 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 24 Sep 2024 14:56:32 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v4] In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 03:47:41 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> refine comment > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 6063: > >> 6061: >> 6062: void generate_vector_math_stubs() { >> 6063: if (UseRVV) { > > Seems to me cleaner to do this: > > if (UseRVV) { > generate_vector_math_stubs(); > } As there are several log output in the method, I think it might be better to put them together in this method. But I made some modification so the indentation looks better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1773531189 From rehn at openjdk.org Tue Sep 24 15:00:40 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 24 Sep 2024 15:00:40 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v4] In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 14:52:39 GMT, Hamlin Li wrote: >> Ah, I think I missed that. I was reading psABI spec 1.0 release. Thanks for this info. > > Thanks Robbin for helping explaining! > > minor correction: v1-v7/v24-v31 = callee saved Yes, sorry what I meant, updated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1773539918 From jbhateja at openjdk.org Tue Sep 24 15:09:41 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 24 Sep 2024 15:09:41 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v13] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 19:24:51 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm >> >> Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup >> -- | -- | -- | -- >> MathBench.tanhDouble | 70900 | 95618 | 1.35x > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > change ifdef from x86 to AMD64 Marked as reviewed by jbhateja (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20657#pullrequestreview-2325597060 From duke at openjdk.org Tue Sep 24 15:14:40 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Tue, 24 Sep 2024 15:14:40 GMT Subject: RFR: 8340363: Tag-specific default decorators for UnifiedLogging In-Reply-To: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: <0B6RQxjbSpVqb_VL-B_GFQUkwhIP5KmhgW2FP5DfBL4=.296cd768-d198-446d-8c06-d6e33a415e6f@github.com> On Fri, 13 Sep 2024 09:03:55 GMT, Ant?n Seoane wrote: > Hi all, > > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: > > - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. > - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. > - Additionally, defaults may target a specific log level. > > Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. > > In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. > > Please consider this PR, and thanks! Oh, I think I misunderstood you there. In case we have some A+B defaults set, and we run something like `java -Xlog:A+B::uptime`, the "runtime"-specified decorators prevail and the defaults are not to be triggered (e.g. in this case we'd have only uptime decorators, as it has been set explicitly). With merge I mean the union. It's what @robcasloz suggested above and what most people I've talked to feel to understand to be more logical ------------- PR Comment: https://git.openjdk.org/jdk/pull/20988#issuecomment-2371590845 From duke at openjdk.org Tue Sep 24 15:14:47 2024 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 24 Sep 2024 15:14:47 GMT Subject: Integrated: 8338694: x86_64 intrinsic for tanh using libm In-Reply-To: References: Message-ID: On Wed, 21 Aug 2024 00:25:03 GMT, Srinivas Vamsi Parasa wrote: > The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm > > Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup > -- | -- | -- | -- > MathBench.tanhDouble | 70900 | 95618 | 1.35x This pull request has now been integrated. Changeset: 212e3293 Author: vamsi-parasa URL: https://git.openjdk.org/jdk/commit/212e32931cafe446d94219d6c3ffd92261984dff Stats: 980 lines in 26 files changed: 970 ins; 0 del; 10 mod 8338694: x86_64 intrinsic for tanh using libm Reviewed-by: kvn, jbhateja, sgibbons, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/20657 From fyang at openjdk.org Tue Sep 24 15:16:38 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 24 Sep 2024 15:16:38 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v4] In-Reply-To: References: Message-ID: <-QJaY5cvW3qGmd5Nj9XT4ibBFFrexIfDKzF3JgEKbtg=.b1ce48dc-fcc5-496b-98bc-58dc7ac13308@github.com> On Tue, 24 Sep 2024 14:57:48 GMT, Robbin Ehn wrote: >> Thanks Robbin for helping explaining! >> >> minor correction: v1-v7/v24-v31 = callee saved > > Yes, sorry what I meant, updated. Then why would we put a constraint on the number of supported argument vector registers here (v8-v15 instead of v8-v23)? Could we just support all of them, i.e., v8-v23 to comply with the RISC-V psABI? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1773564555 From coleenp at openjdk.org Tue Sep 24 15:40:55 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 24 Sep 2024 15:40:55 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v23] In-Reply-To: References: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com> Message-ID: On Fri, 20 Sep 2024 18:11:43 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'lilliput/JEP-450-temporary-fix-branch-2' into JDK-8305895-v4 >> - review feedback > > src/hotspot/share/memory/metaspace/metablock.hpp line 74: > >> 72: #define METABLOCKFORMATARGS(__block__) p2i((__block__).base()), (__block__).word_size() >> 73: >> 74: } // namespace metaspace > > I am wondering if some of these metaspace changes, that is, the addition of MetaBlock could be upstreamed ahead of the CompactObjectHeaders. Some is refactoring so that you can use the wastage to allocate into class-arena but a lot of this seems neutral to compact object headers, and would reduce this patch and allow different people to focus on just this. For the record, I am fine with these metaspace changes going in with this PR if the timing for that is better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1773607587 From roland at openjdk.org Tue Sep 24 15:58:08 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 24 Sep 2024 15:58:08 GMT Subject: RFR: 8340824: C2: Memory for TypeInterfaces not reclaimed by hashcons() Message-ID: The list of interfaces for a TypeInterfaces is contained in a GrowableArray that's allocated in the type arena. When hashcons() deletes a TypeInterfaces object because an identical one exists, it can't reclaim memory for the object because it can only free the last thing that was allocated and that's the backing store for the GrowableArray, not the TypeInterfaces object. With this patch, when the destructor of a GrowableArray is called, an attempt is made to free memory for the backing store when it's allocated in an arena. In the case of TypeInterfaces, for the GrowableArray destructor to be called, I had to add a virtual destructor to the base class (Type). I noticed this while working on a fix for a separate issue that causes more Type objects to be created. With that prototype fix, TestScalarReplacementMaxLiveNodes uses 1.4GB of memory at peak. With the patch I propose here on top, memory usage goes down to 150-200 MB which is in line with the peak memory usage for TestScalarReplacementMaxLiveNodes when run with current master. ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/21163/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21163&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340824 Stats: 7 lines in 2 files changed: 7 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21163.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21163/head:pull/21163 PR: https://git.openjdk.org/jdk/pull/21163 From adinn at openjdk.org Tue Sep 24 16:07:46 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 24 Sep 2024 16:07:46 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v14] In-Reply-To: <_FQX9bjvQ0oKFXqCWA0kQmqFh4Ffvfcp_hQVkxjSWTA=.3caf10c7-27b5-4922-9887-effc4c147030@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <_FQX9bjvQ0oKFXqCWA0kQmqFh4Ffvfcp_hQVkxjSWTA=.3caf10c7-27b5-4922-9887-effc4c147030@github.com> Message-ID: On Tue, 24 Sep 2024 10:17:18 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: > > cleanup: fix a comment typo > > Co-authored-by: Andrew Haley A few small nits to address but otherwise good. src/hotspot/cpu/aarch64/aarch64.ad line 16608: > 16606: %} > 16607: > 16608: instruct arrays_hashcode(iRegP_R1 ary, iRegI_R2 cnt, iRegI_R0 result, immI basic_type, I'm not sure why `arrays_hashcode` uses the plural, ditto for the macroassembler method name and stub name/stub generator method name. Other instructions and stubs use the singular e.g. `instruction array_equal_NNN`, `generate_arraycopy_stubs` etc. It would be better to follow that by systematically renaming all occurrences of `arrays_hashcode` to `array_hashcode`. src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.hpp line 39: > 37: // Helper functions for arrays_hashcode. > 38: void arrays_hashcode_elload(Register dst, Address src, BasicType eltype); > 39: int arrays_hashcode_elsize(BasicType eltype); The above two methods don't seem to exist any more? src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5439: > 5437: case Assembler::T8B: > 5438: case Assembler::T4H: > 5439: case Assembler::T8H: With the current code we should never see T4H as a value for load_arrangement. Is there a reason why you are folding it into the same case handling block as T8B and T8H rather than into the default block? If not then best to remove that case here and below. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5477: > 5475: > 5476: assert(is_power_of_2(vf), "can't use this value to calculate the jump target PC"); > 5477: __ andr(rscratch2, cnt, vf - 1); It would probably be helpful to include here a repeat of the comment you added to the macroassembler method explaining how this deals with the correct number of leftover elements modulo `vf` ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18487#pullrequestreview-2325580547 PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1773570214 PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1773545405 PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1773592005 PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1773627876 From duke at openjdk.org Tue Sep 24 16:21:02 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Tue, 24 Sep 2024 16:21:02 GMT Subject: RFR: 8340363: Tag-specific default decorators for UnifiedLogging [v2] In-Reply-To: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: > Hi all, > > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: > > - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. > - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. > - Additionally, defaults may target a specific log level. > > Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. > > In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. > > Please consider this PR, and thanks! Ant?n Seoane has updated the pull request incrementally with two additional commits since the last revision: - Test adaptations to new focus - Grouping all defaults together ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20988/files - new: https://git.openjdk.org/jdk/pull/20988/files/5e0b45bc..af0b27be Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20988&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20988&range=00-01 Stats: 44 lines in 7 files changed: 7 ins; 9 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/20988.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20988/head:pull/20988 PR: https://git.openjdk.org/jdk/pull/20988 From duke at openjdk.org Tue Sep 24 16:37:17 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Tue, 24 Sep 2024 16:37:17 GMT Subject: RFR: 8340363: Tag-specific default decorators for UnifiedLogging [v3] In-Reply-To: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: <-A9Xv_OvbHZLre0zN7Lsf_1pZVMvfnfruSo1XT-AGtA=.b67ca356-833e-47a4-8912-e141fc99afb3@github.com> > Hi all, > > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: > > - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. > - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. > - Additionally, defaults may target a specific log level. > > Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. > > In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. > > Please consider this PR, and thanks! Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: Initialization of _decorators field in logDecorators ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20988/files - new: https://git.openjdk.org/jdk/pull/20988/files/af0b27be..ee24f637 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20988&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20988&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20988.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20988/head:pull/20988 PR: https://git.openjdk.org/jdk/pull/20988 From duke at openjdk.org Tue Sep 24 16:43:57 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Tue, 24 Sep 2024 16:43:57 GMT Subject: RFR: 8340363: Tag-specific default decorators for UnifiedLogging [v4] In-Reply-To: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: > Hi all, > > Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. > > To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: > > - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. > - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. > - Additionally, defaults may target a specific log level. > > Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. > > In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. > > Please consider this PR, and thanks! Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: Removed whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20988/files - new: https://git.openjdk.org/jdk/pull/20988/files/ee24f637..aa47a627 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20988&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20988&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20988.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20988/head:pull/20988 PR: https://git.openjdk.org/jdk/pull/20988 From lmesnik at openjdk.org Tue Sep 24 18:43:44 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 24 Sep 2024 18:43:44 GMT Subject: RFR: 8340826: Should not send unload notification for scratch classes Message-ID: The jvmti class redefinition creates temporary scratch classes for it's own purposes. These classes are added to corresponding classloaders and might be unloaded. In this case the jvmti/jfr and log events are generated twice: for original class and for it's scratch. The bug could be reproduced by jfr test jdk/jfr/api/metadata/eventtype/TestUnloadingEventClass.java with '-Xcomp -XX:TieredStopAtLevel=1' or with '-Xcomp' The test log (modified slightly) shown [167.294s][info ][class,unload] unloading class jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded 0x00000000af1006d8 allocated [167.294s][info ][class,unload] unloading class jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded 0x00000000af100248 fully_initialized [167.345s][trace][class,unload] unlinking class (subclass): jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded [167.872s][trace][gc ] GC(0) Restored 3597 marks, occupying 57552 B [167.924s][info ][gc ] GC(0) Pause Full (System.gc()) 34M->2M(136M) 691.041ms Unloaded count: 2 instead of expected [159.737s][info ][class,unload] unloading class jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded 0x0000000041100248 state: fully_initialized [159.800s][trace][class,unload] unlinking class (subclass): jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded [160.341s][trace][gc ] GC(0) Restored 3597 marks, occupying 57552 B [160.384s][info ][gc ] GC(0) Pause Full (System.gc()) 34M->2M(136M) 710.422ms The test hang because got 2 events while waiting for one. The "allocated" version is the scratch class generated by JVMTI JFR agent that redefine classes. The fix is to don't send notification for scratch classes. The scratch classes shouldn't have dependency so added assertion. Also, we don't expect any other not loaded classes during unloaded. Thanks Coleen for details about scratch classed. Tested with tier1-5 and with :jdk_jfr with Xcomp and c1. ------------- Commit messages: - 8340826: should not send unload notification for scratch classes Changes: https://git.openjdk.org/jdk/pull/21166/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21166&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340826 Stats: 17 lines in 4 files changed: 15 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21166.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21166/head:pull/21166 PR: https://git.openjdk.org/jdk/pull/21166 From mli at openjdk.org Tue Sep 24 19:12:53 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 24 Sep 2024 19:12:53 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v6] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > Thanks! > > This patch is based on https://github.com/openjdk/jdk/pull/20781 which added the sleef source (in particular the generated sleef inline headers). We use sleef api to vectorize the math operations in vector api. > > On machine with vector intrinsic support on riscv (e.g. gcc 14+) it will generate libsleef.so with the bridge functions to sleef api, otherwise without the bridge functions. > > ### Test > test/jdk/jdk/incubator/vector > > ### Performance > data on bananapi > > Benchmark - bananapi | (size) | Mode | Cnt | Score +intrinsic | Error +intrinsic | Score -intrinsic | Error -intrinsic | Units | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Double128Vector.ACOS | 1024 | avgt | 10 | 112444.388 | 655.761 | 208554.742 | 1508.709 | ns/op | 1.855 > Double128Vector.ASIN | 1024 | avgt | 10 | 104121.259 | 243.167 | 208314.499 | 2833.61 | ns/op | 2.001 > Double128Vector.ATAN | 1024 | avgt | 10 | 136941.263 | 243.486 | 284024.53 | 2204.224 | ns/op | 2.074 > Double128Vector.ATAN2 | 1024 | avgt | 10 | 163228.681 | 435.455 | 427589.587 | 3045.192 | ns/op | 2.62 > Double128Vector.CBRT | 1024 | avgt | 10 | 146395.753 | 239.355 | 317136.654 | 1330.869 | ns/op | 2.166 > Double128Vector.COS | 1024 | avgt | 10 | 154865.298 | 235.697 | 305721.518 | 1319.313 | ns/op | 1.974 > Double128Vector.COSH | 1024 | avgt | 10 | 189212.943 | 262.399 | 220756.27 | 61324.863 | ns/op | 1.167 > Double128Vector.EXP | 1024 | avgt | 10 | 113941.594 | 219.647 | 252853.07 | 891.272 | ns/op | 2.219 > Double128Vector.EXPM1 | 1024 | avgt | 10 | 184552.939 | 513.715 | 254087.184 | 2144.997 | ns/op | 1.377 > Double128Vector.HYPOT | 1024 | avgt | 10 | 111580.194 | 423.282 | 374537.338 | 2091.811 | ns/op | 3.357 > Double128Vector.LOG | 1024 | avgt | 10 | 110680.548 | 192.731 | 265391.129 | 2653.519 | ns/op | 2.398 > Double128Vector.LOG10 | 1024 | avgt | 10 | 116708.105 | 167.095 | 285764.405 | 2489.08 | ns/op | 2.449 > Double128Vector.LOG1P | 1024 | avgt | 10 | 115633.302 | 567.7 | 317235.967 | 1062.848 | ns/op | 2.743 > Double128Vector.POW | 1024 | avgt | 10 | 321655.14 | 36.55 | 560765.066 | 2669.33 | ns/op | 1.743 > Double128Vector.... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: use all arg v regs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21083/files - new: https://git.openjdk.org/jdk/pull/21083/files/f190709a..7719b5cf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21083&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21083&range=04-05 Stats: 4 lines in 2 files changed: 1 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21083.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21083/head:pull/21083 PR: https://git.openjdk.org/jdk/pull/21083 From mli at openjdk.org Tue Sep 24 19:12:53 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 24 Sep 2024 19:12:53 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v4] In-Reply-To: <-QJaY5cvW3qGmd5Nj9XT4ibBFFrexIfDKzF3JgEKbtg=.b1ce48dc-fcc5-496b-98bc-58dc7ac13308@github.com> References: <-QJaY5cvW3qGmd5Nj9XT4ibBFFrexIfDKzF3JgEKbtg=.b1ce48dc-fcc5-496b-98bc-58dc7ac13308@github.com> Message-ID: On Tue, 24 Sep 2024 15:12:29 GMT, Fei Yang wrote: >> Yes, sorry what I meant, updated. > > Then why would we put a constraint on the number of supported argument vector registers here (v8-v15 instead of v8-v23)? Could we just support all of them, i.e., v8-v23 to comply with the RISC-V psABI? There is no strong reason, just it's sufficient for current implementation. Maybe it's better to use them all in case in the future some other code touch the limit unnecessarily. I'll change it to use all arg v regs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1773897781 From kvn at openjdk.org Tue Sep 24 20:00:47 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 24 Sep 2024 20:00:47 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v25] In-Reply-To: References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Mon, 23 Sep 2024 07:54:39 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/share/opto/matcher.cpp line 1821: >> >>> 1819: if( rule >= _END_INST_CHAIN_RULE || rule < _BEGIN_INST_CHAIN_RULE ) { >>> 1820: assert(C->node_arena()->contains(s->_leaf) || !has_new_node(s->_leaf), >>> 1821: "duplicating node that's already been matched"); >> >> Why it was removed? > > The assertion was failing due to it being too strict in several cases where the matcher would generate valid code anyway. One of them is when `is_encode_and_store_pattern(n, m)` returns true but `m -> n` cannot be matched by a single `g1EncodePAndStoreN` instruction. Commit 9ad158b6 removes this case by ensuring that `is_encode_and_store_pattern(n, m)` holds only if `m -> n` can indeed be matched. > There are other cases (all of them harmless as far as I can see) in which this assertion can fail. I am investigating whether they can be avoided so that the assertion can be restored, and what would be the impact on the "redundant decompression removal" (`g1EncodePAndStoreN`) optimization. I think you can do that as follow up changes as separate RFE. I am fine with removal in this JEP code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1773999931 From psandoz at openjdk.org Tue Sep 24 20:08:44 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 24 Sep 2024 20:08:44 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v17] In-Reply-To: <-L7RYBQd-Q6zLkv5GKU0PDM2SZ-jdm1zAk1VRedDgyM=.c712848d-145b-4ecd-af2f-1a811832559d@github.com> References: <-L7RYBQd-Q6zLkv5GKU0PDM2SZ-jdm1zAk1VRedDgyM=.c712848d-145b-4ecd-af2f-1a811832559d@github.com> Message-ID: On Thu, 19 Sep 2024 06:53:15 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. >> >> >> . SUADD : Saturating unsigned addition. >> . SADD : Saturating signed addition. >> . SUSUB : Saturating unsigned subtraction. >> . SSUB : Saturating signed subtraction. >> . UMAX : Unsigned max >> . UMIN : Unsigned min. >> >> >> New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. >> >> As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. >> >> Summary of changes: >> - Java side implementation of new vector operators. >> - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. >> - C2 compiler IR and inline expander changes. >> - Optimized x86 backend implementation for new vector operators and their predicated counterparts. >> - Extends existing VectorAPI Jtreg test suite to cover new operations. >> >> Kindly review and share your feedback. >> >> Best Regards, >> PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. >> >> [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Tuning extra spaces. I sent a pull request to your branch https://github.com/jatin-bhateja/jdk/pull/5/files that moves the `VectorMath` test to the library area and updates it to be more like a library test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20507#issuecomment-2372268735 From mdoerr at openjdk.org Tue Sep 24 20:24:47 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 24 Sep 2024 20:24:47 GMT Subject: RFR: 8340843: [PPC64] Error: ShouldNotReachHere() in TemplateInterpreterGenerator::generate_math_entry after 8338694 Message-ID: [JDK-8338694](https://bugs.openjdk.org/browse/JDK-8338694) introduced `Interpreter::java_lang_math_tanh` which needs to be handled by the interpreter. Unfortunately, `SharedRuntime::dtanh` does not exist, so we need to fallback to the normal interpreter entry (as before JDK-8338694). ------------- Commit messages: - 8340843: [PPC64] Error: ShouldNotReachHere() in TemplateInterpreterGenerator::generate_math_entry after 8338694 Changes: https://git.openjdk.org/jdk/pull/21168/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21168&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340843 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21168.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21168/head:pull/21168 PR: https://git.openjdk.org/jdk/pull/21168 From ccheung at openjdk.org Tue Sep 24 21:20:16 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Tue, 24 Sep 2024 21:20:16 GMT Subject: RFR: 8328313: Archived module graph should allow identical --module-path to be specified during dump time and run time [v4] In-Reply-To: References: Message-ID: > Prior to this patch, if `--module-path` is specified in the command line: > during CDS dump time, full module graph will not be included in the CDS archive; > during run time, full module graph will not be used. > > With this patch, the full module graph will be included in the CDS archive with the `--module-path` option. During run time, if the same `--module-path` option is specified, the archived module graph will be used. > > The checking of module paths between dump time and run time is more lenient compared with the checking of class paths; the ordering of the modules is unimportant, duplicate module names are ignored. > E.g. the following is considered a match: > dump time runtime > m1,m2 m2,m1 > m1,m2 m1,m2,m2 > > I included some [notes](https://bugs.openjdk.org/browse/JDK-8328313?focusedId=14699275&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14699275) in the bug report regarding some changes in the corelib classes. Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: fix indentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21048/files - new: https://git.openjdk.org/jdk/pull/21048/files/61ffd1b2..661615cb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21048&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21048&range=02-03 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/21048.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21048/head:pull/21048 PR: https://git.openjdk.org/jdk/pull/21048 From ccheung at openjdk.org Tue Sep 24 21:20:16 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Tue, 24 Sep 2024 21:20:16 GMT Subject: RFR: 8328313: Archived module graph should allow identical --module-path to be specified during dump time and run time [v3] In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 05:06:50 GMT, David Holmes wrote: >> Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: >> >> trailing whitespace > > src/java.base/share/classes/jdk/internal/module/ModuleBootstrap.java line 481: > >> 479: cf, >> 480: clf, >> 481: mainModule); > > This was correctly aligned before, now it isn't. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1774104849 From duke at openjdk.org Tue Sep 24 22:02:34 2024 From: duke at openjdk.org (Todd V. Jonker) Date: Tue, 24 Sep 2024 22:02:34 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent In-Reply-To: References: Message-ID: On Sun, 22 Sep 2024 11:51:20 GMT, Doug Simon wrote: > This PR replaces some uses of `UseJVMCICompiler` with `EnableJVMCI` so that JVMCI code paths are taken when JVMCI is only used for non-CompilerBroker compilations. Studying these recent changes led me back to #14851 which added jtreg propeties: - `jdk.hasLibgraal`: the libgraal shared library file is present - `vm.libgraal.enabled`: libgraal is used as JIT compiler The latter now feels misleading, since libgraal can be "enabled" for use by non-CompilerBroker compilations, without being used as the JIT compiler. (I'm here b/c we're assembling a distro doing exactly that.) Would it make sense to rename the latter, to reduce ambiguity in the tests? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21120#issuecomment-2372462365 From dnsimon at openjdk.org Tue Sep 24 22:12:34 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 24 Sep 2024 22:12:34 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 22:00:02 GMT, Todd V. Jonker wrote: > Would it make sense to rename the latter, to reduce ambiguity in the tests? Sounds reasonable to me. Maybe `vm.libgraal.jit`? The good news is that there are no current tests using this predicate as far as I can see. Want to take the lead on this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21120#issuecomment-2372474408 From duke at openjdk.org Tue Sep 24 22:21:34 2024 From: duke at openjdk.org (Todd V. Jonker) Date: Tue, 24 Sep 2024 22:21:34 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent In-Reply-To: References: <5qxqeWQofN__IQVFSrm3Jt6NFgbcxPFmbLTr_ivQmX0=.878e9c33-fe64-49b6-9704-500bc12caa7f@github.com> Message-ID: On Mon, 23 Sep 2024 07:37:01 GMT, Doug Simon wrote: >> This will conflict with my change https://github.com/openjdk/jdk/pull/21069/commits/b75504633bc0f4fcecc1c08552e556b26d9ffbb9. > > No problem - I'll resolve it once your PR is merged. Based just off the flag docs, I think its easy to think that `+EnableJVMCI` is a prerequisite to enabling the other JVMCI flags, along the line of `+UnlockExperimentalVMOptions`. (That was for sure my newbie reading.) Perhaps its docs ([here](https://github.com/openjdk/jdk/blob/0b8c9f6d2397dcb480dc5ae109607d86f2b15619/src/hotspot/share/jvmci/jvmci_globals.hpp#L47-L48)) should be updated to say "Defaults to true if UseJVMCICompiler is true"? TBH I can't wrap my head around what `+EnableJVMCI` _means_; these options have a few chains of "this enables that" making it hard to grok the interactions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21120#discussion_r1774158571 From duke at openjdk.org Tue Sep 24 22:25:35 2024 From: duke at openjdk.org (Todd V. Jonker) Date: Tue, 24 Sep 2024 22:25:35 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 22:09:42 GMT, Doug Simon wrote: > > Would it make sense to rename the latter, to reduce ambiguity in the tests? > > Sounds reasonable to me. Maybe `vm.libgraal.jit`? The good news is that there are no current tests using this predicate as far as I can see. > > Want to take the lead on this? I like that alternative, and yes I'll work up the patch. I find myself need to fix several jtreg tests to handle this kind of configuration, so it's relevant to my goals. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21120#issuecomment-2372492772 From kbarrett at openjdk.org Tue Sep 24 23:31:44 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 24 Sep 2024 23:31:44 GMT Subject: RFR: 8340620: Fix -Wzero-as-null-pointer-constant warnings for CompressedOops Message-ID: Please review this change that fixes -Wzero-as-null-pointer-constant warnings in CompressedOops code. These all relate to CompressedOops::base(). I also added a couple of asserts to verify our assumptions about null pointer constants being representationally zero. That isn't a Standard-conforming assumption, but holds for all platforms we currently support. I considered, and even explored, a couple of different options. (1) Continue to have CompressedOops::base() be a pointer, but avoid that assumption, being more careful about how zero-valued pointers are treated. But that adds significant complexity that we can't test, since we don't support any platforms needing that extra work. (2) Change CompressedOops::base() to an integral adjustment. This is probably the correct approach, but is much more intrusive and wide ranging in the changes required. Maybe something for the future. Testing: mach5 tier1-5 GHA testing, verifying builds on some platforms not supported by Oracle. There are some simple changes to s390 and ppc code that I haven't tested, beyond verifying compilation. ------------- Commit messages: - fix setting base to null - fix base comparisons to 0 in shared - fix base comparisons to 0 in s390.ad - fix base comparisons to 0 in ppc.ad Changes: https://git.openjdk.org/jdk/pull/21172/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21172&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340620 Stats: 15 lines in 4 files changed: 6 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/21172.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21172/head:pull/21172 PR: https://git.openjdk.org/jdk/pull/21172 From iklam at openjdk.org Wed Sep 25 00:40:43 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 25 Sep 2024 00:40:43 GMT Subject: RFR: 8328313: Archived module graph should allow identical --module-path to be specified during dump time and run time [v4] In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 21:20:16 GMT, Calvin Cheung wrote: >> Prior to this patch, if `--module-path` is specified in the command line: >> during CDS dump time, full module graph will not be included in the CDS archive; >> during run time, full module graph will not be used. >> >> With this patch, the full module graph will be included in the CDS archive with the `--module-path` option. During run time, if the same `--module-path` option is specified, the archived module graph will be used. >> >> The checking of module paths between dump time and run time is more lenient compared with the checking of class paths; the ordering of the modules is unimportant, duplicate module names are ignored. >> E.g. the following is considered a match: >> dump time runtime >> m1,m2 m2,m1 >> m1,m2 m1,m2,m2 >> >> I included some [notes](https://bugs.openjdk.org/browse/JDK-8328313?focusedId=14699275&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14699275) in the bug report regarding some changes in the corelib classes. > > Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: > > fix indentation src/hotspot/share/cds/filemap.cpp line 956: > 954: } > 955: // module paths are stored in sorted order in the CDS archive. > 956: module_paths->sort(ClassLoaderExt::compare_module_path_by_name); I think it's better to put this call inside `ClassLoaderExt::extract_jar_files_from_path` src/hotspot/share/cds/heapShared.cpp line 879: > 877: > 878: ResourceMark rm(THREAD); > 879: if ((strcmp(k->name()->as_C_string(), "jdk/internal/module/ArchivedModuleGraph") == 0) && You can avoid the ResourceMark by if (k->name()->equals("jdk/internal/module/ArchivedModuleGraph") src/hotspot/share/cds/heapShared.cpp line 885: > 883: log_info(cds, heap)("Skip initializing ArchivedModuleGraph subgraph: is_using_optimized_module_handling=%s num_module_paths=%d", > 884: BOOL_TO_STR(CDSConfig::is_using_optimized_module_handling()), ClassLoaderExt::num_module_paths()); > 885: return; I think we can add a comment like: ArchivedModuleGraph was created with a --module-path that's different than the runtime --module-path. Thus, it might contain references to modules that do not exist in runtime. We cannot use it. src/hotspot/share/classfile/classLoader.cpp line 582: > 580: false /*is_boot_append */, false /* from_class_path_attr */); > 581: if (new_entry != nullptr) { > 582: assert(new_entry->is_jar_file(), "module path entry %s is not a jar file", new_entry->name()); How do we guarantee that new_entry is never a JAR file? Do we never come here if --module-path points to an exploded directory? A comment would be helpful. src/hotspot/share/classfile/classLoaderExt.cpp line 152: > 150: DIR* dirp = os::opendir(path); > 151: if (dirp == nullptr && errno == ENOTDIR && has_jar_suffix(path)) { > 152: module_paths->append(path); Does this handle the case where `path` doesn't exist? src/hotspot/share/classfile/classLoaderExt.cpp line 162: > 160: int n = os::snprintf(full_name, full_name_len, "%s%s%s", path, os::file_separator(), file_name); > 161: assert((size_t)n == full_name_len - 1, "Unexpected number of characters in string"); > 162: module_paths->append(full_name); Can this case be handled: --module-path=dir - Dump time : dir contains only mod1.jar - Run time : dir contains only mod1.jar and mod2.jmod src/hotspot/share/runtime/arguments.cpp line 347: > 345: } > 346: } > 347: return false; Can this be simplified to `return (strcmp(key, MODULE_PROPERTY_PREFIX PATH) == 0)`? src/java.base/share/classes/jdk/internal/loader/BuiltinClassLoader.java line 1092: > 1090: void resetArchivedStatesForAppClassLoader() { > 1091: setClassPath(null); > 1092: if (!moduleToReader.isEmpty()) moduleToReader.clear(); Suggestion: if (!moduleToReader.isEmpty()) { moduleToReader.clear(); } Also, do we need to do the same thing for the platform loader as well? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1774242056 PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1774243535 PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1774245579 PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1774247339 PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1774248778 PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1774249819 PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1774251713 PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1774252333 From dholmes at openjdk.org Wed Sep 25 01:45:37 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 25 Sep 2024 01:45:37 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v4] In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 14:06:17 GMT, Oli Gillespie wrote: >> Mitigate the impact of JVM_StartThread on safepoint synchronization, by adding a new ThreadStart_lock which limits the number of JVM_StartThread invocations competing for Threads_lock at any given time to 1. >> This gives a VM thread trying to call a safepoint a much better chance of acquiring Threads_lock when there are many JVM_StartThread invocations in flight, at the cost of one extra lock/unlock for every new thread. >> >> Can be disabled with new diagnostic flag `-XX:-UseExtraThreadStartLock`. >> >> Before (ThreadStartTtsp.java is shared in JDK-8340547): >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 1291591 ns >> Reaching safepoint: 59962 ns >> Reaching safepoint: 1958065 ns >> Reaching safepoint: 14456666258 ns <-- 14 seconds! >> ... >> >> >> After: >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 214269 ns >> Reaching safepoint: 60253 ns >> Reaching safepoint: 2040680 ns >> Reaching safepoint: 3089284 ns >> Reaching safepoint: 2998303 ns >> Reaching safepoint: 4433713 ns <-- 4.4ms >> Reaching safepoint: 3368436 ns >> Reaching safepoint: 2986519 ns >> Reaching safepoint: 3269102 ns >> ... >> >> >> >> **Alternatives** >> >> I considered some other options for mitigating this. For example, could we reduce the time spent holding the lock in StartThread? Most of the time is spent managing the threads list for ThreadSMR support, and each time we add a thread to that list we need to copy the whole list and free every entry in the original, which is slow. But I didn't see an easy way to avoid this. >> I also looked at some kind of signal from the VM thread that it is ready to start synchronizing that StartThread could check before trying to grab Threads_lock, but I didn't find anything better than this extra lock. > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Also address Thread::exit Implementation looks good. Lets see what the benchmarking shows. We can reason that for applications that create lots of threads quickly, this additional throttling can actually improve general throughput. But of course any individual start/exit is slowed down by the extra lock acquisition and release. Thanks src/hotspot/share/runtime/globals.hpp line 2003: > 2001: product(bool, UseThreadsLockThrottleLock, true, DIAGNOSTIC, \ > 2002: "Use an extra lock during Thread start and exit to alleviate" \ > 2003: "contention on threads lock.") \ Suggestion: "contention on Threads_lock.") \ src/hotspot/share/runtime/mutexLocker.hpp line 65: > 63: extern Mutex* RetData_lock; // a lock on installation of RetData inside method data > 64: extern Monitor* VMOperation_lock; // a lock on queue of vm_operations waiting to execute > 65: extern Monitor* ThreadsLockThrottle_lock; // used by Thread start/stop to reduce competition for Threads_lock, Suggestion: extern Monitor* ThreadsLockThrottle_lock; // used by Thread start/exit to reduce competition for Threads_lock, ------------- PR Review: https://git.openjdk.org/jdk/pull/21111#pullrequestreview-2326792129 PR Review Comment: https://git.openjdk.org/jdk/pull/21111#discussion_r1774287594 PR Review Comment: https://git.openjdk.org/jdk/pull/21111#discussion_r1774288261 From fyang at openjdk.org Wed Sep 25 02:03:43 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 25 Sep 2024 02:03:43 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v4] In-Reply-To: References: <-QJaY5cvW3qGmd5Nj9XT4ibBFFrexIfDKzF3JgEKbtg=.b1ce48dc-fcc5-496b-98bc-58dc7ac13308@github.com> Message-ID: On Tue, 24 Sep 2024 19:09:29 GMT, Hamlin Li wrote: >> Then why would we put a constraint on the number of supported argument vector registers here (v8-v15 instead of v8-v23)? Could we just support all of them, i.e., v8-v23 to comply with the RISC-V psABI? > > There is no strong reason, just it's sufficient for current implementation. > > Maybe it's better to use them all in case in the future some other code touch the limit unnecessarily. I'll change it to use all arg v regs. Thanks for the update. Glad to see it's now fully compliant with the new calling convention variant of RISC-V psABI. I think the code will be easier for others to understand at the same time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1774302338 From gcao at openjdk.org Wed Sep 25 02:27:34 2024 From: gcao at openjdk.org (Gui Cao) Date: Wed, 25 Sep 2024 02:27:34 GMT Subject: RFR: 8340643: RISC-V: Small refactoring for sub/subw macro-assembler routines In-Reply-To: References: Message-ID: <5axQRLNjDQ9UzzVEZ9vNNBlXQ-n66UeYxmb6n3IcfBc=.933ed01a-e3be-4a43-bbfe-0ebf3251530d@github.com> On Mon, 23 Sep 2024 12:39:07 GMT, Gui Cao wrote: > Hi, please help review that, small refactoring for sub/subw macro-assembler routines. > > ### Testing > - [x] Run tier1 tests on SOPHON SG2042 (release) Thanks all for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21135#issuecomment-2372757851 From duke at openjdk.org Wed Sep 25 02:27:34 2024 From: duke at openjdk.org (duke) Date: Wed, 25 Sep 2024 02:27:34 GMT Subject: RFR: 8340643: RISC-V: Small refactoring for sub/subw macro-assembler routines In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 12:39:07 GMT, Gui Cao wrote: > Hi, please help review that, small refactoring for sub/subw macro-assembler routines. > > ### Testing > - [x] Run tier1 tests on SOPHON SG2042 (release) @zifeihan Your change (at version 8ca04fb21b0852677835c5de54421a6559ecce45) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21135#issuecomment-2372758243 From fyang at openjdk.org Wed Sep 25 02:31:40 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 25 Sep 2024 02:31:40 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v6] In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 19:12:53 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> Thanks! >> >> This patch is based on https://github.com/openjdk/jdk/pull/20781 which added the sleef source (in particular the generated sleef inline headers). We use sleef api to vectorize the math operations in vector api. >> >> On machine with vector intrinsic support on riscv (e.g. gcc 14+) it will generate libsleef.so with the bridge functions to sleef api, otherwise without the bridge functions. >> >> ### Test >> test/jdk/jdk/incubator/vector >> >> ### Performance >> data on bananapi >> >> Benchmark - bananapi | (size) | Mode | Cnt | Score +intrinsic | Error +intrinsic | Score -intrinsic | Error -intrinsic | Units | Improvement >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Double128Vector.ACOS | 1024 | avgt | 10 | 112444.388 | 655.761 | 208554.742 | 1508.709 | ns/op | 1.855 >> Double128Vector.ASIN | 1024 | avgt | 10 | 104121.259 | 243.167 | 208314.499 | 2833.61 | ns/op | 2.001 >> Double128Vector.ATAN | 1024 | avgt | 10 | 136941.263 | 243.486 | 284024.53 | 2204.224 | ns/op | 2.074 >> Double128Vector.ATAN2 | 1024 | avgt | 10 | 163228.681 | 435.455 | 427589.587 | 3045.192 | ns/op | 2.62 >> Double128Vector.CBRT | 1024 | avgt | 10 | 146395.753 | 239.355 | 317136.654 | 1330.869 | ns/op | 2.166 >> Double128Vector.COS | 1024 | avgt | 10 | 154865.298 | 235.697 | 305721.518 | 1319.313 | ns/op | 1.974 >> Double128Vector.COSH | 1024 | avgt | 10 | 189212.943 | 262.399 | 220756.27 | 61324.863 | ns/op | 1.167 >> Double128Vector.EXP | 1024 | avgt | 10 | 113941.594 | 219.647 | 252853.07 | 891.272 | ns/op | 2.219 >> Double128Vector.EXPM1 | 1024 | avgt | 10 | 184552.939 | 513.715 | 254087.184 | 2144.997 | ns/op | 1.377 >> Double128Vector.HYPOT | 1024 | avgt | 10 | 111580.194 | 423.282 | 374537.338 | 2091.811 | ns/op | 3.357 >> Double128Vector.LOG | 1024 | avgt | 10 | 110680.548 | 192.731 | 265391.129 | 2653.519 | ns/op | 2.398 >> Double128Vector.LOG10 | 1024 | avgt | 10 | 116708.105 | 167.095 | 285764.405 | 2489.08 | ns/op | 2.449 >> Double128Vector.LOG1P | 1024 | avgt | 10 | 115633.302 | 567.7 | 317235.967 | 1062.848 | ns/op | 2.743 >> Double128Vector.POW | 1024 | avgt | 10 | 321655.14 | 3... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > use all arg v regs src/hotspot/cpu/riscv/riscv.ad line 10079: > 10077: match(CallLeafVector); > 10078: > 10079: effect(USE meth, KILL cr); One more question here. I didn't check the details of `CallLeafVector`. Is it safe to assume that `FRM` will be saved and restored before and after the runtime call? Check this: https://bugs.openjdk.org/browse/JDK-8330094 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1774320423 From gcao at openjdk.org Wed Sep 25 02:31:43 2024 From: gcao at openjdk.org (Gui Cao) Date: Wed, 25 Sep 2024 02:31:43 GMT Subject: Integrated: 8340643: RISC-V: Small refactoring for sub/subw macro-assembler routines In-Reply-To: References: Message-ID: <2nsDjJVl4mpFjs6WVb1iXwEwx79Zx48F9CyvCzhLxn8=.72dce285-de04-48b5-9730-e60ca99e61b5@github.com> On Mon, 23 Sep 2024 12:39:07 GMT, Gui Cao wrote: > Hi, please help review that, small refactoring for sub/subw macro-assembler routines. > > ### Testing > - [x] Run tier1 tests on SOPHON SG2042 (release) This pull request has now been integrated. Changeset: a37bb2e0 Author: Gui Cao Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/a37bb2e0372a7c074c88d31824fc418a47f63405 Stats: 14 lines in 1 file changed: 0 ins; 12 del; 2 mod 8340643: RISC-V: Small refactoring for sub/subw macro-assembler routines Reviewed-by: fyang, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/21135 From fyang at openjdk.org Wed Sep 25 02:34:36 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 25 Sep 2024 02:34:36 GMT Subject: RFR: 8340808: RISC-V: Client build fails after JDK-8339738 In-Reply-To: <2SNxwljy44SGeKeDA1yBPgr8UB6oNq41o4ScsDbr1j4=.8a0bc0de-a62c-47cc-be78-8864fb4d0187@github.com> References: <2SNxwljy44SGeKeDA1yBPgr8UB6oNq41o4ScsDbr1j4=.8a0bc0de-a62c-47cc-be78-8864fb4d0187@github.com> Message-ID: <4aaxD03R84avk_a-OxLbEGzIRlvoe1zE8i5xEnc9EX4=.6792033a-36f5-494b-a62f-d5ac1a536d1e@github.com> On Tue, 24 Sep 2024 14:05:16 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Previously, the crc32 intrinsic (scalar version) was added for both c1/c2, then the vector version was added but depends on a global flag MaxVectorSize which is only valid in c2. > This pr is to put all vector crc32 related code under COMPILER2 macro protection. > I tested with/without --with-jvm-variants=client. > Thanks! Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21159#pullrequestreview-2326841318 From rcastanedalo at openjdk.org Wed Sep 25 04:22:25 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 25 Sep 2024 04:22:25 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v26] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request incrementally with three additional commits since the last revision: - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/47c982ba..6fb36e50 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=24-25 Stats: 104 lines in 5 files changed: 4 ins; 30 del; 70 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From rcastanedalo at openjdk.org Wed Sep 25 04:26:43 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 25 Sep 2024 04:26:43 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v24] In-Reply-To: References: <-TKP8SOh4eIFRiOcn-a3aRb_GEgApaWv9vhyEOiKJSw=.d4ade865-3cb1-477d-a7c7-f46449553a64@github.com> Message-ID: On Sat, 21 Sep 2024 06:44:21 GMT, Fei Yang wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion >> - Remove redundant comment > > src/hotspot/cpu/aarch64/gc/g1/g1_aarch64.ad line 257: > >> 255: RegSet::of($res$$Register) /* no_preserve */); >> 256: __ mov($tmp1$$Register, $oldval$$Register); >> 257: __ mov($tmp2$$Register, $newval$$Register); > > Hi, I don't quite understand these two register-register moves here. Seems to me that we could pass `oldval` and `newval` to `cmpxchg` directly as `cmpxchg` won't modify them, which help us save these two moves. Did I miss anything? Thanks. Hi Fei, good catch, thanks. These moves have been around since the changes were initially prototyped, and are indeed unnecessary. Note that micro-optimization of the barrier code for atomic memory accesses has not been a focus of this JEP since we have not found any performance reason to do it at the macro level. Having said that, the moves are just wasteful and (perhaps more importantly) make the code harder to read and maintain, so I just removed them (commit 2c7f374e). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1774393587 From jbhateja at openjdk.org Wed Sep 25 04:39:26 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 25 Sep 2024 04:39:26 GMT Subject: RFR: 8338021: Support new unsigned and saturating vector operators in VectorAPI [v18] In-Reply-To: References: Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new vector operators. > > > . SUADD : Saturating unsigned addition. > . SADD : Saturating signed addition. > . SUSUB : Saturating unsigned subtraction. > . SSUB : Saturating signed subtraction. > . UMAX : Unsigned max > . UMIN : Unsigned min. > > > New vector operators are applicable to only integral types since their values wraparound in over/underflowing scenarios after setting appropriate status flags. For floating point types, as per IEEE 754 specs there are multiple schemes to handler underflow, one of them is gradual underflow which transitions the value to subnormal range. Similarly, overflow implicitly saturates the floating-point value to an Infinite value. > > As the name suggests, these are saturating operations, i.e. the result of the computation is strictly capped by lower and upper bounds of the result type and is not wrapped around in underflowing or overflowing scenarios. > > Summary of changes: > - Java side implementation of new vector operators. > - Add new scalar saturating APIs for each of the above saturating vector operator in corresponding primitive box classes, fallback implementation of vector operators is based over it. > - C2 compiler IR and inline expander changes. > - Optimized x86 backend implementation for new vector operators and their predicated counterparts. > - Extends existing VectorAPI Jtreg test suite to cover new operations. > > Kindly review and share your feedback. > > Best Regards, > PS: Intrinsification and auto-vectorization of new core-lib API will be addressed separately in a follow-up patch. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html Jatin Bhateja has updated the pull request incrementally with two additional commits since the last revision: - Merge pull request #5 from PaulSandoz/JDK-8338201 Move and convert test - Move and convert test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20507/files - new: https://git.openjdk.org/jdk/pull/20507/files/eb2960a9..28b29bc6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20507&range=16-17 Stats: 523 lines in 2 files changed: 245 ins; 278 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20507.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20507/head:pull/20507 PR: https://git.openjdk.org/jdk/pull/20507 From rcastanedalo at openjdk.org Wed Sep 25 04:58:45 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 25 Sep 2024 04:58:45 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v19] In-Reply-To: References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Tue, 24 Sep 2024 19:57:29 GMT, Vladimir Kozlov wrote: > I think you can do that as follow up changes as separate RFE. I am fine with removal in this JEP code. Thanks, in the meantime I found the remaining case that was causing assertion failures, and managed to handle it and reintroduce the original assertion. The case is where the output of `m` is shared by multiple nodes `N = {n1, n2, ...}` and there exists at the same time a `n` in `N` such that `is_encode_and_store_pattern(n, m)`, and a different `n` in `N` such that `!is_encode_and_store_pattern(n, m)`. Here is an example of such a case: ![ideal](https://github.com/user-attachments/assets/2122dfe0-757c-4094-b8f7-451f4380af45) Commit f96dfe73 ensures that this case does not trigger the assertion by avoiding cloning `m` in `m -> n` if `m` is shared. This means that a few encode-and-store patterns that were matched by a single `g1EncodePAndStoreN` before are now matched with the less optimized `encodeHeapOop` + `g1StoreN`, e.g. in the above example: ![mach-before-after](https://github.com/user-attachments/assets/a6fe5ad2-c0fb-4098-8bed-34593dafbcc5) Luckily, this case is very infrequent so we will only miss around 1% of all optimization opportunities. In return, we can reintroduce the original assertion and be sure that the original invariants of the matcher are preserved. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1774467183 From rcastanedalo at openjdk.org Wed Sep 25 04:58:45 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 25 Sep 2024 04:58:45 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v19] In-Reply-To: References: <3JcYCl2Ew0W_-JXOPA7Cc-iWgsLr-cuj8att-_qIHLw=.23ebe003-540b-4e03-ba04-e5bfb142c5cf@github.com> Message-ID: On Wed, 25 Sep 2024 04:55:35 GMT, Roberto Casta?eda Lozano wrote: >> I think you can do that as follow up changes as separate RFE. I am fine with removal in this JEP code. > >> I think you can do that as follow up changes as separate RFE. I am fine with removal in this JEP code. > > Thanks, in the meantime I found the remaining case that was causing assertion failures, and managed to handle it and reintroduce the original assertion. The case is where the output of `m` is shared by multiple nodes `N = {n1, n2, ...}` and there exists at the same time a `n` in `N` such that `is_encode_and_store_pattern(n, m)`, and a different `n` in `N` such that `!is_encode_and_store_pattern(n, m)`. Here is an example of such a case: > > ![ideal](https://github.com/user-attachments/assets/2122dfe0-757c-4094-b8f7-451f4380af45) > > Commit f96dfe73 ensures that this case does not trigger the assertion by avoiding cloning `m` in `m -> n` if `m` is shared. This means that a few encode-and-store patterns that were matched by a single `g1EncodePAndStoreN` before are now matched with the less optimized `encodeHeapOop` + `g1StoreN`, e.g. in the above example: > > ![mach-before-after](https://github.com/user-attachments/assets/a6fe5ad2-c0fb-4098-8bed-34593dafbcc5) > > Luckily, this case is very infrequent so we will only miss around 1% of all optimization opportunities. In return, we can reintroduce the original assertion and be sure that the original invariants of the matcher are preserved. @TheRealMDoerr: since there are now a few corner cases where we match a StoreN node with g1StoreN even though it stores the output of an EncodeP node, I had to remove the assertions in the x64 and ppc g1StoreN definitions, see above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1774467652 From amitkumar at openjdk.org Wed Sep 25 06:18:35 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 25 Sep 2024 06:18:35 GMT Subject: RFR: 8340843: [PPC64] Error: ShouldNotReachHere() in TemplateInterpreterGenerator::generate_math_entry after 8338694 In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 20:19:12 GMT, Martin Doerr wrote: > [JDK-8338694](https://bugs.openjdk.org/browse/JDK-8338694) introduced `Interpreter::java_lang_math_tanh` which needs to be handled by the interpreter. Unfortunately, `SharedRuntime::dtanh` does not exist, so we need to fallback to the normal interpreter entry (as before JDK-8338694). @TheRealMDoerr can you include change for s390x as well, please: diff --git a/src/hotspot/cpu/s390/templateInterpreterGenerator_s390.cpp b/src/hotspot/cpu/s390/templateInterpreterGenerator_s390.cpp index c16e4449045..0f35393a460 100644 --- a/src/hotspot/cpu/s390/templateInterpreterGenerator_s390.cpp +++ b/src/hotspot/cpu/s390/templateInterpreterGenerator_s390.cpp @@ -1224,6 +1224,7 @@ address TemplateInterpreterGenerator::generate_math_entry(AbstractInterpreter::M case Interpreter::java_lang_math_sin : runtime_entry = CAST_FROM_FN_PTR(address, SharedRuntime::dsin); break; case Interpreter::java_lang_math_cos : runtime_entry = CAST_FROM_FN_PTR(address, SharedRuntime::dcos); break; case Interpreter::java_lang_math_tan : runtime_entry = CAST_FROM_FN_PTR(address, SharedRuntime::dtan); break; + case Interpreter::java_lang_math_tanh : /* run interpreted */ break; case Interpreter::java_lang_math_abs : /* run interpreted */ break; case Interpreter::java_lang_math_sqrt : /* runtime_entry = CAST_FROM_FN_PTR(address, SharedRuntime::dsqrt); not available */ break; case Interpreter::java_lang_math_log : runtime_entry = CAST_FROM_FN_PTR(address, SharedRuntime::dlog); break; ------------- PR Comment: https://git.openjdk.org/jdk/pull/21168#issuecomment-2373122609 From fyang at openjdk.org Wed Sep 25 07:28:37 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 25 Sep 2024 07:28:37 GMT Subject: RFR: 8340732: RISC-V: Refactor crc32 scalar version In-Reply-To: References: Message-ID: <7uY-5VtgQvN6ChnxiclORhWnBy5J8IlsjT9m57LxTDA=.ee10a573-ace7-4ad9-9eed-94b449b5f26d@github.com> On Tue, 24 Sep 2024 07:22:51 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > As discussed in?https://github.com/openjdk/jdk/pull/20910#discussion_r1755150447,?it's helpful to refactor the existing scalar version of crc32 intrinsic. > Several refactoring are done in this pr, > 1. Simplify the `len` usage, now it only decreases (i.e. change in one direction) > 2. Simplify the code paths > 3. Remove one instruction in `L_by4_loop` > 4. Remove unnecessary code > 5. Other misc > > Thanks! Generally looks fine. I only have minor comments. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1621: > 1619: > 1620: bind(L_by1_loop); > 1621: blez(len, L_exit); Since `len` will never be lower than zero, maybe `beqz` is more accurate here? src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1627: > 1625: andi(tmp2, tmp1, right_8_bits); > 1626: update_byte_crc32(crc, tmp2, table0); > 1627: blez(len, L_exit); Same question here. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1633: > 1631: andi(tmp2, tmp2, right_8_bits); > 1632: update_byte_crc32(crc, tmp2, table0); > 1633: blez(len, L_exit); and here. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 6097: > 6095: __ kernel_crc32(crc, buf, len, > 6096: c_rarg3, c_rarg4, c_rarg5, c_rarg6, // tmp's for tables > 6097: c_rarg7, t2, x28, x29, x30, x31); // misc tmps Nit: Can we use alias t3-t6 for x28-x31 instead? The purpose is to be consistent with `t2` in naming. ------------- PR Review: https://git.openjdk.org/jdk/pull/21150#pullrequestreview-2325373678 PR Review Comment: https://git.openjdk.org/jdk/pull/21150#discussion_r1774679248 PR Review Comment: https://git.openjdk.org/jdk/pull/21150#discussion_r1774679913 PR Review Comment: https://git.openjdk.org/jdk/pull/21150#discussion_r1774680149 PR Review Comment: https://git.openjdk.org/jdk/pull/21150#discussion_r1773414691 From mli at openjdk.org Wed Sep 25 07:36:38 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 25 Sep 2024 07:36:38 GMT Subject: RFR: 8340732: RISC-V: Refactor crc32 scalar version In-Reply-To: <7uY-5VtgQvN6ChnxiclORhWnBy5J8IlsjT9m57LxTDA=.ee10a573-ace7-4ad9-9eed-94b449b5f26d@github.com> References: <7uY-5VtgQvN6ChnxiclORhWnBy5J8IlsjT9m57LxTDA=.ee10a573-ace7-4ad9-9eed-94b449b5f26d@github.com> Message-ID: On Tue, 24 Sep 2024 13:58:57 GMT, Fei Yang wrote: >> Hi, >> Can you help to review this patch? >> As discussed in?https://github.com/openjdk/jdk/pull/20910#discussion_r1755150447,?it's helpful to refactor the existing scalar version of crc32 intrinsic. >> Several refactoring are done in this pr, >> 1. Simplify the `len` usage, now it only decreases (i.e. change in one direction) >> 2. Simplify the code paths >> 3. Remove one instruction in `L_by4_loop` >> 4. Remove unnecessary code >> 5. Other misc >> >> Thanks! > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 6097: > >> 6095: __ kernel_crc32(crc, buf, len, >> 6096: c_rarg3, c_rarg4, c_rarg5, c_rarg6, // tmp's for tables >> 6097: c_rarg7, t2, x28, x29, x30, x31); // misc tmps > > Nit: Can we use alias t3-t6 for x28-x31 instead? The purpose is to be consistent with `t2` in naming. How do you think we add t3-t6 alias into assemler_riscv.hpp? I saw several times people had the similar question, and in the code there are several places like this `Register size = x29; // t4`, i.e. try to use t3-t6, but needs to add a comment for that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21150#discussion_r1774694494 From fyang at openjdk.org Wed Sep 25 07:36:47 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 25 Sep 2024 07:36:47 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v24] In-Reply-To: References: <-TKP8SOh4eIFRiOcn-a3aRb_GEgApaWv9vhyEOiKJSw=.d4ade865-3cb1-477d-a7c7-f46449553a64@github.com> Message-ID: On Wed, 25 Sep 2024 04:22:49 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/cpu/aarch64/gc/g1/g1_aarch64.ad line 257: >> >>> 255: RegSet::of($res$$Register) /* no_preserve */); >>> 256: __ mov($tmp1$$Register, $oldval$$Register); >>> 257: __ mov($tmp2$$Register, $newval$$Register); >> >> Hi, I don't quite understand these two register-register moves here. Seems to me that we could pass `oldval` and `newval` to `cmpxchg` directly as `cmpxchg` won't modify them, which help us save these two moves. Did I miss anything? Thanks. > > Hi Fei, good catch, thanks. These moves have been around since the changes were initially prototyped, and are indeed unnecessary. Note that micro-optimization of the barrier code for atomic memory accesses has not been a focus of this JEP since we have not found any performance reason to do it at the macro level. Having said that, the moves are just wasteful and (perhaps more importantly) make the code harder to read and maintain, so I just removed them (commit 2c7f374e). Thanks for the update. It now looks cleaner and easier to understand. BTW: Seems that RISC-V part bears a similar issue. I will discuss with @feilongjiang and hopefully we will come up with a similar fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19746#discussion_r1774695093 From mli at openjdk.org Wed Sep 25 07:45:36 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 25 Sep 2024 07:45:36 GMT Subject: RFR: 8340732: RISC-V: Refactor crc32 scalar version In-Reply-To: <7uY-5VtgQvN6ChnxiclORhWnBy5J8IlsjT9m57LxTDA=.ee10a573-ace7-4ad9-9eed-94b449b5f26d@github.com> References: <7uY-5VtgQvN6ChnxiclORhWnBy5J8IlsjT9m57LxTDA=.ee10a573-ace7-4ad9-9eed-94b449b5f26d@github.com> Message-ID: On Wed, 25 Sep 2024 07:24:32 GMT, Fei Yang wrote: >> Hi, >> Can you help to review this patch? >> As discussed in?https://github.com/openjdk/jdk/pull/20910#discussion_r1755150447,?it's helpful to refactor the existing scalar version of crc32 intrinsic. >> Several refactoring are done in this pr, >> 1. Simplify the `len` usage, now it only decreases (i.e. change in one direction) >> 2. Simplify the code paths >> 3. Remove one instruction in `L_by4_loop` >> 4. Remove unnecessary code >> 5. Other misc >> >> Thanks! > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1621: > >> 1619: >> 1620: bind(L_by1_loop); >> 1621: blez(len, L_exit); > > Since `len` will never be lower than zero, maybe `beqz` is more accurate here? Yes, will do it and below ones. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21150#discussion_r1774709253 From fyang at openjdk.org Wed Sep 25 07:45:37 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 25 Sep 2024 07:45:37 GMT Subject: RFR: 8340732: RISC-V: Refactor crc32 scalar version In-Reply-To: References: <7uY-5VtgQvN6ChnxiclORhWnBy5J8IlsjT9m57LxTDA=.ee10a573-ace7-4ad9-9eed-94b449b5f26d@github.com> Message-ID: On Wed, 25 Sep 2024 07:34:03 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 6097: >> >>> 6095: __ kernel_crc32(crc, buf, len, >>> 6096: c_rarg3, c_rarg4, c_rarg5, c_rarg6, // tmp's for tables >>> 6097: c_rarg7, t2, x28, x29, x30, x31); // misc tmps >> >> Nit: Can we use alias t3-t6 for x28-x31 instead? The purpose is to be consistent with `t2` in naming. > > How do you think we add t3-t6 alias into assemler_riscv.hpp? > I saw several times people had the similar question, and in the code there are several places like this `Register size = x29; // t4`, i.e. try to use t3-t6, but needs to add a comment for that. Ah, I just realized we only have t0 - t2 defined in file assemler_riscv.hpp. I think its reasonable and helpful to add more ones like t3 - t6. These are temporary registers names / ABI Mnemonics specified by the RISC-V psABI spec. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21150#discussion_r1774709305 From mli at openjdk.org Wed Sep 25 07:50:35 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 25 Sep 2024 07:50:35 GMT Subject: RFR: 8340732: RISC-V: Refactor crc32 scalar version In-Reply-To: References: <7uY-5VtgQvN6ChnxiclORhWnBy5J8IlsjT9m57LxTDA=.ee10a573-ace7-4ad9-9eed-94b449b5f26d@github.com> Message-ID: On Wed, 25 Sep 2024 07:42:58 GMT, Fei Yang wrote: >> How do you think we add t3-t6 alias into assemler_riscv.hpp? >> I saw several times people had the similar question, and in the code there are several places like this `Register size = x29; // t4`, i.e. try to use t3-t6, but needs to add a comment for that. > > Ah, I just realized we only have t0 - t2 defined in file assemler_riscv.hpp. I think its reasonable and helpful to add more ones like t3 - t6. These are temporary registers names / ABI Mnemonics specified by the RISC-V psABI spec. OK, let me do it in another pr, as there're several other places like `Register size = x29; // t4`, I will change them together, and also include this place. tracked by https://bugs.openjdk.org/browse/JDK-8340880 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21150#discussion_r1774717904 From stefank at openjdk.org Wed Sep 25 07:51:35 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 25 Sep 2024 07:51:35 GMT Subject: RFR: 8340620: Fix -Wzero-as-null-pointer-constant warnings for CompressedOops In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 23:26:08 GMT, Kim Barrett wrote: > Please review this change that fixes -Wzero-as-null-pointer-constant warnings > in CompressedOops code. These all relate to CompressedOops::base(). > > I also added a couple of asserts to verify our assumptions about null pointer > constants being representationally zero. That isn't a Standard-conforming > assumption, but holds for all platforms we currently support. I considered, > and even explored, a couple of different options. > > (1) Continue to have CompressedOops::base() be a pointer, but avoid that > assumption, being more careful about how zero-valued pointers are treated. But > that adds significant complexity that we can't test, since we don't support > any platforms needing that extra work. > > (2) Change CompressedOops::base() to an integral adjustment. This is probably > the correct approach, but is much more intrusive and wide ranging in the > changes required. Maybe something for the future. > > Testing: mach5 tier1-5 > GHA testing, verifying builds on some platforms not supported by Oracle. > > There are some simple changes to s390 and ppc code that I haven't tested, > beyond verifying compilation. Looks good. One inquiry below. src/hotspot/share/oops/compressedOops.inline.hpp line 53: > 51: // Assume a null base casts to zero. Otherwise we need more complexity that > 52: // we can't test, since this is true for all currently supported platforms. > 53: assert(0 == reinterpret_cast(nullptr), "null pointer value not zero?"); Could this be a static_assert? ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21172#pullrequestreview-2327422676 PR Review Comment: https://git.openjdk.org/jdk/pull/21172#discussion_r1774716887 From rehn at openjdk.org Wed Sep 25 08:13:52 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 25 Sep 2024 08:13:52 GMT Subject: Integrated: 8339771: RISC-V: Reduce icache flushes In-Reply-To: References: Message-ID: <3vEzsrm5B4pmjfFUdP3QW1JTsWJfwJJ-brVdBxBdGgE=.1f5074f8-46ec-46da-bf9b-cc519e3f2674@github.com> On Mon, 9 Sep 2024 12:33:01 GMT, Robbin Ehn wrote: > Hey, please consider, > > All code which is offline (behind a barrier) do not need global icache flushes. > As we can instead in slow path locally (thread and hart) emit fence.i. > But if we were to be context switch do a hart which have not had fence.i emitted we can still fetch stale instructions. > To handle this case new now have kernel support: > https://docs.kernel.org/arch/riscv/cmodx.html > > It's not perfect as we will be emitting fence.i on any context switch for any thread with this patch, even if that thread do not execute on code heap (non attached native thread), and even if there was no changes to code heap. > But this is in many cases much faster as the icache flush global IPI is very intrusive. > Particular cases are running a concurrent gc with small head room. > In such scenario I measured 15% increased throughput on VF2. > A large CPU or less head room (faster GC cycles) will yield even more performance boost. > > Note that this requires 6.10 kernel. > > I'm running VF2 with 6.11-rc3 kernel and this passed tier1-3. (With setting on) > > Later we probably want this default on, but as it's hard to test I'll leave default off. This pull request has now been integrated. Changeset: 97a3933f Author: Robbin Ehn URL: https://git.openjdk.org/jdk/commit/97a3933f1be2cabfc574689bb60618fe6fa3a8a4 Stats: 115 lines in 10 files changed: 111 ins; 0 del; 4 mod 8339771: RISC-V: Reduce icache flushes Reviewed-by: fyang, mli, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/20913 From mli at openjdk.org Wed Sep 25 08:16:39 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 25 Sep 2024 08:16:39 GMT Subject: RFR: 8340808: RISC-V: Client build fails after JDK-8339738 In-Reply-To: <4aaxD03R84avk_a-OxLbEGzIRlvoe1zE8i5xEnc9EX4=.6792033a-36f5-494b-a62f-d5ac1a536d1e@github.com> References: <2SNxwljy44SGeKeDA1yBPgr8UB6oNq41o4ScsDbr1j4=.8a0bc0de-a62c-47cc-be78-8864fb4d0187@github.com> <4aaxD03R84avk_a-OxLbEGzIRlvoe1zE8i5xEnc9EX4=.6792033a-36f5-494b-a62f-d5ac1a536d1e@github.com> Message-ID: On Wed, 25 Sep 2024 02:31:45 GMT, Fei Yang wrote: >> Hi, >> Can you help to review this simple patch? >> Previously, the crc32 intrinsic (scalar version) was added for both c1/c2, then the vector version was added but depends on a global flag MaxVectorSize which is only valid in c2. >> This pr is to put all vector crc32 related code under COMPILER2 macro protection. >> I tested with/without --with-jvm-variants=client. >> Thanks! > > Marked as reviewed by fyang (Reviewer). Thanks @RealFYang for your reviewing! I'll push this with one reviewer, as it's a trivial change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21159#issuecomment-2373377100 From mli at openjdk.org Wed Sep 25 08:16:39 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 25 Sep 2024 08:16:39 GMT Subject: Integrated: 8340808: RISC-V: Client build fails after JDK-8339738 In-Reply-To: <2SNxwljy44SGeKeDA1yBPgr8UB6oNq41o4ScsDbr1j4=.8a0bc0de-a62c-47cc-be78-8864fb4d0187@github.com> References: <2SNxwljy44SGeKeDA1yBPgr8UB6oNq41o4ScsDbr1j4=.8a0bc0de-a62c-47cc-be78-8864fb4d0187@github.com> Message-ID: On Tue, 24 Sep 2024 14:05:16 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Previously, the crc32 intrinsic (scalar version) was added for both c1/c2, then the vector version was added but depends on a global flag MaxVectorSize which is only valid in c2. > This pr is to put all vector crc32 related code under COMPILER2 macro protection. > I tested with/without --with-jvm-variants=client. > Thanks! This pull request has now been integrated. Changeset: 9806d213 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/9806d2139cb5994effdee3f7bc6b23eb81858ed3 Stats: 10 lines in 2 files changed: 8 ins; 1 del; 1 mod 8340808: RISC-V: Client build fails after JDK-8339738 Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/21159 From mli at openjdk.org Wed Sep 25 08:35:05 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 25 Sep 2024 08:35:05 GMT Subject: RFR: 8340732: RISC-V: Refactor crc32 scalar version [v2] In-Reply-To: References: Message-ID: <_Gk-ry03eJvNJv4bcpQLKKoE36Yh-SH853WYXPtdnOo=.e6f7a9f0-f7f8-41f1-aa99-ff33e16b20c2@github.com> > Hi, > Can you help to review this patch? > As discussed in?https://github.com/openjdk/jdk/pull/20910#discussion_r1755150447,?it's helpful to refactor the existing scalar version of crc32 intrinsic. > Several refactoring are done in this pr, > 1. Simplify the `len` usage, now it only decreases (i.e. change in one direction) > 2. Simplify the code paths > 3. Remove one instruction in `L_by4_loop` > 4. Remove unnecessary code > 5. Other misc > > Thanks! Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - merge master - bltz -> beqz - Initial commit ------------- Changes: https://git.openjdk.org/jdk/pull/21150/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21150&range=01 Stats: 57 lines in 2 files changed: 9 ins; 26 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/21150.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21150/head:pull/21150 PR: https://git.openjdk.org/jdk/pull/21150 From shade at openjdk.org Wed Sep 25 08:37:29 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 25 Sep 2024 08:37:29 GMT Subject: RFR: 8336468: Reflection and MethodHandles should use more precise initializer checks [v4] In-Reply-To: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com> References: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com> Message-ID: > This PR should cover the Reflection/MethodHandles part of [JDK-8336103](https://bugs.openjdk.org/browse/JDK-8336103). > > There are places where we change the behavior: `clinit` would now be recorded as "method", instead of "constructor". Tracing back the uses of `get_flags`: it is used for initializing `java.lang.ClassFrameInfo.flags`. There seem to be no readers for this field in VM. Java side for `j.l.CFI` does not seem to check any method/constructor flags. So I would say this change in behavior is not really visible, and there is no need to try and keep the old (odd) behavior. > > I also inlined the `select_method` definition, which allows for a bit more straight-forward local code, and obviates the need for wrapping things with `methodHandle`. > > @mlchung, you probably want to look at this more closely. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'master' into JDK-8336468-reflection-init-checks - Whitespace and comments - Merge branch 'master' into JDK-8336468-reflection-init-checks - Merge branch 'master' into JDK-8336468-reflection-init-checks - Remove unnecessary handle-izing - Fix - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20192/files - new: https://git.openjdk.org/jdk/pull/20192/files/969cbb9e..95b1091b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20192&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20192&range=02-03 Stats: 188453 lines in 1842 files changed: 168822 ins; 10044 del; 9587 mod Patch: https://git.openjdk.org/jdk/pull/20192.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20192/head:pull/20192 PR: https://git.openjdk.org/jdk/pull/20192 From shade at openjdk.org Wed Sep 25 08:37:31 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 25 Sep 2024 08:37:31 GMT Subject: RFR: 8336468: Reflection and MethodHandles should use more precise initializer checks [v2] In-Reply-To: <4L07gHAQMsFU2gzWpWm16TtzrY_e_nQp8YfklCPYiRc=.614132c5-c0ca-423f-af43-58fd3502a451@github.com> References: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com> <4L07gHAQMsFU2gzWpWm16TtzrY_e_nQp8YfklCPYiRc=.614132c5-c0ca-423f-af43-58fd3502a451@github.com> Message-ID: On Fri, 16 Aug 2024 12:56:06 GMT, Coleen Phillimore wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8336468-reflection-init-checks >> - Remove unnecessary handle-izing >> - Fix >> - Fix > > This seems fine. Thanks @coleenp! Dusting off this PR... I think I need a second Reviewer, maybe @mlchung still wants to look at it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20192#issuecomment-2373425533 From shade at openjdk.org Wed Sep 25 08:40:35 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 25 Sep 2024 08:40:35 GMT Subject: RFR: 8340620: Fix -Wzero-as-null-pointer-constant warnings for CompressedOops In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 23:26:08 GMT, Kim Barrett wrote: > Please review this change that fixes -Wzero-as-null-pointer-constant warnings > in CompressedOops code. These all relate to CompressedOops::base(). > > I also added a couple of asserts to verify our assumptions about null pointer > constants being representationally zero. That isn't a Standard-conforming > assumption, but holds for all platforms we currently support. I considered, > and even explored, a couple of different options. > > (1) Continue to have CompressedOops::base() be a pointer, but avoid that > assumption, being more careful about how zero-valued pointers are treated. But > that adds significant complexity that we can't test, since we don't support > any platforms needing that extra work. > > (2) Change CompressedOops::base() to an integral adjustment. This is probably > the correct approach, but is much more intrusive and wide ranging in the > changes required. Maybe something for the future. > > Testing: mach5 tier1-5 > GHA testing, verifying builds on some platforms not supported by Oracle. > > There are some simple changes to s390 and ppc code that I haven't tested, > beyond verifying compilation. Looks fine, modulo assert comment: ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21172#pullrequestreview-2327573673 From shade at openjdk.org Wed Sep 25 08:40:36 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 25 Sep 2024 08:40:36 GMT Subject: RFR: 8340620: Fix -Wzero-as-null-pointer-constant warnings for CompressedOops In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 07:47:21 GMT, Stefan Karlsson wrote: >> Please review this change that fixes -Wzero-as-null-pointer-constant warnings >> in CompressedOops code. These all relate to CompressedOops::base(). >> >> I also added a couple of asserts to verify our assumptions about null pointer >> constants being representationally zero. That isn't a Standard-conforming >> assumption, but holds for all platforms we currently support. I considered, >> and even explored, a couple of different options. >> >> (1) Continue to have CompressedOops::base() be a pointer, but avoid that >> assumption, being more careful about how zero-valued pointers are treated. But >> that adds significant complexity that we can't test, since we don't support >> any platforms needing that extra work. >> >> (2) Change CompressedOops::base() to an integral adjustment. This is probably >> the correct approach, but is much more intrusive and wide ranging in the >> changes required. Maybe something for the future. >> >> Testing: mach5 tier1-5 >> GHA testing, verifying builds on some platforms not supported by Oracle. >> >> There are some simple changes to s390 and ppc code that I haven't tested, >> beyond verifying compilation. > > src/hotspot/share/oops/compressedOops.inline.hpp line 53: > >> 51: // Assume a null base casts to zero. Otherwise we need more complexity that >> 52: // we can't test, since this is true for all currently supported platforms. >> 53: assert(0 == reinterpret_cast(nullptr), "null pointer value not zero?"); > > Could this be a static_assert? +1. Although I would expect any sane compiler to fold it, maybe it is still not optimized with something like `-O0`. Or maybe just move these asserts to `CompressedOops::initialize`, so whatever happens, happens once. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21172#discussion_r1774809759 From fyang at openjdk.org Wed Sep 25 08:54:36 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 25 Sep 2024 08:54:36 GMT Subject: RFR: 8340732: RISC-V: Refactor crc32 scalar version [v2] In-Reply-To: <_Gk-ry03eJvNJv4bcpQLKKoE36Yh-SH853WYXPtdnOo=.e6f7a9f0-f7f8-41f1-aa99-ff33e16b20c2@github.com> References: <_Gk-ry03eJvNJv4bcpQLKKoE36Yh-SH853WYXPtdnOo=.e6f7a9f0-f7f8-41f1-aa99-ff33e16b20c2@github.com> Message-ID: <29Die5DNeMk_F3aZVcXSWnX9hPIr4-vmyYZzqMDnJ6g=.49d75dd2-6454-4732-b1f1-34bcb9cb1782@github.com> On Wed, 25 Sep 2024 08:35:05 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> As discussed in?https://github.com/openjdk/jdk/pull/20910#discussion_r1755150447,?it's helpful to refactor the existing scalar version of crc32 intrinsic. >> Several refactoring are done in this pr, >> 1. Simplify the `len` usage, now it only decreases (i.e. change in one direction) >> 2. Simplify the code paths >> 3. Remove one instruction in `L_by4_loop` >> 4. Remove unnecessary code >> 5. Other misc >> >> Thanks! > > Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - merge master > - bltz -> beqz > - Initial commit Updated change looks good to me. Any test performed? Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21150#pullrequestreview-2327611766 From ogillespie at openjdk.org Wed Sep 25 09:00:09 2024 From: ogillespie at openjdk.org (Oli Gillespie) Date: Wed, 25 Sep 2024 09:00:09 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v5] In-Reply-To: References: Message-ID: > Mitigate the impact of JVM_StartThread on safepoint synchronization, by adding a new ThreadStart_lock which limits the number of JVM_StartThread invocations competing for Threads_lock at any given time to 1. > This gives a VM thread trying to call a safepoint a much better chance of acquiring Threads_lock when there are many JVM_StartThread invocations in flight, at the cost of one extra lock/unlock for every new thread. > > Can be disabled with new diagnostic flag `-XX:-UseExtraThreadStartLock`. > > Before (ThreadStartTtsp.java is shared in JDK-8340547): > > java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' > Reaching safepoint: 1291591 ns > Reaching safepoint: 59962 ns > Reaching safepoint: 1958065 ns > Reaching safepoint: 14456666258 ns <-- 14 seconds! > ... > > > After: > > java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' > Reaching safepoint: 214269 ns > Reaching safepoint: 60253 ns > Reaching safepoint: 2040680 ns > Reaching safepoint: 3089284 ns > Reaching safepoint: 2998303 ns > Reaching safepoint: 4433713 ns <-- 4.4ms > Reaching safepoint: 3368436 ns > Reaching safepoint: 2986519 ns > Reaching safepoint: 3269102 ns > ... > > > > **Alternatives** > > I considered some other options for mitigating this. For example, could we reduce the time spent holding the lock in StartThread? Most of the time is spent managing the threads list for ThreadSMR support, and each time we add a thread to that list we need to copy the whole list and free every entry in the original, which is slow. But I didn't see an easy way to avoid this. > I also looked at some kind of signal from the VM thread that it is ready to start synchronizing that StartThread could check before trying to grab Threads_lock, but I didn't find anything better than this extra lock. Oli Gillespie has updated the pull request incrementally with two additional commits since the last revision: - Improve doc Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> - Improve comment Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21111/files - new: https://git.openjdk.org/jdk/pull/21111/files/42909653..7b5633ae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21111&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21111&range=03-04 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21111.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21111/head:pull/21111 PR: https://git.openjdk.org/jdk/pull/21111 From ogillespie at openjdk.org Wed Sep 25 09:00:10 2024 From: ogillespie at openjdk.org (Oli Gillespie) Date: Wed, 25 Sep 2024 09:00:10 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v3] In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 12:51:25 GMT, David Holmes wrote: >> Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix lock ranking > > If you do this at exit too then my concerns have doubled. I'd want to see some broader benchmarking of this change. Thanks @dholmes-ora . Is there any particular benchmarking you recommend me to run? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21111#issuecomment-2373478786 From mdoerr at openjdk.org Wed Sep 25 09:11:11 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 25 Sep 2024 09:11:11 GMT Subject: RFR: 8340843: [PPC64/s390x] Error: ShouldNotReachHere() in TemplateInterpreterGenerator::generate_math_entry after 8338694 [v2] In-Reply-To: References: Message-ID: > [JDK-8338694](https://bugs.openjdk.org/browse/JDK-8338694) introduced `Interpreter::java_lang_math_tanh` which needs to be handled by the interpreter. Unfortunately, `SharedRuntime::dtanh` does not exist, so we need to fallback to the normal interpreter entry (as before JDK-8338694). Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Add s390 fix. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21168/files - new: https://git.openjdk.org/jdk/pull/21168/files/676e2700..bb63f682 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21168&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21168&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21168.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21168/head:pull/21168 PR: https://git.openjdk.org/jdk/pull/21168 From mbaesken at openjdk.org Wed Sep 25 09:11:11 2024 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 25 Sep 2024 09:11:11 GMT Subject: RFR: 8340843: [PPC64/s390x] Error: ShouldNotReachHere() in TemplateInterpreterGenerator::generate_math_entry after 8338694 [v2] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 09:08:28 GMT, Martin Doerr wrote: >> [JDK-8338694](https://bugs.openjdk.org/browse/JDK-8338694) introduced `Interpreter::java_lang_math_tanh` which needs to be handled by the interpreter. Unfortunately, `SharedRuntime::dtanh` does not exist, so we need to fallback to the normal interpreter entry (as before JDK-8338694). > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add s390 fix. Marked as reviewed by mbaesken (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21168#pullrequestreview-2327646845 From mdoerr at openjdk.org Wed Sep 25 09:11:11 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 25 Sep 2024 09:11:11 GMT Subject: RFR: 8340843: [PPC64/s390x] Error: ShouldNotReachHere() in TemplateInterpreterGenerator::generate_math_entry after 8338694 In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 06:16:04 GMT, Amit Kumar wrote: >> [JDK-8338694](https://bugs.openjdk.org/browse/JDK-8338694) introduced `Interpreter::java_lang_math_tanh` which needs to be handled by the interpreter. Unfortunately, `SharedRuntime::dtanh` does not exist, so we need to fallback to the normal interpreter entry (as before JDK-8338694). > > @TheRealMDoerr can you include change for s390x as well, please: > > diff --git a/src/hotspot/cpu/s390/templateInterpreterGenerator_s390.cpp b/src/hotspot/cpu/s390/templateInterpreterGenerator_s390.cpp > index c16e4449045..0f35393a460 100644 > --- a/src/hotspot/cpu/s390/templateInterpreterGenerator_s390.cpp > +++ b/src/hotspot/cpu/s390/templateInterpreterGenerator_s390.cpp > @@ -1224,6 +1224,7 @@ address TemplateInterpreterGenerator::generate_math_entry(AbstractInterpreter::M > case Interpreter::java_lang_math_sin : runtime_entry = CAST_FROM_FN_PTR(address, SharedRuntime::dsin); break; > case Interpreter::java_lang_math_cos : runtime_entry = CAST_FROM_FN_PTR(address, SharedRuntime::dcos); break; > case Interpreter::java_lang_math_tan : runtime_entry = CAST_FROM_FN_PTR(address, SharedRuntime::dtan); break; > + case Interpreter::java_lang_math_tanh : /* run interpreted */ break; > case Interpreter::java_lang_math_abs : /* run interpreted */ break; > case Interpreter::java_lang_math_sqrt : /* runtime_entry = CAST_FROM_FN_PTR(address, SharedRuntime::dsqrt); not available */ break; > case Interpreter::java_lang_math_log : runtime_entry = CAST_FROM_FN_PTR(address, SharedRuntime::dlog); break; @offamitkumar: Please review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21168#issuecomment-2373499340 From amitkumar at openjdk.org Wed Sep 25 09:19:34 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 25 Sep 2024 09:19:34 GMT Subject: RFR: 8340843: [PPC64/s390x] Error: ShouldNotReachHere() in TemplateInterpreterGenerator::generate_math_entry after 8338694 [v2] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 09:11:11 GMT, Martin Doerr wrote: >> [JDK-8338694](https://bugs.openjdk.org/browse/JDK-8338694) introduced `Interpreter::java_lang_math_tanh` which needs to be handled by the interpreter. Unfortunately, `SharedRuntime::dtanh` does not exist, so we need to fallback to the normal interpreter entry (as before JDK-8338694). > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add s390 fix. Thank you for adding the s390x changes :-) ------------- Marked as reviewed by amitkumar (Committer). PR Review: https://git.openjdk.org/jdk/pull/21168#pullrequestreview-2327672083 From jsjolen at openjdk.org Wed Sep 25 09:23:39 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 25 Sep 2024 09:23:39 GMT Subject: RFR: 8340363: Tag-specific default decorators for UnifiedLogging [v4] In-Reply-To: References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: <_gHhzeVW-OT3gTMiLMjzeBEjOmOMMnveEfr79i8rQBs=.88da52a5-7b42-4e3a-8965-3f607e383c82@github.com> On Tue, 24 Sep 2024 16:43:57 GMT, Ant?n Seoane wrote: >> Hi all, >> >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: >> >> - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. >> - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. >> - Additionally, defaults may target a specific log level. >> >> Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. >> >> In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. >> >> Please consider this PR, and thanks! > > Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: > > Removed whitespace The main point of importance here is that anything user-specified (from `-Xlog` or `jcmd`) will take priority. We should probably add some information regarding this on the `-Xlog:help` page. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20988#issuecomment-2373534931 From mdoerr at openjdk.org Wed Sep 25 09:28:40 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 25 Sep 2024 09:28:40 GMT Subject: RFR: 8340843: [PPC64/s390x] Error: ShouldNotReachHere() in TemplateInterpreterGenerator::generate_math_entry after 8338694 [v2] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 09:11:11 GMT, Martin Doerr wrote: >> [JDK-8338694](https://bugs.openjdk.org/browse/JDK-8338694) introduced `Interpreter::java_lang_math_tanh` which needs to be handled by the interpreter. Unfortunately, `SharedRuntime::dtanh` does not exist, so we need to fallback to the normal interpreter entry (as before JDK-8338694). > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add s390 fix. Thanks for the reviews! Pre-submit cross build for s390x is successful. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21168#issuecomment-2373541386 From mdoerr at openjdk.org Wed Sep 25 09:28:40 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 25 Sep 2024 09:28:40 GMT Subject: Integrated: 8340843: [PPC64/s390x] Error: ShouldNotReachHere() in TemplateInterpreterGenerator::generate_math_entry after 8338694 In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 20:19:12 GMT, Martin Doerr wrote: > [JDK-8338694](https://bugs.openjdk.org/browse/JDK-8338694) introduced `Interpreter::java_lang_math_tanh` which needs to be handled by the interpreter. Unfortunately, `SharedRuntime::dtanh` does not exist, so we need to fallback to the normal interpreter entry (as before JDK-8338694). This pull request has now been integrated. Changeset: 1b9898a4 Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/1b9898a44fd3f8159a7184053ef50cba55419d6e Stats: 2 lines in 2 files changed: 2 ins; 0 del; 0 mod 8340843: [PPC64/s390x] Error: ShouldNotReachHere() in TemplateInterpreterGenerator::generate_math_entry after 8338694 Reviewed-by: mbaesken, amitkumar ------------- PR: https://git.openjdk.org/jdk/pull/21168 From adinn at openjdk.org Wed Sep 25 09:51:34 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 25 Sep 2024 09:51:34 GMT Subject: RFR: 8340181: Shenandoah: Cleanup ShenandoahRuntime stubs In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 08:52:21 GMT, Aleksey Shipilev wrote: > Noticed this while working on Leyden, which has to enumerate Shenandoah stubs for code archival to work. > > `ShenandoahRuntime::shenandoah_clone_barrier` is excessive name. `ShenandoahRuntime::arraycopy_barrier_oop_entry` and friends is not covered by `JRT_LEAF`. This change hopefully homogenizes the namings for the stubs. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` Thanks for cleaning this up. Looks good. ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21152#pullrequestreview-2327757323 From dnsimon at openjdk.org Wed Sep 25 10:32:49 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 25 Sep 2024 10:32:49 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent [v2] In-Reply-To: References: Message-ID: > This PR replaces some uses of `UseJVMCICompiler` with `EnableJVMCI` so that JVMCI code paths are taken when JVMCI is only used for non-CompilerBroker compilations. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: clarified doc for EnableJVMCI and UseJVMCINativeLibrary ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21120/files - new: https://git.openjdk.org/jdk/pull/21120/files/fcc3ece0..e26e68d9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21120&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21120&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21120.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21120/head:pull/21120 PR: https://git.openjdk.org/jdk/pull/21120 From dnsimon at openjdk.org Wed Sep 25 10:32:49 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 25 Sep 2024 10:32:49 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent [v2] In-Reply-To: References: <5qxqeWQofN__IQVFSrm3Jt6NFgbcxPFmbLTr_ivQmX0=.878e9c33-fe64-49b6-9704-500bc12caa7f@github.com> Message-ID: On Tue, 24 Sep 2024 22:17:46 GMT, Todd V. Jonker wrote: >> No problem - I'll resolve it once your PR is merged. > > Based just off the flag docs, I think its easy to think that `+EnableJVMCI` is a prerequisite to enabling the other JVMCI flags, along the line of `+UnlockExperimentalVMOptions`. (That was for sure my newbie reading.) > > Perhaps its docs ([here](https://github.com/openjdk/jdk/blob/0b8c9f6d2397dcb480dc5ae109607d86f2b15619/src/hotspot/share/jvmci/jvmci_globals.hpp#L47-L48)) should be updated to say "Defaults to true if UseJVMCICompiler is true"? > > TBH I can't wrap my head around what `+EnableJVMCI` _means_; these options have a few chains of "this enables that" making it hard to grok the interactions. I've pushed https://github.com/openjdk/jdk/pull/21120/commits/e26e68d9a70ee27b4e71da86ecb42dca11e9a24f to try clarify this a bit. I know it's a little confusing and apologize. When JVMCI is no longer experimental, this should become much clearer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21120#discussion_r1774982914 From duke at openjdk.org Wed Sep 25 10:33:48 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Wed, 25 Sep 2024 10:33:48 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v14] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <_FQX9bjvQ0oKFXqCWA0kQmqFh4Ffvfcp_hQVkxjSWTA=.3caf10c7-27b5-4922-9887-effc4c147030@github.com> Message-ID: <0qXIw6MxohS5BEqM54PZPvjHdWKE9DZfQu3t8GtMgb0=.f3afd2ca-275e-494c-b5e2-ab033e515305@github.com> On Tue, 24 Sep 2024 15:28:25 GMT, Andrew Dinn wrote: >> Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: >> >> cleanup: fix a comment typo >> >> Co-authored-by: Andrew Haley > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5439: > >> 5437: case Assembler::T8B: >> 5438: case Assembler::T4H: >> 5439: case Assembler::T8H: > > With the current code we should never see T4H as a value for load_arrangement. Is there a reason why you are folding it into the same case handling block as T8B and T8H rather than into the default block? If not then best to remove that case here and below. The reason was to make the implementation handle all possible values of `load_arrangement` should it change. And it did while I was tuning the performance of the algorithm. The code is valid and I'd argue there's no mistake if we leave this line here. But I'm comfortable with removing it in order to make the current implementation less error-prone. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1774983995 From adinn at openjdk.org Wed Sep 25 10:56:41 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 25 Sep 2024 10:56:41 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v14] In-Reply-To: <0qXIw6MxohS5BEqM54PZPvjHdWKE9DZfQu3t8GtMgb0=.f3afd2ca-275e-494c-b5e2-ab033e515305@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <_FQX9bjvQ0oKFXqCWA0kQmqFh4Ffvfcp_hQVkxjSWTA=.3caf10c7-27b5-4922-9887-effc4c147030@github.com> <0qXIw6MxohS5BEqM54PZPvjHdWKE9DZfQu3t8GtMgb0=.f3afd2ca-275e-494c-b5e2-ab033e515305@github.com> Message-ID: On Wed, 25 Sep 2024 10:31:11 GMT, Mikhail Ablakatov wrote: >> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5439: >> >>> 5437: case Assembler::T8B: >>> 5438: case Assembler::T4H: >>> 5439: case Assembler::T8H: >> >> With the current code we should never see T4H as a value for load_arrangement. Is there a reason why you are folding it into the same case handling block as T8B and T8H rather than into the default block? If not then best to remove that case here and below. > > The reason was to make the implementation handle all possible values of `load_arrangement` should it change. And it did while I was tuning the performance of the algorithm. The code is valid and I'd argue there's no mistake if we leave this line here. But I'm comfortable with removing it in order to make the current implementation less error-prone. Please remove it then as it can only serve to confuse maintainers. The code should visibly display consistent assumptions wherever possible. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1775012965 From qamai at openjdk.org Wed Sep 25 11:25:35 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 25 Sep 2024 11:25:35 GMT Subject: RFR: 8340620: Fix -Wzero-as-null-pointer-constant warnings for CompressedOops In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 08:36:37 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/oops/compressedOops.inline.hpp line 53: >> >>> 51: // Assume a null base casts to zero. Otherwise we need more complexity that >>> 52: // we can't test, since this is true for all currently supported platforms. >>> 53: assert(0 == reinterpret_cast(nullptr), "null pointer value not zero?"); >> >> Could this be a static_assert? > > +1. Although I would expect any sane compiler to fold it, maybe it is still not optimized with something like `-O0`. Or maybe just move these asserts to `CompressedOops::initialize`, so whatever happens, happens once. This cannot be used in `static_assert` because `reinterpret_cast` is not allowed here. I believe [`reinterpret_cast(nullptr)` will always return 0](https://en.cppreference.com/w/cpp/language/reinterpret_cast). You may need to do it the other way around. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21172#discussion_r1775048234 From rkennke at openjdk.org Wed Sep 25 12:34:36 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 25 Sep 2024 12:34:36 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v25] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Enforce lightweight locking on 32-bit platforms ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/2c4a7877..cd69da86 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=23-24 Stats: 9 lines in 1 file changed: 9 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From fbredberg at openjdk.org Wed Sep 25 12:40:43 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 25 Sep 2024 12:40:43 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v3] In-Reply-To: <2b1-1pjo1Xk-0CDMYCHAWjrjdlhD8hA5H8c9Qh-Ireg=.9df3a0f8-c12c-4515-b60f-407924c5a287@github.com> References: <2b1-1pjo1Xk-0CDMYCHAWjrjdlhD8hA5H8c9Qh-Ireg=.9df3a0f8-c12c-4515-b60f-407924c5a287@github.com> Message-ID: On Fri, 20 Sep 2024 05:05:22 GMT, David Holmes wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update two, after the review > > src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2734: > >> 2732: // We need a full fence after clearing owner to avoid stranding. >> 2733: // StoreLoad achieves this. >> 2734: membar(StoreLoad); > > Suggestion: > > fence(); > > similar to S390 I use `membar(StoreLoad)` in all other platforms except in S390, and that is because it has no `membar()` implementation. I want to be consistent and choose to go with prior art, which happens to be the `OrderAccess::storeload()` called from `ObjectMonitor::exit()`. In PowerPC both `fence() `and `membar(StoreLoad)` are mapped to a `sync` instruction, so there is no real difference. But for consistency reasons I choose `membar(StoreLoad)`. You know, "when in Rome"... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1775150906 From rkennke at openjdk.org Wed Sep 25 12:53:17 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 25 Sep 2024 12:53:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Allow LM_MONITOR on 32-bit platforms ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/cd69da86..4904d433 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=24-25 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From coleenp at openjdk.org Wed Sep 25 13:12:38 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 25 Sep 2024 13:12:38 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: <8X-gxUxvDE0dkl9VGwDNd3aCa06ABV6Kr7uE1vQLuYE=.8b5de297-045c-4096-96b8-b27c91acc10e@github.com> On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. Current patch makes a seqcst write, which is stronger than strictly necessary. I think it is okay to be extra paranoid on rarely-executed class initialization path. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release I was looking through and we set the "loaded" state under the Compile_lock (because of dependencies in add_to_hierarchy), we set the "linked", "being_initialized", "fully_initialized" and "initialization_error" under the init_lock object (which I want to change again) with a notify for the latter two. Using a load_acquire to examine the state (and release_store to write) seems like the right thing to do because there isn't just one lock so we should assume reading this state is lock free. It looks like the C2 code optimizes away the clinit_barrier when possible so we can watch for any performance difference but I'd still rather have safety. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2374046227 From duke at openjdk.org Wed Sep 25 13:15:46 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Wed, 25 Sep 2024 13:15:46 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v14] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <_FQX9bjvQ0oKFXqCWA0kQmqFh4Ffvfcp_hQVkxjSWTA=.3caf10c7-27b5-4922-9887-effc4c147030@github.com> Message-ID: On Tue, 24 Sep 2024 15:15:35 GMT, Andrew Dinn wrote: >> Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: >> >> cleanup: fix a comment typo >> >> Co-authored-by: Andrew Haley > > src/hotspot/cpu/aarch64/aarch64.ad line 16608: > >> 16606: %} >> 16607: >> 16608: instruct arrays_hashcode(iRegP_R1 ary, iRegI_R2 cnt, iRegI_R0 result, immI basic_type, > > I'm not sure why `arrays_hashcode` uses the plural, ditto for the macroassembler method name and stub name/stub generator method name. Other instructions and stubs use the singular e.g. `instruction array_equal_NNN`, `generate_arraycopy_stubs` etc. It would be better to follow that by systematically renaming all occurrences of `arrays_hashcode` to `array_hashcode`. I believe this is because the Java class that provides the method is called `Java.util.Arrays`. Additionally, `MacroAssembler` [declares](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L1439) `arrays_equals` using the plural form as well. Systematically changing these names would impact methods which are not directly related to this PR, namely `arrays_equals`, as well as other architectures beside AArch64. I'd suggest to do it in a separate PR in the future (if at all) to keep the focus of this one on `VectorizedHashCode` on AArch64, as indicated by the title. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1775207723 From liach at openjdk.org Wed Sep 25 13:24:40 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 25 Sep 2024 13:24:40 GMT Subject: RFR: 8336468: Reflection and MethodHandles should use more precise initializer checks [v4] In-Reply-To: References: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com> Message-ID: On Wed, 25 Sep 2024 08:37:29 GMT, Aleksey Shipilev wrote: >> This PR should cover the Reflection/MethodHandles part of [JDK-8336103](https://bugs.openjdk.org/browse/JDK-8336103). >> >> There are places where we change the behavior: `clinit` would now be recorded as "method", instead of "constructor". Tracing back the uses of `get_flags`: it is used for initializing `java.lang.ClassFrameInfo.flags`. There seem to be no readers for this field in VM. Java side for `j.l.CFI` does not seem to check any method/constructor flags. So I would say this change in behavior is not really visible, and there is no need to try and keep the old (odd) behavior. >> >> I also inlined the `select_method` definition, which allows for a bit more straight-forward local code, and obviates the need for wrapping things with `methodHandle`. >> >> @mlchung, you probably want to look at this more closely. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into JDK-8336468-reflection-init-checks > - Whitespace and comments > - Merge branch 'master' into JDK-8336468-reflection-init-checks > - Merge branch 'master' into JDK-8336468-reflection-init-checks > - Remove unnecessary handle-izing > - Fix > - Fix src/hotspot/share/prims/jni.cpp line 450: > 448: reflection_method = Reflection::new_constructor(m, CHECK_NULL); > 449: } else { > 450: assert(!m->is_static_initializer(), "Cannot be static initializer"); This looks like a behavioral change; otherwise reflection and method handle changes look good. Per JNI https://docs.oracle.com/en/java/javase/23/docs/specs/jni/functions.html#toreflectedmethod it seems to just return if the method id is valid. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20192#discussion_r1775222862 From fbredberg at openjdk.org Wed Sep 25 13:54:07 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 25 Sep 2024 13:54:07 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v4] In-Reply-To: References: Message-ID: <9q4etSUBTu0pkFlzmGS3YjeJ8VnDaR5jQOg1TC8KtlU=.a573b142-97a5-472a-814e-de301efde9da@github.com> > Removed the concept of an ObjectMonitor Responsible thread. > > The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. > > The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). > > After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. > > Passes tier1-tier7 on supported platforms. > x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. > Arm32 and Zero doesn't need any changes as far as I can tell. Fredrik Bredberg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: - Update three, after the review - Merge branch 'master' into 8320318_objectmon_responsible_thread - Update two, after the review - Update one, after the review - Small fixes before the review - Merge branch 'master' into 8320318_objectmon_responsible_thread - Merge branch 'master' into 8320318_objectmon_responsible_thread - Removed _Responsible - Fixed s390 - Fixed legacy locking - ... and 4 more: https://git.openjdk.org/jdk/compare/0f253d11...8140570f ------------- Changes: https://git.openjdk.org/jdk/pull/19454/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19454&range=03 Stats: 720 lines in 14 files changed: 302 ins; 288 del; 130 mod Patch: https://git.openjdk.org/jdk/pull/19454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19454/head:pull/19454 PR: https://git.openjdk.org/jdk/pull/19454 From rcastanedalo at openjdk.org Wed Sep 25 13:54:59 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 25 Sep 2024 13:54:59 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <-CikzUsH1qKbMujGJQFhaPlKaCUDzqH-jEZNM5BZVQQ=.22d236a1-a69a-42e0-86d1-aa738c6e6e6d@github.com> Message-ID: <2adTLZAwTvFTVNGeR5e9Cef5uNqpsz2haeobLIDZiNI=.cb2bbf0d-5c1b-4583-b4bd-898e0c5cdbb7@github.com> On Fri, 13 Sep 2024 06:43:34 GMT, Roberto Casta?eda Lozano wrote: >> I don't remember if this change was a reaction to an error or if I just guarded `CompressedKlassPointers::shift()` with +UseCCP because that is the prerequisite now. Probably the latter. I can remove this. Probably should assert then for +UseCCP. > > I see, thanks. In that case, I would suggest removing the explicit `UseCompressedClassPointers` test, since it should be implied by `t->isa_narrowklass()`. `check_init()` within `CompressedKlassPointers::shift()` would already fail for the unexpected case where `t->isa_narrowklass() && !UseCompressedClassPointers`, no? I think it would be good to remove the explicit `UseCompressedClassPointers` test as argued above (i.e. revert this change), unless there is any other reason to keep it that I am missing out? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1775277784 From mli at openjdk.org Wed Sep 25 13:56:36 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 25 Sep 2024 13:56:36 GMT Subject: RFR: 8340732: RISC-V: Refactor crc32 scalar version [v2] In-Reply-To: <29Die5DNeMk_F3aZVcXSWnX9hPIr4-vmyYZzqMDnJ6g=.49d75dd2-6454-4732-b1f1-34bcb9cb1782@github.com> References: <_Gk-ry03eJvNJv4bcpQLKKoE36Yh-SH853WYXPtdnOo=.e6f7a9f0-f7f8-41f1-aa99-ff33e16b20c2@github.com> <29Die5DNeMk_F3aZVcXSWnX9hPIr4-vmyYZzqMDnJ6g=.49d75dd2-6454-4732-b1f1-34bcb9cb1782@github.com> Message-ID: On Wed, 25 Sep 2024 08:51:46 GMT, Fei Yang wrote: > Updated change looks good to me. Any test performed? Thanks. Thanks for reviewing! I just did a test, there is a little difference from previous version, some perf gain, some perf loss when size <= 256, when size get bigger, there is no difference. Basically I think we are good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21150#issuecomment-2374162186 From mli at openjdk.org Wed Sep 25 14:07:38 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 25 Sep 2024 14:07:38 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v6] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 02:29:11 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> use all arg v regs > > src/hotspot/cpu/riscv/riscv.ad line 10079: > >> 10077: match(CallLeafVector); >> 10078: >> 10079: effect(USE meth, KILL cr); > > I haven't checked the details of `CallLeafVector`. One more question here. Is it safe to assume that `FRM` will be saved and restored before and after the runtime call? Check this: https://bugs.openjdk.org/browse/JDK-8330094 Good question! Let me do some further investigation and get back later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1775300239 From fbredberg at openjdk.org Wed Sep 25 14:18:51 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 25 Sep 2024 14:18:51 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v3] In-Reply-To: <2b1-1pjo1Xk-0CDMYCHAWjrjdlhD8hA5H8c9Qh-Ireg=.9df3a0f8-c12c-4515-b60f-407924c5a287@github.com> References: <2b1-1pjo1Xk-0CDMYCHAWjrjdlhD8hA5H8c9Qh-Ireg=.9df3a0f8-c12c-4515-b60f-407924c5a287@github.com> Message-ID: On Fri, 20 Sep 2024 04:55:16 GMT, David Holmes wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update two, after the review > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 535: > >> 533: >> 534: // Set owner to null. >> 535: // Release to satisfy the JMM > > Can I suggest you use this comment form in each of the cpu-specific files. At the moment code that does the same thing is commented slightly differently with regard to the need for a release-store. Thanks. fixed > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 499: > >> 497: #endif >> 498: >> 499: // Intentional fall-through into slow path > > I don't think this comment makes sense / applies any more. fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1775314987 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1775315614 From fbredberg at openjdk.org Wed Sep 25 14:18:53 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 25 Sep 2024 14:18:53 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v3] In-Reply-To: References: <2b1-1pjo1Xk-0CDMYCHAWjrjdlhD8hA5H8c9Qh-Ireg=.9df3a0f8-c12c-4515-b60f-407924c5a287@github.com> Message-ID: On Fri, 20 Sep 2024 13:17:02 GMT, Fredrik Bredberg wrote: >> src/hotspot/share/runtime/javaThread.hpp line 467: >> >>> 465: intx _held_monitor_count; // used by continuations for fast lock detection >>> 466: intx _jni_monitor_count; >>> 467: ObjectMonitor* _unlocked_inflated_monitor; >> >> At the time we store this the OM is in-use but we have unlocked it and so by the time we go to re-lock it later it may no longer be in-use. What prevents it from being deflated and deallocated? Does it require a safepoint that can't happen on that code path? If so we should add a comment to that affect somewhere. > > I asked the GC guys @fisk and @xmas92 about this some time ago, and they assured me that it was ok. > Unfortunately I forgot about the details, so I re-asked them today and they said: "Since there is no safepoint polling when calling into the VM, so you can be sure that it hasn't been dealocated." I'll add a comment about it `SharedRuntime::monitor_exit_helper()`. fixed >> test/micro/org/openjdk/bench/vm/lang/LockUnlock.java line 315: >> >>> 313: * inflated. >>> 314: */ >>> 315: @Threads(3) >> >> Please explain this change and update the comment. > > he reason I increased the number of threads from 2 to 3, was because it enabled me to increase the code coverage, and thereby execute all(?) the corner cases when doing ObjectMonitor locking. I'll update the comment. fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1775316076 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1775319069 From fbredberg at openjdk.org Wed Sep 25 14:18:54 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 25 Sep 2024 14:18:54 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v3] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 12:05:01 GMT, Coleen Phillimore wrote: >> src/hotspot/share/runtime/objectMonitor.cpp line 335: >> >>> 333: // ObjectMonitor::deflate_monitor() will decrement contentions >>> 334: // after it recognizes that the async deflation was cancelled. >>> 335: contention_mark.extend(); >> >> This is a bit scary. Previously the locking_thread would already have 1 stake in the _contentions, and recognizes that after cancelling deflation, that stake is asynchronously released by the deflation thread. This means that in practice, we have 0 stakes left in the contention counter after the CAS that swings the owner to the locking_thread succeeds. Yet the ObjectMonitorContentionMark RAII object passed in from the caller looks like it guarantees there is a stake in the _contentions throughout its scope. By explicitly adding to contentions, the locking_thread reclaimed its stake in the _contentions counter while holding the lock, guaranteeing that deflation is indeed impossible until the end of the scope. >> The new extend() mechanism seems to consider it equivalent to not increment here and also not decrement later. +1 -1 == 0 right? However, that is not equivalent. HotSpot math works in mysterious ways. The old mechanism guaranteed the linearization point for deflation is blocked until you get out of scope. The new mechanism does not. Instead, it's up to the user to reason about for how long deflation is blocked out. It's blocked out as long as the monitor is held naturally, but if it is released and the scope is still active, there is no stake in the _contentions counter and deflation would succeed if it tries again. Things like the _waiters counter might tell the heuristics of deflation to not try again. An absence of a safepoint poll might also prevent deflation from trying again in a timely fashion. But the point is that the linearization point for deflation is no longer blocked, and the abstraction looks safer than it is. >> >> In practice, I don't know of any bug because of this. Seems like deflation is in other ways blocked out in practice. But I would really prefer if extend() would add to contentions and the destructor always decrements. This way, the contract is stronger and it's easier to convince ourselves that we have not messed up. The scope would on its own prevent deflation, regardless of how it is used, which cannot be guaranteed any longer with the current extend() implementation. >> >> Again, this might be better suited for a follow-up RFE. > > I think extends() should add to contentions in this PR since extends() is part of this PR, and initially I expected it to be sort of a refcount (with a case or two for extending the scope of the refcount). fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1775318079 From fbredberg at openjdk.org Wed Sep 25 14:18:55 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 25 Sep 2024 14:18:55 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v3] In-Reply-To: References: <8t6rNdDbJerisofk7hjzrB3Qt_KTV9MXxTZg4jpukao=.aca6e161-0401-47c9-85ae-37389c77f1c7@github.com> Message-ID: <-7bE75OFetziHXuMhKtmwwILt9j9oHpeTk7tnOZDw6w=.af8da70a-878f-4ab5-ba67-80bfe3252869@github.com> On Thu, 19 Sep 2024 19:43:46 GMT, Coleen Phillimore wrote: >> Maybe `check_already_owned`? FTR I prefer this version than a friend declaration. Using it outside would also imply renaming it as try_lock to be consistent. And having to expose and check for TryLockResult is also uglier in my opinion than checking a boolean. I don't see it as a hack, we are just skipping the already owned case since we know in this case it will fail. I would actually remove that comment altogether. > > I like "check_for_recusion" as a parameter name and also agree that dropping the comment, or rewriting the comment as: > > // If called from SharedRuntime::monitor_exit_helper, we know that this thread doesn't already own the lock > > I agree that I don't want TryLock and its results exposed outside this file. fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1775313293 From fbredberg at openjdk.org Wed Sep 25 14:18:56 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 25 Sep 2024 14:18:56 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v3] In-Reply-To: References: <8t6rNdDbJerisofk7hjzrB3Qt_KTV9MXxTZg4jpukao=.aca6e161-0401-47c9-85ae-37389c77f1c7@github.com> Message-ID: <7wb3KfRJDwOWg9gpat5D02KC7Z9rOGejuP5b1o5HNIg=.5c4258ba-5f49-4cce-b6e7-52e145860d80@github.com> On Mon, 23 Sep 2024 10:00:16 GMT, Erik ?sterlund wrote: >> src/hotspot/share/runtime/objectMonitor.cpp line 396: >> >>> 394: // to use ObjectMonitor::try_enter() as a public way of doing TryLock(). >>> 395: // Used this way in SharedRuntime::monitor_exit_helper(). >>> 396: if (check_owner) { >> >> Probably preference but an early return here is easier for me to parse. > > I agree. fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1775312835 From fbredberg at openjdk.org Wed Sep 25 14:18:56 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 25 Sep 2024 14:18:56 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v3] In-Reply-To: <4AsV5DtoOGyc_sXyrvuiijJeiXFcol90QEvg7wDbGDM=.ceae4774-09ae-4eb8-b703-614385b57c1a@github.com> References: <4AsV5DtoOGyc_sXyrvuiijJeiXFcol90QEvg7wDbGDM=.ceae4774-09ae-4eb8-b703-614385b57c1a@github.com> Message-ID: <-nOWRTq8pCo4q8BuyWoHiiBbi80xfykPhKNNhZ2IZBo=.2e7fdb37-4450-4474-997a-864ad069ea4b@github.com> On Thu, 19 Sep 2024 19:44:26 GMT, Coleen Phillimore wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update two, after the review > > src/hotspot/share/runtime/objectMonitor.cpp line 582: > >> 580: return TryLockResult::Interference; >> 581: } >> 582: if (TryLockWithContentionMark(current, contention_mark)) { > > I like this rename. It's consistent with the camel case that we want to change all at once someday. fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1775314576 From fbredberg at openjdk.org Wed Sep 25 14:19:02 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 25 Sep 2024 14:19:02 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v4] In-Reply-To: References: Message-ID: On Tue, 10 Sep 2024 12:54:25 GMT, Erik ?sterlund wrote: >> Fredrik Bredberg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: >> >> - Update three, after the review >> - Merge branch 'master' into 8320318_objectmon_responsible_thread >> - Update two, after the review >> - Update one, after the review >> - Small fixes before the review >> - Merge branch 'master' into 8320318_objectmon_responsible_thread >> - Merge branch 'master' into 8320318_objectmon_responsible_thread >> - Removed _Responsible >> - Fixed s390 >> - Fixed legacy locking >> - ... and 4 more: https://git.openjdk.org/jdk/compare/0f253d11...8140570f > > src/hotspot/share/runtime/objectMonitor.hpp line 226: > >> 224: static ByteSize succ_offset() { return byte_offset_of(ObjectMonitor, _succ); } >> 225: static ByteSize EntryList_offset() { return byte_offset_of(ObjectMonitor, _EntryList); } >> 226: static ByteSize contentions_offset() { return byte_offset_of(ObjectMonitor, _contentions); } > > Looks like a leftover from the previous approach that tried to deal with deflation races in assembly code. It should probably be removed. fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1775316439 From rcastanedalo at openjdk.org Wed Sep 25 14:19:54 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 25 Sep 2024 14:19:54 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 12:53:17 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Allow LM_MONITOR on 32-bit platforms src/hotspot/share/opto/memnode.cpp line 2256: > 2254: if (!UseCompactObjectHeaders && alloc != nullptr) { > 2255: return TypeX::make(markWord::prototype().value()); > 2256: } Suggestion: make these four lines conditional on `!UseCompactObjectHeaders`, like so: if (!UseCompactObjectHeaders) { Node* alloc = is_new_object_mark_load(); if (alloc != nullptr) { return TypeX::make(markWord::prototype().value()); } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1775322670 From fbredberg at openjdk.org Wed Sep 25 14:29:42 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 25 Sep 2024 14:29:42 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: <1MC83jRy9o6GrZouJaYjgHyIoyfNvrakHuirZMxIdhk=.769c2ce1-795f-4981-a10b-cee04cad5a0a@github.com> Message-ID: On Fri, 20 Sep 2024 08:25:10 GMT, Martin Doerr wrote: >> @TheRealMDoerr >>> I've run it through our nightly testing (x86_64, aarch64, PPC64 with several OSes) and the good news is that I haven't seen any functional problems. Performance looks also good for the SPEC benchmarks. I don't think they stress Java monitors very strongly. >> >> That really is good news! Thanks for testing! >> >>> I've rerun the `LockUnlock` micro benchmark with this patch applied, but `LockUnlock.java` reverted to the original version. This makes `LockUnlock.testContendedLock` faster, but not as fast as without this patch (on the 96 Thread Xeon linux server, similar on Power10). Would be great if anybody could confirm. I think this should at least be documented and the description of the JBS issue improved. >> >> Tanks for confirming that my suspension was right. As I stated earlier, due to the added StoreLoad barrier a slight decrease in performance is probably to be expected if you just run `LockUnlock.testContendedLock`, but it shouldn't really matter when running real life applications. Anyhow I'll update the description of the JBS issue. > > @fbredber: If you need help to resolve the PPC64 conflicts with https://github.com/openjdk/jdk/commit/7579d3740217e4a819cbf63837ec929f00464585, just let me know. @TheRealMDoerr @offamitkumar I resolved merge conflicts in `src/hotspot/cpu/ppc/macroAssembler_ppc.cpp` and `src/hotspot/cpu/s390/macroAssembler_s390.cpp`. I've smoke tested it with QEMU, but it would be nice if you could check if it's ok as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19454#issuecomment-2374250115 From fyang at openjdk.org Wed Sep 25 14:33:40 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 25 Sep 2024 14:33:40 GMT Subject: RFR: 8334999: RISC-V: implement AES single block encryption/decryption intrinsics [v7] In-Reply-To: <9rglid_tIn1JA4zqOFygiz1hWYZGOPa8Ci1AI1qRHDA=.c70dd0ec-46b4-43c5-841e-7d91edf65eb4@github.com> References: <9rglid_tIn1JA4zqOFygiz1hWYZGOPa8Ci1AI1qRHDA=.c70dd0ec-46b4-43c5-841e-7d91edf65eb4@github.com> Message-ID: <5ow09TSwi-PdaV29TrkX4LhaJxyQxBFjLkYLxVAhcsA=.6ac81488-94c9-42e7-a4f3-b59256da30df@github.com> On Sun, 8 Sep 2024 13:24:49 GMT, Arseny Bochkarev wrote: >> Hello everyone! Please review this port of vector AES single block encryption/decryption intrinsics. On my QEMU with `Zvkned` extension enabled the `test/hotspot/jtreg/compiler/codegen/aes/TestAESMain.java` test is OK. I know that currently hardware implementing this extension is not available on the market but I suppose this PR can be a good starting point on supporting AES intrinsics for RISC-V in OpenJDK. > > Arseny Bochkarev has updated the pull request incrementally with one additional commit since the last revision: > > Multiversion decrypt intrinsic src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2373: > 2371: __ vxor_vv(res, res, working_vregs[i]); > 2372: __ vaesdm_vv(res, vzero); > 2373: } Seems that a lot more `vxor.vv` are emitted here compared with the openssl version [1]. I wonder if this could be further optimized. Or is there anything I missed? Thanks. [1] https://github.com/openssl/openssl/blob/master/crypto/aes/asm/aes-riscv64-zvkned.pl#L279-L295 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1775344680 From shade at openjdk.org Wed Sep 25 15:25:40 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 25 Sep 2024 15:25:40 GMT Subject: RFR: 8336468: Reflection and MethodHandles should use more precise initializer checks [v4] In-Reply-To: References: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com> Message-ID: On Wed, 25 Sep 2024 13:22:09 GMT, Chen Liang wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8336468-reflection-init-checks >> - Whitespace and comments >> - Merge branch 'master' into JDK-8336468-reflection-init-checks >> - Merge branch 'master' into JDK-8336468-reflection-init-checks >> - Remove unnecessary handle-izing >> - Fix >> - Fix > > src/hotspot/share/prims/jni.cpp line 450: > >> 448: reflection_method = Reflection::new_constructor(m, CHECK_NULL); >> 449: } else { >> 450: assert(!m->is_static_initializer(), "Cannot be static initializer"); > > This looks like a behavioral change; otherwise reflection and method handle changes look good. Per JNI https://docs.oracle.com/en/java/javase/23/docs/specs/jni/functions.html#toreflectedmethod it seems to just return if the method id is valid. You mean the _assert_ is excessive, or something else? I think it is within the spec to return method for `clinit`-s, since clinits are logically not constructors. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20192#discussion_r1775445221 From adinn at openjdk.org Wed Sep 25 15:26:43 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 25 Sep 2024 15:26:43 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v14] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <_FQX9bjvQ0oKFXqCWA0kQmqFh4Ffvfcp_hQVkxjSWTA=.3caf10c7-27b5-4922-9887-effc4c147030@github.com> Message-ID: On Wed, 25 Sep 2024 13:13:13 GMT, Mikhail Ablakatov wrote: >> src/hotspot/cpu/aarch64/aarch64.ad line 16608: >> >>> 16606: %} >>> 16607: >>> 16608: instruct arrays_hashcode(iRegP_R1 ary, iRegI_R2 cnt, iRegI_R0 result, immI basic_type, >> >> I'm not sure why `arrays_hashcode` uses the plural, ditto for the macroassembler method name and stub name/stub generator method name. Other instructions and stubs use the singular e.g. `instruction array_equal_NNN`, `generate_arraycopy_stubs` etc. It would be better to follow that by systematically renaming all occurrences of `arrays_hashcode` to `array_hashcode`. > > I believe this is because the Java class that provides the method is called `Java.util.Arrays`. Additionally, `MacroAssembler` [declares](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L1439) `arrays_equals` using the plural form as well. > > Systematically changing these names would impact methods which are not directly related to this PR, namely `arrays_equals`, as well as other architectures beside AArch64. I'd suggest to do it in a separate PR in the future (if at all) to keep the focus of this one on `VectorizedHashCode` on AArch64, as indicated by the title. Ah, yes, that makes sense. Ok, so ignore this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1775447666 From liach at openjdk.org Wed Sep 25 15:35:36 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 25 Sep 2024 15:35:36 GMT Subject: RFR: 8336468: Reflection and MethodHandles should use more precise initializer checks [v4] In-Reply-To: References: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com> Message-ID: On Wed, 25 Sep 2024 15:22:38 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/prims/jni.cpp line 450: >> >>> 448: reflection_method = Reflection::new_constructor(m, CHECK_NULL); >>> 449: } else { >>> 450: assert(!m->is_static_initializer(), "Cannot be static initializer"); >> >> This looks like a behavioral change; otherwise reflection and method handle changes look good. Per JNI https://docs.oracle.com/en/java/javase/23/docs/specs/jni/functions.html#toreflectedmethod it seems to just return if the method id is valid. > > You mean the _assert_ is excessive, or something else? I think it is within the spec to return method for `clinit`-s, since clinits are logically not constructors. Then this assert seems excessive. If JNI users get the method slot (presumably through some hacks) for a `` then JNI has to return the `` and this assert doesn't stand. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20192#discussion_r1775461239 From shade at openjdk.org Wed Sep 25 15:53:19 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 25 Sep 2024 15:53:19 GMT Subject: RFR: 8336468: Reflection and MethodHandles should use more precise initializer checks [v4] In-Reply-To: References: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com> Message-ID: On Wed, 25 Sep 2024 15:32:33 GMT, Chen Liang wrote: >> You mean the _assert_ is excessive, or something else? I think it is within the spec to return method for `clinit`-s, since clinits are logically not constructors. > > Then this assert seems excessive. If JNI users get the method slot (presumably through some hacks) for a `` then JNI has to return the `` and this assert doesn't stand. OK, I replaced the assert with a comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20192#discussion_r1775488883 From shade at openjdk.org Wed Sep 25 15:53:19 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 25 Sep 2024 15:53:19 GMT Subject: RFR: 8336468: Reflection and MethodHandles should use more precise initializer checks [v5] In-Reply-To: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com> References: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com> Message-ID: <1ka2z7mmLWUvYJRKiYghlw3p41sr16iHEpOy8k8Pifo=.6aae963b-e10f-49b1-ac48-34e54055fdbf@github.com> > This PR should cover the Reflection/MethodHandles part of [JDK-8336103](https://bugs.openjdk.org/browse/JDK-8336103). > > There are places where we change the behavior: `clinit` would now be recorded as "method", instead of "constructor". Tracing back the uses of `get_flags`: it is used for initializing `java.lang.ClassFrameInfo.flags`. There seem to be no readers for this field in VM. Java side for `j.l.CFI` does not seem to check any method/constructor flags. So I would say this change in behavior is not really visible, and there is no need to try and keep the old (odd) behavior. > > I also inlined the `select_method` definition, which allows for a bit more straight-forward local code, and obviates the need for wrapping things with `methodHandle`. > > @mlchung, you probably want to look at this more closely. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Replace assert with a comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20192/files - new: https://git.openjdk.org/jdk/pull/20192/files/95b1091b..d58d7eff Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20192&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20192&range=03-04 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20192.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20192/head:pull/20192 PR: https://git.openjdk.org/jdk/pull/20192 From liach at openjdk.org Wed Sep 25 16:08:38 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 25 Sep 2024 16:08:38 GMT Subject: RFR: 8336468: Reflection and MethodHandles should use more precise initializer checks [v5] In-Reply-To: <1ka2z7mmLWUvYJRKiYghlw3p41sr16iHEpOy8k8Pifo=.6aae963b-e10f-49b1-ac48-34e54055fdbf@github.com> References: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com> <1ka2z7mmLWUvYJRKiYghlw3p41sr16iHEpOy8k8Pifo=.6aae963b-e10f-49b1-ac48-34e54055fdbf@github.com> Message-ID: On Wed, 25 Sep 2024 15:53:19 GMT, Aleksey Shipilev wrote: >> This PR should cover the Reflection/MethodHandles part of [JDK-8336103](https://bugs.openjdk.org/browse/JDK-8336103). >> >> There are places where we change the behavior: `clinit` would now be recorded as "method", instead of "constructor". Tracing back the uses of `get_flags`: it is used for initializing `java.lang.ClassFrameInfo.flags`. There seem to be no readers for this field in VM. Java side for `j.l.CFI` does not seem to check any method/constructor flags. So I would say this change in behavior is not really visible, and there is no need to try and keep the old (odd) behavior. >> >> I also inlined the `select_method` definition, which allows for a bit more straight-forward local code, and obviates the need for wrapping things with `methodHandle`. >> >> @mlchung, you probably want to look at this more closely. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Replace assert with a comment Thanks. FYI @mlchung is away for a few months. Other method handle and reflection checks look good. I am running some tests for this patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20192#issuecomment-2374512016 From duke at openjdk.org Wed Sep 25 16:11:20 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Wed, 25 Sep 2024 16:11:20 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v16] In-Reply-To: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: > Hello, > > Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. > > The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. > > At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. > > # Performance > > ## Neoverse N1 > > > -------------------------------------------------------------------------------------------- > Version Baseline This patch > -------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > -------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op > ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op > ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op > ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op > ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op > ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op > ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op > ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op > ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op > ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op > ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op > ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op > ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op > ArraysHashCode.multibytes 10 avgt 15 5.4... Mikhail Ablakatov has updated the pull request incrementally with three additional commits since the last revision: - cleanup: explain how this deals with the correct number of leftover elements modulo vf - cleanup: current implementation should never end up with T4H load arrangement - cleanup: remove obsolete method declarations ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18487/files - new: https://git.openjdk.org/jdk/pull/18487/files/a28bbcd3..142fa5d0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=14-15 Stats: 8 lines in 2 files changed: 2 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18487.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18487/head:pull/18487 PR: https://git.openjdk.org/jdk/pull/18487 From duke at openjdk.org Wed Sep 25 16:11:20 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Wed, 25 Sep 2024 16:11:20 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v14] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <_FQX9bjvQ0oKFXqCWA0kQmqFh4Ffvfcp_hQVkxjSWTA=.3caf10c7-27b5-4922-9887-effc4c147030@github.com> Message-ID: On Tue, 24 Sep 2024 15:00:58 GMT, Andrew Dinn wrote: >> Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: >> >> cleanup: fix a comment typo >> >> Co-authored-by: Andrew Haley > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.hpp line 39: > >> 37: // Helper functions for arrays_hashcode. >> 38: void arrays_hashcode_elload(Register dst, Address src, BasicType eltype); >> 39: int arrays_hashcode_elsize(BasicType eltype); > > The above two methods don't seem to exist any more? Fixed by https://github.com/openjdk/jdk/pull/18487/commits/0a0ab92a1e6f722bc0feaaa6787d15d734fed392. > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5477: > >> 5475: >> 5476: assert(is_power_of_2(vf), "can't use this value to calculate the jump target PC"); >> 5477: __ andr(rscratch2, cnt, vf - 1); > > It would probably be helpful to include here a repeat of the comment you added to the macroassembler method explaining how this deals with the correct number of leftover elements modulo `vf` Fixed by https://github.com/openjdk/jdk/pull/18487/commits/142fa5d0822910ca979de60aef62ba390365afc3. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1775514892 PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1775515932 From duke at openjdk.org Wed Sep 25 16:11:20 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Wed, 25 Sep 2024 16:11:20 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v14] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <_FQX9bjvQ0oKFXqCWA0kQmqFh4Ffvfcp_hQVkxjSWTA=.3caf10c7-27b5-4922-9887-effc4c147030@github.com> <0qXIw6MxohS5BEqM54PZPvjHdWKE9DZfQu3t8GtMgb0=.f3afd2ca-275e-494c-b5e2-ab033e515305@github.com> Message-ID: On Wed, 25 Sep 2024 10:54:26 GMT, Andrew Dinn wrote: >> The reason was to make the implementation handle all possible values of `load_arrangement` should it change. And it did while I was tuning the performance of the algorithm. The code is valid and I'd argue there's no mistake if we leave this line here. But I'm comfortable with removing it in order to make the current implementation less error-prone. > > Please remove it then as it can only serve to confuse maintainers. The code should visibly display consistent assumptions wherever possible. Fixed by https://github.com/openjdk/jdk/pull/18487/commits/e2d1f711a093826c618242b291dc90195c027872. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1775515452 From adinn at openjdk.org Wed Sep 25 16:19:42 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 25 Sep 2024 16:19:42 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v16] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Wed, 25 Sep 2024 16:11:20 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request incrementally with three additional commits since the last revision: > > - cleanup: explain how this deals with the correct number of leftover elements modulo vf > - cleanup: current implementation should never end up with T4H load arrangement > - cleanup: remove obsolete method declarations Thanks for the final cleanups. All looking very nice! ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/18487#pullrequestreview-2328774578 From shade at openjdk.org Wed Sep 25 16:43:35 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 25 Sep 2024 16:43:35 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. Current patch makes a seqcst write, which is stronger than strictly necessary. I think it is okay to be extra paranoid on rarely-executed class initialization path. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release I am running performance tests with it, and expect no difference given JITed code normally knows that classes are initialized at JIT compilation time. The impact on interpreter paths is likely not be visible as well. If you can run your set of benchmarks, please do as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2374598301 From liach at openjdk.org Wed Sep 25 17:03:18 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 25 Sep 2024 17:03:18 GMT Subject: RFR: 8242888: Convert dynamic proxy to hidden classes [v2] In-Reply-To: References: Message-ID: > Please review this change that adds a new dynamic proxies implementation as hidden classes. > > Summary: > 1. Adds new implementation which can be `-Djdk.reflect.useHiddenProxy=true` for early adoption. > 2. ClassLoader.defineClass0 takes a ClassLoader instance but discards it in native code; I updated native code to reuse that ClassLoader for Proxy support. > 3. ProxyGenerator changes mainly involve using Class data to pass Method list (accessed in a single condy) and removal of obsolete setup code generation. > > Comment: Since #8278, Proxy has been converted to ClassFile API, and infrastructure has changed; now, the migration to hidden classes is much cleaner and has less impact, such as preserving ProtectionDomain and dynamic module without "anchor classes", and avoiding java.lang.invoke package. Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - Flip flags, hidden is enabled only by choice - Merge branch 'master' of https://github.com/openjdk/jdk into feature/hidden-proxy - Merge branch 'master' of https://github.com/openjdk/jdk into feature/hidden-proxy # Conflicts: # src/java.base/share/classes/java/lang/reflect/ProxyGenerator.java - Missing changes to commit - Condense legacy and modern impl - Clean up - Merge branch 'master' of https://github.com/openjdk/jdk into feature/hidden-proxy - Cleanup... - Merge branch 'master' of https://github.com/openjdk/jdk into feature/hidden-proxy # Conflicts: # src/java.base/share/classes/java/lang/reflect/ProxyGenerator.java - Fixes - ... and 2 more: https://git.openjdk.org/jdk/compare/fb703258...16906bfb ------------- Changes: https://git.openjdk.org/jdk/pull/19356/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19356&range=01 Stats: 82 lines in 6 files changed: 53 ins; 1 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/19356.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19356/head:pull/19356 PR: https://git.openjdk.org/jdk/pull/19356 From liach at openjdk.org Wed Sep 25 17:03:18 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 25 Sep 2024 17:03:18 GMT Subject: RFR: 8242888: Convert dynamic proxy to hidden classes [v2] In-Reply-To: References: Message-ID: On Mon, 27 May 2024 00:04:07 GMT, ExE Boss wrote: >> Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: >> >> - Flip flags, hidden is enabled only by choice >> - Merge branch 'master' of https://github.com/openjdk/jdk into feature/hidden-proxy >> - Merge branch 'master' of https://github.com/openjdk/jdk into feature/hidden-proxy >> >> # Conflicts: >> # src/java.base/share/classes/java/lang/reflect/ProxyGenerator.java >> - Missing changes to commit >> - Condense legacy and modern impl >> - Clean up >> - Merge branch 'master' of https://github.com/openjdk/jdk into feature/hidden-proxy >> - Cleanup... >> - Merge branch 'master' of https://github.com/openjdk/jdk into feature/hidden-proxy >> >> # Conflicts: >> # src/java.base/share/classes/java/lang/reflect/ProxyGenerator.java >> - Fixes >> - ... and 2 more: https://git.openjdk.org/jdk/compare/fb703258...16906bfb > > src/java.base/share/classes/jdk/internal/reflect/ReflectionFactory.java line 624: > >> 622: "true".equals(props.getProperty("jdk.disableSerialConstructorChecks")); >> 623: >> 624: useLegacyProxyImpl &= !useOldSerializableConstructor; > > Suggestion: > > useLegacyProxyImpl |= useOldSerializableConstructor; Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19356#discussion_r1631392934 From liach at openjdk.org Wed Sep 25 17:03:18 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 25 Sep 2024 17:03:18 GMT Subject: RFR: 8242888: Convert dynamic proxy to hidden classes In-Reply-To: References: Message-ID: On Thu, 23 May 2024 03:28:30 GMT, Chen Liang wrote: > Please review this change that adds a new dynamic proxies implementation as hidden classes. > > Summary: > 1. Adds new implementation which can be `-Djdk.reflect.useHiddenProxy=true` for early adoption. > 2. ClassLoader.defineClass0 takes a ClassLoader instance but discards it in native code; I updated native code to reuse that ClassLoader for Proxy support. > 3. ProxyGenerator changes mainly involve using Class data to pass Method list (accessed in a single condy) and removal of obsolete setup code generation. > > Comment: Since #8278, Proxy has been converted to ClassFile API, and infrastructure has changed; now, the migration to hidden classes is much cleaner and has less impact, such as preserving ProtectionDomain and dynamic module without "anchor classes", and avoiding java.lang.invoke package. Will have to rework since this is in conflict with #19410. Converting into draft for now. This patch has been reworked. Now the new implementation is opt-in, which should allow for early adoption to ease the transition. Please review the associated CSR and release note as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19356#issuecomment-2133466043 PR Comment: https://git.openjdk.org/jdk/pull/19356#issuecomment-2374651015 From coleenp at openjdk.org Wed Sep 25 17:25:38 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 25 Sep 2024 17:25:38 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 16:41:02 GMT, Aleksey Shipilev wrote: > If you can run your set of benchmarks, please do as well. Ok. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2374715067 From liach at openjdk.org Wed Sep 25 18:11:36 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 25 Sep 2024 18:11:36 GMT Subject: RFR: 8336468: Reflection and MethodHandles should use more precise initializer checks [v5] In-Reply-To: <1ka2z7mmLWUvYJRKiYghlw3p41sr16iHEpOy8k8Pifo=.6aae963b-e10f-49b1-ac48-34e54055fdbf@github.com> References: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com> <1ka2z7mmLWUvYJRKiYghlw3p41sr16iHEpOy8k8Pifo=.6aae963b-e10f-49b1-ac48-34e54055fdbf@github.com> Message-ID: On Wed, 25 Sep 2024 15:53:19 GMT, Aleksey Shipilev wrote: >> This PR should cover the Reflection/MethodHandles part of [JDK-8336103](https://bugs.openjdk.org/browse/JDK-8336103). >> >> There are places where we change the behavior: `clinit` would now be recorded as "method", instead of "constructor". Tracing back the uses of `get_flags`: it is used for initializing `java.lang.ClassFrameInfo.flags`. There seem to be no readers for this field in VM. Java side for `j.l.CFI` does not seem to check any method/constructor flags. So I would say this change in behavior is not really visible, and there is no need to try and keep the old (odd) behavior. >> >> I also inlined the `select_method` definition, which allows for a bit more straight-forward local code, and obviates the need for wrapping things with `methodHandle`. >> >> @mlchung, you probably want to look at this more closely. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Replace assert with a comment Marked as reviewed by liach (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20192#pullrequestreview-2329141741 From coleenp at openjdk.org Wed Sep 25 18:16:45 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 25 Sep 2024 18:16:45 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v4] In-Reply-To: <9q4etSUBTu0pkFlzmGS3YjeJ8VnDaR5jQOg1TC8KtlU=.a573b142-97a5-472a-814e-de301efde9da@github.com> References: <9q4etSUBTu0pkFlzmGS3YjeJ8VnDaR5jQOg1TC8KtlU=.a573b142-97a5-472a-814e-de301efde9da@github.com> Message-ID: On Wed, 25 Sep 2024 13:54:07 GMT, Fredrik Bredberg wrote: >> Removed the concept of an ObjectMonitor Responsible thread. >> >> The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. >> >> The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). >> >> After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. >> >> Passes tier1-tier7 on supported platforms. >> x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. >> Arm32 and Zero doesn't need any changes as far as I can tell. > > Fredrik Bredberg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - Update three, after the review > - Merge branch 'master' into 8320318_objectmon_responsible_thread > - Update two, after the review > - Update one, after the review > - Small fixes before the review > - Merge branch 'master' into 8320318_objectmon_responsible_thread > - Merge branch 'master' into 8320318_objectmon_responsible_thread > - Removed _Responsible > - Fixed s390 > - Fixed legacy locking > - ... and 4 more: https://git.openjdk.org/jdk/compare/0f253d11...8140570f This is really good. I have an issue with a new comment which might conflict with another reviewer. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 474: > 472: > 473: // Set owner to null. > 474: // Release to satisfy the JMM This comment doesn't make sense to me. The JMM is the Java Memory Model, and this just releases the lock. The Java memory model implies memory ordering and optimizations. The next comment about fence is more meaningful. Did someone want this comment? src/hotspot/share/runtime/objectMonitor.inline.hpp line 231: > 229: // lifetime" of the contention mark. > 230: assert(!_extended, "extending twice is probably a bad design"); > 231: _monitor->add_to_contentions(1); This looks really good. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19454#pullrequestreview-2329140950 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1775738476 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1775741125 From fbredberg at openjdk.org Wed Sep 25 19:39:45 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 25 Sep 2024 19:39:45 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v4] In-Reply-To: References: <9q4etSUBTu0pkFlzmGS3YjeJ8VnDaR5jQOg1TC8KtlU=.a573b142-97a5-472a-814e-de301efde9da@github.com> Message-ID: On Wed, 25 Sep 2024 18:08:46 GMT, Coleen Phillimore wrote: >> Fredrik Bredberg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: >> >> - Update three, after the review >> - Merge branch 'master' into 8320318_objectmon_responsible_thread >> - Update two, after the review >> - Update one, after the review >> - Small fixes before the review >> - Merge branch 'master' into 8320318_objectmon_responsible_thread >> - Merge branch 'master' into 8320318_objectmon_responsible_thread >> - Removed _Responsible >> - Fixed s390 >> - Fixed legacy locking >> - ... and 4 more: https://git.openjdk.org/jdk/compare/0f253d11...8140570f > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 474: > >> 472: >> 473: // Set owner to null. >> 474: // Release to satisfy the JMM > > This comment doesn't make sense to me. The JMM is the Java Memory Model, and this just releases the lock. The Java memory model implies memory ordering and optimizations. The next comment about fence is more meaningful. Did someone want this comment? Someone did want this comment in each of the cpu-specific files. I do like to use the same comment in all equal places, so I did what someone wanted. But having said that, CPUs are different. In some you need an explicit release-instruction (like a `membar` of some sort), and in others you don't. I do agree that in a cpu-specific file where it's not needed with an explicit release-instruction, this comment makes no (or less sense). What to do? > src/hotspot/share/runtime/objectMonitor.inline.hpp line 231: > >> 229: // lifetime" of the contention mark. >> 230: assert(!_extended, "extending twice is probably a bad design"); >> 231: _monitor->add_to_contentions(1); > > This looks really good. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1775898830 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1775899089 From coleenp at openjdk.org Wed Sep 25 19:45:44 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 25 Sep 2024 19:45:44 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v4] In-Reply-To: References: <9q4etSUBTu0pkFlzmGS3YjeJ8VnDaR5jQOg1TC8KtlU=.a573b142-97a5-472a-814e-de301efde9da@github.com> Message-ID: On Wed, 25 Sep 2024 19:36:38 GMT, Fredrik Bredberg wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 474: >> >>> 472: >>> 473: // Set owner to null. >>> 474: // Release to satisfy the JMM >> >> This comment doesn't make sense to me. The JMM is the Java Memory Model, and this just releases the lock. The Java memory model implies memory ordering and optimizations. The next comment about fence is more meaningful. Did someone want this comment? > > Someone did want this comment in each of the cpu-specific files. I do like to use the same comment in all equal places, so I did what someone wanted. But having said that, CPUs are different. In some you need an explicit release-instruction (like a `membar` of some sort), and in others you don't. I do agree that in a cpu-specific file where it's not needed with an explicit release-instruction, this comment makes no (or less sense). What to do? Whoever asked for it, let's see if this is what they wanted. I thought the fence() comment below it is what was requested. I agree the comment should be repeated on all platforms though for consistency. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1775905867 From coleenp at openjdk.org Wed Sep 25 19:53:40 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 25 Sep 2024 19:53:40 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v4] In-Reply-To: References: <9q4etSUBTu0pkFlzmGS3YjeJ8VnDaR5jQOg1TC8KtlU=.a573b142-97a5-472a-814e-de301efde9da@github.com> Message-ID: On Wed, 25 Sep 2024 19:43:09 GMT, Coleen Phillimore wrote: >> Someone did want this comment in each of the cpu-specific files. I do like to use the same comment in all equal places, so I did what someone wanted. But having said that, CPUs are different. In some you need an explicit release-instruction (like a `membar` of some sort), and in others you don't. I do agree that in a cpu-specific file where it's not needed with an explicit release-instruction, this comment makes no (or less sense). What to do? > > Whoever asked for it, let's see if this is what they wanted. I thought the fence() comment below it is what was requested. I agree the comment should be repeated on all platforms though for consistency. I see where it came from in the aarch64 code, and that code does a stlr() to satisfy the JMM. It's fine. Leave it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1775913934 From mdoerr at openjdk.org Wed Sep 25 20:02:44 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 25 Sep 2024 20:02:44 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: <1MC83jRy9o6GrZouJaYjgHyIoyfNvrakHuirZMxIdhk=.769c2ce1-795f-4981-a10b-cee04cad5a0a@github.com> Message-ID: On Fri, 20 Sep 2024 08:25:10 GMT, Martin Doerr wrote: >> @TheRealMDoerr >>> I've run it through our nightly testing (x86_64, aarch64, PPC64 with several OSes) and the good news is that I haven't seen any functional problems. Performance looks also good for the SPEC benchmarks. I don't think they stress Java monitors very strongly. >> >> That really is good news! Thanks for testing! >> >>> I've rerun the `LockUnlock` micro benchmark with this patch applied, but `LockUnlock.java` reverted to the original version. This makes `LockUnlock.testContendedLock` faster, but not as fast as without this patch (on the 96 Thread Xeon linux server, similar on Power10). Would be great if anybody could confirm. I think this should at least be documented and the description of the JBS issue improved. >> >> Tanks for confirming that my suspension was right. As I stated earlier, due to the added StoreLoad barrier a slight decrease in performance is probably to be expected if you just run `LockUnlock.testContendedLock`, but it shouldn't really matter when running real life applications. Anyhow I'll update the description of the JBS issue. > > @fbredber: If you need help to resolve the PPC64 conflicts with https://github.com/openjdk/jdk/commit/7579d3740217e4a819cbf63837ec929f00464585, just let me know. > @TheRealMDoerr @offamitkumar I resolved merge conflicts in `src/hotspot/cpu/ppc/macroAssembler_ppc.cpp` and `src/hotspot/cpu/s390/macroAssembler_s390.cpp`. I've smoke tested it with QEMU, but it would be nice if you could check if it's ok as well. Thanks for rebasing! The PPC64 implementation still looks good and some quick tests have passed on real hardware. I'll run more tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19454#issuecomment-2375140146 From mli at openjdk.org Wed Sep 25 20:13:36 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 25 Sep 2024 20:13:36 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v6] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 14:04:35 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/riscv.ad line 10079: >> >>> 10077: match(CallLeafVector); >>> 10078: >>> 10079: effect(USE meth, KILL cr); >> >> I haven't checked the details of `CallLeafVector` yet. One more question here. Is it safe to assume that `FRM` will be saved and restored before and after the runtime call? Check this: https://bugs.openjdk.org/browse/JDK-8330094 > > Good question! > Let me do some further investigation and get back later. I see sleef code only set frm to RNE, but I'm not quite sure. Even if we can make sure current sleef only set frm to RNE, seems to me we can not depends on current implement, it could change. Although good news is we don't update sleef regularly. Maybe we should take similar action as call-to-java and return-from-jni? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1775935783 From dlong at openjdk.org Wed Sep 25 20:32:40 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 25 Sep 2024 20:32:40 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v3] In-Reply-To: References: <0Dr860QgmZaGHq1QGgz5bqKLpiwVSZL-lDOV1JNjkdk=.1c09c464-e9cd-4f66-88c1-2b97e3a9f7ce@github.com> Message-ID: On Tue, 24 Sep 2024 09:27:13 GMT, Oli Gillespie wrote: > > If JVM_StartThread is only called by Thread.start0, then how about putting the new lock in Java instead? > > What benefit do you see of that? One downside is that the lock will be coarser than necessary. I'd rather keep the lock as tightly scoped as possible. I just thought it would be simpler, but I see your point. A coarser lock will serialize more of the native path. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21111#issuecomment-2375195149 From kbarrett at openjdk.org Wed Sep 25 20:58:37 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 25 Sep 2024 20:58:37 GMT Subject: RFR: 8340620: Fix -Wzero-as-null-pointer-constant warnings for CompressedOops In-Reply-To: References: Message-ID: <1vJoJIF6kfUqad9sLmJBU4KD6Dl79XMmB72jKIshpCQ=.a56e2832-8e65-4790-9723-2e9e1b19e15e@github.com> On Wed, 25 Sep 2024 11:23:05 GMT, Quan Anh Mai wrote: >> +1. Although I would expect any sane compiler to fold it, maybe it is still not optimized with something like `-O0`. Or maybe just move these asserts to `CompressedOops::initialize`, so whatever happens, happens once. > > This cannot be used in `static_assert` because `reinterpret_cast` is not allowed here. > > I believe [`reinterpret_cast(nullptr)` will always return 0](https://en.cppreference.com/w/cpp/language/reinterpret_cast). You may need to do it the other way around. Correct that static_assert can't be used, because of the reinterpret_cast. I considered putting the assert in CompressedOops::set_base, with comments to connect encode/decode with that assertion, but prefer putting the assert near the code that is actually relying on the property. Even with -O0 gcc seems to constant fold the expression and (pretty much?) compile away the assertion. A null pointer value is not guaranteed to have a zero representation. The conversion of a literal 0 to a null pointer value is syntactic sugar. The bit pattern of the result might be something else. There's some good discussion of this here: https://stackoverflow.com/questions/2761360/could-i-ever-want-to-access-the-address-zero ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21172#discussion_r1775982734 From duke at openjdk.org Wed Sep 25 21:41:05 2024 From: duke at openjdk.org (Todd V. Jonker) Date: Wed, 25 Sep 2024 21:41:05 GMT Subject: RFR: 8340974: Ambiguous name of jtreg property vm.libgraal.enabled Message-ID: This disambiguates the situation where libgraal is "enabled" for use by non-CompilerBroker compilations, without being used as the JIT compiler. Per discussion at https://github.com/openjdk/jdk/pull/21120#issuecomment-2372462365 Grep shows that the property and related method are not used in this codebase. Tier1 tests on linux-x86_64-server-release pass cleanly. ------------- Commit messages: - Rename jtreg property vm.libgraal.enabled to vm.libgraal.jit. Changes: https://git.openjdk.org/jdk/pull/21190/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21190&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340974 Stats: 4 lines in 2 files changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21190.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21190/head:pull/21190 PR: https://git.openjdk.org/jdk/pull/21190 From duke at openjdk.org Wed Sep 25 21:41:37 2024 From: duke at openjdk.org (Todd V. Jonker) Date: Wed, 25 Sep 2024 21:41:37 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent In-Reply-To: References: Message-ID: <3cVQSC_0_gThdjkUu-UsN5TC1BUtq5ehsKu0BXBlE8U=.1e51b5e5-1493-4f6f-a827-c573fd5916df@github.com> On Tue, 24 Sep 2024 22:09:42 GMT, Doug Simon wrote: >> Studying these recent changes led me back to #14851 which added jtreg propeties: >> >> - `jdk.hasLibgraal`: the libgraal shared library file is present >> - `vm.libgraal.enabled`: libgraal is used as JIT compiler >> >> The latter now feels misleading, since libgraal can be "enabled" for use by non-CompilerBroker compilations, without being used as the JIT compiler. (I'm here b/c we're assembling a distro doing exactly that.) >> >> Would it make sense to rename the latter, to reduce ambiguity in the tests? > >> Would it make sense to rename the latter, to reduce ambiguity in the tests? > > Sounds reasonable to me. Maybe `vm.libgraal.jit`? The good news is that there are no current tests using this predicate as far as I can see. > > Want to take the lead on this? @dougxc I cut an issue https://bugs.openjdk.org/projects/JDK/issues/JDK-8340974 and posted a PR https://github.com/openjdk/jdk/pull/21190 This is my first JDK issue and fix; apologies if I'm getting the process wrong. I wasn't sure if I should tag you on either (or how). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21120#issuecomment-2375311109 From ccheung at openjdk.org Thu Sep 26 00:49:20 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Thu, 26 Sep 2024 00:49:20 GMT Subject: RFR: 8328313: Archived module graph should allow identical --module-path to be specified during dump time and run time [v5] In-Reply-To: References: Message-ID: > Prior to this patch, if `--module-path` is specified in the command line: > during CDS dump time, full module graph will not be included in the CDS archive; > during run time, full module graph will not be used. > > With this patch, the full module graph will be included in the CDS archive with the `--module-path` option. During run time, if the same `--module-path` option is specified, the archived module graph will be used. > > The checking of module paths between dump time and run time is more lenient compared with the checking of class paths; the ordering of the modules is unimportant, duplicate module names are ignored. > E.g. the following is considered a match: > dump time runtime > m1,m2 m2,m1 > m1,m2 m1,m2,m2 > > I included some [notes](https://bugs.openjdk.org/browse/JDK-8328313?focusedId=14699275&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14699275) in the bug report regarding some changes in the corelib classes. Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: @iklam comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21048/files - new: https://git.openjdk.org/jdk/pull/21048/files/661615cb..d96d78f8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21048&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21048&range=03-04 Stats: 25 lines in 5 files changed: 10 ins; 4 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/21048.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21048/head:pull/21048 PR: https://git.openjdk.org/jdk/pull/21048 From ccheung at openjdk.org Thu Sep 26 00:49:21 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Thu, 26 Sep 2024 00:49:21 GMT Subject: RFR: 8328313: Archived module graph should allow identical --module-path to be specified during dump time and run time [v4] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 00:11:32 GMT, Ioi Lam wrote: >> Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: >> >> fix indentation > > src/hotspot/share/cds/filemap.cpp line 956: > >> 954: } >> 955: // module paths are stored in sorted order in the CDS archive. >> 956: module_paths->sort(ClassLoaderExt::compare_module_path_by_name); > > I think it's better to put this call inside `ClassLoaderExt::extract_jar_files_from_path` My thinking was since the function may return false at line 953, the entries in the `module_paths` doesn't need to be sorted until before calling `check_paths()`. Anyway, I've made the change you suggested. > src/hotspot/share/cds/heapShared.cpp line 879: > >> 877: >> 878: ResourceMark rm(THREAD); >> 879: if ((strcmp(k->name()->as_C_string(), "jdk/internal/module/ArchivedModuleGraph") == 0) && > > You can avoid the ResourceMark by > > > if (k->name()->equals("jdk/internal/module/ArchivedModuleGraph") Done. > src/hotspot/share/cds/heapShared.cpp line 885: > >> 883: log_info(cds, heap)("Skip initializing ArchivedModuleGraph subgraph: is_using_optimized_module_handling=%s num_module_paths=%d", >> 884: BOOL_TO_STR(CDSConfig::is_using_optimized_module_handling()), ClassLoaderExt::num_module_paths()); >> 885: return; > > I think we can add a comment like: > > > ArchivedModuleGraph was created with a --module-path that's different than the runtime --module-path. > Thus, it might contain references to modules that do not exist in runtime. We cannot use it. Added the comment. > src/hotspot/share/classfile/classLoader.cpp line 582: > >> 580: false /*is_boot_append */, false /* from_class_path_attr */); >> 581: if (new_entry != nullptr) { >> 582: assert(new_entry->is_jar_file(), "module path entry %s is not a jar file", new_entry->name()); > > How do we guarantee that new_entry is never a JAR file? Do we never come here if --module-path points to an exploded directory? A comment would be helpful. I've added the following comment: `// ClassLoaderExt::process_module_table() filters out non-jar entries before calling this function.` > src/hotspot/share/classfile/classLoaderExt.cpp line 152: > >> 150: DIR* dirp = os::opendir(path); >> 151: if (dirp == nullptr && errno == ENOTDIR && has_jar_suffix(path)) { >> 152: module_paths->append(path); > > Does this handle the case where `path` doesn't exist? If the `path` doesn't exist, `dirp` will be nullptr and it will go to the else case. I think `os::readir` on `nullptr` should return a `nullptr`. To make the code clearer, I've added a `nullptr` check on `dirp` in the else case. > src/hotspot/share/classfile/classLoaderExt.cpp line 162: > >> 160: int n = os::snprintf(full_name, full_name_len, "%s%s%s", path, os::file_separator(), file_name); >> 161: assert((size_t)n == full_name_len - 1, "Unexpected number of characters in string"); >> 162: module_paths->append(full_name); > > Can this case be handled: --module-path=dir > > - Dump time : dir contains only mod1.jar > - Run time : dir contains only mod1.jar and mod2.jmod It should work because the jmod file won't be added to the `module_paths`. > src/hotspot/share/runtime/arguments.cpp line 347: > >> 345: } >> 346: } >> 347: return false; > > Can this be simplified to `return (strcmp(key, MODULE_PROPERTY_PREFIX PATH) == 0)`? I'm not sure. Is your suggest equivalent to: `return (strcmp(key, "jdk.module.path"));` > src/java.base/share/classes/jdk/internal/loader/BuiltinClassLoader.java line 1092: > >> 1090: void resetArchivedStatesForAppClassLoader() { >> 1091: setClassPath(null); >> 1092: if (!moduleToReader.isEmpty()) moduleToReader.clear(); > > Suggestion: > > if (!moduleToReader.isEmpty()) { > moduleToReader.clear(); > } > > > Also, do we need to do the same thing for the platform loader as well? Added braces. The `setClassPath(null)` used to be in `ClassLoaders.AppClassLoader`. Based on investigations so far, the clearing of the `moduleToReader` map is required only for `AppClassLoader`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1776147825 PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1776147896 PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1776147991 PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1776148021 PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1776148168 PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1776148232 PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1776148308 PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1776148461 From sspitsyn at openjdk.org Thu Sep 26 01:19:44 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 26 Sep 2024 01:19:44 GMT Subject: RFR: 8340826: Should not send unload notification for scratch classes In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 16:29:36 GMT, Leonid Mesnik wrote: > The jvmti class redefinition creates temporary scratch classes for it's own purposes. These classes are added to corresponding classloaders and might be unloaded. > In this case the jvmti/jfr and log events are generated twice: for original class and for it's scratch. > > The bug could be reproduced by jfr test > jdk/jfr/api/metadata/eventtype/TestUnloadingEventClass.java > with '-Xcomp -XX:TieredStopAtLevel=1' or with '-Xcomp' > > The test log (modified slightly) shown > > > [167.294s][info ][class,unload] unloading class jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded 0x00000000af1006d8 allocated > [167.294s][info ][class,unload] unloading class jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded 0x00000000af100248 fully_initialized > [167.345s][trace][class,unload] unlinking class (subclass): jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded > [167.872s][trace][gc ] GC(0) Restored 3597 marks, occupying 57552 B > [167.924s][info ][gc ] GC(0) Pause Full (System.gc()) 34M->2M(136M) 691.041ms > Unloaded count: 2 > > > instead of expected > > > > [159.737s][info ][class,unload] unloading class jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded 0x0000000041100248 state: fully_initialized > [159.800s][trace][class,unload] unlinking class (subclass): jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded > [160.341s][trace][gc ] GC(0) Restored 3597 marks, occupying 57552 B > [160.384s][info ][gc ] GC(0) Pause Full (System.gc()) 34M->2M(136M) 710.422ms > > > > The test hang because got 2 events while waiting for one. > The "allocated" version is the scratch class generated by JVMTI JFR agent that redefine classes. > > The fix is to don't send notification for scratch classes. The scratch classes shouldn't have dependency so added assertion. Also, we don't expect any other not loaded classes during unloaded. > > Thanks Coleen for details about scratch classed. > > Tested with tier1-5 and with :jdk_jfr with Xcomp and c1. Good catch. The fix looks good. ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21166#pullrequestreview-2329807042 From fyang at openjdk.org Thu Sep 26 02:18:44 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 26 Sep 2024 02:18:44 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v6] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 20:10:41 GMT, Hamlin Li wrote: >> Good question! >> Let me do some further investigation and get back later. > > I see sleef code only set frm to RNE, but I'm not quite sure. > Even if we can make sure current sleef only set frm to RNE, seems to me we can not depends on current implement, it could change. Although good news is we don't update sleef regularly. > Maybe we should take similar action as call-to-java and return-from-jni? Just a bit worried about the fact that manipunating CSR could be very costly on RISC-V. Another choice would be adding an assertion about FP rounding mode expecting RNE when returning back from the SLEEF routine. I also checked floating-point intrinsics with `_rm` suffix in the function name in SLEEF src code and only witnessed use of `__RISCV_FRM_RNE`. I didn't see uses of other rounding modes as specified by the rvv-intrinsic-spec [1]. [1] https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/main/doc/rvv-intrinsic-spec.adoc ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1776213231 From yzhu at openjdk.org Thu Sep 26 05:41:37 2024 From: yzhu at openjdk.org (Yanhong Zhu) Date: Thu, 26 Sep 2024 05:41:37 GMT Subject: RFR: 8334999: RISC-V: implement AES single block encryption/decryption intrinsics [v7] In-Reply-To: <9rglid_tIn1JA4zqOFygiz1hWYZGOPa8Ci1AI1qRHDA=.c70dd0ec-46b4-43c5-841e-7d91edf65eb4@github.com> References: <9rglid_tIn1JA4zqOFygiz1hWYZGOPa8Ci1AI1qRHDA=.c70dd0ec-46b4-43c5-841e-7d91edf65eb4@github.com> Message-ID: <9S-t7RTEcPhwD0WZLi_OYy9PK8UrKro19v2IbyHpKaI=.5f9663c8-5f33-429c-8077-47bae27608ed@github.com> On Sun, 8 Sep 2024 13:24:49 GMT, Arseny Bochkarev wrote: >> Hello everyone! Please review this port of vector AES single block encryption/decryption intrinsics. On my QEMU with `Zvkned` extension enabled the `test/hotspot/jtreg/compiler/codegen/aes/TestAESMain.java` test is OK. I know that currently hardware implementing this extension is not available on the market but I suppose this PR can be a good starting point on supporting AES intrinsics for RISC-V in OpenJDK. > > Arseny Bochkarev has updated the pull request incrementally with one additional commit since the last revision: > > Multiversion decrypt intrinsic src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 2370: > 2368: assert(reg_number <= 14, "reg_number should be less than or equal to working_vregs size"); > 2369: > 2370: for (int i = 0; i < reg_number; i++) { Hello, I have a question about the order of register handling in loops. Why is it in ascending order instead of descending? Here?s an example: https://github.com/riscv/riscv-crypto/blob/main/doc/vector/code-samples/zvkned.s. And I look forward to your reply. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19960#discussion_r1776411935 From amitkumar at openjdk.org Thu Sep 26 07:22:41 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 26 Sep 2024 07:22:41 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: <1MC83jRy9o6GrZouJaYjgHyIoyfNvrakHuirZMxIdhk=.769c2ce1-795f-4981-a10b-cee04cad5a0a@github.com> Message-ID: On Wed, 25 Sep 2024 20:00:01 GMT, Martin Doerr wrote: >@TheRealMDoerr @offamitkumar I resolved merge conflicts in src/hotspot/cpu/ppc/macroAssembler_ppc.cpp and src/hotspot/cpu/s390/macroAssembler_s390.cpp. I've smoke tested it with QEMU, but it would be nice if you could check if it's ok as well. s390x Changes looks good and I ran test and didn't see any regression. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19454#issuecomment-2376127325 From dnsimon at openjdk.org Thu Sep 26 07:26:35 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 26 Sep 2024 07:26:35 GMT Subject: RFR: 8340974: Ambiguous name of jtreg property vm.libgraal.enabled In-Reply-To: References: Message-ID: <7jMxI7fnbc9aen7sk5qrH4MHt7Pf7eu0lSQaDeav8To=.465b1523-eb84-4e31-9346-38c3165583e8@github.com> On Wed, 25 Sep 2024 19:49:28 GMT, Todd V. Jonker wrote: > This disambiguates the situation where libgraal is "enabled" for use by non-CompilerBroker compilations, without being used as the JIT compiler. > > Per discussion at https://github.com/openjdk/jdk/pull/21120#issuecomment-2372462365 > > Grep shows that the property and related method are not used in this codebase. Tier1 tests on linux-x86_64-server-release pass cleanly. test/jtreg-ext/requires/VMProps.java line 562: > 560: * @return true if libgraal is used as JIT compiler. > 561: */ > 562: protected String isLibgraalJit() { I slightly prefer `isLibgraalJIT` as this acronym is (most) capitalized in the code base. You should also rename `isLibgraalEnabled` to `isLibgraalJIT` in `test/lib/jdk/test/whitebox/code/Compiler.java` for consistency. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21190#discussion_r1776521228 From mli at openjdk.org Thu Sep 26 07:57:37 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 26 Sep 2024 07:57:37 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v6] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 02:15:34 GMT, Fei Yang wrote: >> I see sleef code only set frm to RNE, but I'm not quite sure. >> Even if we can make sure current sleef only set frm to RNE, seems to me we can not depends on current implement, it could change. Although good news is we don't update sleef regularly. >> Maybe we should take similar action as call-to-java and return-from-jni? > > Just a bit worried about the fact that manipunating CSR could be very costly on RISC-V. Another choice would be adding an assertion about FP rounding mode expecting RNE when returning back from the SLEEF routine. I also checked floating-point intrinsics with `_rm` suffix in the function name in SLEEF src code and only witnessed use of `__RISCV_FRM_RNE`. I didn't see uses of other rounding modes as specified by the rvv-intrinsic-spec [1]. > > [1] https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/main/doc/rvv-intrinsic-spec.adoc Sounds like a reasonable solution! Anyone has other thoughts please kindly let me know. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1776563833 From shade at openjdk.org Thu Sep 26 08:25:34 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 26 Sep 2024 08:25:34 GMT Subject: RFR: 8340620: Fix -Wzero-as-null-pointer-constant warnings for CompressedOops In-Reply-To: <1vJoJIF6kfUqad9sLmJBU4KD6Dl79XMmB72jKIshpCQ=.a56e2832-8e65-4790-9723-2e9e1b19e15e@github.com> References: <1vJoJIF6kfUqad9sLmJBU4KD6Dl79XMmB72jKIshpCQ=.a56e2832-8e65-4790-9723-2e9e1b19e15e@github.com> Message-ID: On Wed, 25 Sep 2024 20:55:45 GMT, Kim Barrett wrote: >> This cannot be used in `static_assert` because `reinterpret_cast` is not allowed here. >> >> I believe [`reinterpret_cast(nullptr)` will always return 0](https://en.cppreference.com/w/cpp/language/reinterpret_cast). You may need to do it the other way around. > > Correct that static_assert can't be used, because of the reinterpret_cast. > > I considered putting the assert in CompressedOops::set_base, with comments to > connect encode/decode with that assertion, but prefer putting the assert near > the code that is actually relying on the property. Even with -O0 gcc seems to > constant fold the expression and (pretty much?) compile away the assertion. > > A null pointer value is not guaranteed to have a zero representation. The > conversion of a literal 0 to a null pointer value is syntactic sugar. The > bit pattern of the result might be something else. There's some good > discussion of this here: > https://stackoverflow.com/questions/2761360/could-i-ever-want-to-access-the-address-zero All right, if this assert folds, this all is just nitpicking then. Ship it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21172#discussion_r1776607757 From aph at openjdk.org Thu Sep 26 08:30:41 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 26 Sep 2024 08:30:41 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v16] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Wed, 25 Sep 2024 16:11:20 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request incrementally with three additional commits since the last revision: > > - cleanup: explain how this deals with the correct number of leftover elements modulo vf > - cleanup: current implementation should never end up with T4H load arrangement > - cleanup: remove obsolete method declarations src/hotspot/share/utilities/intpow.hpp line 31: > 29: #include > 30: #include > 31: Suggestion: // Raise v to the power p mod N, where N is the width of the type T. test/hotspot/gtest/aarch64/aarch64-asmtest.py line 1325: > 1323: def aname(self): > 1324: return self._name > 1325: This is nice, thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1776615702 PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1776618650 From amitkumar at openjdk.org Thu Sep 26 08:44:02 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 26 Sep 2024 08:44:02 GMT Subject: RFR: 8339220: [s390x] TestIfMinMax.java failure Message-ID: <5DtK26dlgE6Aacoldf6_VEDVfSzlym7EC6FW1O-iwiE=.218dba46-f28d-4765-9dd3-0d9043838b61@github.com> This test enables Conditional moves for long operands for s390x. Which fixes the test-case. Ran tier1 and not saw any regression. ------------- Commit messages: - enable conditional moves Changes: https://git.openjdk.org/jdk/pull/21198/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21198&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339220 Stats: 21 lines in 3 files changed: 13 ins; 2 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/21198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21198/head:pull/21198 PR: https://git.openjdk.org/jdk/pull/21198 From aph at openjdk.org Thu Sep 26 09:01:36 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 26 Sep 2024 09:01:36 GMT Subject: RFR: 8339220: [s390x] TestIfMinMax.java failure In-Reply-To: <5DtK26dlgE6Aacoldf6_VEDVfSzlym7EC6FW1O-iwiE=.218dba46-f28d-4765-9dd3-0d9043838b61@github.com> References: <5DtK26dlgE6Aacoldf6_VEDVfSzlym7EC6FW1O-iwiE=.218dba46-f28d-4765-9dd3-0d9043838b61@github.com> Message-ID: On Thu, 26 Sep 2024 06:08:40 GMT, Amit Kumar wrote: > This test enables Conditional moves for long operands for s390x. Which fixes the test-case. > > Ran tier1 and not saw any regression. Looks good. I wonder why it wasn't enabled already. src/hotspot/cpu/s390/matcher_s390.hpp line 71: > 69: return 0; > 70: } else { > 71: return ConditionalMoveLimit; Suggestion: // Use conditional move (CMOVL) static int long_cmove_cost() { // z196/z11 and later hardware supports conditional moves return VM_Version::get_model_index() >= 5 ? 0 : ConditionalMoveLimit: } ``` ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21198#pullrequestreview-2330581370 PR Review Comment: https://git.openjdk.org/jdk/pull/21198#discussion_r1776665999 From rcastanedalo at openjdk.org Thu Sep 26 09:07:56 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 26 Sep 2024 09:07:56 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 12:53:17 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Allow LM_MONITOR on 32-bit platforms src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5692: > 5690: > 5691: void MacroAssembler::load_klass(Register dst, Register src, Register tmp) { > 5692: BLOCK_COMMENT("load_klass"); I am not sure that the complexity of `MacroAssembler::load_klass` and the two `MacroAssembler::cmp_klass` functions warrant adding block comments, but if you prefer to leave them in, could you use opening and closing comments, as in the other functions in this file (e.g. `MacroAssembler::_verify_oop`)? In that case, please update the comment in the two `MacroAssembler::cmp_klass` functions with a more descriptive name than `cmp_klass 1` and `cmp_klass 2`. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5726: > 5724: #ifdef _LP64 > 5725: if (UseCompactObjectHeaders) { > 5726: load_nklass_compact(tmp, obj); Suggestion: assert here that `tmp != noreg`, just like in `MacroAssembler::cmp_klass(Register src, Register dst, Register tmp1, Register tmp2)` below. Perhaps also assert that the input registers are different. src/hotspot/cpu/x86/macroAssembler_x86.hpp line 379: > 377: // Uses tmp1 and tmp2 as temporary registers. > 378: void cmp_klass(Register src, Register dst, Register tmp1, Register tmp2); > 379: The naming of these two functions could be made clearer and more consistent with their documentation. Please consider renaming the four-argument `cmp_klass` function to `cmp_klasses_from_objects` or similar. The notion of "source" and "destination" in the parameter names is unclear, I suggest to just call them `obj`, `obj1`, `obj2` etc. Please also make sure that the parameter names are consistent in the declaration and definition (e.g. `dst` vs `obj`). src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4008: > 4006: #ifdef COMPILER2 > 4007: if ((UseAVX == 2) && EnableX86ECoreOpts && !UseCompactObjectHeaders) { > 4008: generate_string_indexof(StubRoutines::_string_indexof_array); This stub routine should be re-enabled if `UseCompactObjectHeaders` is to become non-experimental and enabled by default in the future. Is there a RFE for this task? src/hotspot/share/opto/memnode.cpp line 1976: > 1974: // The field is Klass::_prototype_header. Return its (constant) value. > 1975: assert(this->Opcode() == Op_LoadX, "must load a proper type from _prototype_header"); > 1976: return TypeX::make(klass->prototype_header()); This code is dead, because by the time we call `load_array_final_field` from `LoadNode::Value` (its only caller) we know that if `UseCompactObjectHeaders`, then `tkls->offset() != in_bytes(Klass::prototype_header_offset()` (or else we would have returned from line 2161). Please remove it, or replace it with an assertion if you prefer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776676785 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776628929 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776644021 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776663594 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776621766 From duke at openjdk.org Thu Sep 26 09:10:55 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Thu, 26 Sep 2024 09:10:55 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v16] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Thu, 26 Sep 2024 08:28:17 GMT, Andrew Haley wrote: >> Mikhail Ablakatov has updated the pull request incrementally with three additional commits since the last revision: >> >> - cleanup: explain how this deals with the correct number of leftover elements modulo vf >> - cleanup: current implementation should never end up with T4H load arrangement >> - cleanup: remove obsolete method declarations > > test/hotspot/gtest/aarch64/aarch64-asmtest.py line 1325: > >> 1323: def aname(self): >> 1324: return self._name >> 1325: > > This is nice, thanks. Glad you find it correct, thanks for checking. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1776680128 From duke at openjdk.org Thu Sep 26 09:10:55 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Thu, 26 Sep 2024 09:10:55 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v17] In-Reply-To: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: > Hello, > > Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. > > The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. > > At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. > > # Performance > > ## Neoverse N1 > > > -------------------------------------------------------------------------------------------- > Version Baseline This patch > -------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > -------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op > ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op > ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op > ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op > ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op > ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op > ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op > ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op > ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op > ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op > ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op > ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op > ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op > ArraysHashCode.multibytes 10 avgt 15 5.4... Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: cleanup: add a description for intpow() Co-authored-by: Andrew Haley ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18487/files - new: https://git.openjdk.org/jdk/pull/18487/files/142fa5d0..6f2bec34 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=15-16 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/18487.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18487/head:pull/18487 PR: https://git.openjdk.org/jdk/pull/18487 From stefank at openjdk.org Thu Sep 26 09:24:35 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 26 Sep 2024 09:24:35 GMT Subject: RFR: 8340620: Fix -Wzero-as-null-pointer-constant warnings for CompressedOops In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 23:26:08 GMT, Kim Barrett wrote: > Please review this change that fixes -Wzero-as-null-pointer-constant warnings > in CompressedOops code. These all relate to CompressedOops::base(). > > I also added a couple of asserts to verify our assumptions about null pointer > constants being representationally zero. That isn't a Standard-conforming > assumption, but holds for all platforms we currently support. I considered, > and even explored, a couple of different options. > > (1) Continue to have CompressedOops::base() be a pointer, but avoid that > assumption, being more careful about how zero-valued pointers are treated. But > that adds significant complexity that we can't test, since we don't support > any platforms needing that extra work. > > (2) Change CompressedOops::base() to an integral adjustment. This is probably > the correct approach, but is much more intrusive and wide ranging in the > changes required. Maybe something for the future. > > Testing: mach5 tier1-5 > GHA testing, verifying builds on some platforms not supported by Oracle. > > There are some simple changes to s390 and ppc code that I haven't tested, > beyond verifying compilation. FWIW, I think these asserts adds extra noise to these functions and I don't think we will be much more happy about having to read them over and over again when we read this functions / debug code through these functions. I would have preferred if this was one of those things that we require from our platforms and place a check in globalDefinitions, or some other prominent place that checks HotSpot's assumptions of the compilers / platforms. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21172#issuecomment-2376415052 From amitkumar at openjdk.org Thu Sep 26 09:33:36 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 26 Sep 2024 09:33:36 GMT Subject: RFR: 8339220: [s390x] TestIfMinMax.java failure In-Reply-To: References: <5DtK26dlgE6Aacoldf6_VEDVfSzlym7EC6FW1O-iwiE=.218dba46-f28d-4765-9dd3-0d9043838b61@github.com> Message-ID: <2FojpGR4LWdUoeFRdRhGzj4YfuirgTJzliYB_eio8Zc=.bd65651f-9437-490a-99b2-e09948318c03@github.com> On Thu, 26 Sep 2024 08:58:54 GMT, Andrew Haley wrote: >I wonder why it wasn't enabled already. I tried to find out but, like always I ended up going to "The s390x Port" commit. Which committed this code in OpenJDK. As no history is there before that, so couldn't find out the reason. But I assume @RealLucy can tell us something useful :-) Screenshot 2024-09-26 at 2 56 17?PM ------------- PR Comment: https://git.openjdk.org/jdk/pull/21198#issuecomment-2376435288 From shade at openjdk.org Thu Sep 26 09:44:37 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 26 Sep 2024 09:44:37 GMT Subject: RFR: 8336468: Reflection and MethodHandles should use more precise initializer checks [v5] In-Reply-To: References: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com> <1ka2z7mmLWUvYJRKiYghlw3p41sr16iHEpOy8k8Pifo=.6aae963b-e10f-49b1-ac48-34e54055fdbf@github.com> Message-ID: On Wed, 25 Sep 2024 16:06:22 GMT, Chen Liang wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Replace assert with a comment > > Thanks. FYI @mlchung is away for a few months. Other method handle and reflection checks look good. I am running some tests for this patch. All right. I think this all means this PR may go in, right? @liach -- you have no new failures in tests? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20192#issuecomment-2376458294 From rcastanedalo at openjdk.org Thu Sep 26 09:54:57 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 26 Sep 2024 09:54:57 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: Message-ID: <4sBfv1qLQjGZnrCuHBPuWp1PNkIDFLBjxMo3z_RR0Mo=.38e699ce-30bc-42fe-86b6-988df6700c82@github.com> On Wed, 25 Sep 2024 12:53:17 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Allow LM_MONITOR on 32-bit platforms src/hotspot/cpu/x86/x86_64.ad line 4388: > 4386: effect(KILL cr); > 4387: ins_cost(125); // XXX > 4388: format %{ "movl $dst, $mem\t# compressed klass ptr" %} For consistency with the aarch64 back-end: Suggestion: format %{ "load_nklass_compact $dst, $mem\t# compressed klass ptr" %} ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776747538 From amitkumar at openjdk.org Thu Sep 26 10:06:08 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 26 Sep 2024 10:06:08 GMT Subject: RFR: 8339220: [s390x] TestIfMinMax.java failure [v2] In-Reply-To: <5DtK26dlgE6Aacoldf6_VEDVfSzlym7EC6FW1O-iwiE=.218dba46-f28d-4765-9dd3-0d9043838b61@github.com> References: <5DtK26dlgE6Aacoldf6_VEDVfSzlym7EC6FW1O-iwiE=.218dba46-f28d-4765-9dd3-0d9043838b61@github.com> Message-ID: > This test enables Conditional moves for long operands for s390x. Which fixes the test-case. > > Ran tier1 and not saw any regression. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: suggestions from Andrew ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21198/files - new: https://git.openjdk.org/jdk/pull/21198/files/162932fa..d2467e73 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21198&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21198&range=00-01 Stats: 11 lines in 2 files changed: 0 ins; 6 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/21198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21198/head:pull/21198 PR: https://git.openjdk.org/jdk/pull/21198 From lucy at openjdk.org Thu Sep 26 10:41:38 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 26 Sep 2024 10:41:38 GMT Subject: RFR: 8339220: [s390x] TestIfMinMax.java failure [v2] In-Reply-To: References: <5DtK26dlgE6Aacoldf6_VEDVfSzlym7EC6FW1O-iwiE=.218dba46-f28d-4765-9dd3-0d9043838b61@github.com> Message-ID: On Thu, 26 Sep 2024 10:06:08 GMT, Amit Kumar wrote: >> This test enables Conditional moves for long operands for s390x. Which fixes the test-case. >> >> Ran tier1 and not saw any regression. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > suggestions from Andrew Changes requested by lucy (Reviewer). src/hotspot/cpu/s390/matcher_s390.hpp line 69: > 67: static int long_cmove_cost() { > 68: // z196/z11 or later hardware support conditional moves > 69: return VM_Version::get_model_index() >= 5 ? 0 : ConditionalMoveLimit; Why didn't you use `has_LoadStoreConditional()`? This method uses the facility indication flag and does not rely on some artificial architecture generation detection. At least in theory, it could happen that the facility you are testing for was not installed or disabled on the machine you are running on. src/hotspot/cpu/s390/matcher_s390.hpp line 74: > 72: static int float_cmove_cost() { > 73: // z196/z11 or later hardware support conditional moves > 74: return VM_Version::get_model_index() >= 5 ? 0 : ConditionalMoveLimit; Same as above. src/hotspot/cpu/s390/vm_version_s390.hpp line 174: > 172: public: > 173: > 174: static int get_model_index(); With the above, this method can become private again. src/hotspot/cpu/x86/vm_version_x86.hpp line 656: > 654: // 4 - 486 > 655: // 5 - Pentium > 656: // 6 - PentiumPro, Pentium II, Celer;on, Xeon, Pentium III, Athlon, I haven't heard of that processor family. Nor does intel talk about it: https://www.intel.com/content/www/us/en/products/details/processors/celeron.html ------------- PR Review: https://git.openjdk.org/jdk/pull/21198#pullrequestreview-2330810035 PR Review Comment: https://git.openjdk.org/jdk/pull/21198#discussion_r1776807146 PR Review Comment: https://git.openjdk.org/jdk/pull/21198#discussion_r1776807467 PR Review Comment: https://git.openjdk.org/jdk/pull/21198#discussion_r1776808287 PR Review Comment: https://git.openjdk.org/jdk/pull/21198#discussion_r1776810653 From amitkumar at openjdk.org Thu Sep 26 11:10:34 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 26 Sep 2024 11:10:34 GMT Subject: RFR: 8339220: [s390x] TestIfMinMax.java failure [v2] In-Reply-To: References: <5DtK26dlgE6Aacoldf6_VEDVfSzlym7EC6FW1O-iwiE=.218dba46-f28d-4765-9dd3-0d9043838b61@github.com> Message-ID: On Thu, 26 Sep 2024 10:06:08 GMT, Amit Kumar wrote: >> This test enables Conditional moves for long operands for s390x. Which fixes the test-case. >> >> Ran tier1 and not saw any regression. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > suggestions from Andrew I did two benchmark, and I see some performance improvement, in `IfMinMax.testSingleLong` and `IfMinMax.testVectorLong`: Without Patch: Benchmark Mode Cnt Score Error Units IfMinMax.testReductionInt avgt 15 2073.648 ? 4.173 ns/op IfMinMax.testReductionLong avgt 15 2028.487 ? 1.246 ns/op IfMinMax.testSingleInt avgt 15 9.752 ? 0.172 ns/op IfMinMax.testSingleLong avgt 15 16.168 ? 0.248 ns/op IfMinMax.testVectorInt avgt 15 4713.057 ? 14.566 ns/op IfMinMax.testVectorLong avgt 15 27669.122 ? 4096.469 ns/op Finished running test 'micro:vm.compiler.IfMinMax' Benchmark Mode Cnt Score Error Units IfMinMax.testReductionInt avgt 15 2073.340 ? 4.624 ns/op IfMinMax.testReductionLong avgt 15 2028.775 ? 1.874 ns/op IfMinMax.testSingleInt avgt 15 9.742 ? 0.172 ns/op IfMinMax.testSingleLong avgt 15 16.286 ? 0.177 ns/op IfMinMax.testVectorInt avgt 15 4720.292 ? 30.984 ns/op IfMinMax.testVectorLong avgt 15 25043.432 ? 1627.543 ns/op Finished running test 'micro:vm.compiler.IfMinMax' =================================================================== With Patch: Benchmark Mode Cnt Score Error Units IfMinMax.testReductionInt avgt 15 2082.858 ? 27.064 ns/op IfMinMax.testReductionLong avgt 15 2029.843 ? 4.514 ns/op IfMinMax.testSingleInt avgt 15 9.743 ? 0.170 ns/op IfMinMax.testSingleLong avgt 15 10.072 ? 0.123 ns/op IfMinMax.testVectorInt avgt 15 4775.680 ? 9.953 ns/op IfMinMax.testVectorLong avgt 15 4736.881 ? 20.507 ns/op Finished running test 'micro:vm.compiler.IfMinMax' Benchmark Mode Cnt Score Error Units IfMinMax.testReductionInt avgt 15 2071.544 ? 2.536 ns/op IfMinMax.testReductionLong avgt 15 2030.751 ? 3.733 ns/op IfMinMax.testSingleInt avgt 15 9.630 ? 0.010 ns/op IfMinMax.testSingleLong avgt 15 9.921 ? 0.001 ns/op IfMinMax.testVectorInt avgt 15 4774.041 ? 2.562 ns/op IfMinMax.testVectorLong avgt 15 4753.247 ? 22.158 ns/op Finished running test 'micro:vm.compiler.IfMinMax' ------------- PR Comment: https://git.openjdk.org/jdk/pull/21198#issuecomment-2376640053 From ogillespie at openjdk.org Thu Sep 26 11:37:53 2024 From: ogillespie at openjdk.org (Oli Gillespie) Date: Thu, 26 Sep 2024 11:37:53 GMT Subject: RFR: 8341013: Optimize x86/aarch64 MD5 intrinsics by reducing data dependency Message-ID: As suggested in https://github.com/animetosho/md5-optimisation?tab=readme-ov-file#dependency-shortcut-in-g-function, we can delay the dependency on 'b' by recognizing that the ((d & b) | (~d & c)) is equivalent to ((d & b) + (~d & c)) in this scenario, and we can perform those additions independently, leaving our dependency on b to the final addition. This speeds it up around 5%. Benchmark results on my two hosts: Benchmark (algorithm) (dataSize) (provider) Mode Cnt Score Error Units x86 Before: MessageDigestBench.digest MD5 1048576 thrpt 10 636.389 ? 0.240 ops/s x86 After: MessageDigestBench.digest MD5 1048576 thrpt 10 671.611 ? 0.226 ops/s (+5.5%) aarch64 Before: MessageDigestBench.digest MD5 1048576 thrpt 10 498.613 ? 0.359 ops/s aarch64 After: MessageDigestBench.digest MD5 1048576 thrpt 10 526.008 ? 0.491 ops/s (+5.6%) ------------- Commit messages: - Optimize md5 intrinsic Changes: https://git.openjdk.org/jdk/pull/21203/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21203&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341013 Stats: 6 lines in 2 files changed: 3 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21203.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21203/head:pull/21203 PR: https://git.openjdk.org/jdk/pull/21203 From rkennke at openjdk.org Thu Sep 26 11:41:53 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 26 Sep 2024 11:41:53 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 08:55:44 GMT, Roberto Casta?eda Lozano wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow LM_MONITOR on 32-bit platforms > > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4008: > >> 4006: #ifdef COMPILER2 >> 4007: if ((UseAVX == 2) && EnableX86ECoreOpts && !UseCompactObjectHeaders) { >> 4008: generate_string_indexof(StubRoutines::_string_indexof_array); > > This stub routine should be re-enabled if `UseCompactObjectHeaders` is to become non-experimental and enabled by default in the future. Is there a RFE for this task? This comes from an assert in `LibraryCallKit::inline_string_indexOfI` and I believe we can perhaps remove that assert and the !UCOH clause. I checked a couple of tests that tripped that assert, and they seem to work fine, and I also checked the code in `LibraryCallKit::inline_string_indexOfI` and `generate_string_indexof_stubs()` and could not find anything obvious that requires the base offset to be >=16. I am not sure why that assert is there. I am now running tier1-4 with that change: https://github.com/rkennke/jdk/commit/7001783e8c11718226506f42b7c1f1fda1af3ad0 If you know (or find) any reason why we need that assert, please let me know. Otherwise I'd remove it, if you don't have objections. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776888460 From lucy at openjdk.org Thu Sep 26 11:44:37 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 26 Sep 2024 11:44:37 GMT Subject: RFR: 8339220: [s390x] TestIfMinMax.java failure [v2] In-Reply-To: References: <5DtK26dlgE6Aacoldf6_VEDVfSzlym7EC6FW1O-iwiE=.218dba46-f28d-4765-9dd3-0d9043838b61@github.com> Message-ID: On Thu, 26 Sep 2024 10:38:58 GMT, Lutz Schmidt wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> suggestions from Andrew > > Changes requested by lucy (Reviewer). > But I assume @RealLucy can tell us something useful :-) Yes, the s390x port was alive before it was donated to OpenJDK. :-) If there is general interest, I can dig in the internal history in search for more details. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21198#issuecomment-2376704035 From kbarrett at openjdk.org Thu Sep 26 11:47:39 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 26 Sep 2024 11:47:39 GMT Subject: RFR: 8340620: Fix -Wzero-as-null-pointer-constant warnings for CompressedOops In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 09:21:39 GMT, Stefan Karlsson wrote: > FWIW, I think these asserts adds extra noise to these functions and I don't think we will be much more happy about having to read them over and over again when we read this functions / debug code through these functions. I would have preferred if this was one of those things that we require from our platforms and place a check in globalDefinitions, or some other prominent place that checks HotSpot's assumptions of the compilers / platforms. Implementing option 2 (making base() an integral offset) would remove that assumption here, and allow removal of the assertions currently proposed here. And I generally prefer placing asserts with the expecting code. OTOH, I think it's nearly certain there are other places where we make the same assumption. (And I'd forgotten we have some assumption checks in globalDefinitions.cpp.) I don't have a strong opinion in this area. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21172#issuecomment-2376711459 From mli at openjdk.org Thu Sep 26 11:53:14 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 26 Sep 2024 11:53:14 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v7] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > Thanks! > > This patch is based on https://github.com/openjdk/jdk/pull/20781 which added the sleef source (in particular the generated sleef inline headers). We use sleef api to vectorize the math operations in vector api. > > On machine with vector intrinsic support on riscv (e.g. gcc 14+) it will generate libsleef.so with the bridge functions to sleef api, otherwise without the bridge functions. > > ### Test > test/jdk/jdk/incubator/vector > > ### Performance > data on bananapi > > Benchmark - bananapi | (size) | Mode | Cnt | Score +intrinsic | Error +intrinsic | Score -intrinsic | Error -intrinsic | Units | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Double128Vector.ACOS | 1024 | avgt | 10 | 112444.388 | 655.761 | 208554.742 | 1508.709 | ns/op | 1.855 > Double128Vector.ASIN | 1024 | avgt | 10 | 104121.259 | 243.167 | 208314.499 | 2833.61 | ns/op | 2.001 > Double128Vector.ATAN | 1024 | avgt | 10 | 136941.263 | 243.486 | 284024.53 | 2204.224 | ns/op | 2.074 > Double128Vector.ATAN2 | 1024 | avgt | 10 | 163228.681 | 435.455 | 427589.587 | 3045.192 | ns/op | 2.62 > Double128Vector.CBRT | 1024 | avgt | 10 | 146395.753 | 239.355 | 317136.654 | 1330.869 | ns/op | 2.166 > Double128Vector.COS | 1024 | avgt | 10 | 154865.298 | 235.697 | 305721.518 | 1319.313 | ns/op | 1.974 > Double128Vector.COSH | 1024 | avgt | 10 | 189212.943 | 262.399 | 220756.27 | 61324.863 | ns/op | 1.167 > Double128Vector.EXP | 1024 | avgt | 10 | 113941.594 | 219.647 | 252853.07 | 891.272 | ns/op | 2.219 > Double128Vector.EXPM1 | 1024 | avgt | 10 | 184552.939 | 513.715 | 254087.184 | 2144.997 | ns/op | 1.377 > Double128Vector.HYPOT | 1024 | avgt | 10 | 111580.194 | 423.282 | 374537.338 | 2091.811 | ns/op | 3.357 > Double128Vector.LOG | 1024 | avgt | 10 | 110680.548 | 192.731 | 265391.129 | 2653.519 | ns/op | 2.398 > Double128Vector.LOG10 | 1024 | avgt | 10 | 116708.105 | 167.095 | 285764.405 | 2489.08 | ns/op | 2.449 > Double128Vector.LOG1P | 1024 | avgt | 10 | 115633.302 | 567.7 | 317235.967 | 1062.848 | ns/op | 2.743 > Double128Vector.POW | 1024 | avgt | 10 | 321655.14 | 36.55 | 560765.066 | 2669.33 | ns/op | 1.743 > Double128Vector.... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: check frm after sleef call ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21083/files - new: https://git.openjdk.org/jdk/pull/21083/files/7719b5cf..50b6d529 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21083&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21083&range=05-06 Stats: 23 lines in 1 file changed: 21 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21083.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21083/head:pull/21083 PR: https://git.openjdk.org/jdk/pull/21083 From mli at openjdk.org Thu Sep 26 11:53:14 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 26 Sep 2024 11:53:14 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v6] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 07:54:32 GMT, Hamlin Li wrote: >> Just a bit worried about the fact that manipunating CSR could be very costly on RISC-V. Another choice would be adding an assertion about FP rounding mode expecting RNE when returning back from the SLEEF routine. I also checked floating-point intrinsics with `_rm` suffix in the function name in SLEEF src code and only witnessed use of `__RISCV_FRM_RNE`. I didn't see uses of other rounding modes as specified by the rvv-intrinsic-spec [1]. >> >> [1] https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/main/doc/rvv-intrinsic-spec.adoc > > Sounds like a reasonable solution! > Anyone has other thoughts please kindly let me know. added some code to check `frm` after sleef calls. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1776903349 From rcastanedalo at openjdk.org Thu Sep 26 12:16:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 26 Sep 2024 12:16:52 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 12:53:17 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Allow LM_MONITOR on 32-bit platforms src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2570: > 2568: // we get the heapBase in obj, and the narrowOop+klass_offset_in_bytes/sizeof(narrowOop) in index. > 2569: // When that happens, we need to lea the address into a single register, and subtract the > 2570: // klass_offset_in_bytes, to get the address of the mark-word. Parts of this comment are obsolete after commit 2c4a7877, please update the comment. src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 882: > 880: void store_klass(Register dst, Register src); > 881: void cmp_klass(Register oop, Register trial_klass, Register tmp); > 882: void cmp_klass(Register src, Register dst, Register tmp1, Register tmp2); Same suggestion as for the analogous x86 functions: consider renaming the four-argument `cmp_klass` function to `cmp_klasses_from_objects` or similar, and the `src` and `dst` parameters to `oop1` and `oop2` or similar if there is no notion of "source" and "destination". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776927247 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776942226 From coleenp at openjdk.org Thu Sep 26 12:25:36 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 26 Sep 2024 12:25:36 GMT Subject: RFR: 8340826: Should not send unload notification for scratch classes In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 16:29:36 GMT, Leonid Mesnik wrote: > The jvmti class redefinition creates temporary scratch classes for it's own purposes. These classes are added to corresponding classloaders and might be unloaded. > In this case the jvmti/jfr and log events are generated twice: for original class and for it's scratch. > > The bug could be reproduced by jfr test > jdk/jfr/api/metadata/eventtype/TestUnloadingEventClass.java > with '-Xcomp -XX:TieredStopAtLevel=1' or with '-Xcomp' > > The test log (modified slightly) shown > > > [167.294s][info ][class,unload] unloading class jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded 0x00000000af1006d8 allocated > [167.294s][info ][class,unload] unloading class jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded 0x00000000af100248 fully_initialized > [167.345s][trace][class,unload] unlinking class (subclass): jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded > [167.872s][trace][gc ] GC(0) Restored 3597 marks, occupying 57552 B > [167.924s][info ][gc ] GC(0) Pause Full (System.gc()) 34M->2M(136M) 691.041ms > Unloaded count: 2 > > > instead of expected > > > > [159.737s][info ][class,unload] unloading class jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded 0x0000000041100248 state: fully_initialized > [159.800s][trace][class,unload] unlinking class (subclass): jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded > [160.341s][trace][gc ] GC(0) Restored 3597 marks, occupying 57552 B > [160.384s][info ][gc ] GC(0) Pause Full (System.gc()) 34M->2M(136M) 710.422ms > > > > The test hang because got 2 events while waiting for one. > The "allocated" version is the scratch class generated by JVMTI JFR agent that redefine classes. > > The fix is to don't send notification for scratch classes. The scratch classes shouldn't have dependency so added assertion. Also, we don't expect any other not loaded classes during unloaded. > > Thanks Coleen for details about scratch classed. > > Tested with tier1-5 and with :jdk_jfr with Xcomp and c1. This looks good. Thank you for diagnosing and fixing this problem. Marked as reviewed by coleenp (Reviewer). src/hotspot/share/jfr/recorder/checkpoint/types/jfrTypeSet.cpp line 487: > 485: assert(klass != nullptr, "invariant"); > 486: assert(_subsystem_callback != nullptr, "invariant"); > 487: if(klass->is_instance_klass() && InstanceKlass::cast(klass)->is_scratch_class()) { Nit: add a space between if and (. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21166#pullrequestreview-2331071532 PR Review: https://git.openjdk.org/jdk/pull/21166#pullrequestreview-2331075368 PR Review Comment: https://git.openjdk.org/jdk/pull/21166#discussion_r1776964706 From coleenp at openjdk.org Thu Sep 26 12:25:36 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 26 Sep 2024 12:25:36 GMT Subject: RFR: 8340826: Should not send unload notification for scratch classes In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 12:21:18 GMT, Coleen Phillimore wrote: >> The jvmti class redefinition creates temporary scratch classes for it's own purposes. These classes are added to corresponding classloaders and might be unloaded. >> In this case the jvmti/jfr and log events are generated twice: for original class and for it's scratch. >> >> The bug could be reproduced by jfr test >> jdk/jfr/api/metadata/eventtype/TestUnloadingEventClass.java >> with '-Xcomp -XX:TieredStopAtLevel=1' or with '-Xcomp' >> >> The test log (modified slightly) shown >> >> >> [167.294s][info ][class,unload] unloading class jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded 0x00000000af1006d8 allocated >> [167.294s][info ][class,unload] unloading class jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded 0x00000000af100248 fully_initialized >> [167.345s][trace][class,unload] unlinking class (subclass): jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded >> [167.872s][trace][gc ] GC(0) Restored 3597 marks, occupying 57552 B >> [167.924s][info ][gc ] GC(0) Pause Full (System.gc()) 34M->2M(136M) 691.041ms >> Unloaded count: 2 >> >> >> instead of expected >> >> >> >> [159.737s][info ][class,unload] unloading class jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded 0x0000000041100248 state: fully_initialized >> [159.800s][trace][class,unload] unlinking class (subclass): jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded >> [160.341s][trace][gc ] GC(0) Restored 3597 marks, occupying 57552 B >> [160.384s][info ][gc ] GC(0) Pause Full (System.gc()) 34M->2M(136M) 710.422ms >> >> >> >> The test hang because got 2 events while waiting for one. >> The "allocated" version is the scratch class generated by JVMTI JFR agent that redefine classes. >> >> The fix is to don't send notification for scratch classes. The scratch classes shouldn't have dependency so added assertion. Also, we don't expect any other not loaded classes during unloaded. >> >> Thanks Coleen for details about scratch classed. >> >> Tested with tier1-5 and with :jdk_jfr with Xcomp and c1. > > src/hotspot/share/jfr/recorder/checkpoint/types/jfrTypeSet.cpp line 487: > >> 485: assert(klass != nullptr, "invariant"); >> 486: assert(_subsystem_callback != nullptr, "invariant"); >> 487: if(klass->is_instance_klass() && InstanceKlass::cast(klass)->is_scratch_class()) { > > Nit: add a space between if and (. I'll approve it again after you add this blank. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21166#discussion_r1776967255 From liach at openjdk.org Thu Sep 26 12:26:38 2024 From: liach at openjdk.org (Chen Liang) Date: Thu, 26 Sep 2024 12:26:38 GMT Subject: RFR: 8336468: Reflection and MethodHandles should use more precise initializer checks [v5] In-Reply-To: <1ka2z7mmLWUvYJRKiYghlw3p41sr16iHEpOy8k8Pifo=.6aae963b-e10f-49b1-ac48-34e54055fdbf@github.com> References: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com> <1ka2z7mmLWUvYJRKiYghlw3p41sr16iHEpOy8k8Pifo=.6aae963b-e10f-49b1-ac48-34e54055fdbf@github.com> Message-ID: On Wed, 25 Sep 2024 15:53:19 GMT, Aleksey Shipilev wrote: >> This PR should cover the Reflection/MethodHandles part of [JDK-8336103](https://bugs.openjdk.org/browse/JDK-8336103). >> >> There are places where we change the behavior: `clinit` would now be recorded as "method", instead of "constructor". Tracing back the uses of `get_flags`: it is used for initializing `java.lang.ClassFrameInfo.flags`. There seem to be no readers for this field in VM. Java side for `j.l.CFI` does not seem to check any method/constructor flags. So I would say this change in behavior is not really visible, and there is no need to try and keep the old (odd) behavior. >> >> I also inlined the `select_method` definition, which allows for a bit more straight-forward local code, and obviates the need for wrapping things with `methodHandle`. >> >> @mlchung, you probably want to look at this more closely. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Replace assert with a comment Yep, should've clarified the approval came after test success. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20192#issuecomment-2376806714 From duke at openjdk.org Thu Sep 26 12:34:44 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Thu, 26 Sep 2024 12:34:44 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v17] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Thu, 26 Sep 2024 09:10:55 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: > > cleanup: add a description for intpow() > > Co-authored-by: Andrew Haley src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5445: > 5443: __ uxtl(vhalf0, Assembler::T4S, vdata0, Assembler::T4H); > 5444: } > 5445: __ addv(vmul0, Assembler::T4S, vmul0, vhalf0); I was advised to use a single `SADDW`/`UADDW` instruction instead of the current pair of `SXTL`/`UXTL` followed by `ADD`. It seems this was likely overlooked because the `Assembler` class is missing the corresponding instructions. I am adding these instructions and updating the implementation accordingly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1776980537 From amitkumar at openjdk.org Thu Sep 26 13:03:13 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 26 Sep 2024 13:03:13 GMT Subject: RFR: 8339220: [s390x] TestIfMinMax.java failure [v2] In-Reply-To: References: <5DtK26dlgE6Aacoldf6_VEDVfSzlym7EC6FW1O-iwiE=.218dba46-f28d-4765-9dd3-0d9043838b61@github.com> Message-ID: On Thu, 26 Sep 2024 10:35:42 GMT, Lutz Schmidt wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> suggestions from Andrew > > src/hotspot/cpu/s390/matcher_s390.hpp line 69: > >> 67: static int long_cmove_cost() { >> 68: // z196/z11 or later hardware support conditional moves >> 69: return VM_Version::get_model_index() >= 5 ? 0 : ConditionalMoveLimit; > > Why didn't you use `has_LoadStoreConditional()`? This method uses the facility indication flag and does not rely on some artificial architecture generation detection. At least in theory, it could happen that the facility you are testing for was not installed or disabled on the machine you are running on. Fixed :) > src/hotspot/cpu/x86/vm_version_x86.hpp line 656: > >> 654: // 4 - 486 >> 655: // 5 - Pentium >> 656: // 6 - PentiumPro, Pentium II, Celer;on, Xeon, Pentium III, Athlon, > > I haven't heard of that processor family. Nor does intel talk about it: https://www.intel.com/content/www/us/en/products/details/processors/celeron.html Oh, maybe because it's not released yet. sorry, that was unintentional-typo; Fixed :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21198#discussion_r1777020322 PR Review Comment: https://git.openjdk.org/jdk/pull/21198#discussion_r1777023337 From amitkumar at openjdk.org Thu Sep 26 13:03:12 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 26 Sep 2024 13:03:12 GMT Subject: RFR: 8339220: [s390x] TestIfMinMax.java failure [v3] In-Reply-To: <5DtK26dlgE6Aacoldf6_VEDVfSzlym7EC6FW1O-iwiE=.218dba46-f28d-4765-9dd3-0d9043838b61@github.com> References: <5DtK26dlgE6Aacoldf6_VEDVfSzlym7EC6FW1O-iwiE=.218dba46-f28d-4765-9dd3-0d9043838b61@github.com> Message-ID: > This test enables Conditional moves for long operands for s390x. Which fixes the test-case. > > Ran tier1 and not saw any regression. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: suggestion from Lutz ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21198/files - new: https://git.openjdk.org/jdk/pull/21198/files/d2467e73..281a66bf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21198&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21198&range=01-02 Stats: 6 lines in 3 files changed: 1 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21198.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21198/head:pull/21198 PR: https://git.openjdk.org/jdk/pull/21198 From rcastanedalo at openjdk.org Thu Sep 26 13:07:55 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 26 Sep 2024 13:07:55 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 11:39:02 GMT, Roman Kennke wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4008: >> >>> 4006: #ifdef COMPILER2 >>> 4007: if ((UseAVX == 2) && EnableX86ECoreOpts && !UseCompactObjectHeaders) { >>> 4008: generate_string_indexof(StubRoutines::_string_indexof_array); >> >> This stub routine should be re-enabled if `UseCompactObjectHeaders` is to become non-experimental and enabled by default in the future. Is there a RFE for this task? > > This comes from an assert in `LibraryCallKit::inline_string_indexOfI` and I believe we can perhaps remove that assert and the !UCOH clause. I checked a couple of tests that tripped that assert, and they seem to work fine, and I also checked the code in `LibraryCallKit::inline_string_indexOfI` and `generate_string_indexof_stubs()` and could not find anything obvious that requires the base offset to be >=16. I am not sure why that assert is there. I am now running tier1-4 with that change: https://github.com/rkennke/jdk/commit/7001783e8c11718226506f42b7c1f1fda1af3ad0 > > If you know (or find) any reason why we need that assert, please let me know. Otherwise I'd remove it, if you don't have objections. I am not familiar with the `indexOf` implementation, but here is a relevant comment that motivates the assertion: https://github.com/openjdk/jdk/pull/16753#discussion_r1592774634. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777033220 From coleenp at openjdk.org Thu Sep 26 13:10:40 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 26 Sep 2024 13:10:40 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release My benchmarking showed only the normal jitter and no regressions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2376912839 From mli at openjdk.org Thu Sep 26 13:14:04 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 26 Sep 2024 13:14:04 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v8] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > Thanks! > > This patch is based on https://github.com/openjdk/jdk/pull/20781 which added the sleef source (in particular the generated sleef inline headers). We use sleef api to vectorize the math operations in vector api. > > On machine with vector intrinsic support on riscv (e.g. gcc 14+) it will generate libsleef.so with the bridge functions to sleef api, otherwise without the bridge functions. > > ### Test > test/jdk/jdk/incubator/vector > > ### Performance > data on bananapi > > Benchmark - bananapi | (size) | Mode | Cnt | Score +intrinsic | Error +intrinsic | Score -intrinsic | Error -intrinsic | Units | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Double128Vector.ACOS | 1024 | avgt | 10 | 112444.388 | 655.761 | 208554.742 | 1508.709 | ns/op | 1.855 > Double128Vector.ASIN | 1024 | avgt | 10 | 104121.259 | 243.167 | 208314.499 | 2833.61 | ns/op | 2.001 > Double128Vector.ATAN | 1024 | avgt | 10 | 136941.263 | 243.486 | 284024.53 | 2204.224 | ns/op | 2.074 > Double128Vector.ATAN2 | 1024 | avgt | 10 | 163228.681 | 435.455 | 427589.587 | 3045.192 | ns/op | 2.62 > Double128Vector.CBRT | 1024 | avgt | 10 | 146395.753 | 239.355 | 317136.654 | 1330.869 | ns/op | 2.166 > Double128Vector.COS | 1024 | avgt | 10 | 154865.298 | 235.697 | 305721.518 | 1319.313 | ns/op | 1.974 > Double128Vector.COSH | 1024 | avgt | 10 | 189212.943 | 262.399 | 220756.27 | 61324.863 | ns/op | 1.167 > Double128Vector.EXP | 1024 | avgt | 10 | 113941.594 | 219.647 | 252853.07 | 891.272 | ns/op | 2.219 > Double128Vector.EXPM1 | 1024 | avgt | 10 | 184552.939 | 513.715 | 254087.184 | 2144.997 | ns/op | 1.377 > Double128Vector.HYPOT | 1024 | avgt | 10 | 111580.194 | 423.282 | 374537.338 | 2091.811 | ns/op | 3.357 > Double128Vector.LOG | 1024 | avgt | 10 | 110680.548 | 192.731 | 265391.129 | 2653.519 | ns/op | 2.398 > Double128Vector.LOG10 | 1024 | avgt | 10 | 116708.105 | 167.095 | 285764.405 | 2489.08 | ns/op | 2.449 > Double128Vector.LOG1P | 1024 | avgt | 10 | 115633.302 | 567.7 | 317235.967 | 1062.848 | ns/op | 2.743 > Double128Vector.POW | 1024 | avgt | 10 | 321655.14 | 36.55 | 560765.066 | 2669.33 | ns/op | 1.743 > Double128Vector.... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: fix test macro ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21083/files - new: https://git.openjdk.org/jdk/pull/21083/files/50b6d529..0bd263d1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21083&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21083&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21083.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21083/head:pull/21083 PR: https://git.openjdk.org/jdk/pull/21083 From ogillespie at openjdk.org Thu Sep 26 13:47:37 2024 From: ogillespie at openjdk.org (Oli Gillespie) Date: Thu, 26 Sep 2024 13:47:37 GMT Subject: RFR: 8341013: Optimize x86/aarch64 MD5 intrinsics by reducing data dependency In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 11:33:00 GMT, Oli Gillespie wrote: > As suggested in https://github.com/animetosho/md5-optimisation?tab=readme-ov-file#dependency-shortcut-in-g-function, we can delay the dependency on 'b' by recognizing that the ((d & b) | (~d & c)) is equivalent to ((d & b) + (~d & c)) in this scenario, and we can perform those additions independently, leaving our dependency on b to the final addition. This speeds it up around 5%. > > Benchmark results on my two hosts: > > > Benchmark (algorithm) (dataSize) (provider) Mode Cnt Score Error Units > > x86 Before: > MessageDigestBench.digest MD5 1048576 thrpt 10 636.389 ? 0.240 ops/s > > x86 After: > MessageDigestBench.digest MD5 1048576 thrpt 10 671.611 ? 0.226 ops/s (+5.5%) > > > aarch64 Before: > MessageDigestBench.digest MD5 1048576 thrpt 10 498.613 ? 0.359 ops/s > > aarch64 After: > MessageDigestBench.digest MD5 1048576 thrpt 10 526.008 ? 0.491 ops/s (+5.6%) Looks like there's a bug, at least on macos aarch64. I will investigate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21203#issuecomment-2377021394 From rkennke at openjdk.org Thu Sep 26 14:00:58 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 26 Sep 2024 14:00:58 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: Message-ID: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> On Thu, 26 Sep 2024 13:04:57 GMT, Roberto Casta?eda Lozano wrote: >> This comes from an assert in `LibraryCallKit::inline_string_indexOfI` and I believe we can perhaps remove that assert and the !UCOH clause. I checked a couple of tests that tripped that assert, and they seem to work fine, and I also checked the code in `LibraryCallKit::inline_string_indexOfI` and `generate_string_indexof_stubs()` and could not find anything obvious that requires the base offset to be >=16. I am not sure why that assert is there. I am now running tier1-4 with that change: https://github.com/rkennke/jdk/commit/7001783e8c11718226506f42b7c1f1fda1af3ad0 >> >> If you know (or find) any reason why we need that assert, please let me know. Otherwise I'd remove it, if you don't have objections. > > I am not familiar with the `indexOf` implementation, but here is a relevant comment that motivates the assertion: https://github.com/openjdk/jdk/pull/16753#discussion_r1592774634. Ok, this is indeed relevant and helpful. This could segfault if we happen to read from the very first object on the heap. I can solve this by allowing to copy only 8 bytes onto the stack: https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 Does this look correct to you? Or better to do it as a follow-up? (It passes a couple of indexOf tests, will run tier1-4 on it). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777134871 From rkennke at openjdk.org Thu Sep 26 14:04:43 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 26 Sep 2024 14:04:43 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v27] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - @robcasloz review comments - Improve CollectedHeap::is_oop() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/4904d433..d48f55d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=25-26 Stats: 86 lines in 10 files changed: 20 ins; 21 del; 45 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From mli at openjdk.org Thu Sep 26 14:14:34 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 26 Sep 2024 14:14:34 GMT Subject: RFR: 8341013: Optimize x86/aarch64 MD5 intrinsics by reducing data dependency In-Reply-To: References: Message-ID: <2TbDKPFxiPplfnOOerVPV9DkkKMXT2YaZutG36xApXQ=.c38c7650-188a-48f9-8c7b-79fb99e6bc7d@github.com> On Thu, 26 Sep 2024 11:33:00 GMT, Oli Gillespie wrote: > As suggested in https://github.com/animetosho/md5-optimisation?tab=readme-ov-file#dependency-shortcut-in-g-function, we can delay the dependency on 'b' by recognizing that the ((d & b) | (~d & c)) is equivalent to ((d & b) + (~d & c)) in this scenario, and we can perform those additions independently, leaving our dependency on b to the final addition. This speeds it up around 5%. > > Benchmark results on my two hosts: > > > Benchmark (algorithm) (dataSize) (provider) Mode Cnt Score Error Units > > x86 Before: > MessageDigestBench.digest MD5 1048576 thrpt 10 636.389 ? 0.240 ops/s > > x86 After: > MessageDigestBench.digest MD5 1048576 thrpt 10 671.611 ? 0.226 ops/s (+5.5%) > > > aarch64 Before: > MessageDigestBench.digest MD5 1048576 thrpt 10 498.613 ? 0.359 ops/s > > aarch64 After: > MessageDigestBench.digest MD5 1048576 thrpt 10 526.008 ? 0.491 ops/s (+5.6%) Hi, interesting optimization. Have some questions below. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 3428: > 3426: __ andw(rscratch2, r2, r4); > 3427: __ addw(rscratch2, rscratch2, rscratch3); > 3428: __ rorw(rscratch2, rscratch3, 32 - s); Does this mean that `rscratch2` at line 3426-3427 is discarded at line 3428? ------------- PR Review: https://git.openjdk.org/jdk/pull/21203#pullrequestreview-2331406395 PR Review Comment: https://git.openjdk.org/jdk/pull/21203#discussion_r1777160103 From ogillespie at openjdk.org Thu Sep 26 14:30:35 2024 From: ogillespie at openjdk.org (Oli Gillespie) Date: Thu, 26 Sep 2024 14:30:35 GMT Subject: RFR: 8341013: Optimize x86/aarch64 MD5 intrinsics by reducing data dependency In-Reply-To: <2TbDKPFxiPplfnOOerVPV9DkkKMXT2YaZutG36xApXQ=.c38c7650-188a-48f9-8c7b-79fb99e6bc7d@github.com> References: <2TbDKPFxiPplfnOOerVPV9DkkKMXT2YaZutG36xApXQ=.c38c7650-188a-48f9-8c7b-79fb99e6bc7d@github.com> Message-ID: On Thu, 26 Sep 2024 14:11:42 GMT, Hamlin Li wrote: >> As suggested in https://github.com/animetosho/md5-optimisation?tab=readme-ov-file#dependency-shortcut-in-g-function, we can delay the dependency on 'b' by recognizing that the ((d & b) | (~d & c)) is equivalent to ((d & b) + (~d & c)) in this scenario, and we can perform those additions independently, leaving our dependency on b to the final addition. This speeds it up around 5%. >> >> Benchmark results on my two hosts: >> >> >> Benchmark (algorithm) (dataSize) (provider) Mode Cnt Score Error Units >> >> x86 Before: >> MessageDigestBench.digest MD5 1048576 thrpt 10 636.389 ? 0.240 ops/s >> >> x86 After: >> MessageDigestBench.digest MD5 1048576 thrpt 10 671.611 ? 0.226 ops/s (+5.5%) >> >> >> aarch64 Before: >> MessageDigestBench.digest MD5 1048576 thrpt 10 498.613 ? 0.359 ops/s >> >> aarch64 After: >> MessageDigestBench.digest MD5 1048576 thrpt 10 526.008 ? 0.491 ops/s (+5.6%) > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 3428: > >> 3426: __ andw(rscratch2, r2, r4); >> 3427: __ addw(rscratch2, rscratch2, rscratch3); >> 3428: __ rorw(rscratch2, rscratch3, 32 - s); > > Does this mean that `rscratch2` at line 3426-3427 is discarded at line 3428? Yes! Well spotted, my complete mistake. I'm kinda shocked it passed so many tests ? . I will fix it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21203#discussion_r1777187116 From mli at openjdk.org Thu Sep 26 14:34:36 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 26 Sep 2024 14:34:36 GMT Subject: RFR: 8340620: Fix -Wzero-as-null-pointer-constant warnings for CompressedOops In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 23:26:08 GMT, Kim Barrett wrote: > Please review this change that fixes -Wzero-as-null-pointer-constant warnings > in CompressedOops code. These all relate to CompressedOops::base(). > > I also added a couple of asserts to verify our assumptions about null pointer > constants being representationally zero. That isn't a Standard-conforming > assumption, but holds for all platforms we currently support. I considered, > and even explored, a couple of different options. > > (1) Continue to have CompressedOops::base() be a pointer, but avoid that > assumption, being more careful about how zero-valued pointers are treated. But > that adds significant complexity that we can't test, since we don't support > any platforms needing that extra work. > > (2) Change CompressedOops::base() to an integral adjustment. This is probably > the correct approach, but is much more intrusive and wide ranging in the > changes required. Maybe something for the future. > > Testing: mach5 tier1-5 > GHA testing, verifying builds on some platforms not supported by Oracle. > > There are some simple changes to s390 and ppc code that I haven't tested, > beyond verifying compilation. Marked as reviewed by mli (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21172#pullrequestreview-2331472421 From shade at openjdk.org Thu Sep 26 14:39:37 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 26 Sep 2024 14:39:37 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release Our performance tests show no effect as well. So I guess we are fine. I would like platform maintainers to look at relevant parts: @RealFYang, @TheRealMDoerr, @RealLucy, @offamitkumar? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2377161569 From ogillespie at openjdk.org Thu Sep 26 14:58:49 2024 From: ogillespie at openjdk.org (Oli Gillespie) Date: Thu, 26 Sep 2024 14:58:49 GMT Subject: RFR: 8341013: Optimize x86/aarch64 MD5 intrinsics by reducing data dependency [v2] In-Reply-To: References: Message-ID: <0YcJsAbwtMOHJqUMFkQhpDKsEm3-AFXucByQATljmYc=.7c70a817-3055-4071-88e4-39222bdf3bfb@github.com> > As suggested in https://github.com/animetosho/md5-optimisation?tab=readme-ov-file#dependency-shortcut-in-g-function, we can delay the dependency on 'b' by recognizing that the ((d & b) | (~d & c)) is equivalent to ((d & b) + (~d & c)) in this scenario, and we can perform those additions independently, leaving our dependency on b to the final addition. This speeds it up around 5%. > > Benchmark results on my two hosts: > > > Benchmark (algorithm) (dataSize) (provider) Mode Cnt Score Error Units > > x86 Before: > MessageDigestBench.digest MD5 1048576 thrpt 10 636.389 ? 0.240 ops/s > > x86 After: > MessageDigestBench.digest MD5 1048576 thrpt 10 671.611 ? 0.226 ops/s (+5.5%) > > > aarch64 Before: > MessageDigestBench.digest MD5 1048576 thrpt 10 498.613 ? 0.359 ops/s > > aarch64 After: > MessageDigestBench.digest MD5 1048576 thrpt 10 526.008 ? 0.491 ops/s (+5.6%) Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: Fix aarch64 bug ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21203/files - new: https://git.openjdk.org/jdk/pull/21203/files/d7641133..e6d95c2f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21203&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21203&range=00-01 Stats: 5 lines in 1 file changed: 1 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21203.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21203/head:pull/21203 PR: https://git.openjdk.org/jdk/pull/21203 From aph-open at littlepinkcloud.com Thu Sep 26 15:14:59 2024 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Thu, 26 Sep 2024 16:14:59 +0100 Subject: RFR: 8341013: Optimize x86/aarch64 MD5 intrinsics by reducing data dependency In-Reply-To: References: Message-ID: On 9/26/24 12:37, Oli Gillespie wrote: > aarch64 Before: > MessageDigestBench.digest MD5 1048576 thrpt 10 498.613 ? 0.359 ops/s > > aarch64 After: > MessageDigestBench.digest MD5 1048576 thrpt 10 526.008 ? 0.491 ops/s (+5.6%) Hmm. I'm not sure it's really worth it, but which AArch64 is this? From shade at openjdk.org Thu Sep 26 15:16:41 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 26 Sep 2024 15:16:41 GMT Subject: RFR: 8336468: Reflection and MethodHandles should use more precise initializer checks [v5] In-Reply-To: <1ka2z7mmLWUvYJRKiYghlw3p41sr16iHEpOy8k8Pifo=.6aae963b-e10f-49b1-ac48-34e54055fdbf@github.com> References: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com> <1ka2z7mmLWUvYJRKiYghlw3p41sr16iHEpOy8k8Pifo=.6aae963b-e10f-49b1-ac48-34e54055fdbf@github.com> Message-ID: <3tbC8UlkgMWNeMDw5H_IecOjrWFkqOW4BstP8jhsLQU=.b921003e-a9af-4179-8927-0405e214ddff@github.com> On Wed, 25 Sep 2024 15:53:19 GMT, Aleksey Shipilev wrote: >> This PR should cover the Reflection/MethodHandles part of [JDK-8336103](https://bugs.openjdk.org/browse/JDK-8336103). >> >> There are places where we change the behavior: `clinit` would now be recorded as "method", instead of "constructor". Tracing back the uses of `get_flags`: it is used for initializing `java.lang.ClassFrameInfo.flags`. There seem to be no readers for this field in VM. Java side for `j.l.CFI` does not seem to check any method/constructor flags. So I would say this change in behavior is not really visible, and there is no need to try and keep the old (odd) behavior. >> >> I also inlined the `select_method` definition, which allows for a bit more straight-forward local code, and obviates the need for wrapping things with `methodHandle`. >> >> @mlchung, you probably want to look at this more closely. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Replace assert with a comment Thanks, OK then, I'll integrate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20192#issuecomment-2377255574 From shade at openjdk.org Thu Sep 26 15:16:41 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 26 Sep 2024 15:16:41 GMT Subject: Integrated: 8336468: Reflection and MethodHandles should use more precise initializer checks In-Reply-To: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com> References: <-nwfoQ-7Vg5U97i9sgPAcmj8oE2Nvk0SZoLB5CxzbTk=.a4d6f576-cb95-4106-8f3b-cd216b16eb85@github.com> Message-ID: On Tue, 16 Jul 2024 11:28:22 GMT, Aleksey Shipilev wrote: > This PR should cover the Reflection/MethodHandles part of [JDK-8336103](https://bugs.openjdk.org/browse/JDK-8336103). > > There are places where we change the behavior: `clinit` would now be recorded as "method", instead of "constructor". Tracing back the uses of `get_flags`: it is used for initializing `java.lang.ClassFrameInfo.flags`. There seem to be no readers for this field in VM. Java side for `j.l.CFI` does not seem to check any method/constructor flags. So I would say this change in behavior is not really visible, and there is no need to try and keep the old (odd) behavior. > > I also inlined the `select_method` definition, which allows for a bit more straight-forward local code, and obviates the need for wrapping things with `methodHandle`. > > @mlchung, you probably want to look at this more closely. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` This pull request has now been integrated. Changeset: 376056ca Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/376056ca48fb5dbe3d57cea01a9fbf2ea4c35616 Stats: 40 lines in 5 files changed: 15 ins; 10 del; 15 mod 8336468: Reflection and MethodHandles should use more precise initializer checks Reviewed-by: liach, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/20192 From rcastanedalo at openjdk.org Thu Sep 26 16:02:55 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 26 Sep 2024 16:02:55 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> Message-ID: On Thu, 26 Sep 2024 13:58:02 GMT, Roman Kennke wrote: > Does this look correct to you? Or better to do it as a follow-up? I do not feel confident enough to review this part. If you want to include https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 in this changeset, I would prefer that the original author of JDK-8320448 or at least someone from Intel reviews it, otherwise I think it is fine to leave it as a follow-up enhancement. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777370316 From ogillespie at openjdk.org Thu Sep 26 16:16:40 2024 From: ogillespie at openjdk.org (Oli Gillespie) Date: Thu, 26 Sep 2024 16:16:40 GMT Subject: RFR: 8341013: Optimize x86/aarch64 MD5 intrinsics by reducing data dependency In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 15:16:29 GMT, Andrew Haley wrote: > Hmm. I'm not sure it's really worth it, but which AArch64 is this? That result is from a Neoverse-N1 stepping r3p1 (AWS m6g.xlarge). I can test on mac m1 if interested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21203#issuecomment-2377398734 From rkennke at openjdk.org Thu Sep 26 16:18:58 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 26 Sep 2024 16:18:58 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> Message-ID: On Thu, 26 Sep 2024 15:59:50 GMT, Roberto Casta?eda Lozano wrote: >> Ok, this is indeed relevant and helpful. This could segfault if we happen to read from the very first object on the heap. I can solve this by allowing to copy only 8 bytes onto the stack: https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 >> >> Does this look correct to you? Or better to do it as a follow-up? >> (It passes a couple of indexOf tests, will run tier1-4 on it). > >> Does this look correct to you? Or better to do it as a follow-up? > > I do not feel confident enough to review this part. If you want to include https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 in this changeset, I would prefer that the original author of JDK-8320448 or at least someone from Intel reviews it, otherwise I think it is fine to leave it as a follow-up enhancement. @sviswa7 or @asgibbons WDYT about including https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 as part of compact object headers implementation? Otherwise we would have to disable indexOf intrinsic when running with compact headers, because of the assumption that array headers are >= 16 bytes, which is no longer true with compact headers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777396409 From aph at openjdk.org Thu Sep 26 16:21:37 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 26 Sep 2024 16:21:37 GMT Subject: RFR: 8341013: Optimize x86/aarch64 MD5 intrinsics by reducing data dependency In-Reply-To: References: Message-ID: <0tTwfxNQJz8-XYxBL1zujuv7Cbbe8N1hVqsqddmYB1o=.367aa163-cae4-4d31-a84c-ee7e11c49776@github.com> On Thu, 26 Sep 2024 16:14:00 GMT, Oli Gillespie wrote: > > Hmm. I'm not sure it's really worth it, but which AArch64 is this? > > That result is from a Neoverse-N1 stepping r3p1 (AWS m6g.xlarge). I can test on mac m1 if interested. Given that there is so little advantage, almost down in the noise, you should do that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21203#issuecomment-2377408005 From ogillespie at openjdk.org Thu Sep 26 16:25:36 2024 From: ogillespie at openjdk.org (Oli Gillespie) Date: Thu, 26 Sep 2024 16:25:36 GMT Subject: RFR: 8341013: Optimize x86/aarch64 MD5 intrinsics by reducing data dependency In-Reply-To: <0tTwfxNQJz8-XYxBL1zujuv7Cbbe8N1hVqsqddmYB1o=.367aa163-cae4-4d31-a84c-ee7e11c49776@github.com> References: <0tTwfxNQJz8-XYxBL1zujuv7Cbbe8N1hVqsqddmYB1o=.367aa163-cae4-4d31-a84c-ee7e11c49776@github.com> Message-ID: <79fI8ByboUSgIF7r_ka5gQ3QpHy5QacucjQ9Cy429ZQ=.0a785c68-38f2-4734-abf9-5922a69312b1@github.com> On Thu, 26 Sep 2024 16:18:34 GMT, Andrew Haley wrote: > Given that there is so little advantage, almost down in the noise, you should do that. Just to check we're talking about the same results - the improvement shown in my aarch64 run is the same (actually a littler more) as the x86 run; around 5.6%, and very high confidence (+-0.1%). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21203#issuecomment-2377416892 From phh at openjdk.org Thu Sep 26 17:10:12 2024 From: phh at openjdk.org (Paul Hohensee) Date: Thu, 26 Sep 2024 17:10:12 GMT Subject: RFR: 8340181: Shenandoah: Cleanup ShenandoahRuntime stubs In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 08:52:21 GMT, Aleksey Shipilev wrote: > Noticed this while working on Leyden, which has to enumerate Shenandoah stubs for code archival to work. > > `ShenandoahRuntime::shenandoah_clone_barrier` is excessive name. `ShenandoahRuntime::arraycopy_barrier_oop_entry` and friends is not covered by `JRT_LEAF`. This change hopefully homogenizes the namings for the stubs. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` In case you need a second review. ------------- Marked as reviewed by phh (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21152#pullrequestreview-2331900600 From never at openjdk.org Thu Sep 26 17:24:35 2024 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 26 Sep 2024 17:24:35 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent [v2] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 10:32:49 GMT, Doug Simon wrote: >> This PR replaces some uses of `UseJVMCICompiler` with `EnableJVMCI` so that JVMCI code paths are taken when JVMCI is only used for non-CompilerBroker compilations. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > clarified doc for EnableJVMCI and UseJVMCINativeLibrary This is a nice cleanup. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21120#pullrequestreview-2331928013 From sgibbons at openjdk.org Thu Sep 26 17:27:50 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 26 Sep 2024 17:27:50 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> Message-ID: On Thu, 26 Sep 2024 16:15:39 GMT, Roman Kennke wrote: >>> Does this look correct to you? Or better to do it as a follow-up? >> >> I do not feel confident enough to review this part. If you want to include https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 in this changeset, I would prefer that the original author of JDK-8320448 or at least someone from Intel reviews it, otherwise I think it is fine to leave it as a follow-up enhancement. > > @sviswa7 or @asgibbons WDYT about including https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 as part of compact object headers implementation? Otherwise we would have to disable indexOf intrinsic when running with compact headers, because of the assumption that array headers are >= 16 bytes, which is no longer true with compact headers. @rkennke I reviewed [rkennke@ 097c2af](https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1) and the code looks good to me. I would prefer this approach instead of not generating the IndexOf intrinsic. Should the controlling `if` be conditioned on `UseCompactObjectHeaders` instead of `arrayOopDesc::base_offset_in_bytes`? I can see benefits to either - which provides more clarity? I like the assert as it makes the intention clear (thanks!). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777485078 From duke at openjdk.org Thu Sep 26 17:49:50 2024 From: duke at openjdk.org (Todd V. Jonker) Date: Thu, 26 Sep 2024 17:49:50 GMT Subject: RFR: 8340974: Ambiguous name of jtreg property vm.libgraal.enabled [v2] In-Reply-To: References: Message-ID: > This disambiguates the situation where libgraal is "enabled" for use by non-CompilerBroker compilations, without being used as the JIT compiler. > > Per discussion at https://github.com/openjdk/jdk/pull/21120#issuecomment-2372462365 > > Grep shows that the property and related method are not used in this codebase. Tier1 tests on linux-x86_64-server-release pass cleanly. Todd V. Jonker has updated the pull request incrementally with one additional commit since the last revision: Rename Compiler.isLibgraalEnabled to isLibgraalJIT ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21190/files - new: https://git.openjdk.org/jdk/pull/21190/files/386150d4..59a8ed3c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21190&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21190&range=00-01 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/21190.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21190/head:pull/21190 PR: https://git.openjdk.org/jdk/pull/21190 From duke at openjdk.org Thu Sep 26 17:49:50 2024 From: duke at openjdk.org (Todd V. Jonker) Date: Thu, 26 Sep 2024 17:49:50 GMT Subject: RFR: 8340974: Ambiguous name of jtreg property vm.libgraal.enabled [v2] In-Reply-To: <7jMxI7fnbc9aen7sk5qrH4MHt7Pf7eu0lSQaDeav8To=.465b1523-eb84-4e31-9346-38c3165583e8@github.com> References: <7jMxI7fnbc9aen7sk5qrH4MHt7Pf7eu0lSQaDeav8To=.465b1523-eb84-4e31-9346-38c3165583e8@github.com> Message-ID: On Thu, 26 Sep 2024 07:23:35 GMT, Doug Simon wrote: >> Todd V. Jonker has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename Compiler.isLibgraalEnabled to isLibgraalJIT > > test/jtreg-ext/requires/VMProps.java line 562: > >> 560: * @return true if libgraal is used as JIT compiler. >> 561: */ >> 562: protected String isLibgraalJit() { > > I slightly prefer `isLibgraalJIT` as this acronym is (most) capitalized in the code base. > > You should also rename `isLibgraalEnabled` to `isLibgraalJIT` in `test/lib/jdk/test/whitebox/code/Compiler.java` for consistency. Renamed both. I was being conservative by limiting scope of change to the relevant interface, and felt that the existing name was reasonable in the `Compiler` scope. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21190#discussion_r1777513248 From lmesnik at openjdk.org Thu Sep 26 18:23:48 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 26 Sep 2024 18:23:48 GMT Subject: RFR: 8340826: Should not send unload notification for scratch classes [v2] In-Reply-To: References: Message-ID: <_pMZy7UcNUu1V6S2zW3mfpGFUiwlz7OBkF-rM0TV6XI=.d231d35b-1357-4dbd-9041-5ee0bda3d3dd@github.com> > The jvmti class redefinition creates temporary scratch classes for it's own purposes. These classes are added to corresponding classloaders and might be unloaded. > In this case the jvmti/jfr and log events are generated twice: for original class and for it's scratch. > > The bug could be reproduced by jfr test > jdk/jfr/api/metadata/eventtype/TestUnloadingEventClass.java > with '-Xcomp -XX:TieredStopAtLevel=1' or with '-Xcomp' > > The test log (modified slightly) shown > > > [167.294s][info ][class,unload] unloading class jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded 0x00000000af1006d8 allocated > [167.294s][info ][class,unload] unloading class jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded 0x00000000af100248 fully_initialized > [167.345s][trace][class,unload] unlinking class (subclass): jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded > [167.872s][trace][gc ] GC(0) Restored 3597 marks, occupying 57552 B > [167.924s][info ][gc ] GC(0) Pause Full (System.gc()) 34M->2M(136M) 691.041ms > Unloaded count: 2 > > > instead of expected > > > > [159.737s][info ][class,unload] unloading class jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded 0x0000000041100248 state: fully_initialized > [159.800s][trace][class,unload] unlinking class (subclass): jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded > [160.341s][trace][gc ] GC(0) Restored 3597 marks, occupying 57552 B > [160.384s][info ][gc ] GC(0) Pause Full (System.gc()) 34M->2M(136M) 710.422ms > > > > The test hang because got 2 events while waiting for one. > The "allocated" version is the scratch class generated by JVMTI JFR agent that redefine classes. > > The fix is to don't send notification for scratch classes. The scratch classes shouldn't have dependency so added assertion. Also, we don't expect any other not loaded classes during unloaded. > > Thanks Coleen for details about scratch classed. > > Tested with tier1-5 and with :jdk_jfr with Xcomp and c1. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: space added ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21166/files - new: https://git.openjdk.org/jdk/pull/21166/files/e3a0b81f..86207f2e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21166&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21166&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21166.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21166/head:pull/21166 PR: https://git.openjdk.org/jdk/pull/21166 From coleenp at openjdk.org Thu Sep 26 18:31:36 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 26 Sep 2024 18:31:36 GMT Subject: RFR: 8340826: Should not send unload notification for scratch classes [v2] In-Reply-To: <_pMZy7UcNUu1V6S2zW3mfpGFUiwlz7OBkF-rM0TV6XI=.d231d35b-1357-4dbd-9041-5ee0bda3d3dd@github.com> References: <_pMZy7UcNUu1V6S2zW3mfpGFUiwlz7OBkF-rM0TV6XI=.d231d35b-1357-4dbd-9041-5ee0bda3d3dd@github.com> Message-ID: On Thu, 26 Sep 2024 18:23:48 GMT, Leonid Mesnik wrote: >> The jvmti class redefinition creates temporary scratch classes for it's own purposes. These classes are added to corresponding classloaders and might be unloaded. >> In this case the jvmti/jfr and log events are generated twice: for original class and for it's scratch. >> >> The bug could be reproduced by jfr test >> jdk/jfr/api/metadata/eventtype/TestUnloadingEventClass.java >> with '-Xcomp -XX:TieredStopAtLevel=1' or with '-Xcomp' >> >> The test log (modified slightly) shown >> >> >> [167.294s][info ][class,unload] unloading class jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded 0x00000000af1006d8 allocated >> [167.294s][info ][class,unload] unloading class jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded 0x00000000af100248 fully_initialized >> [167.345s][trace][class,unload] unlinking class (subclass): jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded >> [167.872s][trace][gc ] GC(0) Restored 3597 marks, occupying 57552 B >> [167.924s][info ][gc ] GC(0) Pause Full (System.gc()) 34M->2M(136M) 691.041ms >> Unloaded count: 2 >> >> >> instead of expected >> >> >> >> [159.737s][info ][class,unload] unloading class jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded 0x0000000041100248 state: fully_initialized >> [159.800s][trace][class,unload] unlinking class (subclass): jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded >> [160.341s][trace][gc ] GC(0) Restored 3597 marks, occupying 57552 B >> [160.384s][info ][gc ] GC(0) Pause Full (System.gc()) 34M->2M(136M) 710.422ms >> >> >> >> The test hang because got 2 events while waiting for one. >> The "allocated" version is the scratch class generated by JVMTI JFR agent that redefine classes. >> >> The fix is to don't send notification for scratch classes. The scratch classes shouldn't have dependency so added assertion. Also, we don't expect any other not loaded classes during unloaded. >> >> Thanks Coleen for details about scratch classed. >> >> Tested with tier1-5 and with :jdk_jfr with Xcomp and c1. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > space added Thanks! ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21166#pullrequestreview-2332068915 From dnsimon at openjdk.org Thu Sep 26 19:29:35 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 26 Sep 2024 19:29:35 GMT Subject: RFR: 8340974: Ambiguous name of jtreg property vm.libgraal.enabled [v2] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 17:49:50 GMT, Todd V. Jonker wrote: >> This disambiguates the situation where libgraal is "enabled" for use by non-CompilerBroker compilations, without being used as the JIT compiler. >> >> Per discussion at https://github.com/openjdk/jdk/pull/21120#issuecomment-2372462365 >> >> Grep shows that the property and related method are not used in this codebase. Tier1 tests on linux-x86_64-server-release pass cleanly. > > Todd V. Jonker has updated the pull request incrementally with one additional commit since the last revision: > > Rename Compiler.isLibgraalEnabled to isLibgraalJIT LGTM. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21190#pullrequestreview-2332175624 From dnsimon at openjdk.org Thu Sep 26 19:39:39 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 26 Sep 2024 19:39:39 GMT Subject: RFR: 8340576: Some JVMCI flags are inconsistent [v2] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 10:32:49 GMT, Doug Simon wrote: >> This PR replaces some uses of `UseJVMCICompiler` with `EnableJVMCI` so that JVMCI code paths are taken when JVMCI is only used for non-CompilerBroker compilations. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > clarified doc for EnableJVMCI and UseJVMCINativeLibrary Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21120#issuecomment-2377780416 From dnsimon at openjdk.org Thu Sep 26 19:39:40 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 26 Sep 2024 19:39:40 GMT Subject: Integrated: 8340576: Some JVMCI flags are inconsistent In-Reply-To: References: Message-ID: On Sun, 22 Sep 2024 11:51:20 GMT, Doug Simon wrote: > This PR replaces some uses of `UseJVMCICompiler` with `EnableJVMCI` so that JVMCI code paths are taken when JVMCI is only used for non-CompilerBroker compilations. This pull request has now been integrated. Changeset: 5d062e24 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/5d062e248ec4be7b35f85c341e76aa6d8d6d8b2b Stats: 18 lines in 9 files changed: 2 ins; 4 del; 12 mod 8340576: Some JVMCI flags are inconsistent Reviewed-by: never ------------- PR: https://git.openjdk.org/jdk/pull/21120 From duke at openjdk.org Thu Sep 26 20:48:36 2024 From: duke at openjdk.org (duke) Date: Thu, 26 Sep 2024 20:48:36 GMT Subject: RFR: 8340974: Ambiguous name of jtreg property vm.libgraal.enabled [v2] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 17:49:50 GMT, Todd V. Jonker wrote: >> This disambiguates the situation where libgraal is "enabled" for use by non-CompilerBroker compilations, without being used as the JIT compiler. >> >> Per discussion at https://github.com/openjdk/jdk/pull/21120#issuecomment-2372462365 >> >> Grep shows that the property and related method are not used in this codebase. Tier1 tests on linux-x86_64-server-release pass cleanly. > > Todd V. Jonker has updated the pull request incrementally with one additional commit since the last revision: > > Rename Compiler.isLibgraalEnabled to isLibgraalJIT @openjdk[bot] Your change (at version 59a8ed3c876e6084f95cd15dffa1403b263b8048) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21190#issuecomment-2377905075 From asmehra at openjdk.org Thu Sep 26 20:52:37 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Thu, 26 Sep 2024 20:52:37 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v3] In-Reply-To: <1VQUnjdiscLRkDSW_pKI9D3HRHuRVsvuDGscxfXjCgs=.bd09973b-bba9-4ca1-9aa8-c015f5e4c9cf@github.com> References: <0QRMVGKYDVfu4Ie1N6RKC5B1QPxs7sQUvdnyZxayX9o=.e4fe8dc7-4cc8-44ba-99e7-dc5cacd85147@github.com> <1VQUnjdiscLRkDSW_pKI9D3HRHuRVsvuDGscxfXjCgs=.bd09973b-bba9-4ca1-9aa8-c015f5e4c9cf@github.com> Message-ID: On Thu, 19 Sep 2024 16:26:07 GMT, Ioi Lam wrote: >> Expanding on the above example, lets say A is aot- initialized, but B and C are not. >> So this function should initialize only X not Y, is that correct? If so, then how does it prevent initialization of Y? It iterates through all the subgraph_object_klasses which includes both X and Y. > > The `subgraph_object_klasses` are built during assembly phase when each mirror is added to the cache. Note that we don't add the "real" mirror of the classes, but we add the scratch mirror: > > > void HeapShared::archive_java_mirrors() { > ... > oop m = scratch_java_mirror(orig_k); > if (m != nullptr) { > Klass* buffered_k = ArchiveBuilder::get_buffered_klass(orig_k); > bool success = archive_reachable_objects_from(1, _default_subgraph_info, m); > > > So the scratch mirrors of `B` and `C` will be empty when they are being archived. > > Because `A` is marked as aot-initialized, we copy the fields of the "real" mirror of `A` into its scratch mirror. See `HeapShared::copy_aot_initialized_mirror()`. That's why we are able to add `X` into the subgraph_object_klasses (when `A`'s scratch mirror is scanned inside `archive_reachable_objects_from()`) @iklam are you planning to address this? Otherwise the patch looks good to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20958#discussion_r1777718590 From iklam at openjdk.org Thu Sep 26 21:04:37 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 26 Sep 2024 21:04:37 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v3] In-Reply-To: References: <0QRMVGKYDVfu4Ie1N6RKC5B1QPxs7sQUvdnyZxayX9o=.e4fe8dc7-4cc8-44ba-99e7-dc5cacd85147@github.com> <1VQUnjdiscLRkDSW_pKI9D3HRHuRVsvuDGscxfXjCgs=.bd09973b-bba9-4ca1-9aa8-c015f5e4c9cf@github.com> Message-ID: On Thu, 26 Sep 2024 20:50:17 GMT, Ashutosh Mehra wrote: >> The `subgraph_object_klasses` are built during assembly phase when each mirror is added to the cache. Note that we don't add the "real" mirror of the classes, but we add the scratch mirror: >> >> >> void HeapShared::archive_java_mirrors() { >> ... >> oop m = scratch_java_mirror(orig_k); >> if (m != nullptr) { >> Klass* buffered_k = ArchiveBuilder::get_buffered_klass(orig_k); >> bool success = archive_reachable_objects_from(1, _default_subgraph_info, m); >> >> >> So the scratch mirrors of `B` and `C` will be empty when they are being archived. >> >> Because `A` is marked as aot-initialized, we copy the fields of the "real" mirror of `A` into its scratch mirror. See `HeapShared::copy_aot_initialized_mirror()`. That's why we are able to add `X` into the subgraph_object_klasses (when `A`'s scratch mirror is scanned inside `archive_reachable_objects_from()`) > > @iklam are you planning to address this? Otherwise the patch looks good to me. Yes, I plan to to address this. I think the name `init_classes_reachable_from_archived_mirrors` is confusing, as it also initializes classes that may not be reachable from archived mirrors. I will try to rename this function and related data structures to make the meaning more obvious. I'd like to keep the logic the same, as I still need to do the initialization in 4 steps (boo1/boot2/platform/app) for classes listed under `_runtime_default_subgraph_info`. If I split up `_runtime_default_subgraph_info` into two separate parts the code will just become more verbose. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20958#discussion_r1777729938 From phh at openjdk.org Thu Sep 26 21:40:39 2024 From: phh at openjdk.org (Paul Hohensee) Date: Thu, 26 Sep 2024 21:40:39 GMT Subject: RFR: 8340974: Ambiguous name of jtreg property vm.libgraal.enabled [v2] In-Reply-To: References: Message-ID: <6ZSR79931Yt_5OPH5-_HS4g4RQzOqU1cBebC3kVvKe0=.ee58ee43-a69c-431e-99f2-a200de999364@github.com> On Thu, 26 Sep 2024 17:49:50 GMT, Todd V. Jonker wrote: >> This disambiguates the situation where libgraal is "enabled" for use by non-CompilerBroker compilations, without being used as the JIT compiler. >> >> Per discussion at https://github.com/openjdk/jdk/pull/21120#issuecomment-2372462365 >> >> Grep shows that the property and related method are not used in this codebase. Tier1 tests on linux-x86_64-server-release pass cleanly. > > Todd V. Jonker has updated the pull request incrementally with one additional commit since the last revision: > > Rename Compiler.isLibgraalEnabled to isLibgraalJIT Marked as reviewed by phh (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21190#pullrequestreview-2332385804 From duke at openjdk.org Thu Sep 26 21:40:40 2024 From: duke at openjdk.org (Todd V. Jonker) Date: Thu, 26 Sep 2024 21:40:40 GMT Subject: Integrated: 8340974: Ambiguous name of jtreg property vm.libgraal.enabled In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 19:49:28 GMT, Todd V. Jonker wrote: > This disambiguates the situation where libgraal is "enabled" for use by non-CompilerBroker compilations, without being used as the JIT compiler. > > Per discussion at https://github.com/openjdk/jdk/pull/21120#issuecomment-2372462365 > > Grep shows that the property and related method are not used in this codebase. Tier1 tests on linux-x86_64-server-release pass cleanly. This pull request has now been integrated. Changeset: 2349bb7a Author: Todd V. Jonker Committer: Paul Hohensee URL: https://git.openjdk.org/jdk/commit/2349bb7ace0c40c0f19dee81b4a86bed0e855043 Stats: 7 lines in 3 files changed: 0 ins; 1 del; 6 mod 8340974: Ambiguous name of jtreg property vm.libgraal.enabled Reviewed-by: dnsimon, phh ------------- PR: https://git.openjdk.org/jdk/pull/21190 From duke at openjdk.org Thu Sep 26 23:12:46 2024 From: duke at openjdk.org (duke) Date: Thu, 26 Sep 2024 23:12:46 GMT Subject: Withdrawn: 8313396: Portable implementation of FORBID_C_FUNCTION and ALLOW_C_FUNCTION In-Reply-To: References: Message-ID: On Fri, 12 Jan 2024 06:16:25 GMT, Julian Waters wrote: > Please review a portable implementation of FORBID_C_FUNCTION and ALLOW_C_FUNCTION: > > Currently, FORBID_C_FUNCTION only works for gcc like compilers, and ALLOW_C_FUNCTION acts to disable CRT warnings on Windows, where FORBID_C_FUNCTION does not work. It would be beneficial to provide a universal portable definition for both, to allow the macros to work on all platforms HotSpot can be compiled for. > > The implementation is portable and _should_ work on all HotSpot supported platforms (I don't have an AIX device!). > > Regrettably, I did end up having to change the signature of ALLOW_C_FUNCTION to work with this new implementation, as well as the way it is used. On one hand, it is more compact than before, but on the other the established syntax is likely more familiar by this point. I do hope this is not a showstopper, but understand if it is This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/17387 From fyang at openjdk.org Fri Sep 27 02:28:37 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 27 Sep 2024 02:28:37 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v8] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 13:14:04 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> Thanks! >> >> This patch is based on https://github.com/openjdk/jdk/pull/20781 which added the sleef source (in particular the generated sleef inline headers). We use sleef api to vectorize the math operations in vector api. >> >> On machine with vector intrinsic support on riscv (e.g. gcc 14+) it will generate libsleef.so with the bridge functions to sleef api, otherwise without the bridge functions. >> >> ### Test >> test/jdk/jdk/incubator/vector >> >> ### Performance >> data on bananapi >> >> Benchmark - bananapi | (size) | Mode | Cnt | Score +intrinsic | Error +intrinsic | Score -intrinsic | Error -intrinsic | Units | Improvement >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Double128Vector.ACOS | 1024 | avgt | 10 | 112444.388 | 655.761 | 208554.742 | 1508.709 | ns/op | 1.855 >> Double128Vector.ASIN | 1024 | avgt | 10 | 104121.259 | 243.167 | 208314.499 | 2833.61 | ns/op | 2.001 >> Double128Vector.ATAN | 1024 | avgt | 10 | 136941.263 | 243.486 | 284024.53 | 2204.224 | ns/op | 2.074 >> Double128Vector.ATAN2 | 1024 | avgt | 10 | 163228.681 | 435.455 | 427589.587 | 3045.192 | ns/op | 2.62 >> Double128Vector.CBRT | 1024 | avgt | 10 | 146395.753 | 239.355 | 317136.654 | 1330.869 | ns/op | 2.166 >> Double128Vector.COS | 1024 | avgt | 10 | 154865.298 | 235.697 | 305721.518 | 1319.313 | ns/op | 1.974 >> Double128Vector.COSH | 1024 | avgt | 10 | 189212.943 | 262.399 | 220756.27 | 61324.863 | ns/op | 1.167 >> Double128Vector.EXP | 1024 | avgt | 10 | 113941.594 | 219.647 | 252853.07 | 891.272 | ns/op | 2.219 >> Double128Vector.EXPM1 | 1024 | avgt | 10 | 184552.939 | 513.715 | 254087.184 | 2144.997 | ns/op | 1.377 >> Double128Vector.HYPOT | 1024 | avgt | 10 | 111580.194 | 423.282 | 374537.338 | 2091.811 | ns/op | 3.357 >> Double128Vector.LOG | 1024 | avgt | 10 | 110680.548 | 192.731 | 265391.129 | 2653.519 | ns/op | 2.398 >> Double128Vector.LOG10 | 1024 | avgt | 10 | 116708.105 | 167.095 | 285764.405 | 2489.08 | ns/op | 2.449 >> Double128Vector.LOG1P | 1024 | avgt | 10 | 115633.302 | 567.7 | 317235.967 | 1062.848 | ns/op | 2.743 >> Double128Vector.POW | 1024 | avgt | 10 | 321655.14 | 3... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > fix test macro src/jdk.incubator.vector/linux/native/libsleef/lib/vector_math_rvv.c line 51: > 49: // the dynamic rounding mode is always RNE. > 50: > 51: #ifdef DEBUG Question: Should we check for `NDEBUG` macro here instead? I see checks for this macro in the original SLEEF code. #ifndef NDEBUG #define CHECK_FRM __asm__ __volatile__ ( \ " frrm t0 \n\t" \ " beqz t0, 2f \n\t" \ " csrrw x0, cycle, x0 \n\t" \ "2: \n\t" \ : : : "memory" ); #else #define CHECK_FRM #endif ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1777931566 From sspitsyn at openjdk.org Fri Sep 27 03:03:36 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 27 Sep 2024 03:03:36 GMT Subject: RFR: 8340826: Should not send unload notification for scratch classes [v2] In-Reply-To: <_pMZy7UcNUu1V6S2zW3mfpGFUiwlz7OBkF-rM0TV6XI=.d231d35b-1357-4dbd-9041-5ee0bda3d3dd@github.com> References: <_pMZy7UcNUu1V6S2zW3mfpGFUiwlz7OBkF-rM0TV6XI=.d231d35b-1357-4dbd-9041-5ee0bda3d3dd@github.com> Message-ID: <_vovdhbxro8VsOql1YRBFkKRNY27XVZpZw0dwuuXhGY=.efb7437a-467b-4c87-b3c6-1c8c427ea616@github.com> On Thu, 26 Sep 2024 18:23:48 GMT, Leonid Mesnik wrote: >> The jvmti class redefinition creates temporary scratch classes for it's own purposes. These classes are added to corresponding classloaders and might be unloaded. >> In this case the jvmti/jfr and log events are generated twice: for original class and for it's scratch. >> >> The bug could be reproduced by jfr test >> jdk/jfr/api/metadata/eventtype/TestUnloadingEventClass.java >> with '-Xcomp -XX:TieredStopAtLevel=1' or with '-Xcomp' >> >> The test log (modified slightly) shown >> >> >> [167.294s][info ][class,unload] unloading class jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded 0x00000000af1006d8 allocated >> [167.294s][info ][class,unload] unloading class jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded 0x00000000af100248 fully_initialized >> [167.345s][trace][class,unload] unlinking class (subclass): jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded >> [167.872s][trace][gc ] GC(0) Restored 3597 marks, occupying 57552 B >> [167.924s][info ][gc ] GC(0) Pause Full (System.gc()) 34M->2M(136M) 691.041ms >> Unloaded count: 2 >> >> >> instead of expected >> >> >> >> [159.737s][info ][class,unload] unloading class jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded 0x0000000041100248 state: fully_initialized >> [159.800s][trace][class,unload] unlinking class (subclass): jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded >> [160.341s][trace][gc ] GC(0) Restored 3597 marks, occupying 57552 B >> [160.384s][info ][gc ] GC(0) Pause Full (System.gc()) 34M->2M(136M) 710.422ms >> >> >> >> The test hang because got 2 events while waiting for one. >> The "allocated" version is the scratch class generated by JVMTI JFR agent that redefine classes. >> >> The fix is to don't send notification for scratch classes. The scratch classes shouldn't have dependency so added assertion. Also, we don't expect any other not loaded classes during unloaded. >> >> Thanks Coleen for details about scratch classed. >> >> Tested with tier1-5 and with :jdk_jfr with Xcomp and c1. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > space added Marked as reviewed by sspitsyn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21166#pullrequestreview-2332663970 From fyang at openjdk.org Fri Sep 27 05:37:35 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 27 Sep 2024 05:37:35 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release Hi, Thanks for the ping. RISC-V part of the change looks fine. Not obvious change witnessed on specjbb numbers. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21110#pullrequestreview-2332782685 From dholmes at openjdk.org Fri Sep 27 06:23:43 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 27 Sep 2024 06:23:43 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v4] In-Reply-To: <9q4etSUBTu0pkFlzmGS3YjeJ8VnDaR5jQOg1TC8KtlU=.a573b142-97a5-472a-814e-de301efde9da@github.com> References: <9q4etSUBTu0pkFlzmGS3YjeJ8VnDaR5jQOg1TC8KtlU=.a573b142-97a5-472a-814e-de301efde9da@github.com> Message-ID: On Wed, 25 Sep 2024 13:54:07 GMT, Fredrik Bredberg wrote: >> Removed the concept of an ObjectMonitor Responsible thread. >> >> The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. >> >> The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). >> >> After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. >> >> Passes tier1-tier7 on supported platforms. >> x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. >> Arm32 and Zero doesn't need any changes as far as I can tell. > > Fredrik Bredberg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - Update three, after the review > - Merge branch 'master' into 8320318_objectmon_responsible_thread > - Update two, after the review > - Update one, after the review > - Small fixes before the review > - Merge branch 'master' into 8320318_objectmon_responsible_thread > - Merge branch 'master' into 8320318_objectmon_responsible_thread > - Removed _Responsible > - Fixed s390 > - Fixed legacy locking > - ... and 4 more: https://git.openjdk.org/jdk/compare/0f253d11...8140570f src/hotspot/share/runtime/objectMonitor.cpp line 312: > 310: // The monitor is private to or already owned by locking_thread which must be suspended. > 311: // So this code may only contend with deflation. > 312: assert(locking_thread == Thread::current() || locking_thread->is_obj_deopt_suspend(), "must be"); These comments and asserts seem to belong to `enter_for_with_contention_mark`. ?? src/hotspot/share/runtime/objectMonitor.cpp line 336: > 334: // decrement occurs when the contention_mark goes out of > 335: // scope. ObjectMonitor::deflate_monitor() which is called by > 336: // the deflater thread who will decrement contentions after it Suggestion: // scope. ObjectMonitor::deflate_monitor() (which is called by // the deflater thread) will decrement contentions after it ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1778045723 PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1778038127 From dholmes at openjdk.org Fri Sep 27 06:23:43 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 27 Sep 2024 06:23:43 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v4] In-Reply-To: References: <9q4etSUBTu0pkFlzmGS3YjeJ8VnDaR5jQOg1TC8KtlU=.a573b142-97a5-472a-814e-de301efde9da@github.com> Message-ID: On Fri, 27 Sep 2024 05:55:35 GMT, David Holmes wrote: >> Fredrik Bredberg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: >> >> - Update three, after the review >> - Merge branch 'master' into 8320318_objectmon_responsible_thread >> - Update two, after the review >> - Update one, after the review >> - Small fixes before the review >> - Merge branch 'master' into 8320318_objectmon_responsible_thread >> - Merge branch 'master' into 8320318_objectmon_responsible_thread >> - Removed _Responsible >> - Fixed s390 >> - Fixed legacy locking >> - ... and 4 more: https://git.openjdk.org/jdk/compare/0f253d11...8140570f > > src/hotspot/share/runtime/objectMonitor.cpp line 312: > >> 310: // The monitor is private to or already owned by locking_thread which must be suspended. >> 311: // So this code may only contend with deflation. >> 312: assert(locking_thread == Thread::current() || locking_thread->is_obj_deopt_suspend(), "must be"); > > These comments and asserts seem to belong to `enter_for_with_contention_mark`. ?? And for `enter_for` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1778066968 From aboldtch at openjdk.org Fri Sep 27 06:32:40 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 27 Sep 2024 06:32:40 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v4] In-Reply-To: References: <9q4etSUBTu0pkFlzmGS3YjeJ8VnDaR5jQOg1TC8KtlU=.a573b142-97a5-472a-814e-de301efde9da@github.com> Message-ID: <2mGCingmy8Y1LpOQRebFtK9aTsUqfqIP9LeGa4TD3RY=.ddf18b27-9a69-485a-8c28-77869216fac5@github.com> On Fri, 27 Sep 2024 06:20:29 GMT, David Holmes wrote: >> src/hotspot/share/runtime/objectMonitor.cpp line 312: >> >>> 310: // The monitor is private to or already owned by locking_thread which must be suspended. >>> 311: // So this code may only contend with deflation. >>> 312: assert(locking_thread == Thread::current() || locking_thread->is_obj_deopt_suspend(), "must be"); >> >> These comments and asserts seem to belong to `enter_for_with_contention_mark`. ?? > > And for `enter_for` I had the same comment earlier https://github.com/openjdk/jdk/pull/19454#discussion_r1753348561 guess it got lost in the renaming of the function. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1778075638 From fbredberg at openjdk.org Fri Sep 27 06:45:41 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Fri, 27 Sep 2024 06:45:41 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v4] In-Reply-To: <2mGCingmy8Y1LpOQRebFtK9aTsUqfqIP9LeGa4TD3RY=.ddf18b27-9a69-485a-8c28-77869216fac5@github.com> References: <9q4etSUBTu0pkFlzmGS3YjeJ8VnDaR5jQOg1TC8KtlU=.a573b142-97a5-472a-814e-de301efde9da@github.com> <2mGCingmy8Y1LpOQRebFtK9aTsUqfqIP9LeGa4TD3RY=.ddf18b27-9a69-485a-8c28-77869216fac5@github.com> Message-ID: <7OQeCCmp26s_Hnpc9kGUQguRy7_QfJc2_fVjJykHQm8=.6b950afc-e643-4533-8eb8-ed811ea4aca2@github.com> On Fri, 27 Sep 2024 06:30:25 GMT, Axel Boldt-Christmas wrote: >> And for `enter_for` > > I had the same comment earlier https://github.com/openjdk/jdk/pull/19454#discussion_r1753348561 guess it got lost in the renaming of the function. Quite correct, it got lost in the renaming process. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1778087675 From mli at openjdk.org Fri Sep 27 07:09:42 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 27 Sep 2024 07:09:42 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v8] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 02:22:58 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> fix test macro > > src/jdk.incubator.vector/linux/native/libsleef/lib/vector_math_rvv.c line 51: > >> 49: // the dynamic rounding mode is always RNE. >> 50: >> 51: #ifdef DEBUG > > Question: Should we check for `NDEBUG` macro (A macro specified by C/C++ standard) here instead? I see checks for this macro in the original SLEEF code. > > > #ifndef NDEBUG > #define CHECK_FRM __asm__ __volatile__ ( \ > " frrm t0 \n\t" \ > " beqz t0, 2f \n\t" \ > " csrrw x0, cycle, x0 \n\t" \ > "2: \n\t" \ > : : : "memory" ); > #else > #define CHECK_FRM > #endif `NDEBUG` is only used in sleefdp.c which is the original sleef code, and we don't use that file in jdk directly, in java.base module of jdk it uses `DEBUG` but not use `NDEBUG`. Based on above information, I think `DEBUG` is better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1778114122 From fbredberg at openjdk.org Fri Sep 27 07:42:15 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Fri, 27 Sep 2024 07:42:15 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v5] In-Reply-To: References: Message-ID: > Removed the concept of an ObjectMonitor Responsible thread. > > The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. > > The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). > > After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. > > Passes tier1-tier7 on supported platforms. > x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. > Arm32 and Zero doesn't need any changes as far as I can tell. Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: Update four, after the review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19454/files - new: https://git.openjdk.org/jdk/pull/19454/files/8140570f..1dcdc176 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19454&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19454&range=03-04 Stats: 10 lines in 1 file changed: 6 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/19454.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19454/head:pull/19454 PR: https://git.openjdk.org/jdk/pull/19454 From fbredberg at openjdk.org Fri Sep 27 07:42:16 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Fri, 27 Sep 2024 07:42:16 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v4] In-Reply-To: <7OQeCCmp26s_Hnpc9kGUQguRy7_QfJc2_fVjJykHQm8=.6b950afc-e643-4533-8eb8-ed811ea4aca2@github.com> References: <9q4etSUBTu0pkFlzmGS3YjeJ8VnDaR5jQOg1TC8KtlU=.a573b142-97a5-472a-814e-de301efde9da@github.com> <2mGCingmy8Y1LpOQRebFtK9aTsUqfqIP9LeGa4TD3RY=.ddf18b27-9a69-485a-8c28-77869216fac5@github.com> <7OQeCCmp26s_Hnpc9kGUQguRy7_QfJc2_fVjJykHQm8=.6b950afc-e643-4533-8eb8-ed811ea4aca2@github.com> Message-ID: On Fri, 27 Sep 2024 06:43:22 GMT, Fredrik Bredberg wrote: >> I had the same comment earlier https://github.com/openjdk/jdk/pull/19454#discussion_r1753348561 guess it got lost in the renaming of the function. > > Quite correct, it got lost in the renaming process. fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1778170129 From fbredberg at openjdk.org Fri Sep 27 07:42:18 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Fri, 27 Sep 2024 07:42:18 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v4] In-Reply-To: References: <9q4etSUBTu0pkFlzmGS3YjeJ8VnDaR5jQOg1TC8KtlU=.a573b142-97a5-472a-814e-de301efde9da@github.com> Message-ID: On Fri, 27 Sep 2024 05:44:40 GMT, David Holmes wrote: >> Fredrik Bredberg has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: >> >> - Update three, after the review >> - Merge branch 'master' into 8320318_objectmon_responsible_thread >> - Update two, after the review >> - Update one, after the review >> - Small fixes before the review >> - Merge branch 'master' into 8320318_objectmon_responsible_thread >> - Merge branch 'master' into 8320318_objectmon_responsible_thread >> - Removed _Responsible >> - Fixed s390 >> - Fixed legacy locking >> - ... and 4 more: https://git.openjdk.org/jdk/compare/0f253d11...8140570f > > src/hotspot/share/runtime/objectMonitor.cpp line 336: > >> 334: // decrement occurs when the contention_mark goes out of >> 335: // scope. ObjectMonitor::deflate_monitor() which is called by >> 336: // the deflater thread who will decrement contentions after it > > Suggestion: > > // scope. ObjectMonitor::deflate_monitor() (which is called by > // the deflater thread) will decrement contentions after it fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1778171745 From aboldtch at openjdk.org Fri Sep 27 08:10:44 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 27 Sep 2024 08:10:44 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v5] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 07:42:15 GMT, Fredrik Bredberg wrote: >> Removed the concept of an ObjectMonitor Responsible thread. >> >> The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. >> >> The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). >> >> After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. >> >> Passes tier1-tier7 on supported platforms. >> x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. >> Arm32 and Zero doesn't need any changes as far as I can tell. > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Update four, after the review Marked as reviewed by aboldtch (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/19454#pullrequestreview-2333065222 From fyang at openjdk.org Fri Sep 27 08:14:37 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 27 Sep 2024 08:14:37 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v8] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 07:06:44 GMT, Hamlin Li wrote: >> src/jdk.incubator.vector/linux/native/libsleef/lib/vector_math_rvv.c line 51: >> >>> 49: // the dynamic rounding mode is always RNE. >>> 50: >>> 51: #ifdef DEBUG >> >> Question: Should we check for `NDEBUG` macro (A macro specified by C/C++ standard) here instead? I see checks for this macro in the original SLEEF code. >> >> >> #ifndef NDEBUG >> #define CHECK_FRM __asm__ __volatile__ ( \ >> " frrm t0 \n\t" \ >> " beqz t0, 2f \n\t" \ >> " csrrw x0, cycle, x0 \n\t" \ >> "2: \n\t" \ >> : : : "memory" ); >> #else >> #define CHECK_FRM >> #endif > > `NDEBUG` is only used in sleefdp.c which is the original sleef code, and we don't use that file in jdk directly, in java.base module of jdk it uses `DEBUG` but not use `NDEBUG`. > Based on above information, I think `DEBUG` is better. Sounds good. I suppose this `DEBUG` macro will be defined for debug builds, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1778212230 From rkennke at openjdk.org Fri Sep 27 08:27:57 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 27 Sep 2024 08:27:57 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> Message-ID: On Thu, 26 Sep 2024 17:25:06 GMT, Scott Gibbons wrote: >> @sviswa7 or @asgibbons WDYT about including https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 as part of compact object headers implementation? Otherwise we would have to disable indexOf intrinsic when running with compact headers, because of the assumption that array headers are >= 16 bytes, which is no longer true with compact headers. > > @rkennke I reviewed [rkennke@ 097c2af](https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1) and the code looks good to me. I would prefer this approach instead of not generating the IndexOf intrinsic. > > Should the controlling `if` be conditioned on `UseCompactObjectHeaders` instead of `arrayOopDesc::base_offset_in_bytes`? I can see benefits to either - which provides more clarity? I like the assert as it makes the intention clear (thanks!). I like to have the functional connection: if - for whatever reason - the array base offset is smaller than 16, we need to deal with that. The reason for this happens to be `UseCompactObjectHeaders`, but that may not be clear to the reader of the code. I could add an `assert(UseCompactObjectHeaders` in that branch to make that connection clear. Also consider that `UseCompactObjectHeaders` is intended to go away at some point. I wonder if having 2 or 3 branches ahead of the main-loop (which probably doesn't do much, because haystack is <=32 bytes) is a useful approach, or if there may be a better way to get the bytes on the stack? I don't know enough about the implementation to make that judgement. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1778230714 From amitkumar at openjdk.org Fri Sep 27 08:58:40 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 27 Sep 2024 08:58:40 GMT Subject: RFR: 8340620: Fix -Wzero-as-null-pointer-constant warnings for CompressedOops In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 23:26:08 GMT, Kim Barrett wrote: > Please review this change that fixes -Wzero-as-null-pointer-constant warnings > in CompressedOops code. These all relate to CompressedOops::base(). > > I also added a couple of asserts to verify our assumptions about null pointer > constants being representationally zero. That isn't a Standard-conforming > assumption, but holds for all platforms we currently support. I considered, > and even explored, a couple of different options. > > (1) Continue to have CompressedOops::base() be a pointer, but avoid that > assumption, being more careful about how zero-valued pointers are treated. But > that adds significant complexity that we can't test, since we don't support > any platforms needing that extra work. > > (2) Change CompressedOops::base() to an integral adjustment. This is probably > the correct approach, but is much more intrusive and wide ranging in the > changes required. Maybe something for the future. > > Testing: mach5 tier1-5 > GHA testing, verifying builds on some platforms not supported by Oracle. > > There are some simple changes to s390 and ppc code that I haven't tested, > beyond verifying compilation. Marked as reviewed by amitkumar (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21172#pullrequestreview-2333168597 From kbarrett at openjdk.org Fri Sep 27 09:03:15 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 27 Sep 2024 09:03:15 GMT Subject: RFR: 8340620: Fix -Wzero-as-null-pointer-constant warnings for CompressedOops [v2] In-Reply-To: References: Message-ID: > Please review this change that fixes -Wzero-as-null-pointer-constant warnings > in CompressedOops code. These all relate to CompressedOops::base(). > > I also added a couple of asserts to verify our assumptions about null pointer > constants being representationally zero. That isn't a Standard-conforming > assumption, but holds for all platforms we currently support. I considered, > and even explored, a couple of different options. > > (1) Continue to have CompressedOops::base() be a pointer, but avoid that > assumption, being more careful about how zero-valued pointers are treated. But > that adds significant complexity that we can't test, since we don't support > any platforms needing that extra work. > > (2) Change CompressedOops::base() to an integral adjustment. This is probably > the correct approach, but is much more intrusive and wide ranging in the > changes required. Maybe something for the future. > > Testing: mach5 tier1-5 > GHA testing, verifying builds on some platforms not supported by Oracle. > > There are some simple changes to s390 and ppc code that I haven't tested, > beyond verifying compilation. Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: remove nullptr representation asserts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21172/files - new: https://git.openjdk.org/jdk/pull/21172/files/ef0dc56e..a99a6cd9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21172&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21172&range=00-01 Stats: 6 lines in 1 file changed: 0 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21172.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21172/head:pull/21172 PR: https://git.openjdk.org/jdk/pull/21172 From kbarrett at openjdk.org Fri Sep 27 09:06:36 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 27 Sep 2024 09:06:36 GMT Subject: RFR: 8340620: Fix -Wzero-as-null-pointer-constant warnings for CompressedOops In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 09:21:39 GMT, Stefan Karlsson wrote: >> Please review this change that fixes -Wzero-as-null-pointer-constant warnings >> in CompressedOops code. These all relate to CompressedOops::base(). >> >> I also added a couple of asserts to verify our assumptions about null pointer >> constants being representationally zero. That isn't a Standard-conforming >> assumption, but holds for all platforms we currently support. I considered, >> and even explored, a couple of different options. >> >> (1) Continue to have CompressedOops::base() be a pointer, but avoid that >> assumption, being more careful about how zero-valued pointers are treated. But >> that adds significant complexity that we can't test, since we don't support >> any platforms needing that extra work. >> >> (2) Change CompressedOops::base() to an integral adjustment. This is probably >> the correct approach, but is much more intrusive and wide ranging in the >> changes required. Maybe something for the future. >> >> Testing: mach5 tier1-5 >> GHA testing, verifying builds on some platforms not supported by Oracle. >> >> There are some simple changes to s390 and ppc code that I haven't tested, >> beyond verifying compilation. > > FWIW, I think these asserts adds extra noise to these functions and I don't think we will be much more happy about having to read them over and over again when we read this functions / debug code through these functions. I would have preferred if this was one of those things that we require from our platforms and place a check in globalDefinitions, or some other prominent place that checks HotSpot's assumptions of the compilers / platforms. @stefank wrote: > > FWIW, I think these asserts adds extra noise to these functions [...] > @kimbarrett replied > Implementing option 2 (making base() an integral offset) would remove that assumption here, and allow removal of the assertions currently proposed here. And I generally prefer placing asserts with the expecting code. OTOH, I think it's nearly certain there are other places where we make the same assumption. (And I'd forgotten we have some assumption checks in globalDefinitions.cpp.) I don't have a strong opinion in this area. @stefank and I discussed this offline. I've removed the asserts, as they are just kind of executable comments that aren't going to help catch any bugs. As a separate task we're going to consider adding a file that checks various assumptions like this, as a way of documenting assumptions we make. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21172#issuecomment-2378804197 From anton.seoane.ampudia at oracle.com Fri Sep 27 09:07:06 2024 From: anton.seoane.ampudia at oracle.com (Anton Seoane Ampudia) Date: Fri, 27 Sep 2024 09:07:06 +0000 Subject: 8340363: Tag-specific default decorators for UnifiedLogging Message-ID: Hi all, Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. This results sometimes inconvenient when specific users with some predefined needs do not want those tags. For example, C2 developers would rather not see those defaults in cases such as jit+inlining, but also do not want to specify so every time they run -Xlog. One solution for this is found in this PR: https://github.com/openjdk/jdk/pull/20988. It can be considered as a ?flavoured? version of the existing default decorators and in no way it will override anything user-specified. Also, decorators will still be consistent throughout an output device (i.e., no different decorators ?mixed in?). However, upon recent talks with different teams this approach may be too flexible/powerful. The ability of specifying LogSelection-bound default decorators may result in a situation where defaults for A+B and C+D have been specified, and a user selects -Xlog:A+B,C+D. In that case, the union of the prespecified defaults is taken, which may not be what the end user wants (and might result in too many decorators). Actually, the main use case for this that I know as of now is C2 developers and the wish to not see decorators for some defined log selections. With this in mind, I have reduced the original idea to a feature where only the default decorators are not shown if we get a positive match with a prespecified list throughout the entire user log selection list (i.e.: * If there is a default for A+B and the user specifies -Xlog:A+B,C+D, he will still get the default decorators * If there is a default for A+B and the user specifies -Xlog:A+B, no default decorators will be supplied). Before scraping the original idea and moving on with this one (which will not change anything as it is right now, except for the really specific uses like C2 jit+inlining that may be decided), I wanted to get a broader idea of people?s opinions on this, as well as other use cases for this behaviour. Many thanks, Ant?n -------------- next part -------------- An HTML attachment was scrubbed... URL: From mli at openjdk.org Fri Sep 27 09:28:36 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 27 Sep 2024 09:28:36 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v8] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 08:10:35 GMT, Fei Yang wrote: >> `NDEBUG` is only used in sleefdp.c which is the original sleef code, and we don't use that file in jdk directly, in java.base module of jdk it uses `DEBUG` but not use `NDEBUG`. >> Based on above information, I think `DEBUG` is better. > > Sounds good. I suppose this `DEBUG` macro will be defined for debug builds, right? Yes, I tested on both release and fastdebug version. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1778316701 From rkennke at openjdk.org Fri Sep 27 09:41:17 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 27 Sep 2024 09:41:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v28] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Disable TestSplitPacks::test4a, failing on aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/d48f55d6..059b1573 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=26-27 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From shade at openjdk.org Fri Sep 27 09:46:38 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 27 Sep 2024 09:46:38 GMT Subject: RFR: 8340620: Fix -Wzero-as-null-pointer-constant warnings for CompressedOops [v2] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 09:03:15 GMT, Kim Barrett wrote: >> Please review this change that fixes -Wzero-as-null-pointer-constant warnings >> in CompressedOops code. These all relate to CompressedOops::base(). >> >> I also added a couple of asserts to verify our assumptions about null pointer >> constants being representationally zero. That isn't a Standard-conforming >> assumption, but holds for all platforms we currently support. I considered, >> and even explored, a couple of different options. >> >> (1) Continue to have CompressedOops::base() be a pointer, but avoid that >> assumption, being more careful about how zero-valued pointers are treated. But >> that adds significant complexity that we can't test, since we don't support >> any platforms needing that extra work. >> >> (2) Change CompressedOops::base() to an integral adjustment. This is probably >> the correct approach, but is much more intrusive and wide ranging in the >> changes required. Maybe something for the future. >> >> Testing: mach5 tier1-5 >> GHA testing, verifying builds on some platforms not supported by Oracle. >> >> There are some simple changes to s390 and ppc code that I haven't tested, >> beyond verifying compilation. > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > remove nullptr representation asserts Still good, ship it. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21172#pullrequestreview-2333271086 From shade at openjdk.org Fri Sep 27 09:57:06 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 27 Sep 2024 09:57:06 GMT Subject: RFR: 8336103: Clean up confusing Method::is_initializer [v4] In-Reply-To: References: Message-ID: > All around Hotspot, we have calls to `method->is_initializer()`. That method tests for both instance and static initializers. In many cases, the uses imply we actually want to test for constructor, not static initializer. Sometimes we filter explicitly for `!m->is_static()`, sometimes we don't. At this point, I think the best way to prevent future accidents is to remove the confusing `is_initializer`. > > The behavioral changes have been handled by already integrated PRs, see the links in JBS. The changes left here are not (supposed to be) changing the behavior. Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: Fix ------------- Changes: https://git.openjdk.org/jdk/pull/20120/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20120&range=03 Stats: 24 lines in 7 files changed: 4 ins; 13 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/20120.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20120/head:pull/20120 PR: https://git.openjdk.org/jdk/pull/20120 From mdoerr at openjdk.org Fri Sep 27 10:01:43 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 27 Sep 2024 10:01:43 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v5] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 07:42:15 GMT, Fredrik Bredberg wrote: >> Removed the concept of an ObjectMonitor Responsible thread. >> >> The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. >> >> The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). >> >> After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. >> >> Passes tier1-tier7 on supported platforms. >> x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. >> Arm32 and Zero doesn't need any changes as far as I can tell. > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Update four, after the review Test results on PPC64 look good. Is it already documented somewhere that `LockUnlock.testContendedLock` is getting slower with this change? I couldn't see it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19454#issuecomment-2378906937 From duke at openjdk.org Fri Sep 27 10:04:18 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Fri, 27 Sep 2024 10:04:18 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v18] In-Reply-To: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: > Hello, > > Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. > > The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. > > At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. > > # Performance > > ## Neoverse N1 > > > -------------------------------------------------------------------------------------------- > Version Baseline This patch > -------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > -------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op > ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op > ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op > ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op > ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op > ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op > ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op > ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op > ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op > ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op > ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op > ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op > ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op > ArraysHashCode.multibytes 10 avgt 15 5.4... Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: Add and use Add Wide insts instead of pairs of Extend/Shift + Add ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18487/files - new: https://git.openjdk.org/jdk/pull/18487/files/6f2bec34..03849d62 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=16-17 Stats: 153 lines in 4 files changed: 99 ins; 11 del; 43 mod Patch: https://git.openjdk.org/jdk/pull/18487.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18487/head:pull/18487 PR: https://git.openjdk.org/jdk/pull/18487 From anton.seoane.ampudia at oracle.com Fri Sep 27 10:18:35 2024 From: anton.seoane.ampudia at oracle.com (Anton Seoane Ampudia) Date: Fri, 27 Sep 2024 10:18:35 +0000 Subject: 8340363: Tag-specific default decorators for UnifiedLogging In-Reply-To: References: Message-ID: For completeness, I would like to add that this issue arises from the fact that compiler logging, currently done via custom flags and printed out to tty, is planned to be migrated to Unified Logging. Many use cases for this ?undecorated output? for soon-to-be new compiler tags have been identified. From: hotspot-dev on behalf of Anton Seoane Ampudia Date: Friday, 27 September 2024 at 11:08 To: hotspot-dev at openjdk.org Subject: 8340363: Tag-specific default decorators for UnifiedLogging Hi all, Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. This results sometimes inconvenient when specific users with some predefined needs do not want those tags. For example, C2 developers would rather not see those defaults in cases such as jit+inlining, but also do not want to specify so every time they run -Xlog. One solution for this is found in this PR: https://github.com/openjdk/jdk/pull/20988. It can be considered as a ?flavoured? version of the existing default decorators and in no way it will override anything user-specified. Also, decorators will still be consistent throughout an output device (i.e., no different decorators ?mixed in?). However, upon recent talks with different teams this approach may be too flexible/powerful. The ability of specifying LogSelection-bound default decorators may result in a situation where defaults for A+B and C+D have been specified, and a user selects -Xlog:A+B,C+D. In that case, the union of the prespecified defaults is taken, which may not be what the end user wants (and might result in too many decorators). Actually, the main use case for this that I know as of now is C2 developers and the wish to not see decorators for some defined log selections. With this in mind, I have reduced the original idea to a feature where only the default decorators are not shown if we get a positive match with a prespecified list throughout the entire user log selection list (i.e.: * If there is a default for A+B and the user specifies -Xlog:A+B,C+D, he will still get the default decorators * If there is a default for A+B and the user specifies -Xlog:A+B, no default decorators will be supplied). Before scraping the original idea and moving on with this one (which will not change anything as it is right now, except for the really specific uses like C2 jit+inlining that may be decided), I wanted to get a broader idea of people?s opinions on this, as well as other use cases for this behaviour. Many thanks, Ant?n -------------- next part -------------- An HTML attachment was scrubbed... URL: From duke at openjdk.org Fri Sep 27 10:30:20 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Fri, 27 Sep 2024 10:30:20 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v19] In-Reply-To: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: > Hello, > > Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. > > The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. > > At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. > > # Performance > > ## Neoverse N1 > > > -------------------------------------------------------------------------------------------- > Version Baseline This patch > -------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > -------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op > ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op > ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op > ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op > ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op > ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op > ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op > ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op > ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op > ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op > ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op > ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op > ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op > ArraysHashCode.multibytes 10 avgt 15 5.4... Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: cleanup: formatting ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18487/files - new: https://git.openjdk.org/jdk/pull/18487/files/03849d62..b55d4baa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=17-18 Stats: 20 lines in 1 file changed: 0 ins; 0 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/18487.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18487/head:pull/18487 PR: https://git.openjdk.org/jdk/pull/18487 From amitkumar at openjdk.org Fri Sep 27 10:32:35 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 27 Sep 2024 10:32:35 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release I can't see any regression on s390x as well. @RealLucy maybe a quick look ? ------------- Marked as reviewed by amitkumar (Committer). PR Review: https://git.openjdk.org/jdk/pull/21110#pullrequestreview-2333378085 From duke at openjdk.org Fri Sep 27 10:40:41 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Fri, 27 Sep 2024 10:40:41 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v17] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Thu, 26 Sep 2024 12:32:08 GMT, Mikhail Ablakatov wrote: >> Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: >> >> cleanup: add a description for intpow() >> >> Co-authored-by: Andrew Haley > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5445: > >> 5443: __ uxtl(vhalf0, Assembler::T4S, vdata0, Assembler::T4H); >> 5444: } >> 5445: __ addv(vmul0, Assembler::T4S, vmul0, vhalf0); > > I was advised to use a single `SADDW`/`UADDW` instruction instead of the current pair of `SXTL`/`UXTL` followed by `ADD`. It seems this was likely overlooked because the `Assembler` class is missing the corresponding instructions. I am adding these instructions and updating the implementation accordingly. Done by https://github.com/openjdk/jdk/pull/18487/commits/03849d62254d3bbf23b01659d8fd4a27fa1c019e . This improves the performance for `T_BOOLEAN`/`T_BYTE` and `T_CHAR`/`T_SHORT` arrays compared to the previous implementation (https://github.com/openjdk/jdk/pull/18487/commits/bfa93695b7813678936d4d25ed02866353eaae81). ![03849d6-v2-bytes](https://github.com/user-attachments/assets/f86eddaf-d53d-47c6-9f0f-05056e232c8c) ![03849d6-v2-shorts](https://github.com/user-attachments/assets/f4461e94-fd71-4502-8d24-5905572ffff5) [ArraysHashCode-v2-03849d6.txt](https://github.com/user-attachments/files/17162865/ArraysHashCode-v2-03849d6.txt) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1778415094 From mli at openjdk.org Fri Sep 27 10:46:36 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 27 Sep 2024 10:46:36 GMT Subject: RFR: 8341013: Optimize x86/aarch64 MD5 intrinsics by reducing data dependency [v2] In-Reply-To: <0YcJsAbwtMOHJqUMFkQhpDKsEm3-AFXucByQATljmYc=.7c70a817-3055-4071-88e4-39222bdf3bfb@github.com> References: <0YcJsAbwtMOHJqUMFkQhpDKsEm3-AFXucByQATljmYc=.7c70a817-3055-4071-88e4-39222bdf3bfb@github.com> Message-ID: On Thu, 26 Sep 2024 14:58:49 GMT, Oli Gillespie wrote: >> As suggested in https://github.com/animetosho/md5-optimisation?tab=readme-ov-file#dependency-shortcut-in-g-function, we can delay the dependency on 'b' by recognizing that the ((d & b) | (~d & c)) is equivalent to ((d & b) + (~d & c)) in this scenario, and we can perform those additions independently, leaving our dependency on b to the final addition. This speeds it up around 5%. >> >> Benchmark results on my two hosts: >> >> >> Benchmark (algorithm) (dataSize) (provider) Mode Cnt Score Error Units >> >> x86 Before: >> MessageDigestBench.digest MD5 1048576 thrpt 10 636.389 ? 0.240 ops/s >> >> x86 After: >> MessageDigestBench.digest MD5 1048576 thrpt 10 671.611 ? 0.226 ops/s (+5.5%) >> >> >> aarch64 Before: >> MessageDigestBench.digest MD5 1048576 thrpt 10 498.613 ? 0.359 ops/s >> >> aarch64 After: >> MessageDigestBench.digest MD5 1048576 thrpt 10 526.008 ? 0.491 ops/s (+5.6%) > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Fix aarch64 bug Overall it's a nice optimization! Some minor comment about aarch64 one. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 3422: > 3420: reg_cache.extract_u32(rscratch1, k); > 3421: __ movw(rscratch2, t); > 3422: __ addw(rscratch4, r1, rscratch2); Can you try to replace these 2 lines (3421-3422) with following? __ movw(rscratch4, t); __ addw(rscratch4, r1, rscratch4); I expect it could bring more performance gain, but not sure. ------------- PR Review: https://git.openjdk.org/jdk/pull/21203#pullrequestreview-2333399088 PR Review Comment: https://git.openjdk.org/jdk/pull/21203#discussion_r1778419407 From kbarrett at openjdk.org Fri Sep 27 11:00:44 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 27 Sep 2024 11:00:44 GMT Subject: RFR: 8340620: Fix -Wzero-as-null-pointer-constant warnings for CompressedOops [v2] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 09:03:15 GMT, Kim Barrett wrote: >> Please review this change that fixes -Wzero-as-null-pointer-constant warnings >> in CompressedOops code. These all relate to CompressedOops::base(). >> >> I also added a couple of asserts to verify our assumptions about null pointer >> constants being representationally zero. That isn't a Standard-conforming >> assumption, but holds for all platforms we currently support. I considered, >> and even explored, a couple of different options. >> >> (1) Continue to have CompressedOops::base() be a pointer, but avoid that >> assumption, being more careful about how zero-valued pointers are treated. But >> that adds significant complexity that we can't test, since we don't support >> any platforms needing that extra work. >> >> (2) Change CompressedOops::base() to an integral adjustment. This is probably >> the correct approach, but is much more intrusive and wide ranging in the >> changes required. Maybe something for the future. >> >> Testing: mach5 tier1-5 >> GHA testing, verifying builds on some platforms not supported by Oracle. >> >> There are some simple changes to s390 and ppc code that I haven't tested, >> beyond verifying compilation. > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > remove nullptr representation asserts Thanks for all the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21172#issuecomment-2379007519 From kbarrett at openjdk.org Fri Sep 27 11:00:44 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 27 Sep 2024 11:00:44 GMT Subject: Integrated: 8340620: Fix -Wzero-as-null-pointer-constant warnings for CompressedOops In-Reply-To: References: Message-ID: <6NS0CUpTwWZyDm46KXpEOq68wI4FGWgPtB3J46LSmk4=.4fb0ef41-748a-4201-b262-b89855fa54ea@github.com> On Tue, 24 Sep 2024 23:26:08 GMT, Kim Barrett wrote: > Please review this change that fixes -Wzero-as-null-pointer-constant warnings > in CompressedOops code. These all relate to CompressedOops::base(). > > I also added a couple of asserts to verify our assumptions about null pointer > constants being representationally zero. That isn't a Standard-conforming > assumption, but holds for all platforms we currently support. I considered, > and even explored, a couple of different options. > > (1) Continue to have CompressedOops::base() be a pointer, but avoid that > assumption, being more careful about how zero-valued pointers are treated. But > that adds significant complexity that we can't test, since we don't support > any platforms needing that extra work. > > (2) Change CompressedOops::base() to an integral adjustment. This is probably > the correct approach, but is much more intrusive and wide ranging in the > changes required. Maybe something for the future. > > Testing: mach5 tier1-5 > GHA testing, verifying builds on some platforms not supported by Oracle. > > There are some simple changes to s390 and ppc code that I haven't tested, > beyond verifying compilation. This pull request has now been integrated. Changeset: 25e89291 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/25e892911dabe32cc0d13b0d4322c5d89585b8f1 Stats: 9 lines in 3 files changed: 0 ins; 0 del; 9 mod 8340620: Fix -Wzero-as-null-pointer-constant warnings for CompressedOops Reviewed-by: shade, stefank, mli, amitkumar ------------- PR: https://git.openjdk.org/jdk/pull/21172 From alanb at openjdk.org Fri Sep 27 11:15:35 2024 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 27 Sep 2024 11:15:35 GMT Subject: RFR: 8328313: Archived module graph should allow identical --module-path to be specified during dump time and run time [v5] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 00:49:20 GMT, Calvin Cheung wrote: >> Prior to this patch, if `--module-path` is specified in the command line: >> during CDS dump time, full module graph will not be included in the CDS archive; >> during run time, full module graph will not be used. >> >> With this patch, the full module graph will be included in the CDS archive with the `--module-path` option. During run time, if the same `--module-path` option is specified, the archived module graph will be used. >> >> The checking of module paths between dump time and run time is more lenient compared with the checking of class paths; the ordering of the modules is unimportant, duplicate module names are ignored. >> E.g. the following is considered a match: >> dump time runtime >> m1,m2 m2,m1 >> m1,m2 m1,m2,m2 >> >> I included some [notes](https://bugs.openjdk.org/browse/JDK-8328313?focusedId=14699275&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14699275) in the bug report regarding some changes in the corelib classes. > > Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: > > @iklam comments src/java.base/share/classes/jdk/internal/loader/BuiltinClassLoader.java line 1095: > 1093: moduleToReader.clear(); > 1094: } > 1095: } Do you remember why resetArchivedStates resets the resource cache? I would expected it to be cleared for all class loaders. Rather than putting something specific to the app class loader here then maybe it should be renamed and have resetArchivedStates call it, e.g. void resetArchivedStates(boolean all) { ucp = null; resourceCache = null; if (all) { moduleToReader.clear(); } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1778453669 From alanb at openjdk.org Fri Sep 27 11:15:37 2024 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 27 Sep 2024 11:15:37 GMT Subject: RFR: 8328313: Archived module graph should allow identical --module-path to be specified during dump time and run time [v3] In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 21:17:35 GMT, Calvin Cheung wrote: >> src/java.base/share/classes/jdk/internal/module/ModuleBootstrap.java line 481: >> >>> 479: cf, >>> 480: clf, >>> 481: mainModule); >> >> This was correctly aligned before, now it isn't. > > Fixed. That may have been my fault, I gave Calvin changes to check the configuration and the IDE re-formatted it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1778455739 From alanb at openjdk.org Fri Sep 27 11:15:37 2024 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 27 Sep 2024 11:15:37 GMT Subject: RFR: 8328313: Archived module graph should allow identical --module-path to be specified during dump time and run time [v3] In-Reply-To: References: <7x-dr_M70dbSsP6Jr-QIY1g40vSdKMnXmkwfuUElzDg=.ca786a00-1613-4db3-a53b-0ce01942e5bd@github.com> Message-ID: On Mon, 23 Sep 2024 05:57:09 GMT, Calvin Cheung wrote: >> src/java.base/share/classes/jdk/internal/module/ModuleReferences.java line 105: >> >>> 103: public byte[] generate(String algorithm) { >>> 104: return ModuleHashes.computeHash(supplier, algorithm); >>> 105: } >> >> Why is JarModuleReader changed to use a file string, is this because of an environment dependency when using a Path? > > It is to avoid the following warnings during dump time: > > [1.607s][warning][cds,heap ] Archive heap points to a static field that may be reinitialized at runtime: > [1.607s][warning][cds,heap ] Field: java/util/zip/ZipFile$Source::builtInFS > [1.607s][warning][cds,heap ] Value: sun.nio.fs.LinuxFileSystem > ... > [1.607s][warning][cds,heap ] Archive heap points to a static field that may be reinitialized at runtime: > [1.607s][warning][cds,heap ] Field: sun/nio/fs/DefaultFileSystemProvider::INSTANCE > [1.607s][warning][cds,heap ] Value: sun.nio.fs.LinuxFileSystemProvider Thanks. At one point we will likely have to re-visit this we have prototype changes that re-implement ZipFile and java.io to use the newer APIs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1778454364 From ogillespie at openjdk.org Fri Sep 27 11:45:35 2024 From: ogillespie at openjdk.org (Oli Gillespie) Date: Fri, 27 Sep 2024 11:45:35 GMT Subject: RFR: 8341013: Optimize x86/aarch64 MD5 intrinsics by reducing data dependency [v2] In-Reply-To: References: <0YcJsAbwtMOHJqUMFkQhpDKsEm3-AFXucByQATljmYc=.7c70a817-3055-4071-88e4-39222bdf3bfb@github.com> Message-ID: On Fri, 27 Sep 2024 10:42:04 GMT, Hamlin Li wrote: >> Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix aarch64 bug > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 3422: > >> 3420: reg_cache.extract_u32(rscratch1, k); >> 3421: __ movw(rscratch2, t); >> 3422: __ addw(rscratch4, r1, rscratch2); > > Can you try to replace these 2 lines (3421-3422) with following? > > __ movw(rscratch4, t); > __ addw(rscratch4, r1, rscratch4); > > > I expect it could bring more performance gain, but not sure. Thanks! I can't measure any difference at all with that change, seems to perform identically. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21203#discussion_r1778488350 From aph at openjdk.org Fri Sep 27 12:40:44 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 27 Sep 2024 12:40:44 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v19] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Fri, 27 Sep 2024 10:30:20 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: > > cleanup: formatting src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 3098: > 3096: _xaddwv(/* is_unsigned */ false, Vd, Vn, Ta, Vm, Tb); > 3097: } > 3098: Suggestion: #define INSN(NAME, assertion, is_unsigned) \ void NAME(FloatRegister Vd, FloatRegister Vn, SIMD_Arrangement Ta, FloatRegister Vm, \ SIMD_Arrangement Tb) { \ assert((assertion), "invalid arrangement"); \ _xaddwv(is_unsigned, Vd, Vn, Ta, Vm, Tb); \ } public: INSN(uaddwv, Tb == T8B || Tb == T4H || Tb == T2S, /*is_unsigned*/true) INSN(uaddwv2, Tb == T16B || Tb == T8H || Tb == T4S, /*is_unsigned*/true) INSN(saddwv, Tb == T8B || Tb == T4H || Tb == T2S, /*is_unsigned*/false) INSN(saddwv2, Tb == T16B || Tb == T8H || Tb == T4S, /*is_unsigned*/false) #undef INSN (untested) Doing this squeezes out redundancy, leaving just the significant differences, helping the reader. A general principle used in assembler_aarch64.hpp is to prefer data to repetitive code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1778551147 From duke at openjdk.org Fri Sep 27 12:45:58 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Fri, 27 Sep 2024 12:45:58 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v20] In-Reply-To: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: > Hello, > > Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. > > The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. > > At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. > > # Performance > > ## Neoverse N1 > > > -------------------------------------------------------------------------------------------- > Version Baseline This patch > -------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > -------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op > ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op > ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op > ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op > ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op > ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op > ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op > ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op > ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op > ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op > ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op > ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op > ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op > ArraysHashCode.multibytes 10 avgt 15 5.4... Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: fixup: use xaddwv2 where required in unit tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18487/files - new: https://git.openjdk.org/jdk/pull/18487/files/b55d4baa..7d1c6b77 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=18-19 Stats: 12 lines in 2 files changed: 0 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/18487.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18487/head:pull/18487 PR: https://git.openjdk.org/jdk/pull/18487 From aph at openjdk.org Fri Sep 27 12:51:40 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 27 Sep 2024 12:51:40 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v20] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: <_JsfxYz0sFoNJovghDK3c1Na9OCnzqWiImy1jrFo61E=.6dd3e3e4-ab98-4449-bcba-05e45810ac67@github.com> On Fri, 27 Sep 2024 12:45:58 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: > > fixup: use xaddwv2 where required in unit tests src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5456: > 5454: } else { > 5455: __ uaddwv2(vmul0, vmul0, Assembler::T4S, vdata0, Assembler::T8H); > 5456: } Maybe define `addwv2` and `addwv` in MacroAssembler. Suggestion: __ addwv2(is_signed_subword_type(eltype), vmul0, vmul0, Assembler::T4S, vdata0, Assembler::T8H); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1778569792 From ogillespie at openjdk.org Fri Sep 27 12:57:38 2024 From: ogillespie at openjdk.org (Oli Gillespie) Date: Fri, 27 Sep 2024 12:57:38 GMT Subject: RFR: 8341013: Optimize x86/aarch64 MD5 intrinsics by reducing data dependency [v2] In-Reply-To: <0YcJsAbwtMOHJqUMFkQhpDKsEm3-AFXucByQATljmYc=.7c70a817-3055-4071-88e4-39222bdf3bfb@github.com> References: <0YcJsAbwtMOHJqUMFkQhpDKsEm3-AFXucByQATljmYc=.7c70a817-3055-4071-88e4-39222bdf3bfb@github.com> Message-ID: On Thu, 26 Sep 2024 14:58:49 GMT, Oli Gillespie wrote: >> As suggested in https://github.com/animetosho/md5-optimisation?tab=readme-ov-file#dependency-shortcut-in-g-function, we can delay the dependency on 'b' by recognizing that the ((d & b) | (~d & c)) is equivalent to ((d & b) + (~d & c)) in this scenario, and we can perform those additions independently, leaving our dependency on b to the final addition. This speeds it up around 5%. >> >> Benchmark results on my two hosts: >> >> >> Benchmark (algorithm) (dataSize) (provider) Mode Cnt Score Error Units >> >> x86 Before: >> MessageDigestBench.digest MD5 1048576 thrpt 10 636.389 ? 0.240 ops/s >> >> x86 After: >> MessageDigestBench.digest MD5 1048576 thrpt 10 671.611 ? 0.226 ops/s (+5.5%) >> >> >> aarch64 Before: >> MessageDigestBench.digest MD5 1048576 thrpt 10 498.613 ? 0.359 ops/s >> >> aarch64 After: >> MessageDigestBench.digest MD5 1048576 thrpt 10 526.008 ? 0.491 ops/s (+5.6%) > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Fix aarch64 bug I also see around 4% improvement on Mac M1. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21203#issuecomment-2379218476 From coleenp at openjdk.org Fri Sep 27 12:58:36 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 27 Sep 2024 12:58:36 GMT Subject: RFR: 8336103: Clean up confusing Method::is_initializer [v4] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 09:57:06 GMT, Aleksey Shipilev wrote: >> All around Hotspot, we have calls to `method->is_initializer()`. That method tests for both instance and static initializers. In many cases, the uses imply we actually want to test for constructor, not static initializer. Sometimes we filter explicitly for `!m->is_static()`, sometimes we don't. At this point, I think the best way to prevent future accidents is to remove the confusing `is_initializer`. >> >> The behavioral changes have been handled by already integrated PRs, see the links in JBS. The changes left here are not (supposed to be) changing the behavior. > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > Fix This looks good. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20120#pullrequestreview-2333676170 From roberto.castaneda.lozano at oracle.com Fri Sep 27 13:18:08 2024 From: roberto.castaneda.lozano at oracle.com (Roberto Castaneda Lozano) Date: Fri, 27 Sep 2024 13:18:08 +0000 Subject: 8340363: Tag-specific default decorators for UnifiedLogging In-Reply-To: References: Message-ID: Hi Ant?n, thanks for starting this discussion. Let me summarize your latest proposal to see if I get it right (please correct me otherwise): - 1) If the user selects a certain set of decorators through the Xlog command, that set is used to decorate all UL output lines. - 2) Otherwise (no user-selected set of decorators): if the user selects at least one tag is *not* configured as no-decorators-by-default, the current default decorators (uptime, level, tag set) are applied to decorate *all* UL output lines. - 3) Otherwise (no user-selected set of decorators, and all selected tags are configured as no-decorators-by-default): no UL output line is decorated (equivalent to -Xlog:(...)::none). As a C2 developer, I think this behavior strikes a good balance between good compiler tracing/debugging ergonomics (where we usually have no need for decorators) and preserving the current behavior for all other use cases. It would support migrating flags such as e.g. TraceLoopOpts, which usually have no need for decorators, to the UL framework and getting the exact same output as today with as simple line like -Xlog:jit+loopopts or similar (the user can always select decorators actively if needed). At the same time, it would preserve the UL behavior for all users of the existing GC and runtime tags which are *not* marked as no-decorators-by-default, even if these tags are selected in combination with future compiler tags marked as no-decorators-by-default. Cheers, Roberto ________________________________________ From: hotspot-dev on behalf of Anton Seoane Ampudia Sent: Friday, September 27, 2024 11:07 AM To: hotspot-dev at openjdk.org Subject: 8340363: Tag-specific default decorators for UnifiedLogging Hi all, Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through -Xlog. This results sometimes inconvenient when specific users with some predefined needs do not want those tags. For example, C2 developers would rather not see those defaults in cases such as jit+inlining, but also do not want to specify so every time they run -Xlog. One solution for this is found in this PR: https://github.com/openjdk/jdk/pull/20988. It can be considered as a ?flavoured? version of the existing default decorators and in no way it will override anything user-specified. Also, decorators will still be consistent throughout an output device (i.e., no different decorators ?mixed in?). However, upon recent talks with different teams this approach may be too flexible/powerful. The ability of specifying LogSelection-bound default decorators may result in a situation where defaults for A+B and C+D have been specified, and a user selects -Xlog:A+B,C+D. In that case, the union of the prespecified defaults is taken, which may not be what the end user wants (and might result in too many decorators). Actually, the main use case for this that I know as of now is C2 developers and the wish to not see decorators for some defined log selections. With this in mind, I have reduced the original idea to a feature where only the default decorators are not shown if we get a positive match with a prespecified list throughout the entire user log selection list (i.e.: * If there is a default for A+B and the user specifies -Xlog:A+B,C+D, he will still get the default decorators * If there is a default for A+B and the user specifies -Xlog:A+B, no default decorators will be supplied). Before scraping the original idea and moving on with this one (which will not change anything as it is right now, except for the really specific uses like C2 jit+inlining that may be decided), I wanted to get a broader idea of people?s opinions on this, as well as other use cases for this behaviour. Many thanks, Ant?n From duke at openjdk.org Fri Sep 27 13:19:54 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Fri, 27 Sep 2024 13:19:54 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v21] In-Reply-To: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: > Hello, > > Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. > > The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. > > At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. > > # Performance > > ## Neoverse N1 > > > -------------------------------------------------------------------------------------------- > Version Baseline This patch > -------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > -------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op > ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op > ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op > ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op > ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op > ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op > ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op > ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op > ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op > ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op > ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op > ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op > ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op > ArraysHashCode.multibytes 10 avgt 15 5.4... Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: cleanup: collapse duplicated code into a macro Co-authored-by: Andrew Haley ------------- Changes: - all: https://git.openjdk.org/jdk/pull/18487/files - new: https://git.openjdk.org/jdk/pull/18487/files/7d1c6b77..1dbb1ddf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18487&range=19-20 Stats: 21 lines in 1 file changed: 2 ins; 9 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/18487.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/18487/head:pull/18487 PR: https://git.openjdk.org/jdk/pull/18487 From duke at openjdk.org Fri Sep 27 13:29:37 2024 From: duke at openjdk.org (=?UTF-8?B?QW50w7Nu?= Seoane) Date: Fri, 27 Sep 2024 13:29:37 GMT Subject: RFR: 8340363: Tag-specific default decorators for UnifiedLogging [v4] In-Reply-To: References: <4VEAQafvKlq5O7kpfHcfI9RBfV83zcIwTFe1RyUiKMs=.2af26609-4594-4f98-841c-ca7a56d23271@github.com> Message-ID: On Tue, 24 Sep 2024 16:43:57 GMT, Ant?n Seoane wrote: >> Hi all, >> >> Currently, the Unified Logging framework defaults to three decorators (uptime, level, tags) whenever the user does not specify otherwise through `-Xlog`. This can result in cumbersome input whenever a specific user that relies on a particular tag(s) has some predefined needs. For example, C2 developers rarely need decorations, and having to manually specify this every time results inconvenient. >> >> To address this, this PR enables the possibility of adding tag-specific default decorators to UL. These defaults are in no way overriding user input -- they will only act whenever `-Xlog` has no decorators supplied and there is a positive match with the pre-specified defaults. Such a match is based on the following: >> >> - Inclusion: if `-Xlog:jit+compilation` is provided, a default for `jit` may be applied. >> - Specificity: if, for the above line, there is a more specific default for `jit+compilation` the latter shall be applied. Upon equal specificity cases, both defaults will be applied. >> - Additionally, defaults may target a specific log level. >> >> Decorators are also associated with an output file, so an output device may only have one set of decorators. For this reason, if different `LogSelection`s trigger defaults, none is to be applied. >> >> In summary, these defaults may be seen as a "tailored" or "flavoured" version of the existing "uptime-level-tags" current defaults. >> >> Please consider this PR, and thanks! > > Ant?n Seoane has updated the pull request incrementally with one additional commit since the last revision: > > Removed whitespace Upon further comments and consideration, I am keeping this PR on hold and opening up to further discussion via the hotspot-dev mailing list: https://mail.openjdk.org/pipermail/hotspot-dev/2024-September/094810.html ------------- PR Comment: https://git.openjdk.org/jdk/pull/20988#issuecomment-2379285507 From duke at openjdk.org Fri Sep 27 13:38:42 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Fri, 27 Sep 2024 13:38:42 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v20] In-Reply-To: <_JsfxYz0sFoNJovghDK3c1Na9OCnzqWiImy1jrFo61E=.6dd3e3e4-ab98-4449-bcba-05e45810ac67@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <_JsfxYz0sFoNJovghDK3c1Na9OCnzqWiImy1jrFo61E=.6dd3e3e4-ab98-4449-bcba-05e45810ac67@github.com> Message-ID: On Fri, 27 Sep 2024 12:48:37 GMT, Andrew Haley wrote: >> Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: >> >> fixup: use xaddwv2 where required in unit tests > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5456: > >> 5454: } else { >> 5455: __ uaddwv2(vmul0, vmul0, Assembler::T4S, vdata0, Assembler::T8H); >> 5456: } > > Maybe define `addwv2` and `addwv` in MacroAssembler. > Suggestion: > > __ addwv2(is_signed_subword_type(eltype), vmul0, vmul0, Assembler::T4S, vdata0, Assembler::T8H); I believe that an interface should be explicit and map 1:1 to real instructions when it comes to assembly whereas possible. Anyway, regardless of my preferences, as far as I can see, currently `Assembler` provides all other signed/unsigned versions of arithmetic instructions separately. Adding a single method like this would make the whole API inconsistent. Therefore, I suggest leaving it as is. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1778644417 From mli at openjdk.org Fri Sep 27 13:48:36 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 27 Sep 2024 13:48:36 GMT Subject: RFR: 8341013: Optimize x86/aarch64 MD5 intrinsics by reducing data dependency [v2] In-Reply-To: References: <0YcJsAbwtMOHJqUMFkQhpDKsEm3-AFXucByQATljmYc=.7c70a817-3055-4071-88e4-39222bdf3bfb@github.com> Message-ID: On Fri, 27 Sep 2024 11:43:06 GMT, Oli Gillespie wrote: >> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 3422: >> >>> 3420: reg_cache.extract_u32(rscratch1, k); >>> 3421: __ movw(rscratch2, t); >>> 3422: __ addw(rscratch4, r1, rscratch2); >> >> Can you try to replace these 2 lines (3421-3422) with following? >> >> __ movw(rscratch4, t); >> __ addw(rscratch4, r1, rscratch4); >> >> >> I expect it could bring more performance gain, but not sure. > > Thanks! I can't measure any difference at all with that change, seems to perform identically. I see, Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21203#discussion_r1778658786 From mli at openjdk.org Fri Sep 27 13:48:34 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 27 Sep 2024 13:48:34 GMT Subject: RFR: 8341013: Optimize x86/aarch64 MD5 intrinsics by reducing data dependency [v2] In-Reply-To: <0YcJsAbwtMOHJqUMFkQhpDKsEm3-AFXucByQATljmYc=.7c70a817-3055-4071-88e4-39222bdf3bfb@github.com> References: <0YcJsAbwtMOHJqUMFkQhpDKsEm3-AFXucByQATljmYc=.7c70a817-3055-4071-88e4-39222bdf3bfb@github.com> Message-ID: On Thu, 26 Sep 2024 14:58:49 GMT, Oli Gillespie wrote: >> As suggested in https://github.com/animetosho/md5-optimisation?tab=readme-ov-file#dependency-shortcut-in-g-function, we can delay the dependency on 'b' by recognizing that the ((d & b) | (~d & c)) is equivalent to ((d & b) + (~d & c)) in this scenario, and we can perform those additions independently, leaving our dependency on b to the final addition. This speeds it up around 5%. >> >> Benchmark results on my two hosts: >> >> >> Benchmark (algorithm) (dataSize) (provider) Mode Cnt Score Error Units >> >> x86 Before: >> MessageDigestBench.digest MD5 1048576 thrpt 10 636.389 ? 0.240 ops/s >> >> x86 After: >> MessageDigestBench.digest MD5 1048576 thrpt 10 671.611 ? 0.226 ops/s (+5.5%) >> >> >> aarch64 Before: >> MessageDigestBench.digest MD5 1048576 thrpt 10 498.613 ? 0.359 ops/s >> >> aarch64 After: >> MessageDigestBench.digest MD5 1048576 thrpt 10 526.008 ? 0.491 ops/s (+5.6%) > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Fix aarch64 bug Looks good to me, Thanks! ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21203#pullrequestreview-2333800761 From mdoerr at openjdk.org Fri Sep 27 13:52:38 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 27 Sep 2024 13:52:38 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release Which benchmarks did you use? Is there any micro benchmark for class initialization? Is this one interesting? https://github.com/clojure/test.benchmark ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2379333144 From shade at openjdk.org Fri Sep 27 13:58:40 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 27 Sep 2024 13:58:40 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v3] In-Reply-To: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> References: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> Message-ID: On Fri, 19 Jul 2024 15:52:14 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Amend the test case for guaranteing it works under different compilation regimes We keep seeing `Reference.clear` native call on hot paths in services in JDK 17+. I would like to get this PR moving again. Please take a look :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2379346593 From shade at openjdk.org Fri Sep 27 14:01:37 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 27 Sep 2024 14:01:37 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 13:49:41 GMT, Martin Doerr wrote: > Which benchmarks did you use? Is there any micro benchmark for class initialization? Is this one interesting? https://github.com/clojure/test.benchmark Our usual corpus of industry-standard benchmarks, like Dacapo, SPECjbb, etc. I don't think we have a microbenchmark that targets class loading specifically. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2379351833 From galder at openjdk.org Fri Sep 27 14:18:41 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 27 Sep 2024 14:18:41 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v2] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Wed, 17 Jul 2024 22:48:04 GMT, Jasmine Karthikeyan wrote: >>> The C2 changes look nice! I just added one comment here about style. It would also be good to add some IR tests checking that the intrinsic is creating `MaxL`/`MinL` nodes before macro expansion, and a microbenchmark to compare results. >> >> Thanks for the review. +1 to the IR tests, I'll work on those. >> >> Re: microbenchmark - what do you have exactly in mind? For vectorization performance there is `ReductionPerf` though it's not a microbenchmark per se. Do you want a microbenchmark for the performance of vectorized max/min long? For non-vectorization performance there is `MathBench`. >> >> I would not expect performance differences in `MathBench` because the backend is still the same and this change really benefits vectorization. I've run the min/max long tests on darwin/aarch64 and linux/x64 and indeed I see no difference: >> >> linux/x64 >> >> Benchmark (seed) Mode Cnt Score Error Units >> MathBench.maxLong 0 thrpt 8 1464197.164 ? 27044.205 ops/ms # base >> MathBench.minLong 0 thrpt 8 1469917.328 ? 25397.401 ops/ms # base >> MathBench.maxLong 0 thrpt 8 1469615.250 ? 17950.429 ops/ms # patched >> MathBench.minLong 0 thrpt 8 1456290.514 ? 44455.727 ops/ms # patched >> >> >> darwin/aarch64 >> >> Benchmark (seed) Mode Cnt Score Error Units >> MathBench.maxLong 0 thrpt 8 1739341.447 ? 210983.444 ops/ms # base >> MathBench.minLong 0 thrpt 8 1659547.649 ? 260554.159 ops/ms # base >> MathBench.maxLong 0 thrpt 8 1660449.074 ? 254534.725 ops/ms # patched >> MathBench.minLong 0 thrpt 8 1729728.021 ? 16327.575 ops/ms # patched > >> Do you want a microbenchmark for the performance of vectorized max/min long? > > Yeah, I think a simple benchmark that tests for long min/max vectorization and reduction would be good. I worry that checking performance manually like in `ReductionPerf` can lead to harder to interpret results than with a microbenchmark, especially with vm warmup ? Thanks for looking into this! Following the advice from @jaskarth I've worked on a JMH benchmark for this intrinsic. The benchmarks are pretty straightforward, but the way the data is distributed in the arrays has been designed such that the branch percentage can be controlled. The code uses a random increment/decrement algorithm to distribute the data for testing max. To test min the values are negated. Controlling the branching is an important factor, because the IR/assembly C2 emits can vary depending on the branching characteristics. First, the non-AVX512 results (only posting max results for brevity, same seen with min): Benchmark (probability) (size) Mode Cnt Score Error Units MinMaxLoopBench.longReductionMax 50 10000 thrpt 8 107.609 ? 0.149 ops/ms (non-AVX512, base) MinMaxLoopBench.longReductionMax 80 10000 thrpt 8 107.627 ? 0.150 ops/ms (non-AVX512, base) MinMaxLoopBench.longReductionMax 100 10000 thrpt 8 238.799 ? 5.028 ops/ms (non-AVX512, base) MinMaxLoopBench.longReductionMax 50 10000 thrpt 8 107.575 ? 0.088 ops/ms (non-AVX512, patch) MinMaxLoopBench.longReductionMax 80 10000 thrpt 8 107.594 ? 0.072 ops/ms (non-AVX512, patch) MinMaxLoopBench.longReductionMax 100 10000 thrpt 8 107.514 ? 0.067 ops/ms (non-AVX512, patch) The only situation where this PR is a regression compared to current code is when the one of the branch side is always taken. Why is this? At 50% and 80%, the base code uses cmovlq, but for 100% uses cmp+jump (the branch is for the uncommon trap). The intrinsic the patch adds means that a MinL/MaxL node is always used, and through the macro expansion that always transforms to cmovlq. Next, the AVX-512 results (note that the results were taken in different machines, so the non-AVX-512 and AVX-512 numbers cannot be compared): Benchmark (probability) (size) Mode Cnt Score Error Units MinMaxLoopBench.longReductionMax 50 10000 thrpt 8 492.327 ? 0.106 ops/ms (AVX512, base) MinMaxLoopBench.longReductionMax 80 10000 thrpt 8 492.515 ? 0.044 ops/ms (AVX512, base) MinMaxLoopBench.longReductionMax 100 10000 thrpt 8 232.861 ? 5.859 ops/ms (AVX512, base) MinMaxLoopBench.longReductionMax 50 10000 thrpt 8 492.563 ? 0.452 ops/ms (AVX512, patch) MinMaxLoopBench.longReductionMax 80 10000 thrpt 8 492.478 ? 0.105 ops/ms (AVX512, patch) MinMaxLoopBench.longReductionMax 100 10000 thrpt 8 492.365 ? 0.220 ops/ms (AVX512, patch) Here we see the same thing as in non-AVX512 systems but the other way around. For the base JDK, at 50-80% the CmpL+Bool gets converted into a CMoveL, and via `CMoveNode::Ideal_minmax` it gets converted to MinL/MaxL nodes, so it behaves just like the patched version. At 100% base adds a cmp+jump (for the uncommon trap branch) and because of flow control vectorization is not applied. The patched version behaves the same way regardless of the branch probability. For completeness, here are the numbers from ~longLoopMax~, which tests vectorization of min/max without reduction on AVX-512. The pattern is the same: Benchmark (probability) (size) Mode Cnt Score Error Units MinMaxLoopBench.longLoopMax 50 10000 thrpt 8 66.959 ? 0.426 ops/ms (AVX512, base) MinMaxLoopBench.longLoopMax 80 10000 thrpt 8 66.783 ? 0.342 ops/ms (AVX512, base) MinMaxLoopBench.longLoopMax 100 10000 thrpt 8 55.923 ? 0.390 ops/ms (AVX512, base) MinMaxLoopBench.longLoopMax 50 10000 thrpt 8 67.044 ? 0.535 ops/ms (AVX512, patch) MinMaxLoopBench.longLoopMax 80 10000 thrpt 8 66.600 ? 0.176 ops/ms (AVX512, patch) MinMaxLoopBench.longLoopMax 100 10000 thrpt 8 66.672 ? 0.205 ops/ms (AVX512, patch) Finally, note that the reduction benchmarks only use one array to compute the value. Coming up with a random increment algorithm such that the combination of multiple array values would be higher/lower than the previous one was quite complex. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2379386872 From galder at openjdk.org Fri Sep 27 14:18:41 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 27 Sep 2024 14:18:41 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v2] In-Reply-To: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: <-IW4I9MWB3up_N8BClv2TvHy2lUuvDk7bGohxIPv5IU=.b2f0e1b6-3ef8-4f97-9331-a7c5ba1046d1@github.com> > This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. > > Currently vectorization does not kick in for loops containing either of these calls because of the following error: > > > VLoop::check_preconditions: failed: control flow in loop not allowed > > > The control flow is due to the java implementation for these methods, e.g. > > > public static long max(long a, long b) { > return (a >= b) ? a : b; > } > > > This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. > By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. > E.g. > > > SuperWord::transform_loop: > Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined > 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) > > > Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java > 1 1 0 0 > ============================== > TEST SUCCESS > > long min 1155 > long max 1173 > > > After the patch, on darwin/aarch64 (M1): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java > 1 1 0 0 > ============================== > TEST SUCCESS > > long min 1042 > long max 1042 > > > This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. > Therefore, it still relies on the macro expansion to transform those into CMoveL. > > I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier1 2500 2500 0 0 >>> jtreg:test/jdk:tier1 ... Galder Zamarre?o has updated the pull request incrementally with 17 additional commits since the last revision: - Remove previous benchmark effort - Multiply array value in reduction for vectorization to kick in - Renamed benchmark methods - Add min/max benchmark that includes loops and reductions - Skip single array benchmarks - Add an intermediate % that is more representative of real life - Fix compilation error - Fix min case to distribute numbers as per probability - Distribute values targetting a branch percentage * Use a random increment algorithm, to create an array of values such that min/max branch percentage matches. - Fix format of assembly for the movl to movq switch - ... and 7 more: https://git.openjdk.org/jdk/compare/3dd72b89...28778c84 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20098/files - new: https://git.openjdk.org/jdk/pull/20098/files/3dd72b89..28778c84 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20098&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20098&range=00-01 Stats: 562 lines in 5 files changed: 418 ins; 132 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/20098.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20098/head:pull/20098 PR: https://git.openjdk.org/jdk/pull/20098 From galder at openjdk.org Fri Sep 27 14:21:57 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 27 Sep 2024 14:21:57 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v3] In-Reply-To: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: > This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. > > Currently vectorization does not kick in for loops containing either of these calls because of the following error: > > > VLoop::check_preconditions: failed: control flow in loop not allowed > > > The control flow is due to the java implementation for these methods, e.g. > > > public static long max(long a, long b) { > return (a >= b) ? a : b; > } > > > This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. > By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. > E.g. > > > SuperWord::transform_loop: > Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined > 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) > > > Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java > 1 1 0 0 > ============================== > TEST SUCCESS > > long min 1155 > long max 1173 > > > After the patch, on darwin/aarch64 (M1): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java > 1 1 0 0 > ============================== > TEST SUCCESS > > long min 1042 > long max 1042 > > > This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. > Therefore, it still relies on the macro expansion to transform those into CMoveL. > > I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier1 2500 2500 0 0 >>> jtreg:test/jdk:tier1 ... Galder Zamarre?o has updated the pull request incrementally with three additional commits since the last revision: - Revert "Implement cmovL as a jump+mov branch" This reverts commit 1522e26bf66c47b780ebd0d0d0c4f78a4c564e44. - Revert "Switch movl to movq" This reverts commit a64fcdab7d6c63125c8dfd427ae8a56ff5fa2bb7. - Revert "Fix format of assembly for the movl to movq switch" This reverts commit 13ed87295cff50ff6ef30f909f6dcb35d15af047. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20098/files - new: https://git.openjdk.org/jdk/pull/20098/files/28778c84..16ae2a33 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20098&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20098&range=01-02 Stats: 9 lines in 1 file changed: 0 ins; 6 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20098.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20098/head:pull/20098 PR: https://git.openjdk.org/jdk/pull/20098 From galder at openjdk.org Fri Sep 27 14:24:39 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 27 Sep 2024 14:24:39 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v3] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Fri, 27 Sep 2024 14:21:57 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarre?o has updated the pull request incrementally with three additional commits since the last revision: > > - Revert "Implement cmovL as a jump+mov branch" > > This reverts commit 1522e26bf66c47b780ebd0d0d0c4f78a4c564e44. > - Revert "Switch movl to movq" > > This reverts commit a64fcdab7d6c63125c8dfd427ae8a56ff5fa2bb7. > - Revert "Fix format of assembly for the movl to movq switch" > > This reverts commit 13ed87295cff50ff6ef30f909f6dcb35d15af047. Reverted the ad changes, those are not related to this PR. While exploring the differences in performance between base and the patched version, I wondered why the cmov version was slower than the branch one. As part of that investigation I played around with modifying the ad file to make cmov emit a branch instead. The details of this can be found in https://bugs.openjdk.org/browse/JDK-8340206 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2379401009 From fbredberg at openjdk.org Fri Sep 27 14:31:43 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Fri, 27 Sep 2024 14:31:43 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v5] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 09:58:48 GMT, Martin Doerr wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Update four, after the review > > Test results on PPC64 look good. Is it already documented somewhere that `LockUnlock.testContendedLock` is getting slower with this change? I couldn't see it. @TheRealMDoerr > Test results on PPC64 look good. Is it already documented somewhere that `LockUnlock.testContendedLock` is getting slower with this change? I couldn't see it. Thank you for testing on PPC64. I've added the missing documentation [here](https://bugs.openjdk.org/browse/JDK-8320318?focusedId=14708416&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14708416). ------------- PR Comment: https://git.openjdk.org/jdk/pull/19454#issuecomment-2379416738 From rcastanedalo at openjdk.org Fri Sep 27 14:35:54 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 27 Sep 2024 14:35:54 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v28] In-Reply-To: <-CikzUsH1qKbMujGJQFhaPlKaCUDzqH-jEZNM5BZVQQ=.22d236a1-a69a-42e0-86d1-aa738c6e6e6d@github.com> References: <-CikzUsH1qKbMujGJQFhaPlKaCUDzqH-jEZNM5BZVQQ=.22d236a1-a69a-42e0-86d1-aa738c6e6e6d@github.com> Message-ID: On Thu, 12 Sep 2024 15:42:59 GMT, Thomas Stuefe wrote: >> src/hotspot/share/opto/machnode.cpp line 390: >> >>> 388: t = t->make_ptr(); >>> 389: } >>> 390: if (t->isa_narrowklass() && CompressedKlassPointers::shift() == 0) { >> >> Does this change have any effect? `UseCompressedClassPointers` should be implied by `t->isa_narrowklass()`. > > I don't remember if this change was a reaction to an error or if I just guarded `CompressedKlassPointers::shift()` with +UseCCP because that is the prerequisite now. Probably the latter. I can remove this. Probably should assert then for +UseCCP. @tstuefe @rkennke what do you think about this suggestion? If there is a known case where `t->isa_narrowklass() && !UseCompressedClassPointers` holds, it should be investigated because it might be a symptom of a larger problem. If there is no such a case, I think the explicit `UseCompressedClassPointers` test should be removed to avoid confusion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1778724120 From galder at openjdk.org Fri Sep 27 14:40:37 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 27 Sep 2024 14:40:37 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v3] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Fri, 27 Sep 2024 14:21:57 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarre?o has updated the pull request incrementally with three additional commits since the last revision: > > - Revert "Implement cmovL as a jump+mov branch" > > This reverts commit 1522e26bf66c47b780ebd0d0d0c4f78a4c564e44. > - Revert "Switch movl to movq" > > This reverts commit a64fcdab7d6c63125c8dfd427ae8a56ff5fa2bb7. > - Revert "Fix format of assembly for the movl to movq switch" > > This reverts commit 13ed87295cff50ff6ef30f909f6dcb35d15af047. The numbers in https://github.com/openjdk/jdk/pull/20098#issuecomment-2379386872 might have been obtained with the ad changes included in them. I'll re-run the benchmarks again (in about ~2 weeks time) and post the results. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2379434675 From sgibbons at openjdk.org Fri Sep 27 14:47:51 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Fri, 27 Sep 2024 14:47:51 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> Message-ID: On Fri, 27 Sep 2024 08:24:50 GMT, Roman Kennke wrote: >> @rkennke I reviewed [rkennke@ 097c2af](https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1) and the code looks good to me. I would prefer this approach instead of not generating the IndexOf intrinsic. >> >> Should the controlling `if` be conditioned on `UseCompactObjectHeaders` instead of `arrayOopDesc::base_offset_in_bytes`? I can see benefits to either - which provides more clarity? I like the assert as it makes the intention clear (thanks!). > > I like to have the functional connection: if - for whatever reason - the array base offset is smaller than 16, we need to deal with that. The reason for this happens to be `UseCompactObjectHeaders`, but that may not be clear to the reader of the code. I could add an `assert(UseCompactObjectHeaders` in that branch to make that connection clear. Also consider that `UseCompactObjectHeaders` is intended to go away at some point. > > I wonder if having 2 or 3 branches ahead of the main-loop (which probably doesn't do much, because haystack is <=32 bytes) is a useful approach, or if there may be a better way to get the bytes on the stack? I don't know enough about the implementation to make that judgement. I believe the code in the patch is good enough as-is, especially if `UseCompactObjectHeaders` is slated to go away. The existing `if` will prevent the < 16 byte header code from being emitted, which is the desired behavior - i.e., if the header size is >= 16, there will be no code emitted to the intrinsic for that block. So there will not be an additional branch for the code when it is executed. I'm good with a comment tying `UseCompactObjectHeaders` to the condition. The comment can be removed when the flag is removed. "Ship it" :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1778739517 From aph at openjdk.org Fri Sep 27 14:52:42 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 27 Sep 2024 14:52:42 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v20] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <_JsfxYz0sFoNJovghDK3c1Na9OCnzqWiImy1jrFo61E=.6dd3e3e4-ab98-4449-bcba-05e45810ac67@github.com> Message-ID: On Fri, 27 Sep 2024 13:36:06 GMT, Mikhail Ablakatov wrote: >> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5456: >> >>> 5454: } else { >>> 5455: __ uaddwv2(vmul0, vmul0, Assembler::T4S, vdata0, Assembler::T8H); >>> 5456: } >> >> Maybe define `addwv2` and `addwv` in MacroAssembler. >> Suggestion: >> >> __ addwv2(is_signed_subword_type(eltype), vmul0, vmul0, Assembler::T4S, vdata0, Assembler::T8H); > > I believe that an interface should be explicit and map 1:1 to real instructions when it comes to assembly whereas possible. > > Anyway, regardless of my preferences, as far as I can see, currently `Assembler` provides all other signed/unsigned versions of arithmetic instructions separately. Adding a single method like this would make the whole API inconsistent. Therefore, I suggest leaving it as is. I have no problem at all with what class Assembler provides. However, when the result looks like this, even a "normal" assembler programmer would suggest macros rather than copy-and-paste: assert(is_subword_type(eltype), "subword type expected"); if (is_signed_subword_type(eltype)) { __ saddwv(vmul3, vmul3, Assembler::T4S, vdata3, Assembler::T4H); __ saddwv(vmul2, vmul2, Assembler::T4S, vdata2, Assembler::T4H); __ saddwv(vmul1, vmul1, Assembler::T4S, vdata1, Assembler::T4H); __ saddwv(vmul0, vmul0, Assembler::T4S, vdata0, Assembler::T4H); } else { __ uaddwv(vmul3, vmul3, Assembler::T4S, vdata3, Assembler::T4H); __ uaddwv(vmul2, vmul2, Assembler::T4S, vdata2, Assembler::T4H); __ uaddwv(vmul1, vmul1, Assembler::T4S, vdata1, Assembler::T4H); __ uaddwv(vmul0, vmul0, Assembler::T4S, vdata0, Assembler::T4H); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1778746689 From aph at openjdk.org Fri Sep 27 14:58:36 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 27 Sep 2024 14:58:36 GMT Subject: RFR: 8341013: Optimize x86/aarch64 MD5 intrinsics by reducing data dependency In-Reply-To: <79fI8ByboUSgIF7r_ka5gQ3QpHy5QacucjQ9Cy429ZQ=.0a785c68-38f2-4734-abf9-5922a69312b1@github.com> References: <0tTwfxNQJz8-XYxBL1zujuv7Cbbe8N1hVqsqddmYB1o=.367aa163-cae4-4d31-a84c-ee7e11c49776@github.com> <79fI8ByboUSgIF7r_ka5gQ3QpHy5QacucjQ9Cy429ZQ=.0a785c68-38f2-4734-abf9-5922a69312b1@github.com> Message-ID: On Thu, 26 Sep 2024 16:23:08 GMT, Oli Gillespie wrote: > > Given that there is so little advantage, almost down in the noise, you should do that. > > Just to check we're talking about the same results - the improvement shown in my aarch64 run is the same (actually a littler more) as the x86 run; around 5.6%, and very high confidence (+-0.1%). OK, fair enough. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21203#issuecomment-2379470402 From lmesnik at openjdk.org Fri Sep 27 15:04:40 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 27 Sep 2024 15:04:40 GMT Subject: Integrated: 8340826: Should not send unload notification for scratch classes In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 16:29:36 GMT, Leonid Mesnik wrote: > The jvmti class redefinition creates temporary scratch classes for it's own purposes. These classes are added to corresponding classloaders and might be unloaded. > In this case the jvmti/jfr and log events are generated twice: for original class and for it's scratch. > > The bug could be reproduced by jfr test > jdk/jfr/api/metadata/eventtype/TestUnloadingEventClass.java > with '-Xcomp -XX:TieredStopAtLevel=1' or with '-Xcomp' > > The test log (modified slightly) shown > > > [167.294s][info ][class,unload] unloading class jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded 0x00000000af1006d8 allocated > [167.294s][info ][class,unload] unloading class jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded 0x00000000af100248 fully_initialized > [167.345s][trace][class,unload] unlinking class (subclass): jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded > [167.872s][trace][gc ] GC(0) Restored 3597 marks, occupying 57552 B > [167.924s][info ][gc ] GC(0) Pause Full (System.gc()) 34M->2M(136M) 691.041ms > Unloaded count: 2 > > > instead of expected > > > > [159.737s][info ][class,unload] unloading class jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded 0x0000000041100248 state: fully_initialized > [159.800s][trace][class,unload] unlinking class (subclass): jdk.jfr.api.metadata.eventtype.TestUnloadingEventClass$ToBeUnloaded > [160.341s][trace][gc ] GC(0) Restored 3597 marks, occupying 57552 B > [160.384s][info ][gc ] GC(0) Pause Full (System.gc()) 34M->2M(136M) 710.422ms > > > > The test hang because got 2 events while waiting for one. > The "allocated" version is the scratch class generated by JVMTI JFR agent that redefine classes. > > The fix is to don't send notification for scratch classes. The scratch classes shouldn't have dependency so added assertion. Also, we don't expect any other not loaded classes during unloaded. > > Thanks Coleen for details about scratch classed. > > Tested with tier1-5 and with :jdk_jfr with Xcomp and c1. This pull request has now been integrated. Changeset: 12de4fbc Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/12de4fbce7a314a1c5c84340526cd65b9a4a29d1 Stats: 17 lines in 4 files changed: 15 ins; 0 del; 2 mod 8340826: Should not send unload notification for scratch classes Reviewed-by: sspitsyn, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/21166 From aph at openjdk.org Fri Sep 27 15:05:45 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 27 Sep 2024 15:05:45 GMT Subject: RFR: 8341013: Optimize x86/aarch64 MD5 intrinsics by reducing data dependency [v2] In-Reply-To: References: <0YcJsAbwtMOHJqUMFkQhpDKsEm3-AFXucByQATljmYc=.7c70a817-3055-4071-88e4-39222bdf3bfb@github.com> Message-ID: <-Zpc_gifltXhuvzG6yki_5crIsKrIkGCvj5_44OgmG4=.7eb29dc8-9b8d-49e8-a909-c3141173726c@github.com> On Fri, 27 Sep 2024 13:45:46 GMT, Hamlin Li wrote: >> Thanks! I can't measure any difference at all with that change, seems to perform identically. > > I see, Thanks! Unless you really want zero extension it's better to use `mov` than `movw` (or `orrw`). On many AArch64 implementations `mov` doesn't even issue. Instead, it is handled by the renamer during the decode stage. However, because it has to clear the upper 32 bits, `movw` does issue. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21203#discussion_r1778763993 From rkennke at openjdk.org Fri Sep 27 15:21:07 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 27 Sep 2024 15:21:07 GMT Subject: RFR: 8307532: Implement LM_LIGHTWEIGHT for Zero Message-ID: <13U9XvpqyKNdHhr3MdXXa8Gc3PTsfByumX-maUm0t7Y=.b33384d8-63b7-4d4f-9b29-f95d4bcc1f48@github.com> This implements the remaining parts of LW locking in Zero. Much of the work has already been done by Axel, this basically only implements the missing part that handles synchronized JNI entries. I basically preserved the LM_LEGACY case, except that I shuffled the code a little to match what we do in monitorexit case in bytecodeInterpreter.cpp (but should be functionally equivalent). The LM_LIGHTWEIGHT and LM_MONITOR case (the latter of which has been broken, before) simply call into the runtime. With this change, we can now remove the block in arguments.cpp that deals with missing LM_LIGHTWEIGHT support. Testing: - [x] bootcycle-images ------------- Commit messages: - 8307532: Implement LM_LIGHTWEIGHT for Zero Changes: https://git.openjdk.org/jdk/pull/21220/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21220&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307532 Stats: 35 lines in 3 files changed: 4 ins; 16 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/21220.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21220/head:pull/21220 PR: https://git.openjdk.org/jdk/pull/21220 From ogillespie at openjdk.org Fri Sep 27 15:46:35 2024 From: ogillespie at openjdk.org (Oli Gillespie) Date: Fri, 27 Sep 2024 15:46:35 GMT Subject: RFR: 8341013: Optimize x86/aarch64 MD5 intrinsics by reducing data dependency [v2] In-Reply-To: <-Zpc_gifltXhuvzG6yki_5crIsKrIkGCvj5_44OgmG4=.7eb29dc8-9b8d-49e8-a909-c3141173726c@github.com> References: <0YcJsAbwtMOHJqUMFkQhpDKsEm3-AFXucByQATljmYc=.7c70a817-3055-4071-88e4-39222bdf3bfb@github.com> <-Zpc_gifltXhuvzG6yki_5crIsKrIkGCvj5_44OgmG4=.7eb29dc8-9b8d-49e8-a909-c3141173726c@github.com> Message-ID: On Fri, 27 Sep 2024 15:02:54 GMT, Andrew Haley wrote: >> I see, Thanks! > > Unless you really want zero extension it's better to use `mov` than `movw` (or `orrw`). > On many AArch64 implementations `mov` doesn't even issue. Instead, it is handled by the renamer during the decode stage. However, because it has to clear the upper 32 bits, `movw` does issue. Thanks! I don't measure a throughput improvement (didn't check any perf counters like instructions retired) when changing the `movw` to `mov` for `t` across F, G, H and I on Neoverse N1 or Mac M1. I'm also not sure how to tell if it's safe, my knowledge is shallow here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21203#discussion_r1778824238 From wkemper at openjdk.org Fri Sep 27 15:47:34 2024 From: wkemper at openjdk.org (William Kemper) Date: Fri, 27 Sep 2024 15:47:34 GMT Subject: RFR: 8340181: Shenandoah: Cleanup ShenandoahRuntime stubs In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 08:52:21 GMT, Aleksey Shipilev wrote: > Noticed this while working on Leyden, which has to enumerate Shenandoah stubs for code archival to work. > > `ShenandoahRuntime::shenandoah_clone_barrier` is excessive name. `ShenandoahRuntime::arraycopy_barrier_oop_entry` and friends is not covered by `JRT_LEAF`. This change hopefully homogenizes the namings for the stubs. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` Marked as reviewed by wkemper (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21152#pullrequestreview-2334085291 From mdoerr at openjdk.org Fri Sep 27 16:17:37 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 27 Sep 2024 16:17:37 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release I've run some of these benchmarks on PPC64le and couldn't spot a regression, but the results are not very stable and I guess that they are not very sensitive to class initialization. I really wonder about the acquire barrier in `LIR_Assembler::emit_alloc_obj`. The interesting fields of the class are already read by `LIRGenerator::new_instance` during compile time. How can an acquire barrier after the execution help? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2379632980 From rkennke at openjdk.org Fri Sep 27 16:25:54 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 27 Sep 2024 16:25:54 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> Message-ID: On Fri, 27 Sep 2024 14:44:35 GMT, Scott Gibbons wrote: >> I like to have the functional connection: if - for whatever reason - the array base offset is smaller than 16, we need to deal with that. The reason for this happens to be `UseCompactObjectHeaders`, but that may not be clear to the reader of the code. I could add an `assert(UseCompactObjectHeaders` in that branch to make that connection clear. Also consider that `UseCompactObjectHeaders` is intended to go away at some point. >> >> I wonder if having 2 or 3 branches ahead of the main-loop (which probably doesn't do much, because haystack is <=32 bytes) is a useful approach, or if there may be a better way to get the bytes on the stack? I don't know enough about the implementation to make that judgement. > > I believe the code in the patch is good enough as-is, especially if `UseCompactObjectHeaders` is slated to go away. The existing `if` will prevent the < 16 byte header code from being emitted, which is the desired behavior - i.e., if the header size is >= 16, there will be no code emitted to the intrinsic for that block. So there will not be an additional branch for the code when it is executed. > > I'm good with a comment tying `UseCompactObjectHeaders` to the condition. The comment can be removed when the flag is removed. "Ship it" :-) Wait a second, I've probably not been clear. `UseCompactObjectHeaders` is slated to become *on by default* and then slated to go away. That means that array base offets <= 16 bytes will become the default. The generated code will be something like: if (haystack_len <= 8) { // Copy 8 bytes onto stack } else if (haystack_len <= 16) { // Copy 16 bytes onto stack } else { // Copy 32 bytes onto stack } So that is 2 branches in this prologue code instead of originally 1. However, I just noticed that what I proposed is not enough. Consider what happens when haystack_len is 17. This would take the last case and copy 32 bytes. But we only have 17+8=25 bytes that we can guarantee to be available for copying. If this happens to be the array at the very beginning of the heap (very rare/unlikely), this would segfault. I think I need to mull over it some more to come up with a correct fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1778874906 From yzheng at openjdk.org Fri Sep 27 16:34:55 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Fri, 27 Sep 2024 16:34:55 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: <2w9H6VAbxm7BgSGRwKAxbI56bG-k4bE_ZDviGrBF36o=.3d4cb47f-0f84-479a-a809-6d0186dfad2d@github.com> References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> <2w9H6VAbxm7BgSGRwKAxbI56bG-k4bE_ZDviGrBF36o=.3d4cb47f-0f84-479a-a809-6d0186dfad2d@github.com> Message-ID: On Thu, 19 Sep 2024 14:22:51 GMT, Stefan Karlsson wrote: >> We haven't decided whether or not we will git rid of ```Klass::_prototype_header``` before intergrating this PR, or not. @stefank could point you to a WIP branch, if that's helpful. > > This is my current work-in-progress code: > https://github.com/stefank/jdk/compare/pull/20677...stefank:jdk:lilliput_remove_prototype_header_wip_2 > > I've made some large rewrites and I'm currently running it through functional testing. If @stefank 's patch does not go in this PR, could you please export `Klass::_prototype_header` to JVMCI? Thanks! diff --git a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp index 9d1b8a1cb9f..e462025074f 100644 --- a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp +++ b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp @@ -278,6 +278,7 @@ nonstatic_field(Klass, _bitmap, uintx) \ nonstatic_field(Klass, _hash_slot, uint8_t) \ nonstatic_field(Klass, _misc_flags._flags, u1) \ + nonstatic_field(Klass, _prototype_header, markWord) \ \ nonstatic_field(LocalVariableTableElement, start_bci, u2) \ nonstatic_field(LocalVariableTableElement, length, u2) \ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1778884055 From galder at openjdk.org Fri Sep 27 16:53:47 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 27 Sep 2024 16:53:47 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v3] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: <2F2rvBSpHjpMXu40xa2hUUqWQYJJihO7mvXD73OCqKQ=.4cf78e1b-6d7a-4188-888d-f901fcf338cc@github.com> On Fri, 27 Sep 2024 14:21:57 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarre?o has updated the pull request incrementally with three additional commits since the last revision: > > - Revert "Implement cmovL as a jump+mov branch" > > This reverts commit 1522e26bf66c47b780ebd0d0d0c4f78a4c564e44. > - Revert "Switch movl to movq" > > This reverts commit a64fcdab7d6c63125c8dfd427ae8a56ff5fa2bb7. > - Revert "Fix format of assembly for the movl to movq switch" > > This reverts commit 13ed87295cff50ff6ef30f909f6dcb35d15af047. Failure seems related to the patch, I'll look at it when I re-execute the benchmarks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2379688866 From kvn at openjdk.org Fri Sep 27 17:13:36 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 27 Sep 2024 17:13:36 GMT Subject: RFR: 8341013: Optimize x86/aarch64 MD5 intrinsics by reducing data dependency [v2] In-Reply-To: <0YcJsAbwtMOHJqUMFkQhpDKsEm3-AFXucByQATljmYc=.7c70a817-3055-4071-88e4-39222bdf3bfb@github.com> References: <0YcJsAbwtMOHJqUMFkQhpDKsEm3-AFXucByQATljmYc=.7c70a817-3055-4071-88e4-39222bdf3bfb@github.com> Message-ID: <7rUtiarSdQCBZurbsYojwyD-R4ItUxPgt1qEq58Wrm4=.9d141360-f225-4093-96ab-1ea618914f90@github.com> On Thu, 26 Sep 2024 14:58:49 GMT, Oli Gillespie wrote: >> As suggested in https://github.com/animetosho/md5-optimisation?tab=readme-ov-file#dependency-shortcut-in-g-function, we can delay the dependency on 'b' by recognizing that the ((d & b) | (~d & c)) is equivalent to ((d & b) + (~d & c)) in this scenario, and we can perform those additions independently, leaving our dependency on b to the final addition. This speeds it up around 5%. >> >> Benchmark results on my two hosts: >> >> >> Benchmark (algorithm) (dataSize) (provider) Mode Cnt Score Error Units >> >> x86 Before: >> MessageDigestBench.digest MD5 1048576 thrpt 10 636.389 ? 0.240 ops/s >> >> x86 After: >> MessageDigestBench.digest MD5 1048576 thrpt 10 671.611 ? 0.226 ops/s (+5.5%) >> >> >> aarch64 Before: >> MessageDigestBench.digest MD5 1048576 thrpt 10 498.613 ? 0.359 ops/s >> >> aarch64 After: >> MessageDigestBench.digest MD5 1048576 thrpt 10 526.008 ? 0.491 ops/s (+5.6%) > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Fix aarch64 bug @ascarpino please check this change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21203#issuecomment-2379722321 From duke at openjdk.org Fri Sep 27 17:27:40 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Fri, 27 Sep 2024 17:27:40 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v20] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <_JsfxYz0sFoNJovghDK3c1Na9OCnzqWiImy1jrFo61E=.6dd3e3e4-ab98-4449-bcba-05e45810ac67@github.com> Message-ID: On Fri, 27 Sep 2024 14:49:53 GMT, Andrew Haley wrote: >> I believe that an interface should be explicit and map 1:1 to real instructions when it comes to assembly whereas possible. >> >> Anyway, regardless of my preferences, as far as I can see, currently `Assembler` provides all other signed/unsigned versions of arithmetic instructions separately. Adding a single method like this would make the whole API inconsistent. Therefore, I suggest leaving it as is. > > I have no problem at all with what class Assembler provides. However, when the result looks like this, even a "normal" assembler programmer would suggest macros rather than copy-and-paste: > > > assert(is_subword_type(eltype), "subword type expected"); > if (is_signed_subword_type(eltype)) { > __ saddwv(vmul3, vmul3, Assembler::T4S, vdata3, Assembler::T4H); > __ saddwv(vmul2, vmul2, Assembler::T4S, vdata2, Assembler::T4H); > __ saddwv(vmul1, vmul1, Assembler::T4S, vdata1, Assembler::T4H); > __ saddwv(vmul0, vmul0, Assembler::T4S, vdata0, Assembler::T4H); > } else { > __ uaddwv(vmul3, vmul3, Assembler::T4S, vdata3, Assembler::T4H); > __ uaddwv(vmul2, vmul2, Assembler::T4S, vdata2, Assembler::T4H); > __ uaddwv(vmul1, vmul1, Assembler::T4S, vdata1, Assembler::T4H); > __ uaddwv(vmul0, vmul0, Assembler::T4S, vdata0, Assembler::T4H); > } Would it be possible to integrate this as is? The code was approved twice before, even though it had the same constructs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1778936507 From ccheung at openjdk.org Fri Sep 27 17:36:45 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 27 Sep 2024 17:36:45 GMT Subject: RFR: 8328313: Archived module graph should allow identical --module-path to be specified during dump time and run time [v5] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 11:11:20 GMT, Alan Bateman wrote: > Do you remember why resetArchivedStates resets the resource cache? I would expected it to be cleared for all class loaders. > I think it is because `resourceCache` is a `SoftReference` and it will fail the check in `JavaClasses::is_supported_for_archiving()`. > Rather than putting something specific to the app class loader here then maybe it should be renamed and have resetArchivedStates call it, e.g. > > ``` > void resetArchivedStates(boolean all) { > ucp = null; > resourceCache = null; > if (all) { > moduleToReader.clear(); > } > } > ``` Under `if(all)`, we also need to do `setClassPath(null)`. If I understand your suggestion correctly, in `BuiltinClassLoader`: private void resetArchivedStates() { resetArchivedStates(false); } In `ClassLoaders.AppClassLoader`: private void resetArchivedStates() { resetArchivedStates(true); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1778944555 From aph at openjdk.org Fri Sep 27 17:38:41 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 27 Sep 2024 17:38:41 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v21] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Fri, 27 Sep 2024 13:19:54 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: > > cleanup: collapse duplicated code into a macro > > Co-authored-by: Andrew Haley Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/18487#pullrequestreview-2334285831 From aph at openjdk.org Fri Sep 27 17:38:42 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 27 Sep 2024 17:38:42 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v20] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> <_JsfxYz0sFoNJovghDK3c1Na9OCnzqWiImy1jrFo61E=.6dd3e3e4-ab98-4449-bcba-05e45810ac67@github.com> Message-ID: <1jgThrVJZZzIKLChNJa6SRjjPI5bP2RtfpYNWlX0JWs=.509be4ff-898c-4ba5-b3b3-57aaa4b0b3b7@github.com> On Fri, 27 Sep 2024 17:25:21 GMT, Mikhail Ablakatov wrote: >> I have no problem at all with what class Assembler provides. However, when the result looks like this, even a "normal" assembler programmer would suggest macros rather than copy-and-paste: >> >> >> assert(is_subword_type(eltype), "subword type expected"); >> if (is_signed_subword_type(eltype)) { >> __ saddwv(vmul3, vmul3, Assembler::T4S, vdata3, Assembler::T4H); >> __ saddwv(vmul2, vmul2, Assembler::T4S, vdata2, Assembler::T4H); >> __ saddwv(vmul1, vmul1, Assembler::T4S, vdata1, Assembler::T4H); >> __ saddwv(vmul0, vmul0, Assembler::T4S, vdata0, Assembler::T4H); >> } else { >> __ uaddwv(vmul3, vmul3, Assembler::T4S, vdata3, Assembler::T4H); >> __ uaddwv(vmul2, vmul2, Assembler::T4S, vdata2, Assembler::T4H); >> __ uaddwv(vmul1, vmul1, Assembler::T4S, vdata1, Assembler::T4H); >> __ uaddwv(vmul0, vmul0, Assembler::T4S, vdata0, Assembler::T4H); >> } > > Would it be possible to integrate this as is? The code was approved twice before, even though it had the same constructs. OK. It's a fairly minor style thing, something to remember for the future. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18487#discussion_r1778946881 From aph at openjdk.org Fri Sep 27 17:44:36 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 27 Sep 2024 17:44:36 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v3] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Fri, 27 Sep 2024 14:15:04 GMT, Galder Zamarre?o wrote: > The only situation where this PR is a regression compared to current code is when the one of the branch side is always taken. Bear in mind that's quite common. It's not very unusual to clip a range with something equivalent to `x = min(max(x, lowest), highest)`. What does benchmarking that look like, when all the `x` are within that range? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2379768983 From kvn at openjdk.org Fri Sep 27 17:47:37 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 27 Sep 2024 17:47:37 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v3] In-Reply-To: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> References: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> Message-ID: On Fri, 19 Jul 2024 15:52:14 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Amend the test case for guaranteing it works under different compilation regimes Is ZGC affected by this? I see only G1 and Shenandoah changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2379772205 From qamai at openjdk.org Fri Sep 27 18:12:53 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 27 Sep 2024 18:12:53 GMT Subject: RFR: 8341102: Add element type information to vector types Message-ID: Hi, This patch adds the type information of each element in a `TypeVect`. This helps constant folding vectors as well as strength reduction of several complex operations such as `Rearrange`. Some notable points: - I only implement `ConV` rule on x86, looking at other architectures it seems that I would not only need to implement the `ConV` implementations, but several other rules that match `ReplicateNode` of a constant. - I changed the implementation of an array constant in `constanttable`, I think working with `jbyte` is easier as it allows `memcpy` and at this point, we are close to the metal anyway. - Constant folding for a `VectorUnboxNode`, this is special because an element of a normal stable array is only constant if it is non-zero, so implementing constant folding on a load node seems less trivial. - Memory fences because `Vector::payload` is a final field and we should respect that. - Several places expect a `const Type*` when in reality it expects a `BasicType`, I refactor that so that the intent is clearer and there is less room for possible errors, this is needed because `byte`, `short` and `int` share the same kind of `const Type*`. Please take a look and leave your reviews, thanks a lot. ------------- Commit messages: - build error - add element types to vector types Changes: https://git.openjdk.org/jdk/pull/21229/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21229&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341102 Stats: 1401 lines in 39 files changed: 863 ins; 332 del; 206 mod Patch: https://git.openjdk.org/jdk/pull/21229.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21229/head:pull/21229 PR: https://git.openjdk.org/jdk/pull/21229 From shade at openjdk.org Fri Sep 27 18:14:36 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 27 Sep 2024 18:14:36 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 16:14:29 GMT, Martin Doerr wrote: > I really wonder about the acquire barrier in `LIR_Assembler::emit_alloc_obj`. The interesting fields of the class are already read by `LIRGenerator::new_instance` during compile time. How can an acquire barrier after the execution help? At least, it doesn't help the allocation itself. Well, that's the thing: if compiler _does not know_ the class is initialized, it emits the runtime check for class initialization. Here, in `LIRGenerator::new_instance` we enter with `init_check = true` (`!klass->is_initialized()`): https://github.com/openjdk/jdk/blob/65200a9589e46956a2194b20c4c90d003351a539/src/hotspot/share/c1/c1_LIRGenerator.cpp#L670-L671 In generated code, we come to `init_check` block, check at runtime if class is fully initialized, and proceed to the rest of allocation path if so: https://github.com/openjdk/jdk/blob/f554c3ffce7599fdb535b03db4a6ea96870b3c2d/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp#L2275-L2277 If we were the only thread, it would not have been a problem: on first entry we would have called the stub, initialized the class and completed the allocation there. Next time around we would have passed `init_check == fully_initialized`, and proceeded without calling a stub. But the caveat we are handling in this PR is that if _some other thread_ might have completed the class initialization, we need to make sure _this thread_ sees the class state consistently. For example, if its Java constructor reads class statics written in ``. The initializing thread would do release-store for `init_check = fully_initialized`. On this reader side, we need a related acquire-load in the runtime check. Since runtime check does not run often -- most of the time compilers know the class is definitely initialized, the change does not affect performance all that much, if at all. Makes sense? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2379809426 From bulasevich at openjdk.org Fri Sep 27 18:20:05 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Fri, 27 Sep 2024 18:20:05 GMT Subject: RFR: 8341101: [ARM32] Error: ShouldNotReachHere() in TemplateInterpreterGenerator::generate_math_entry after 8338694 Message-ID: [JDK-8338694](https://bugs.openjdk.org/browse/JDK-8338694) introduced Interpreter::java_lang_math_tanh which needs to be handled by the interpreter. Since the intrinsic is not implemented for ARM32, it should be implicitly skipped it in the TemplateInterpreterGenerator::generate_math_entry to avoid ShouldNotReachHere. ------------- Commit messages: - 8341101: [ARM32] Error: ShouldNotReachHere() in TemplateInterpreterGenerator::generate_math_entry after 8338694 Changes: https://git.openjdk.org/jdk/pull/21228/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21228&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341101 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21228.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21228/head:pull/21228 PR: https://git.openjdk.org/jdk/pull/21228 From shade at openjdk.org Fri Sep 27 18:24:38 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 27 Sep 2024 18:24:38 GMT Subject: RFR: 8341101: [ARM32] Error: ShouldNotReachHere() in TemplateInterpreterGenerator::generate_math_entry after 8338694 In-Reply-To: References: Message-ID: <5YZ0Mx7VRCuuUBYtdCvwVEH5_gXWy_0BaQWWNKmoH6o=.37a71119-6719-4f36-88ec-bbef7ce2ef9e@github.com> On Fri, 27 Sep 2024 17:17:09 GMT, Boris Ulasevich wrote: > [JDK-8338694](https://bugs.openjdk.org/browse/JDK-8338694) introduced Interpreter::java_lang_math_tanh which needs to be handled by the interpreter. Since the intrinsic is not implemented for ARM32, it should be implicitly skipped it in the TemplateInterpreterGenerator::generate_math_entry to avoid ShouldNotReachHere. Looks fine. I think this is pretty trivial too. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21228#pullrequestreview-2334367762 From shade at openjdk.org Fri Sep 27 19:00:39 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 27 Sep 2024 19:00:39 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v3] In-Reply-To: References: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> Message-ID: On Fri, 27 Sep 2024 17:44:38 GMT, Vladimir Kozlov wrote: > Is ZGC affected by this? I see only G1 and Shenandoah changes. Good question. ZGC expands the GC barriers late. This is why the IR test configuration that tests ZGC shows the same result as with other collectors: no additional fluff in IR. I would not expect we need anything else in late expansion for ZGC for Reference.clear, but maybe I am tired and cannot see it. Can you confirm this is fine, @fisk? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2379881102 From mdoerr at openjdk.org Fri Sep 27 19:02:38 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 27 Sep 2024 19:02:38 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release Thanks for the explanation. This makes sense. Nevertheless, the aforementioned `membar_storestore()` follows the allocation immediately and it includes an acquire barrier for the current thread, too. So, the extra acquire is redundant. At least for the C1 code and probably at more places. This is not so obvious, so we may be able to live with what you have as long as performance is ok. Otherwise, we could still do a follow-up. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2379883568 From shade at openjdk.org Fri Sep 27 19:11:35 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 27 Sep 2024 19:11:35 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 18:59:36 GMT, Martin Doerr wrote: > Nevertheless, the aforementioned `membar_storestore()` follows the allocation immediately and it includes an acquire barrier for the current thread, too. OK, I see what you are getting at. But isn't that barrier still too late? See: Thread 1 (in "new A()"): IK::init_state = fully_initialized [stalls] Thread 2: (also in "new A()"): membar_storestore(); // <---- nothing to cumulate with yet // not seeing result fully, no barriers! Thread 1: [resumes] membar_storestore(); // <---- too late! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2379895063 From kvn at openjdk.org Fri Sep 27 19:20:39 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 27 Sep 2024 19:20:39 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v3] In-Reply-To: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> References: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> Message-ID: On Fri, 19 Jul 2024 15:52:14 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Amend the test case for guaranteing it works under different compilation regimes There is coming JEP for later G1 barriers expansion similar to ZGC. Will you still need this intrinsic after it? I assume Shenandoah will follow G1 later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2379910959 From mdoerr at openjdk.org Fri Sep 27 19:25:35 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 27 Sep 2024 19:25:35 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release The point is that compiled `` is only a TLAB pointer bump. It doesn't read anything from the class. We only need an acquire barrier anywhere between `` and ``. The latter will always see ` result fully` because `membar_storestore();` acts as acquire barrier (PPC64 specifically). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2379917187 From shade at openjdk.org Fri Sep 27 22:41:43 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 27 Sep 2024 22:41:43 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release All right, granted. We can make an argument that a release store to `IK::init_state` can be matched with cumulative barrier like `storestore` at the end of C1-compiled allocation code. That said, it looks quite fragile, since: a) it depends on cumulative properties of low-level hardware primitives, and b) it likely only holds true for C1, as C2 normally coalesces header-protecting `storestore` with the final field `storestore`. Given that we expect no perf problems on this seemingly rare path, I prefer not to go into exploiting those specifics, unless you feel strongly otherwise :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2380238637 From ccheung at openjdk.org Fri Sep 27 22:44:05 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 27 Sep 2024 22:44:05 GMT Subject: RFR: 8328313: Archived module graph should allow identical --module-path to be specified during dump time and run time [v6] In-Reply-To: References: Message-ID: > Prior to this patch, if `--module-path` is specified in the command line: > during CDS dump time, full module graph will not be included in the CDS archive; > during run time, full module graph will not be used. > > With this patch, the full module graph will be included in the CDS archive with the `--module-path` option. During run time, if the same `--module-path` option is specified, the archived module graph will be used. > > The checking of module paths between dump time and run time is more lenient compared with the checking of class paths; the ordering of the modules is unimportant, duplicate module names are ignored. > E.g. the following is considered a match: > dump time runtime > m1,m2 m2,m1 > m1,m2 m1,m2,m2 > > I included some [notes](https://bugs.openjdk.org/browse/JDK-8328313?focusedId=14699275&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14699275) in the bug report regarding some changes in the corelib classes. Calvin Cheung has updated the pull request incrementally with one additional commit since the last revision: @AlanBateman comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21048/files - new: https://git.openjdk.org/jdk/pull/21048/files/d96d78f8..3da4f9f5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21048&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21048&range=04-05 Stats: 12 lines in 2 files changed: 4 ins; 1 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/21048.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21048/head:pull/21048 PR: https://git.openjdk.org/jdk/pull/21048 From ccheung at openjdk.org Fri Sep 27 23:05:31 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 27 Sep 2024 23:05:31 GMT Subject: RFR: 8328313: Archived module graph should allow identical --module-path to be specified during dump time and run time [v7] In-Reply-To: References: Message-ID: > Prior to this patch, if `--module-path` is specified in the command line: > during CDS dump time, full module graph will not be included in the CDS archive; > during run time, full module graph will not be used. > > With this patch, the full module graph will be included in the CDS archive with the `--module-path` option. During run time, if the same `--module-path` option is specified, the archived module graph will be used. > > The checking of module paths between dump time and run time is more lenient compared with the checking of class paths; the ordering of the modules is unimportant, duplicate module names are ignored. > E.g. the following is considered a match: > dump time runtime > m1,m2 m2,m1 > m1,m2 m1,m2,m2 > > I included some [notes](https://bugs.openjdk.org/browse/JDK-8328313?focusedId=14699275&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14699275) in the bug report regarding some changes in the corelib classes. Calvin Cheung has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - fix merge error - Merge branch 'master' into 8328313-FMG-module-path - @AlanBateman comments - @iklam comments - fix indentation - trailing whitespace - comments from David, Alan, and Ioi - 8328313: Archived module graph should allow identical --module-path to be specified during dump time and run time ------------- Changes: https://git.openjdk.org/jdk/pull/21048/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21048&range=06 Stats: 525 lines in 19 files changed: 480 ins; 2 del; 43 mod Patch: https://git.openjdk.org/jdk/pull/21048.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21048/head:pull/21048 PR: https://git.openjdk.org/jdk/pull/21048 From bulasevich at openjdk.org Fri Sep 27 23:14:42 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Fri, 27 Sep 2024 23:14:42 GMT Subject: RFR: 8341101: [ARM32] Error: ShouldNotReachHere() in TemplateInterpreterGenerator::generate_math_entry after 8338694 In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 17:17:09 GMT, Boris Ulasevich wrote: > [JDK-8338694](https://bugs.openjdk.org/browse/JDK-8338694) introduced Interpreter::java_lang_math_tanh which needs to be handled by the interpreter. Since the intrinsic is not implemented for ARM32, it should be implicitly skipped it in the TemplateInterpreterGenerator::generate_math_entry to avoid ShouldNotReachHere. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21228#issuecomment-2380260605 From bulasevich at openjdk.org Fri Sep 27 23:14:43 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Fri, 27 Sep 2024 23:14:43 GMT Subject: Integrated: 8341101: [ARM32] Error: ShouldNotReachHere() in TemplateInterpreterGenerator::generate_math_entry after 8338694 In-Reply-To: References: Message-ID: <-sRyNQ30nUsPaxaTsnLzBF5CvtHaE_iiz6eGUi5UuT8=.4909adc2-2262-4b12-8e25-8d7784f6a693@github.com> On Fri, 27 Sep 2024 17:17:09 GMT, Boris Ulasevich wrote: > [JDK-8338694](https://bugs.openjdk.org/browse/JDK-8338694) introduced Interpreter::java_lang_math_tanh which needs to be handled by the interpreter. Since the intrinsic is not implemented for ARM32, it should be implicitly skipped it in the TemplateInterpreterGenerator::generate_math_entry to avoid ShouldNotReachHere. This pull request has now been integrated. Changeset: ed140f5d Author: Boris Ulasevich URL: https://git.openjdk.org/jdk/commit/ed140f5d5e2dec1217e2efbee815d84306de0563 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8341101: [ARM32] Error: ShouldNotReachHere() in TemplateInterpreterGenerator::generate_math_entry after 8338694 Reviewed-by: shade ------------- PR: https://git.openjdk.org/jdk/pull/21228 From kbarrett at openjdk.org Fri Sep 27 23:56:37 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 27 Sep 2024 23:56:37 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v3] In-Reply-To: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> References: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> Message-ID: On Fri, 19 Jul 2024 15:52:14 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Amend the test case for guaranteing it works under different compilation regimes Changes requested by kbarrett (Reviewer). src/java.base/share/classes/java/lang/ref/Reference.java line 420: > 418: /* Implementation of clear(), also used by enqueue(). A simple > 419: * assignment of the referent field won't do for some garbage > 420: * collectors. Description of clear0 is rendered stale by this change. The first sentence is no longer true, since it's now clearImpl that has that role. The second sentence probably ought to also be moved into the description of clearImpl. ------------- PR Review: https://git.openjdk.org/jdk/pull/20139#pullrequestreview-2334816850 PR Review Comment: https://git.openjdk.org/jdk/pull/20139#discussion_r1779311136 From lmesnik at openjdk.org Sat Sep 28 01:19:17 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sat, 28 Sep 2024 01:19:17 GMT Subject: RFR: 8340988: Update jdk/jfr/event/gc/collection tests to accept "CodeCache GC Threshold" as valid GC reason Message-ID: <1ZSbNKjlCqyiZJgH3lC79mZI38WGreVQZ4hILJGzCao=.29b6dd30-a9c7-475d-be65-2a11ef62e71e@github.com> Tests jdk/jdk/jfr/event/gc/collection/TestGCCauseWith* GC check the GC reasons. They GC might be caused by "CodeCache GC Threshold" if test is executed with Xcomp and GC caused by codecache cleanup. ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/21238/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21238&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340988 Stats: 10 lines in 4 files changed: 2 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/21238.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21238/head:pull/21238 PR: https://git.openjdk.org/jdk/pull/21238 From qamai at openjdk.org Sat Sep 28 01:44:54 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 28 Sep 2024 01:44:54 GMT Subject: RFR: 8341102: Add element type information to vector types [v2] In-Reply-To: References: Message-ID: > Hi, > > This patch adds the type information of each element in a `TypeVect`. This helps constant folding vectors as well as strength reduction of several complex operations such as `Rearrange`. Some notable points: > > - I only implement `ConV` rule on x86, looking at other architectures it seems that I would not only need to implement the `ConV` implementations, but several other rules that match `ReplicateNode` of a constant. > - I changed the implementation of an array constant in `constanttable`, I think working with `jbyte` is easier as it allows `memcpy` and at this point, we are close to the metal anyway. > - Constant folding for a `VectorUnboxNode`, this is special because an element of a normal stable array is only constant if it is non-zero, so implementing constant folding on a load node seems less trivial. > - Memory fences because `Vector::payload` is a final field and we should respect that. > - Several places expect a `const Type*` when in reality it expects a `BasicType`, I refactor that so that the intent is clearer and there is less room for possible errors, this is needed because `byte`, `short` and `int` share the same kind of `const Type*`. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: add mask test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21229/files - new: https://git.openjdk.org/jdk/pull/21229/files/c9bc1c4f..cd5123f3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21229&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21229&range=00-01 Stats: 20 lines in 1 file changed: 19 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21229.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21229/head:pull/21229 PR: https://git.openjdk.org/jdk/pull/21229 From fjiang at openjdk.org Sat Sep 28 11:55:45 2024 From: fjiang at openjdk.org (Feilong Jiang) Date: Sat, 28 Sep 2024 11:55:45 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v23] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 07:57:15 GMT, Roberto Casta?eda Lozano wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with seven additional commits since the last revision: >> >> - Assert that unneeded stub tmp registers are not initialized in x64 and aarch64 platforms >> - Set tmp registers to noreg by default in G1PreBarrierStubC2::initialize_registers, for consistency >> - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion >> - Restore some asserts >> - Default values for tmp regs of G1PostBarrierStubC2 >> - 8334060: [arm32] Implementation of Late Barrier Expansion for G1 >> - 8330685: [arm32] share barrier spilling logic > > Thanks for the arm 32-bits port @snazarkin! Merged in commit 3957c03f. > Besides the arm 32-bits port, @snazarkin's changeset includes adding the possibility to use a third temporary register in the platform-independent class `G1PostBarrierStubC2`. This temporary register (`G1PostBarrierStubC2::_tmp3`) is initialized to `noreg` by default in `G1PostBarrierStubC2::initialize_registers`, so no other platform should be affected. Hi @robcasloz, riscv port cleanup is available at https://github.com/feilongjiang/jdk/commit/1297f6086e1de62196e2acddf2f7c86a29619dd7, would you please help to apply it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2380614984 From ascarpino at openjdk.org Sat Sep 28 17:56:35 2024 From: ascarpino at openjdk.org (Anthony Scarpino) Date: Sat, 28 Sep 2024 17:56:35 GMT Subject: RFR: 8341013: Optimize x86/aarch64 MD5 intrinsics by reducing data dependency [v2] In-Reply-To: <0YcJsAbwtMOHJqUMFkQhpDKsEm3-AFXucByQATljmYc=.7c70a817-3055-4071-88e4-39222bdf3bfb@github.com> References: <0YcJsAbwtMOHJqUMFkQhpDKsEm3-AFXucByQATljmYc=.7c70a817-3055-4071-88e4-39222bdf3bfb@github.com> Message-ID: On Thu, 26 Sep 2024 14:58:49 GMT, Oli Gillespie wrote: >> As suggested in https://github.com/animetosho/md5-optimisation?tab=readme-ov-file#dependency-shortcut-in-g-function, we can delay the dependency on 'b' by recognizing that the ((d & b) | (~d & c)) is equivalent to ((d & b) + (~d & c)) in this scenario, and we can perform those additions independently, leaving our dependency on b to the final addition. This speeds it up around 5%. >> >> Benchmark results on my two hosts: >> >> >> Benchmark (algorithm) (dataSize) (provider) Mode Cnt Score Error Units >> >> x86 Before: >> MessageDigestBench.digest MD5 1048576 thrpt 10 636.389 ? 0.240 ops/s >> >> x86 After: >> MessageDigestBench.digest MD5 1048576 thrpt 10 671.611 ? 0.226 ops/s (+5.5%) >> >> >> aarch64 Before: >> MessageDigestBench.digest MD5 1048576 thrpt 10 498.613 ? 0.359 ops/s >> >> aarch64 After: >> MessageDigestBench.digest MD5 1048576 thrpt 10 526.008 ? 0.491 ops/s (+5.6%) > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Fix aarch64 bug The changes look good and they passed the tier 1-3 testing ------------- Marked as reviewed by ascarpino (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21203#pullrequestreview-2335385133 From iklam at openjdk.org Sun Sep 29 04:15:41 2024 From: iklam at openjdk.org (Ioi Lam) Date: Sun, 29 Sep 2024 04:15:41 GMT Subject: RFR: 8328313: Archived module graph should allow identical --module-path to be specified during dump time and run time [v4] In-Reply-To: References: Message-ID: <2TjFLEwheCeUXfKrmZW_wJ_oCB7BNoknhhecGaMC2g0=.6469d724-cd72-42df-a89d-5ffb7ab995b9@github.com> On Thu, 26 Sep 2024 00:45:20 GMT, Calvin Cheung wrote: >> src/hotspot/share/classfile/classLoaderExt.cpp line 162: >> >>> 160: int n = os::snprintf(full_name, full_name_len, "%s%s%s", path, os::file_separator(), file_name); >>> 161: assert((size_t)n == full_name_len - 1, "Unexpected number of characters in string"); >>> 162: module_paths->append(full_name); >> >> Can this case be handled: --module-path=dir >> >> - Dump time : dir contains only mod1.jar >> - Run time : dir contains only mod1.jar and mod2.jmod > > It should work because the jmod file won't be added to the `module_paths`. In my scenario, will the FMG be used? If so, the program won't be able to load the code in mod2.jmod, so the behavior will be wrong. Could you add a test case for this? >> src/hotspot/share/runtime/arguments.cpp line 347: >> >>> 345: } >>> 346: } >>> 347: return false; >> >> Can this be simplified to `return (strcmp(key, MODULE_PROPERTY_PREFIX PATH) == 0)`? > > I'm not sure. Is your suggest equivalent to: > `return (strcmp(key, "jdk.module.path"));` Yes, the C++ compiler will automatically concatenate `MODULE_PROPERTY_PREFIX PATH` into a single string. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1779902553 PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1779902689 From iklam at openjdk.org Sun Sep 29 04:28:40 2024 From: iklam at openjdk.org (Ioi Lam) Date: Sun, 29 Sep 2024 04:28:40 GMT Subject: RFR: 8328313: Archived module graph should allow identical --module-path to be specified during dump time and run time [v4] In-Reply-To: References: Message-ID: <8zmMK9osKaE2aA051AjovRF3ilTWLZhsSQKTgans7QQ=.87813411-03bb-4b52-af61-8e7595c794e2@github.com> On Thu, 26 Sep 2024 00:45:48 GMT, Calvin Cheung wrote: >> src/java.base/share/classes/jdk/internal/loader/BuiltinClassLoader.java line 1092: >> >>> 1090: void resetArchivedStatesForAppClassLoader() { >>> 1091: setClassPath(null); >>> 1092: if (!moduleToReader.isEmpty()) moduleToReader.clear(); >> >> Suggestion: >> >> if (!moduleToReader.isEmpty()) { >> moduleToReader.clear(); >> } >> >> >> Also, do we need to do the same thing for the platform loader as well? > > Added braces. > The `setClassPath(null)` used to be in `ClassLoaders.AppClassLoader`. Based on investigations so far, the clearing of the `moduleToReader` map is required only for `AppClassLoader`. Why does it need to clear `moduleToReader` only for app loader and not for platform loader? Is it because the `moduleToReader` for the app loader may contain reference to jar files that indirectly references some file system objects? Since moduleToReader is just a cache, I think it's better to always clear it for both loaders. Also, the logic can be moved into BuiltinClassLoader: class BuiltinClassLoader { .... private void resetArchivedStates() { ucp = null; resourceCache = null; setClassPath(null); // AppClassLoader will initialize this again at runtime. moduleToReader.clear(); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1779903867 From alanb at openjdk.org Sun Sep 29 06:46:34 2024 From: alanb at openjdk.org (Alan Bateman) Date: Sun, 29 Sep 2024 06:46:34 GMT Subject: RFR: 8328313: Archived module graph should allow identical --module-path to be specified during dump time and run time [v4] In-Reply-To: <8zmMK9osKaE2aA051AjovRF3ilTWLZhsSQKTgans7QQ=.87813411-03bb-4b52-af61-8e7595c794e2@github.com> References: <8zmMK9osKaE2aA051AjovRF3ilTWLZhsSQKTgans7QQ=.87813411-03bb-4b52-af61-8e7595c794e2@github.com> Message-ID: On Sun, 29 Sep 2024 04:23:14 GMT, Ioi Lam wrote: >> Added braces. >> The `setClassPath(null)` used to be in `ClassLoaders.AppClassLoader`. Based on investigations so far, the clearing of the `moduleToReader` map is required only for `AppClassLoader`. > > Why does it need to clear `moduleToReader` only for app loader and not for platform loader? Is it because the `moduleToReader` for the app loader may contain reference to jar files that indirectly references some file system objects? > > Since moduleToReader is just a cache, I think it's better to always clear it for both loaders. Also, the logic can be moved into BuiltinClassLoader: > > > class BuiltinClassLoader { > .... > private void resetArchivedStates() { > ucp = null; > resourceCache = null; > setClassPath(null); // AppClassLoader will initialize this again at runtime. > moduleToReader.clear(); > } setClassPath(null) is the same as `ucp = null` but yes, keep it simple as otherwise there will be question each time there are changes. BuiltinClassPath should not include any code that is specific to the app class loader or the platform class loader as there are specific subclasses for that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21048#discussion_r1779923680 From dholmes at openjdk.org Sun Sep 29 07:20:40 2024 From: dholmes at openjdk.org (David Holmes) Date: Sun, 29 Sep 2024 07:20:40 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v4] In-Reply-To: References: <9q4etSUBTu0pkFlzmGS3YjeJ8VnDaR5jQOg1TC8KtlU=.a573b142-97a5-472a-814e-de301efde9da@github.com> Message-ID: On Wed, 25 Sep 2024 19:50:53 GMT, Coleen Phillimore wrote: >> Whoever asked for it, let's see if this is what they wanted. I thought the fence() comment below it is what was requested. I agree the comment should be repeated on all platforms though for consistency. > > I see where it came from in the aarch64 code, and that code does a stlr() to satisfy the JMM. It's fine. Leave it. To get the correct semantics for the JMM the releasing of a monitor has to have release semantics (funny that!). So setting the owner to null should be a release_store. On some platforms we get "release" semantics for free. The fence is needed for correct operation of our own locking implementation and has nothing to do with JMM. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1779936007 From dholmes at openjdk.org Sun Sep 29 07:23:39 2024 From: dholmes at openjdk.org (David Holmes) Date: Sun, 29 Sep 2024 07:23:39 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v4] In-Reply-To: References: <9q4etSUBTu0pkFlzmGS3YjeJ8VnDaR5jQOg1TC8KtlU=.a573b142-97a5-472a-814e-de301efde9da@github.com> Message-ID: On Sun, 29 Sep 2024 07:17:32 GMT, David Holmes wrote: >> I see where it came from in the aarch64 code, and that code does a stlr() to satisfy the JMM. It's fine. Leave it. > > To get the correct semantics for the JMM the releasing of a monitor has to have release semantics (funny that!). So setting the owner to null should be a release_store. On some platforms we get "release" semantics for free. > > The fence is needed for correct operation of our own locking implementation and has nothing to do with JMM. But I concede that the comment is less clear on platforms where the release does not appear in the instruction stream. But I think it important that the need for release semantics be documented. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/19454#discussion_r1779942129 From dholmes at openjdk.org Sun Sep 29 20:53:37 2024 From: dholmes at openjdk.org (David Holmes) Date: Sun, 29 Sep 2024 20:53:37 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v5] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 09:00:09 GMT, Oli Gillespie wrote: >> Mitigate the impact of JVM_StartThread on safepoint synchronization, by adding a new ThreadStart_lock which limits the number of JVM_StartThread invocations competing for Threads_lock at any given time to 1. >> This gives a VM thread trying to call a safepoint a much better chance of acquiring Threads_lock when there are many JVM_StartThread invocations in flight, at the cost of one extra lock/unlock for every new thread. >> >> Can be disabled with new diagnostic flag `-XX:-UseExtraThreadStartLock`. >> >> Before (ThreadStartTtsp.java is shared in JDK-8340547): >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 1291591 ns >> Reaching safepoint: 59962 ns >> Reaching safepoint: 1958065 ns >> Reaching safepoint: 14456666258 ns <-- 14 seconds! >> ... >> >> >> After: >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 214269 ns >> Reaching safepoint: 60253 ns >> Reaching safepoint: 2040680 ns >> Reaching safepoint: 3089284 ns >> Reaching safepoint: 2998303 ns >> Reaching safepoint: 4433713 ns <-- 4.4ms >> Reaching safepoint: 3368436 ns >> Reaching safepoint: 2986519 ns >> Reaching safepoint: 3269102 ns >> ... >> >> >> >> **Alternatives** >> >> I considered some other options for mitigating this. For example, could we reduce the time spent holding the lock in StartThread? Most of the time is spent managing the threads list for ThreadSMR support, and each time we add a thread to that list we need to copy the whole list and free every entry in the original, which is slow. But I didn't see an easy way to avoid this. >> I also looked at some kind of signal from the VM thread that it is ready to start synchronizing that StartThread could check before trying to grab Threads_lock, but I didn't find anything better than this extra lock. > > Oli Gillespie has updated the pull request incrementally with two additional commits since the last revision: > > - Improve doc > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > - Improve comment > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> Looks fine. Just confirm general benchmarking results. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21111#pullrequestreview-2335996179 From david.holmes at oracle.com Mon Sep 30 00:39:16 2024 From: david.holmes at oracle.com (David Holmes) Date: Mon, 30 Sep 2024 10:39:16 +1000 Subject: 8340363: Tag-specific default decorators for UnifiedLogging In-Reply-To: References: Message-ID: Hi Anton, Thanks for bringing this up for general discussion outside the PR. Just to be clear for other readers, decorators are associated with a given log output device. On 27/09/2024 7:07 pm, Anton Seoane Ampudia wrote: > Hi all, > > Currently, the Unified Logging framework defaults to three decorators > (uptime, level, tags) whenever the user does not specify otherwise > through?-Xlog.?This resultssometimes inconvenient when specific users > with some predefined needs do not want those tags. For example, C2 > developers would rather not see those defaults in cases such as > jit+inlining, but also do not want to specify so every time they run -Xlog. > > One solution for this is found in this PR: https://github.com/openjdk/ > jdk/pull/20988 . It can be > considered as a ?flavoured? version of the existing default decorators > and in no way it will override anything user-specified. Also, decorators > will still be consistent throughout an output device (i.e., no different > decorators ?mixed in?). > > However, upon recent talks with different teams this approach may be too > flexible/powerful. The ability of specifying LogSelection-bound default > decorators may result in a situation where defaults for A+B and C+D have > been specified, and a user selects -Xlog:A+B,C+D. In that case, the > union of the prespecified defaults is taken, which may not be what the > end user wants (and might result in too many decorators). > > Actually, the main use case for this that I know as of now is C2 > developers and the wish to not see decorators for some defined log > selections. With this in mind, I have reduced the original idea to a > feature where only the default decorators are not shown if we get a > positive match with a prespecified list throughout the entire user log > selection list (i.e.: > > * If there is a default for A+B and the user specifies -Xlog:A+B,C+D, > he will still get the default decorators > * If there is a default for A+B and the user specifies -Xlog:A+B, no > default decorators will be supplied). So to be clear, is the proposal now to just drop the default decorators, rather than allowing them to be replaced with alternate defaults? If that is the case then it is the same as writing: -Xlog:A+B::none and I don't really see much value in that. But I wouldn't oppose it. Allowing new defaults gives more flexibility - but obviously the developers using the specific tag combinations have to agree on what defaults to set. Thanks, David ----- > Before scraping the original idea and moving on with this one (which > will not change anything as it is right now, except for the really > specific uses like C2 jit+inlining that may be decided), *I wanted to > get a broader idea of people?s opinions on this, as well as other use > cases for this behaviour.* > > ** > > Many thanks, > > Ant?n > From iklam at openjdk.org Mon Sep 30 03:16:05 2024 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 30 Sep 2024 03:16:05 GMT Subject: RFR: 8338018: Rename ClassPrelinker to AOTConstantPoolResolver [v4] In-Reply-To: References: Message-ID: > This is the 2nd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > A simple renaming of the `ClassPrelinker` class to `AOTConstantPoolLinker`, so that the name is consistent with new classes that will be introduced in subsequent PRs for JEP 483 (`AOTClassLinker`, `AOTLinkedClassTable`, and `AOTLinkedClassBulkLoader`). > > ----- > See [here](https://bugs.openjdk.org/browse/JDK-8315737) for the sequence of dependent RFEs for implementing JEP 483. Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'jep-483-step-01-8338017-add-aot-command-line-aliases' into jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver - Merge branch 'jep-483-step-01-8338017-add-aot-command-line-aliases' into jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver - Merge branch 'jep-483-step-01-8338017-add-aot-command-line-aliases' into jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver - Merge branch 'jep-483-step-01-8338017-add-aot-command-line-aliases' into jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver - Merge branch 'jep-483-step-01-8338017-add-aot-command-line-aliases' into jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver - Merge branch 'jep-483-step-01-8338017-add-aot-command-line-aliases' into jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver - 8338018: Rename ClassPrelinker to AOTConstantPoolResolver ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20517/files - new: https://git.openjdk.org/jdk/pull/20517/files/49dbfa6a..0859631f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20517&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20517&range=02-03 Stats: 12295 lines in 294 files changed: 9757 ins; 1521 del; 1017 mod Patch: https://git.openjdk.org/jdk/pull/20517.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20517/head:pull/20517 PR: https://git.openjdk.org/jdk/pull/20517 From fyang at openjdk.org Mon Sep 30 03:21:38 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 30 Sep 2024 03:21:38 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v8] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 13:14:04 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> Thanks! >> >> This patch is based on https://github.com/openjdk/jdk/pull/20781 which added the sleef source (in particular the generated sleef inline headers). We use sleef api to vectorize the math operations in vector api. >> >> On machine with vector intrinsic support on riscv (e.g. gcc 14+) it will generate libsleef.so with the bridge functions to sleef api, otherwise without the bridge functions. >> >> ### Test >> test/jdk/jdk/incubator/vector >> >> ### Performance >> data on bananapi >> >> Benchmark - bananapi | (size) | Mode | Cnt | Score +intrinsic | Error +intrinsic | Score -intrinsic | Error -intrinsic | Units | Improvement >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Double128Vector.ACOS | 1024 | avgt | 10 | 112444.388 | 655.761 | 208554.742 | 1508.709 | ns/op | 1.855 >> Double128Vector.ASIN | 1024 | avgt | 10 | 104121.259 | 243.167 | 208314.499 | 2833.61 | ns/op | 2.001 >> Double128Vector.ATAN | 1024 | avgt | 10 | 136941.263 | 243.486 | 284024.53 | 2204.224 | ns/op | 2.074 >> Double128Vector.ATAN2 | 1024 | avgt | 10 | 163228.681 | 435.455 | 427589.587 | 3045.192 | ns/op | 2.62 >> Double128Vector.CBRT | 1024 | avgt | 10 | 146395.753 | 239.355 | 317136.654 | 1330.869 | ns/op | 2.166 >> Double128Vector.COS | 1024 | avgt | 10 | 154865.298 | 235.697 | 305721.518 | 1319.313 | ns/op | 1.974 >> Double128Vector.COSH | 1024 | avgt | 10 | 189212.943 | 262.399 | 220756.27 | 61324.863 | ns/op | 1.167 >> Double128Vector.EXP | 1024 | avgt | 10 | 113941.594 | 219.647 | 252853.07 | 891.272 | ns/op | 2.219 >> Double128Vector.EXPM1 | 1024 | avgt | 10 | 184552.939 | 513.715 | 254087.184 | 2144.997 | ns/op | 1.377 >> Double128Vector.HYPOT | 1024 | avgt | 10 | 111580.194 | 423.282 | 374537.338 | 2091.811 | ns/op | 3.357 >> Double128Vector.LOG | 1024 | avgt | 10 | 110680.548 | 192.731 | 265391.129 | 2653.519 | ns/op | 2.398 >> Double128Vector.LOG10 | 1024 | avgt | 10 | 116708.105 | 167.095 | 285764.405 | 2489.08 | ns/op | 2.449 >> Double128Vector.LOG1P | 1024 | avgt | 10 | 115633.302 | 567.7 | 317235.967 | 1062.848 | ns/op | 2.743 >> Double128Vector.POW | 1024 | avgt | 10 | 321655.14 | 3... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > fix test macro Thanks for the update. The RISC-V part LGTM. You still need another reviewer for the rest of the change. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21083#pullrequestreview-2336259918 From iklam at openjdk.org Mon Sep 30 03:25:26 2024 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 30 Sep 2024 03:25:26 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v10] In-Reply-To: References: Message-ID: > This is the 4th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > **Problem:** > > This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store [`sun.invoke.util.Wrapper`](https://github.com/openjdk/jdk/blob/c3711dc90980fb3e63ff199612c201c4464626bf/src/java.base/share/classes/sun/invoke/util/Wrapper.java) enums in the AOT cache. Although CDS has some limited support for storing enums, the `Wrapper` type is too complex for the existing solution to handle. Please see the JBS issue for details. > > **Solution:** > > In the assembly phase, we store the initialized states of the `Wrapper` class (captured in a `java.lang.Class` object, a.k.a. the *mirror* of this class) into the AOT cache. > > In production run, we no longer execute `Wrapper::`, because all the static fields (contained in its mirror) are already initialized. > > **Review Notes:** > > - The new capability is controlled by `CDSConfig::is_initing_classes_at_dump_time()`. We can aot-initialize classes only if `-XX:+AOTClassLinking` is enabled. > - The old (more limited) support for enums is still there (it's required when `AOTClassLinking` is disabled). See the call to `CDSEnumKlass::handle_enum_obj()` in heapShared.cpp. > - `AOTClassInitializer::can_archive_initialized_mirror()` decides what classes can be aot-initialized. This is currently a very small set of classes, but will expand in [JDK-8293336](https://bugs.openjdk.org/browse/JDK-8293336) > - Before, `HeapShared::archive_java_mirrors()` would clear out all the states in the archived mirrors. With this PR, the states of aot-initialized classes are preserved via `HeapShared::copy_aot_initialized_mirror()`. > - During the early state of the production run, `AOTLinkedClassBulkLoader::init_required_classes_for_loader()` is called to make sure that: > - all aot-initialized classes are moved into the `initialized` state (without executing its `` method). This is done in `InstanceKlass::initialize_from_cds()` > - the classes of all the objects that are reachable from the aot-initialized mirrors are initialized. See comments above ` HeapShared::init_classes_reachable_from_archived_mirrors()` > > **Caveats:** > > Not all Enum classes can be stored in the initialized state. E.g., some Enums might have static fields that depend on the environment: > > > enum Foo { > [....] > static fin... Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 109 commits: - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap - Merge branch 'jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver' into jep-483-step-03-8329706-implement-xx-aot-class-linking - Merge branch 'jep-483-step-01-8338017-add-aot-command-line-aliases' into jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver - Merge branch 'master' of https://github.com/openjdk/jdk into jep-483-step-01-8338017-add-aot-command-line-aliases - 8340864: Remove unused lines related to vmClasses Reviewed-by: shade, kvn - 8340831: Simplify simple validation for class definition in MethodHandles.Lookup Reviewed-by: redestad - 8340838: Clean up MutableCallSite to use explicit release fence instead of AtomicInteger Reviewed-by: jrose, redestad, shade - 8340956: ProblemList 4 java/nio/channels/DatagramChannel tests on macosx-all Reviewed-by: liach, alanb, darcy, dfuchs - 8340228: Open source couple more miscellaneous AWT tests Reviewed-by: prr - 8340684: Reading from an input stream backed by a closed ZipFile has no test coverage Reviewed-by: lancea - ... and 99 more: https://git.openjdk.org/jdk/compare/6029b35f...563bccb3 ------------- Changes: https://git.openjdk.org/jdk/pull/20958/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20958&range=09 Stats: 13126 lines in 312 files changed: 10513 ins; 1539 del; 1074 mod Patch: https://git.openjdk.org/jdk/pull/20958.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20958/head:pull/20958 PR: https://git.openjdk.org/jdk/pull/20958 From iklam at openjdk.org Mon Sep 30 04:02:56 2024 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 30 Sep 2024 04:02:56 GMT Subject: RFR: 8329706: Implement -XX:+AOTClassLinking [v13] In-Reply-To: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> References: <5cstSdLtxGHWY5aAvTT0RlSVOkuqf5IZ1aN4_VeEHyM=.018c626f-495c-4d49-82ce-712737307701@github.com> Message-ID: > This is the 3rd PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). > > **Overview** > > - A new `-XX:+AOTClassLinking` flag is added. See [JEP 498](https://bugs.openjdk.org/browse/JDK-8315737) and the [CSR](https://bugs.openjdk.org/browse/JDK-8339506) for a discussion of this command-line option, its default value, and its impact on compatibility. > - When this flag is enabled during the creation of an AOT cache (aka CDS archive), an `AOTLinkedClassTable` is added to the cache to include all classes that are AOT-linked. For this PR, only classes for the boot/platform/application loaders are eligible. The main logic is in `aotClassLinker.cpp`. > - When an AOT archive is loaded in a production run, all classes in the `AOTLinkedClassTable` are loaded into their respective class loaders at the earliest opportunity. The main logic is in `aotLinkedClassBulkLoader.cpp`. > - The boot classes are loaded as part of `vmClasses::resolve_all()` > - The platform/application classes are loaded after the module graph is restored (see changes in `threads.cpp`). > - Since all classes in a `AOTLinkedClassTable` are loaded before any user-code is executed, we can resolve constant pool entries that refer to these classes during AOT cache creation. See changes in `AOTConstantPoolResolver::is_class_resolution_deterministic()`. > > **All-or-nothing Loading** > > - Because AOT-linked classes can refer to each other, using direct C++ pointers, all AOT-linked classes must be loaded together. Otherwise we will have dangling C++ pointers in the class metadata structures. > - During a production run, we check during VM start-up for incompatible VM options that would prevent some of the AOT-linked classes from being loaded. For example: > - If the VM is started with an JVMTI agent that has ClassFileLoadHook capabilities, it could replace some of the AOT-linked classes with alternative versions. > - If the VM is started with certain module options, such as `--patch-module` or `--module`, some AOT-linked classes may be replaced with patched versions, or may become invisible and cannot be loaded into the JVM. > - When incompatible VM options are detected, the JVM will refuse to load an AOT cache that has AOT-linked classes. See `FileMapInfo::validate_aot_class_linking()`. > - For simplfication, `FileMapInfo::validate_aot_class_linking()` requires `CDSConfig::is_using_full_module_graph()` to be true. This means that the exact same set of modules are visible whe... Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - Merge branch 'jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver' into jep-483-step-03-8329706-implement-xx-aot-class-linking - Merge branch 'jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver' into jep-483-step-03-8329706-implement-xx-aot-class-linking - @dholmes-ora comments - @dholmes-ora comments - Fixed ZERO build - minor comment fix - @ashu-mehra comment: move code outside of call_initPhase2(); also renamed BOOT/BOOT2 to BOOT1/BOOT2 and refactored code related to AOTLinkedClassCategory - @ashu-mehra reviews - @ashu-mehra comments - @adinn comments - ... and 6 more: https://git.openjdk.org/jdk/compare/0859631f...3cdc7634 ------------- Changes: https://git.openjdk.org/jdk/pull/20843/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20843&range=12 Stats: 1788 lines in 47 files changed: 1631 ins; 57 del; 100 mod Patch: https://git.openjdk.org/jdk/pull/20843.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20843/head:pull/20843 PR: https://git.openjdk.org/jdk/pull/20843 From rcastanedalo at openjdk.org Mon Sep 30 05:02:12 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 30 Sep 2024 05:02:12 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: > This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. > > We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: > > - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and > - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. > > ## Summary of the Changes > > ### Platform-Independent Changes (`src/hotspot/share`) > > These consist mainly of: > > - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; > - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and > - temporary support for porting the JEP to the remaining platforms. > > The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. > > ### Platform-Dependent Changes (`src/hotspot/cpu`) > > These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. > > #### ADL Changes > > The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. > > #### `G1BarrierSetAssembler` Changes > > Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live registers, provided by the `SaveLiveRegisters` class. This c... Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion - riscv port refactor - Remove temporary support code - Merge jdk-24+17 - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes - Merge jdk-24+16 - Ensure that detected encode-and-store patterns are matched - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion - ... and 43 more: https://git.openjdk.org/jdk/compare/8ee5f762...14483b83 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/19746/files - new: https://git.openjdk.org/jdk/pull/19746/files/6fb36e50..14483b83 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=19746&range=25-26 Stats: 19042 lines in 408 files changed: 13042 ins; 3680 del; 2320 mod Patch: https://git.openjdk.org/jdk/pull/19746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/19746/head:pull/19746 PR: https://git.openjdk.org/jdk/pull/19746 From pminborg at openjdk.org Mon Sep 30 06:15:38 2024 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 30 Sep 2024 06:15:38 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v3] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Fri, 27 Sep 2024 17:42:25 GMT, Andrew Haley wrote: > > The only situation where this PR is a regression compared to current code is when the one of the branch side is always taken. > > Bear in mind that's quite common. It's not very unusual to clip a range with something equivalent to `x = min(max(x, lowest), highest)`. What does benchmarking that look like, when all the `x` are within that range? In fact, the new `Math::clamp` methods do just this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2382197333 From aboldtch at openjdk.org Mon Sep 30 06:40:06 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 30 Sep 2024 06:40:06 GMT Subject: RFR: 8340420: ZGC: Should call `vm_shutdown_during_initialization` if initialization fails Message-ID: ZGC does not call `vm_shutdown_during_initialization` if initialization fails during the setup of the CollectedHeap, in contrast to the other GC. I propose we add a `ZInitialize::error` which we can use during initialisation to record errors. The first error recored is also stored and used as the error message when shutting down the VM. Initially used malloc to allocate the error (ed9ba5dd6805291a6b1b56566c933424230d3b4a) but feels like it is just better to have static storage for the string and not have to care about malloc potentially failing to allocate. ------------- Commit messages: - Avoid malloc - Add AllStatic and ZInitializer - 8340420: ZGC: Should call `vm_shutdown_during_initialization` if initialization fails Changes: https://git.openjdk.org/jdk/pull/21254/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21254&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340420 Stats: 118 lines in 9 files changed: 91 ins; 2 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/21254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21254/head:pull/21254 PR: https://git.openjdk.org/jdk/pull/21254 From shade at openjdk.org Mon Sep 30 07:05:42 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 30 Sep 2024 07:05:42 GMT Subject: Integrated: 8340181: Shenandoah: Cleanup ShenandoahRuntime stubs In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 08:52:21 GMT, Aleksey Shipilev wrote: > Noticed this while working on Leyden, which has to enumerate Shenandoah stubs for code archival to work. > > `ShenandoahRuntime::shenandoah_clone_barrier` is excessive name. `ShenandoahRuntime::arraycopy_barrier_oop_entry` and friends is not covered by `JRT_LEAF`. This change hopefully homogenizes the namings for the stubs. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `hotspot_gc_shenandoah` > - [x] Linux AArch64 server fastdebug, `hotspot_gc_shenandoah` This pull request has now been integrated. Changeset: 988a531b Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/988a531b097ccbd699d233059d73f41cae24dc5b Stats: 71 lines in 9 files changed: 7 ins; 11 del; 53 mod 8340181: Shenandoah: Cleanup ShenandoahRuntime stubs Reviewed-by: adinn, phh, wkemper ------------- PR: https://git.openjdk.org/jdk/pull/21152 From stefank at openjdk.org Mon Sep 30 07:38:36 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 30 Sep 2024 07:38:36 GMT Subject: RFR: 8340420: ZGC: Should call `vm_shutdown_during_initialization` if initialization fails In-Reply-To: References: Message-ID: <4358LZ5nLIR_WC2szYg72CwaYVn3n0U81eN2EMo8UHA=.eb75595f-aaf4-4e74-bd99-e7a081d92a0a@github.com> On Mon, 30 Sep 2024 06:36:21 GMT, Axel Boldt-Christmas wrote: > ZGC does not call `vm_shutdown_during_initialization` if initialization fails during the setup of the CollectedHeap, in contrast to the other GC. > > I propose we add a `ZInitialize::error` which we can use during initialisation to record errors. The first error recored is also stored and used as the error message when shutting down the VM. > > Initially used malloc to allocate the error (ed9ba5dd6805291a6b1b56566c933424230d3b4a) but feels like it is just better to have static storage for the string and not have to care about malloc potentially failing to allocate. Changes requested by stefank (Reviewer). src/hotspot/share/gc/z/zInitialize.cpp line 95: > 93: va_list argp; > 94: va_start(argp, msg_format); > 95: const FormatBuffer error(FormatBufferDummy(), msg_format, argp); Maybe consider a name that doesn't clash with the function name. src/hotspot/share/gc/z/zInitialize.cpp line 103: > 101: va_list argp; > 102: va_start(argp, msg_format); > 103: const FormatBuffer error(FormatBufferDummy(), msg_format, argp); Maybe consider a name that doesn't clash with the function name. src/hotspot/share/gc/z/zInitialize.cpp line 117: > 115: return "Unknown error, check error GC logs"; > 116: } > 117: return _error_message; Maybe invert the conditional? Suggestion: if (had_error()) { return _error_message; } return "Unknown error, check error GC logs"; src/hotspot/share/gc/z/zInitialize.hpp line 35: > 33: > 34: class ZInitializer { > 35: public: Indentation doesn't follow the ZGC style. src/hotspot/share/gc/z/zInitialize.hpp line 42: > 40: private: > 41: static constexpr size_t ErrorMessageLength = 256; > 42: static char _error_message[ErrorMessageLength]; I would suggest that we separate the constants Suggestion: static constexpr size_t ErrorMessageLength = 256; static char _error_message[ErrorMessageLength]; ------------- PR Review: https://git.openjdk.org/jdk/pull/21254#pullrequestreview-2336623596 PR Review Comment: https://git.openjdk.org/jdk/pull/21254#discussion_r1780576948 PR Review Comment: https://git.openjdk.org/jdk/pull/21254#discussion_r1780576635 PR Review Comment: https://git.openjdk.org/jdk/pull/21254#discussion_r1780577740 PR Review Comment: https://git.openjdk.org/jdk/pull/21254#discussion_r1780578369 PR Review Comment: https://git.openjdk.org/jdk/pull/21254#discussion_r1780579084 From aboldtch at openjdk.org Mon Sep 30 07:42:48 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 30 Sep 2024 07:42:48 GMT Subject: RFR: 8340420: ZGC: Should call `vm_shutdown_during_initialization` if initialization fails [v2] In-Reply-To: References: Message-ID: > ZGC does not call `vm_shutdown_during_initialization` if initialization fails during the setup of the CollectedHeap, in contrast to the other GC. > > I propose we add a `ZInitialize::error` which we can use during initialisation to record errors. The first error recored is also stored and used as the error message when shutting down the VM. > > Initially used malloc to allocate the error (ed9ba5dd6805291a6b1b56566c933424230d3b4a) but feels like it is just better to have static storage for the string and not have to care about malloc potentially failing to allocate. Axel Boldt-Christmas has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/share/gc/z/zInitialize.hpp Co-authored-by: Stefan Karlsson - Update src/hotspot/share/gc/z/zInitialize.cpp Co-authored-by: Stefan Karlsson ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21254/files - new: https://git.openjdk.org/jdk/pull/21254/files/ed9ba5dd..d54b3eba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21254&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21254&range=00-01 Stats: 4 lines in 2 files changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21254/head:pull/21254 PR: https://git.openjdk.org/jdk/pull/21254 From aboldtch at openjdk.org Mon Sep 30 07:44:04 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 30 Sep 2024 07:44:04 GMT Subject: RFR: 8341168: Cleanup dead code after JDK-8322630 Message-ID: [JDK-8322630](https://bugs.openjdk.org/browse/JDK-8322630) / #17495 removed the the concept of ICStubs, InlineCache buffers and related safepoints. There are a handfull of references and auxiliary constructs still in the code, I propose we clean these out. This removes the unused: * Experimental `InlineCacheBufferSize` option * `InlineCacheBuffer_lock` mutex * `Thread::_missed_ic_stub_refill_verifier` field * `VM_ICBufferFull` VM operation ------------- Commit messages: - 8341168: Cleanup dead code after JDK-8322630 Changes: https://git.openjdk.org/jdk/pull/21255/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21255&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341168 Stats: 39 lines in 10 files changed: 0 ins; 35 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21255/head:pull/21255 PR: https://git.openjdk.org/jdk/pull/21255 From aboldtch at openjdk.org Mon Sep 30 07:49:48 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 30 Sep 2024 07:49:48 GMT Subject: RFR: 8340420: ZGC: Should call `vm_shutdown_during_initialization` if initialization fails [v3] In-Reply-To: References: Message-ID: > ZGC does not call `vm_shutdown_during_initialization` if initialization fails during the setup of the CollectedHeap, in contrast to the other GC. > > I propose we add a `ZInitialize::error` which we can use during initialisation to record errors. The first error recored is also stored and used as the error message when shutting down the VM. > > Initially used malloc to allocate the error (ed9ba5dd6805291a6b1b56566c933424230d3b4a) but feels like it is just better to have static storage for the string and not have to care about malloc potentially failing to allocate. Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Change error variable name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21254/files - new: https://git.openjdk.org/jdk/pull/21254/files/d54b3eba..12475f1e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21254&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21254&range=01-02 Stats: 9 lines in 2 files changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/21254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21254/head:pull/21254 PR: https://git.openjdk.org/jdk/pull/21254 From aboldtch at openjdk.org Mon Sep 30 07:52:49 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 30 Sep 2024 07:52:49 GMT Subject: RFR: 8340420: ZGC: Should call `vm_shutdown_during_initialization` if initialization fails [v4] In-Reply-To: References: Message-ID: > ZGC does not call `vm_shutdown_during_initialization` if initialization fails during the setup of the CollectedHeap, in contrast to the other GC. > > I propose we add a `ZInitialize::error` which we can use during initialisation to record errors. The first error recored is also stored and used as the error message when shutting down the VM. > > Initially used malloc to allocate the error (ed9ba5dd6805291a6b1b56566c933424230d3b4a) but feels like it is just better to have static storage for the string and not have to care about malloc potentially failing to allocate. Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Fix indentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21254/files - new: https://git.openjdk.org/jdk/pull/21254/files/12475f1e..e4ac12ed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21254&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21254&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21254/head:pull/21254 PR: https://git.openjdk.org/jdk/pull/21254 From rcastanedalo at openjdk.org Mon Sep 30 07:59:45 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 30 Sep 2024 07:59:45 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v23] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 07:57:15 GMT, Roberto Casta?eda Lozano wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with seven additional commits since the last revision: >> >> - Assert that unneeded stub tmp registers are not initialized in x64 and aarch64 platforms >> - Set tmp registers to noreg by default in G1PreBarrierStubC2::initialize_registers, for consistency >> - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion >> - Restore some asserts >> - Default values for tmp regs of G1PostBarrierStubC2 >> - 8334060: [arm32] Implementation of Late Barrier Expansion for G1 >> - 8330685: [arm32] share barrier spilling logic > > Thanks for the arm 32-bits port @snazarkin! Merged in commit 3957c03f. > Besides the arm 32-bits port, @snazarkin's changeset includes adding the possibility to use a third temporary register in the platform-independent class `G1PostBarrierStubC2`. This temporary register (`G1PostBarrierStubC2::_tmp3`) is initialized to `noreg` by default in `G1PostBarrierStubC2::initialize_registers`, so no other platform should be affected. > Hi @robcasloz, riscv port cleanup is available at [feilongjiang at 1297f60](https://github.com/feilongjiang/jdk/commit/1297f6086e1de62196e2acddf2f7c86a29619dd7), would you please help to apply it? Done (commit 14483b83), thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2382377364 From stefank at openjdk.org Mon Sep 30 08:05:37 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 30 Sep 2024 08:05:37 GMT Subject: RFR: 8341168: Cleanup dead code after JDK-8322630 In-Reply-To: References: Message-ID: <2PJZY5ZA7F1bgZpAs9Jkm7LPtgw4NHljOVoEj4B9bNI=.c5bcfeaa-5b98-44aa-b20d-44c176105cc0@github.com> On Mon, 30 Sep 2024 07:39:11 GMT, Axel Boldt-Christmas wrote: > [JDK-8322630](https://bugs.openjdk.org/browse/JDK-8322630) / #17495 removed the the concept of ICStubs, InlineCache buffers and related safepoints. > > There are a handfull of references and auxiliary constructs still in the code, I propose we clean these out. > > This removes the unused: > * Experimental `InlineCacheBufferSize` option > * `InlineCacheBuffer_lock` mutex > * `Thread::_missed_ic_stub_refill_verifier` field > * `VM_ICBufferFull` VM operation Looks good, but one small nit. src/hotspot/share/runtime/thread.hpp line 246: > 244: HandleMark* last_handle_mark() const { return _last_handle_mark; } > 245: private: > 246: I think one of the blank lines should be left to retain the separation between `_last_handle_mark` and `_skip_gcalot` src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/code/StubQueue.java line 56: > 54: > 55: // The type of the contained stubs (i.e., InterpreterCodelet). > 56: // Must be a subclass of type Stub. This is interesting. The StubQueue seems to work against an interface and that interface used to have two implementing classes. Now that one of the is gone, we might want to reconsider if we need this interface. There seems to be a lot of code that could be remove in this area. See `DEF_STUB_INTERFACE` and how it is being used. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21255#pullrequestreview-2336663437 PR Review Comment: https://git.openjdk.org/jdk/pull/21255#discussion_r1780601023 PR Review Comment: https://git.openjdk.org/jdk/pull/21255#discussion_r1780616803 From stefank at openjdk.org Mon Sep 30 08:06:37 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 30 Sep 2024 08:06:37 GMT Subject: RFR: 8340420: ZGC: Should call `vm_shutdown_during_initialization` if initialization fails [v4] In-Reply-To: References: Message-ID: <7dve4ZnO1db7mPh6m_cuvj3e7nKyIKWfnOpXkijUK54=.b67a7424-43a9-4737-b957-c85b05e95558@github.com> On Mon, 30 Sep 2024 07:52:49 GMT, Axel Boldt-Christmas wrote: >> ZGC does not call `vm_shutdown_during_initialization` if initialization fails during the setup of the CollectedHeap, in contrast to the other GC. >> >> I propose we add a `ZInitialize::error` which we can use during initialisation to record errors. The first error recored is also stored and used as the error message when shutting down the VM. >> >> Initially used malloc to allocate the error (ed9ba5dd6805291a6b1b56566c933424230d3b4a) but feels like it is just better to have static storage for the string and not have to care about malloc potentially failing to allocate. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Fix indentation Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21254#pullrequestreview-2336692694 From aboldtch at openjdk.org Mon Sep 30 08:15:51 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 30 Sep 2024 08:15:51 GMT Subject: RFR: 8341168: Cleanup dead code after JDK-8322630 [v2] In-Reply-To: References: Message-ID: > [JDK-8322630](https://bugs.openjdk.org/browse/JDK-8322630) / #17495 removed the the concept of ICStubs, InlineCache buffers and related safepoints. > > There are a handfull of references and auxiliary constructs still in the code, I propose we clean these out. > > This removes the unused: > * Experimental `InlineCacheBufferSize` option > * `InlineCacheBuffer_lock` mutex > * `Thread::_missed_ic_stub_refill_verifier` field > * `VM_ICBufferFull` VM operation Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Add newline ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21255/files - new: https://git.openjdk.org/jdk/pull/21255/files/3c6758dd..6fb323b2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21255&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21255&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21255/head:pull/21255 PR: https://git.openjdk.org/jdk/pull/21255 From shade at openjdk.org Mon Sep 30 08:15:51 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 30 Sep 2024 08:15:51 GMT Subject: RFR: 8341168: Cleanup dead code after JDK-8322630 [v2] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 08:13:10 GMT, Axel Boldt-Christmas wrote: >> [JDK-8322630](https://bugs.openjdk.org/browse/JDK-8322630) / #17495 removed the the concept of ICStubs, InlineCache buffers and related safepoints. >> >> There are a handfull of references and auxiliary constructs still in the code, I propose we clean these out. >> >> This removes the unused: >> * Experimental `InlineCacheBufferSize` option >> * `InlineCacheBuffer_lock` mutex >> * `Thread::_missed_ic_stub_refill_verifier` field >> * `VM_ICBufferFull` VM operation > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Add newline OK, looks fine. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21255#pullrequestreview-2336702247 From stefank at openjdk.org Mon Sep 30 08:15:51 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 30 Sep 2024 08:15:51 GMT Subject: RFR: 8341168: Cleanup dead code after JDK-8322630 [v2] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 08:13:10 GMT, Axel Boldt-Christmas wrote: >> [JDK-8322630](https://bugs.openjdk.org/browse/JDK-8322630) / #17495 removed the the concept of ICStubs, InlineCache buffers and related safepoints. >> >> There are a handfull of references and auxiliary constructs still in the code, I propose we clean these out. >> >> This removes the unused: >> * Experimental `InlineCacheBufferSize` option >> * `InlineCacheBuffer_lock` mutex >> * `Thread::_missed_ic_stub_refill_verifier` field >> * `VM_ICBufferFull` VM operation > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Add newline Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21255#pullrequestreview-2336709842 From shade at openjdk.org Mon Sep 30 08:18:37 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 30 Sep 2024 08:18:37 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v5] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 09:00:09 GMT, Oli Gillespie wrote: >> Mitigate the impact of JVM_StartThread on safepoint synchronization, by adding a new ThreadStart_lock which limits the number of JVM_StartThread invocations competing for Threads_lock at any given time to 1. >> This gives a VM thread trying to call a safepoint a much better chance of acquiring Threads_lock when there are many JVM_StartThread invocations in flight, at the cost of one extra lock/unlock for every new thread. >> >> Can be disabled with new diagnostic flag `-XX:-UseExtraThreadStartLock`. >> >> Before (ThreadStartTtsp.java is shared in JDK-8340547): >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 1291591 ns >> Reaching safepoint: 59962 ns >> Reaching safepoint: 1958065 ns >> Reaching safepoint: 14456666258 ns <-- 14 seconds! >> ... >> >> >> After: >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 214269 ns >> Reaching safepoint: 60253 ns >> Reaching safepoint: 2040680 ns >> Reaching safepoint: 3089284 ns >> Reaching safepoint: 2998303 ns >> Reaching safepoint: 4433713 ns <-- 4.4ms >> Reaching safepoint: 3368436 ns >> Reaching safepoint: 2986519 ns >> Reaching safepoint: 3269102 ns >> ... >> >> >> >> **Alternatives** >> >> I considered some other options for mitigating this. For example, could we reduce the time spent holding the lock in StartThread? Most of the time is spent managing the threads list for ThreadSMR support, and each time we add a thread to that list we need to copy the whole list and free every entry in the original, which is slow. But I didn't see an easy way to avoid this. >> I also looked at some kind of signal from the VM thread that it is ready to start synchronizing that StartThread could check before trying to grab Threads_lock, but I didn't find anything better than this extra lock. > > Oli Gillespie has updated the pull request incrementally with two additional commits since the last revision: > > - Improve doc > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > - Improve comment > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> Looks fine, only cosmetics: src/hotspot/share/prims/jvm.cpp line 2918: > 2916: } > 2917: > 2918: I think this double-new-line is deliberate style in this file. src/hotspot/share/runtime/threads.cpp line 1032: > 1030: { > 1031: ConditionalMutexLocker ml1(ThreadsLockThrottle_lock, UseThreadsLockThrottleLock); > 1032: MonitorLocker ml2(Threads_lock); I thinking about the names here again. I think this is cleaner, as it does not require checking and fixing the uses of `ml`. ConditionalMutexLocker throttle_ml(ThreadsLockThrottle_lock, UseThreadsLockThrottleLock); MonitorLocker ml(Threads_lock); ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21111#pullrequestreview-2336714026 PR Review Comment: https://git.openjdk.org/jdk/pull/21111#discussion_r1780632542 PR Review Comment: https://git.openjdk.org/jdk/pull/21111#discussion_r1780635083 From rcastanedalo at openjdk.org Mon Sep 30 08:24:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 30 Sep 2024 08:24:52 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 05:02:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: > > - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion > - riscv port refactor > - Remove temporary support code > - Merge jdk-24+17 > - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization > - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions > - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes > - Merge jdk-24+16 > - Ensure that detected encode-and-store patterns are matched > - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion > - ... and 43 more: https://git.openjdk.org/jdk/compare/60c13deb...14483b83 I just updated to jdk-24+17 (commit bda4ab21) and removed the temporary support code guarded by `G1_LATE_BARRIER_MIGRATION_SUPPORT` (commit 55a1f621). The current changeset passes all tests specified in the pull request [description](https://github.com/openjdk/jdk/pull/19746#issue-2356905813) and yields benchmark results similar to those of the original submission. @albertnetymk @vnkozlov @tschatzl @kimbarrett could you please re-review? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2382431347 From tschatzl at openjdk.org Mon Sep 30 08:31:41 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 30 Sep 2024 08:31:41 GMT Subject: RFR: 8341168: Cleanup dead code after JDK-8322630 [v2] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 08:15:51 GMT, Axel Boldt-Christmas wrote: >> [JDK-8322630](https://bugs.openjdk.org/browse/JDK-8322630) / #17495 removed the the concept of ICStubs, InlineCache buffers and related safepoints. >> >> There are a handfull of references and auxiliary constructs still in the code, I propose we clean these out. >> >> This removes the unused: >> * Experimental `InlineCacheBufferSize` option >> * `InlineCacheBuffer_lock` mutex >> * `Thread::_missed_ic_stub_refill_verifier` field >> * `VM_ICBufferFull` VM operation > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Add newline Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21255#pullrequestreview-2336747623 From mli at openjdk.org Mon Sep 30 08:31:42 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 30 Sep 2024 08:31:42 GMT Subject: RFR: 8341168: Cleanup dead code after JDK-8322630 [v2] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 08:15:51 GMT, Axel Boldt-Christmas wrote: >> [JDK-8322630](https://bugs.openjdk.org/browse/JDK-8322630) / #17495 removed the the concept of ICStubs, InlineCache buffers and related safepoints. >> >> There are a handfull of references and auxiliary constructs still in the code, I propose we clean these out. >> >> This removes the unused: >> * Experimental `InlineCacheBufferSize` option >> * `InlineCacheBuffer_lock` mutex >> * `Thread::_missed_ic_stub_refill_verifier` field >> * `VM_ICBufferFull` VM operation > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Add newline Marked as reviewed by mli (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21255#pullrequestreview-2336747914 From mli at openjdk.org Mon Sep 30 08:40:36 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 30 Sep 2024 08:40:36 GMT Subject: RFR: 8340420: ZGC: Should call `vm_shutdown_during_initialization` if initialization fails [v4] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 07:52:49 GMT, Axel Boldt-Christmas wrote: >> ZGC does not call `vm_shutdown_during_initialization` if initialization fails during the setup of the CollectedHeap, in contrast to the other GC. >> >> I propose we add a `ZInitialize::error` which we can use during initialisation to record errors. The first error recored is also stored and used as the error message when shutting down the VM. >> >> Initially used malloc to allocate the error (ed9ba5dd6805291a6b1b56566c933424230d3b4a) but feels like it is just better to have static storage for the string and not have to care about malloc potentially failing to allocate. > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Fix indentation Marked as reviewed by mli (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21254#pullrequestreview-2336768026 From duke at openjdk.org Mon Sep 30 09:01:43 2024 From: duke at openjdk.org (duke) Date: Mon, 30 Sep 2024 09:01:43 GMT Subject: RFR: 8322770: Implement C2 VectorizedHashCode on AArch64 [v21] In-Reply-To: References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: On Fri, 27 Sep 2024 13:19:54 GMT, Mikhail Ablakatov wrote: >> Hello, >> >> Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. >> >> The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. >> >> At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. >> >> # Performance >> >> ## Neoverse N1 >> >> >> -------------------------------------------------------------------------------------------- >> Version Baseline This patch >> -------------------------------------------------------------------------------------------- >> Benchmark (size) Mode Cnt Score Error Score Error Units >> -------------------------------------------------------------------------------------------- >> ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op >> ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op >> ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op >> ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op >> ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op >> ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op >> ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op >> ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op >> ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op >> ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op >> ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op >> ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op >> ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ... > > Mikhail Ablakatov has updated the pull request incrementally with one additional commit since the last revision: > > cleanup: collapse duplicated code into a macro > > Co-authored-by: Andrew Haley @mikabl-arm Your change (at version 1dbb1ddf076878556d25e1cb7677da973a5b76f0) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/18487#issuecomment-2382537341 From luhenry at openjdk.org Mon Sep 30 09:04:38 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 30 Sep 2024 09:04:38 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v8] In-Reply-To: References: Message-ID: On Thu, 26 Sep 2024 13:14:04 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> Thanks! >> >> This patch is based on https://github.com/openjdk/jdk/pull/20781 which added the sleef source (in particular the generated sleef inline headers). We use sleef api to vectorize the math operations in vector api. >> >> On machine with vector intrinsic support on riscv (e.g. gcc 14+) it will generate libsleef.so with the bridge functions to sleef api, otherwise without the bridge functions. >> >> ### Test >> test/jdk/jdk/incubator/vector >> >> ### Performance >> data on bananapi >> >> Benchmark - bananapi | (size) | Mode | Cnt | Score +intrinsic | Error +intrinsic | Score -intrinsic | Error -intrinsic | Units | Improvement >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Double128Vector.ACOS | 1024 | avgt | 10 | 112444.388 | 655.761 | 208554.742 | 1508.709 | ns/op | 1.855 >> Double128Vector.ASIN | 1024 | avgt | 10 | 104121.259 | 243.167 | 208314.499 | 2833.61 | ns/op | 2.001 >> Double128Vector.ATAN | 1024 | avgt | 10 | 136941.263 | 243.486 | 284024.53 | 2204.224 | ns/op | 2.074 >> Double128Vector.ATAN2 | 1024 | avgt | 10 | 163228.681 | 435.455 | 427589.587 | 3045.192 | ns/op | 2.62 >> Double128Vector.CBRT | 1024 | avgt | 10 | 146395.753 | 239.355 | 317136.654 | 1330.869 | ns/op | 2.166 >> Double128Vector.COS | 1024 | avgt | 10 | 154865.298 | 235.697 | 305721.518 | 1319.313 | ns/op | 1.974 >> Double128Vector.COSH | 1024 | avgt | 10 | 189212.943 | 262.399 | 220756.27 | 61324.863 | ns/op | 1.167 >> Double128Vector.EXP | 1024 | avgt | 10 | 113941.594 | 219.647 | 252853.07 | 891.272 | ns/op | 2.219 >> Double128Vector.EXPM1 | 1024 | avgt | 10 | 184552.939 | 513.715 | 254087.184 | 2144.997 | ns/op | 1.377 >> Double128Vector.HYPOT | 1024 | avgt | 10 | 111580.194 | 423.282 | 374537.338 | 2091.811 | ns/op | 3.357 >> Double128Vector.LOG | 1024 | avgt | 10 | 110680.548 | 192.731 | 265391.129 | 2653.519 | ns/op | 2.398 >> Double128Vector.LOG10 | 1024 | avgt | 10 | 116708.105 | 167.095 | 285764.405 | 2489.08 | ns/op | 2.449 >> Double128Vector.LOG1P | 1024 | avgt | 10 | 115633.302 | 567.7 | 317235.967 | 1062.848 | ns/op | 2.743 >> Double128Vector.POW | 1024 | avgt | 10 | 321655.14 | 3... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > fix test macro src/hotspot/share/opto/vectorIntrinsics.cpp line 2044: > 2042: } > 2043: > 2044: if (addr == nullptr && Matcher::supports_scalable_vector()) { Shouldn't this be in the `default` branch of the switch above? Otherwise, we would be hitting the `Unimplemented();` at https://github.com/openjdk/jdk/pull/21083/files#diff-33d0866101d899687e04303fb2232574f2cb796ce060528a243ebdc9903b01b1R2040? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1780704614 From duke at openjdk.org Mon Sep 30 09:05:51 2024 From: duke at openjdk.org (Mikhail Ablakatov) Date: Mon, 30 Sep 2024 09:05:51 GMT Subject: Integrated: 8322770: Implement C2 VectorizedHashCode on AArch64 In-Reply-To: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> References: <2VKOC-rT0vOyMcXUX2gs3sOrbZ5H79KBIo50sOOVmyI=.1936f78e-794c-4f54-af3c-b1b97e5fafa8@github.com> Message-ID: <9xA_BB9hiDcp36QRR6AxRzsBdNKC3gSg0BPvgv_jjaQ=.22e243fc-b764-4495-9789-0fb17b073c64@github.com> On Tue, 26 Mar 2024 13:59:12 GMT, Mikhail Ablakatov wrote: > Hello, > > Please review the following PR for [JDK-8322770 Implement C2 VectorizedHashCode on AArch64](https://bugs.openjdk.org/browse/JDK-8322770). It follows previous work done in https://github.com/openjdk/jdk/pull/16629 and https://github.com/openjdk/jdk/pull/10847 for RISC-V and x86 respectively. > > The code to calculate a hash code consists of two parts: a vectorized loop of Neon instruction that process 4 or 8 elements per iteration depending on the data type and a fully unrolled scalar "loop" that processes up to 7 tail elements. > > At the time of writing this I don't see potential benefits from providing SVE/SVE2 implementation, but it could be added as a follow-up or independently later if required. > > # Performance > > ## Neoverse N1 > > > -------------------------------------------------------------------------------------------- > Version Baseline This patch > -------------------------------------------------------------------------------------------- > Benchmark (size) Mode Cnt Score Error Score Error Units > -------------------------------------------------------------------------------------------- > ArraysHashCode.bytes 1 avgt 15 1.249 ? 0.060 1.247 ? 0.062 ns/op > ArraysHashCode.bytes 10 avgt 15 8.754 ? 0.028 4.387 ? 0.015 ns/op > ArraysHashCode.bytes 100 avgt 15 98.596 ? 0.051 26.655 ? 0.097 ns/op > ArraysHashCode.bytes 10000 avgt 15 10150.578 ? 1.352 2649.962 ? 216.744 ns/op > ArraysHashCode.chars 1 avgt 15 1.286 ? 0.062 1.246 ? 0.054 ns/op > ArraysHashCode.chars 10 avgt 15 8.731 ? 0.002 5.344 ? 0.003 ns/op > ArraysHashCode.chars 100 avgt 15 98.632 ? 0.048 23.023 ? 0.142 ns/op > ArraysHashCode.chars 10000 avgt 15 10150.658 ? 3.374 2410.504 ? 8.872 ns/op > ArraysHashCode.ints 1 avgt 15 1.189 ? 0.005 1.187 ? 0.001 ns/op > ArraysHashCode.ints 10 avgt 15 8.730 ? 0.002 5.676 ? 0.001 ns/op > ArraysHashCode.ints 100 avgt 15 98.559 ? 0.016 24.378 ? 0.006 ns/op > ArraysHashCode.ints 10000 avgt 15 10148.752 ? 1.336 2419.015 ? 0.492 ns/op > ArraysHashCode.multibytes 1 avgt 15 1.037 ? 0.001 1.037 ? 0.001 ns/op > ArraysHashCode.multibytes 10 avgt 15 5.4... This pull request has now been integrated. Changeset: 475b8943 Author: Mikhail Ablakatov <164922675+mikabl-arm at users.noreply.github.com> URL: https://git.openjdk.org/jdk/commit/475b8943c672349609a4839ce0a02ef995764698 Stats: 1355 lines in 11 files changed: 775 ins; 0 del; 580 mod 8322770: Implement C2 VectorizedHashCode on AArch64 Reviewed-by: aph, adinn ------------- PR: https://git.openjdk.org/jdk/pull/18487 From anton.seoane.ampudia at oracle.com Mon Sep 30 09:32:22 2024 From: anton.seoane.ampudia at oracle.com (Anton Seoane Ampudia) Date: Mon, 30 Sep 2024 09:32:22 +0000 Subject: 8340363: Tag-specific default decorators for UnifiedLogging In-Reply-To: References: Message-ID: Yes. I agree on the original idea being more flexible, but it has the risk of defaults being not completely agreed on and ?infecting? some people?s expected output. The (only) use case that has driven this has been the ?no decorators? default, which is what I am tentatively restricting the original idea to. As Roberto Casta?eda mentioned before, yes, this would be equivalent to -Xlog:tags::none, but a bit more ergonomic and convenient considering that with the upcoming compiler migration to UL many of these undecorated output cases will appear. Ant?n From: hotspot-dev on behalf of David Holmes Date: Monday, 30 September 2024 at 02:39 To: hotspot-dev at openjdk.org Subject: Re: 8340363: Tag-specific default decorators for UnifiedLogging Hi Anton, Thanks for bringing this up for general discussion outside the PR. Just to be clear for other readers, decorators are associated with a given log output device. On 27/09/2024 7:07 pm, Anton Seoane Ampudia wrote: > Hi all, > > Currently, the Unified Logging framework defaults to three decorators > (uptime, level, tags) whenever the user does not specify otherwise > through -Xlog. This resultssometimes inconvenient when specific users > with some predefined needs do not want those tags. For example, C2 > developers would rather not see those defaults in cases such as > jit+inlining, but also do not want to specify so every time they run -Xlog. > > One solution for this is found in this PR: https://github.com/openjdk/ > jdk/pull/20988 . It can be > considered as a ?flavoured? version of the existing default decorators > and in no way it will override anything user-specified. Also, decorators > will still be consistent throughout an output device (i.e., no different > decorators ?mixed in?). > > However, upon recent talks with different teams this approach may be too > flexible/powerful. The ability of specifying LogSelection-bound default > decorators may result in a situation where defaults for A+B and C+D have > been specified, and a user selects -Xlog:A+B,C+D. In that case, the > union of the prespecified defaults is taken, which may not be what the > end user wants (and might result in too many decorators). > > Actually, the main use case for this that I know as of now is C2 > developers and the wish to not see decorators for some defined log > selections. With this in mind, I have reduced the original idea to a > feature where only the default decorators are not shown if we get a > positive match with a prespecified list throughout the entire user log > selection list (i.e.: > > * If there is a default for A+B and the user specifies -Xlog:A+B,C+D, > he will still get the default decorators > * If there is a default for A+B and the user specifies -Xlog:A+B, no > default decorators will be supplied). So to be clear, is the proposal now to just drop the default decorators, rather than allowing them to be replaced with alternate defaults? If that is the case then it is the same as writing: -Xlog:A+B::none and I don't really see much value in that. But I wouldn't oppose it. Allowing new defaults gives more flexibility - but obviously the developers using the specific tag combinations have to agree on what defaults to set. Thanks, David ----- > Before scraping the original idea and moving on with this one (which > will not change anything as it is right now, except for the really > specific uses like C2 jit+inlining that may be decided), *I wanted to > get a broader idea of people?s opinions on this, as well as other use > cases for this behaviour.* > > ** > > Many thanks, > > Ant?n > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdoerr at openjdk.org Mon Sep 30 09:32:35 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 30 Sep 2024 09:32:35 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 07:17:50 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Relax to just a release I've done a bit of research and it seems like the C2 clinit barrier is only used very rarely in a corner case while the C1 parts are not so infrequently used. Peak performance doesn't seem to be affected. So, I don't see any reason for optimizing C2, either. The shared code LGTM. The more frequently used parts are in platform specific code, so it might make sense to optimize the PPC64 parts. Also note that the "isync trick" is a faster acquire barrier than "lwsync". What do you think about this? diff --git a/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp b/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp index 61f654c9cfa..684c06614a9 100644 --- a/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp +++ b/src/hotspot/cpu/ppc/c1_LIRAssembler_ppc.cpp @@ -2274,7 +2274,7 @@ void LIR_Assembler::emit_alloc_obj(LIR_OpAllocObj* op) { } __ lbz(op->tmp1()->as_register(), in_bytes(InstanceKlass::init_state_offset()), op->klass()->as_register()); - __ lwsync(); // acquire + // acquire barrier included in membar_storestore() which follows the allocation immediately. __ cmpwi(CCR0, op->tmp1()->as_register(), InstanceKlass::fully_initialized); __ bc_far_optimized(Assembler::bcondCRbiIs0, __ bi0(CCR0, Assembler::equal), *op->stub()->entry()); } diff --git a/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp b/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp index e73e617b8ca..bf2b2540e35 100644 --- a/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp +++ b/src/hotspot/cpu/ppc/macroAssembler_ppc.cpp @@ -2410,7 +2410,7 @@ void MacroAssembler::verify_secondary_supers_table(Register r_sub_klass, void MacroAssembler::clinit_barrier(Register klass, Register thread, Label* L_fast_path, Label* L_slow_path) { assert(L_fast_path != nullptr || L_slow_path != nullptr, "at least one is required"); - Label L_fallthrough; + Label L_check_thread, L_fallthrough; if (L_fast_path == nullptr) { L_fast_path = &L_fallthrough; } else if (L_slow_path == nullptr) { @@ -2419,11 +2419,14 @@ void MacroAssembler::clinit_barrier(Register klass, Register thread, Label* L_fa // Fast path check: class is fully initialized lbz(R0, in_bytes(InstanceKlass::init_state_offset()), klass); - lwsync(); // acquire + // acquire by cmp-branch-isync if fully_initialized cmpwi(CCR0, R0, InstanceKlass::fully_initialized); - beq(CCR0, *L_fast_path); + bne(CCR0, L_check_thread); + isync(); + b(*L_fast_path); // Fast path check: current thread is initializer thread + bind(L_check_thread); ld(R0, in_bytes(InstanceKlass::init_thread_offset()), klass); cmpd(CCR0, thread, R0); if (L_slow_path == &L_fallthrough) { ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2382609010 From mli at openjdk.org Mon Sep 30 09:36:36 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 30 Sep 2024 09:36:36 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v8] In-Reply-To: References: Message-ID: <19hPemMkpl_1xHlsjWm0Dc_lpvB4jXG6SoeQUOBkb6A=.ed9843c5-c545-4fda-9b61-df235db6b481@github.com> On Mon, 30 Sep 2024 09:01:24 GMT, Ludovic Henry wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> fix test macro > > src/hotspot/share/opto/vectorIntrinsics.cpp line 2044: > >> 2042: } >> 2043: >> 2044: if (addr == nullptr && Matcher::supports_scalable_vector()) { > > Shouldn't this be in the `default` branch of the switch above? Otherwise, we would be hitting the `Unimplemented();` at https://github.com/openjdk/jdk/pull/21083/files#diff-33d0866101d899687e04303fb2232574f2cb796ce060528a243ebdc9903b01b1R2040? Do you mean when the bits > 512, e.g. 1024? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1780753333 From tschatzl at openjdk.org Mon Sep 30 10:04:51 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 30 Sep 2024 10:04:51 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: <9o1uzat6Ap4MIn7o6xZhwXKaHsgMRNi_2qyvpcjAlIQ=.311f7fe0-45be-4f39-a28d-3fc1d5ea5471@github.com> On Mon, 30 Sep 2024 05:02:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: > > - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion > - riscv port refactor > - Remove temporary support code > - Merge jdk-24+17 > - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization > - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions > - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes > - Merge jdk-24+16 > - Ensure that detected encode-and-store patterns are matched > - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion > - ... and 43 more: https://git.openjdk.org/jdk/compare/55c0ecf8...14483b83 Still seems good. Mostly only looked at the changes in the GC directory and the barrier code themselves as I do not feel enabled to comment too much on other (compiler) changes. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2336972915 From tschatzl at openjdk.org Mon Sep 30 10:15:40 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 30 Sep 2024 10:15:40 GMT Subject: RFR: 8340988: Update jdk/jfr/event/gc/collection tests to accept "CodeCache GC Threshold" as valid GC reason In-Reply-To: <1ZSbNKjlCqyiZJgH3lC79mZI38WGreVQZ4hILJGzCao=.29b6dd30-a9c7-475d-be65-2a11ef62e71e@github.com> References: <1ZSbNKjlCqyiZJgH3lC79mZI38WGreVQZ4hILJGzCao=.29b6dd30-a9c7-475d-be65-2a11ef62e71e@github.com> Message-ID: On Sat, 28 Sep 2024 01:14:15 GMT, Leonid Mesnik wrote: > Tests > jdk/jdk/jfr/event/gc/collection/TestGCCauseWith* GC > check the GC reasons. They GC might be caused by > "CodeCache GC Threshold" > if test is executed with Xcomp and GC caused by codecache cleanup. Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21238#pullrequestreview-2336997510 From thartmann at openjdk.org Mon Sep 30 10:28:09 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 30 Sep 2024 10:28:09 GMT Subject: RFR: 8341197: [BACKOUT] 8322770: Implement C2 VectorizedHashCode on AArch64 Message-ID: <3muNlS1Y7HT5geXwEejOPNn37GNhH0I7lM-IMYFMMHU=.1cf7654c-e7e8-44ee-b34f-f06c4bd6b801@github.com> Let's backout [JDK-8322770](https://bugs.openjdk.org/browse/JDK-8322770) while we investigate [JDK-8341194](https://bugs.openjdk.org/browse/JDK-8341194). Applies cleanly. Thanks, Tobias ------------- Commit messages: - Revert "8322770: Implement C2 VectorizedHashCode on AArch64" Changes: https://git.openjdk.org/jdk/pull/21260/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21260&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341197 Stats: 1355 lines in 11 files changed: 0 ins; 775 del; 580 mod Patch: https://git.openjdk.org/jdk/pull/21260.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21260/head:pull/21260 PR: https://git.openjdk.org/jdk/pull/21260 From shade at openjdk.org Mon Sep 30 10:28:09 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 30 Sep 2024 10:28:09 GMT Subject: RFR: 8341197: [BACKOUT] 8322770: Implement C2 VectorizedHashCode on AArch64 In-Reply-To: <3muNlS1Y7HT5geXwEejOPNn37GNhH0I7lM-IMYFMMHU=.1cf7654c-e7e8-44ee-b34f-f06c4bd6b801@github.com> References: <3muNlS1Y7HT5geXwEejOPNn37GNhH0I7lM-IMYFMMHU=.1cf7654c-e7e8-44ee-b34f-f06c4bd6b801@github.com> Message-ID: On Mon, 30 Sep 2024 10:17:30 GMT, Tobias Hartmann wrote: > Let's backout [JDK-8322770](https://bugs.openjdk.org/browse/JDK-8322770) while we investigate [JDK-8341194](https://bugs.openjdk.org/browse/JDK-8341194). Applies cleanly. > > Thanks, > Tobias Backout looks okay. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21260#pullrequestreview-2337013024 From jpai at openjdk.org Mon Sep 30 10:28:09 2024 From: jpai at openjdk.org (Jaikiran Pai) Date: Mon, 30 Sep 2024 10:28:09 GMT Subject: RFR: 8341197: [BACKOUT] 8322770: Implement C2 VectorizedHashCode on AArch64 In-Reply-To: <3muNlS1Y7HT5geXwEejOPNn37GNhH0I7lM-IMYFMMHU=.1cf7654c-e7e8-44ee-b34f-f06c4bd6b801@github.com> References: <3muNlS1Y7HT5geXwEejOPNn37GNhH0I7lM-IMYFMMHU=.1cf7654c-e7e8-44ee-b34f-f06c4bd6b801@github.com> Message-ID: On Mon, 30 Sep 2024 10:17:30 GMT, Tobias Hartmann wrote: > Let's backout [JDK-8322770](https://bugs.openjdk.org/browse/JDK-8322770) while we investigate [JDK-8341194](https://bugs.openjdk.org/browse/JDK-8341194). Applies cleanly. > > Thanks, > Tobias Looks OK to me. ------------- Marked as reviewed by jpai (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21260#pullrequestreview-2337016466 From thartmann at openjdk.org Mon Sep 30 10:28:09 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 30 Sep 2024 10:28:09 GMT Subject: RFR: 8341197: [BACKOUT] 8322770: Implement C2 VectorizedHashCode on AArch64 In-Reply-To: References: <3muNlS1Y7HT5geXwEejOPNn37GNhH0I7lM-IMYFMMHU=.1cf7654c-e7e8-44ee-b34f-f06c4bd6b801@github.com> Message-ID: On Mon, 30 Sep 2024 10:19:41 GMT, Aleksey Shipilev wrote: >> Let's backout [JDK-8322770](https://bugs.openjdk.org/browse/JDK-8322770) while we investigate [JDK-8341194](https://bugs.openjdk.org/browse/JDK-8341194). Applies cleanly. >> >> Thanks, >> Tobias > > Backout looks okay. Thanks for the quick reviews, @shipilev and @jaikiran. Will run some quick sanity testing before integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21260#issuecomment-2382725077 From mli at openjdk.org Mon Sep 30 10:33:20 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 30 Sep 2024 10:33:20 GMT Subject: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v9] In-Reply-To: References: Message-ID: <4oXN9TokV6OkJe4BMIF4GBI7g8LA8KZTiZwSksTU_Qw=.6f3da3ea-63f0-472b-8505-d736ab959aa3@github.com> > Hi, > Can you help to review this patch? > Thanks! > > This patch is based on https://github.com/openjdk/jdk/pull/20781 which added the sleef source (in particular the generated sleef inline headers). We use sleef api to vectorize the math operations in vector api. > > On machine with vector intrinsic support on riscv (e.g. gcc 14+) it will generate libsleef.so with the bridge functions to sleef api, otherwise without the bridge functions. > > ### Test > test/jdk/jdk/incubator/vector > > ### Performance > data on bananapi > > Benchmark - bananapi | (size) | Mode | Cnt | Score +intrinsic | Error +intrinsic | Score -intrinsic | Error -intrinsic | Units | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Double128Vector.ACOS | 1024 | avgt | 10 | 112444.388 | 655.761 | 208554.742 | 1508.709 | ns/op | 1.855 > Double128Vector.ASIN | 1024 | avgt | 10 | 104121.259 | 243.167 | 208314.499 | 2833.61 | ns/op | 2.001 > Double128Vector.ATAN | 1024 | avgt | 10 | 136941.263 | 243.486 | 284024.53 | 2204.224 | ns/op | 2.074 > Double128Vector.ATAN2 | 1024 | avgt | 10 | 163228.681 | 435.455 | 427589.587 | 3045.192 | ns/op | 2.62 > Double128Vector.CBRT | 1024 | avgt | 10 | 146395.753 | 239.355 | 317136.654 | 1330.869 | ns/op | 2.166 > Double128Vector.COS | 1024 | avgt | 10 | 154865.298 | 235.697 | 305721.518 | 1319.313 | ns/op | 1.974 > Double128Vector.COSH | 1024 | avgt | 10 | 189212.943 | 262.399 | 220756.27 | 61324.863 | ns/op | 1.167 > Double128Vector.EXP | 1024 | avgt | 10 | 113941.594 | 219.647 | 252853.07 | 891.272 | ns/op | 2.219 > Double128Vector.EXPM1 | 1024 | avgt | 10 | 184552.939 | 513.715 | 254087.184 | 2144.997 | ns/op | 1.377 > Double128Vector.HYPOT | 1024 | avgt | 10 | 111580.194 | 423.282 | 374537.338 | 2091.811 | ns/op | 3.357 > Double128Vector.LOG | 1024 | avgt | 10 | 110680.548 | 192.731 | 265391.129 | 2653.519 | ns/op | 2.398 > Double128Vector.LOG10 | 1024 | avgt | 10 | 116708.105 | 167.095 | 285764.405 | 2489.08 | ns/op | 2.449 > Double128Vector.LOG1P | 1024 | avgt | 10 | 115633.302 | 567.7 | 317235.967 | 1062.848 | ns/op | 2.743 > Double128Vector.POW | 1024 | avgt | 10 | 321655.14 | 36.55 | 560765.066 | 2669.33 | ns/op | 1.743 > Double128Vector.... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: bits > 512 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21083/files - new: https://git.openjdk.org/jdk/pull/21083/files/0bd263d1..0f917561 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21083&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21083&range=07-08 Stats: 5 lines in 1 file changed: 2 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21083.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21083/head:pull/21083 PR: https://git.openjdk.org/jdk/pull/21083 From thartmann at openjdk.org Mon Sep 30 10:38:37 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 30 Sep 2024 10:38:37 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v3] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Fri, 27 Sep 2024 14:21:57 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarre?o has updated the pull request incrementally with three additional commits since the last revision: > > - Revert "Implement cmovL as a jump+mov branch" > > This reverts commit 1522e26bf66c47b780ebd0d0d0c4f78a4c564e44. > - Revert "Switch movl to movq" > > This reverts commit a64fcdab7d6c63125c8dfd427ae8a56ff5fa2bb7. > - Revert "Fix format of assembly for the movl to movq switch" > > This reverts commit 13ed87295cff50ff6ef30f909f6dcb35d15af047. You've probably seen this but the new test is failing IR verification: Failed IR Rules (4) of Methods (4) ---------------------------------- 1) Method "private static double compiler.intrinsics.math.TestMinMaxInlining.testDoubleMax(double,double)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MAX_D#_", "1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(MaxD.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! 2) Method "private static double compiler.intrinsics.math.TestMinMaxInlining.testDoubleMin(double,double)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MIN_D#_", "1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(MinD.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! 3) Method "private static float compiler.intrinsics.math.TestMinMaxInlining.testFloatMax(float,float)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MAX_F#_", "1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(MaxF.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! 4) Method "private static float compiler.intrinsics.math.TestMinMaxInlining.testFloatMin(float,float)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MIN_F#_", "1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(MinF.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2382746375 From ogillespie at openjdk.org Mon Sep 30 10:41:38 2024 From: ogillespie at openjdk.org (Oli Gillespie) Date: Mon, 30 Sep 2024 10:41:38 GMT Subject: RFR: 8341013: Optimize x86/aarch64 MD5 intrinsics by reducing data dependency In-Reply-To: References: <0tTwfxNQJz8-XYxBL1zujuv7Cbbe8N1hVqsqddmYB1o=.367aa163-cae4-4d31-a84c-ee7e11c49776@github.com> <79fI8ByboUSgIF7r_ka5gQ3QpHy5QacucjQ9Cy429ZQ=.0a785c68-38f2-4734-abf9-5922a69312b1@github.com> Message-ID: <0QUhy2lbDRCRBsZUWHySpM8UPoWmmPNCXfjO8-ffveY=.54420f24-1ce8-4fe5-abce-c61ac40ff4aa@github.com> On Fri, 27 Sep 2024 14:55:52 GMT, Andrew Haley wrote: >>> Given that there is so little advantage, almost down in the noise, you should do that. >> >> Just to check we're talking about the same results - the improvement shown in my aarch64 run is the same (actually a littler more) as the x86 run; around 5.6%, and very high confidence (+-0.1%). > >> > Given that there is so little advantage, almost down in the noise, you should do that. >> >> Just to check we're talking about the same results - the improvement shown in my aarch64 run is the same (actually a littler more) as the x86 run; around 5.6%, and very high confidence (+-0.1%). > > OK, fair enough. Thanks all for reviewing. @theRealAph - any concerns with integrating? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21203#issuecomment-2382774499 From mdoerr at openjdk.org Mon Sep 30 10:45:06 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 30 Sep 2024 10:45:06 GMT Subject: RFR: 8339386: Assertion on AIX - original PC must be in the main code section of the compiled method Message-ID: We should make sure to read the `sender_pc` only once to make it signal safe (e.g. using `volatile`). Now, we can check if it is a deopt PC and if so, if the original PC is within the nmethod. In case of interpreter frame on top of compiled deoptimized frame we need to use the unextendedSP (2nd commit). ------------- Commit messages: - Handle interpreter frame on top of compiled deoptimized frame. - 8339386: Assertion on AIX - original PC must be in the main code section of the compiled method Changes: https://git.openjdk.org/jdk/pull/21189/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21189&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339386 Stats: 12 lines in 1 file changed: 9 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21189.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21189/head:pull/21189 PR: https://git.openjdk.org/jdk/pull/21189 From ogillespie at openjdk.org Mon Sep 30 10:49:21 2024 From: ogillespie at openjdk.org (Oli Gillespie) Date: Mon, 30 Sep 2024 10:49:21 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v6] In-Reply-To: References: Message-ID: > Mitigate the impact of JVM_StartThread on safepoint synchronization, by adding a new ThreadStart_lock which limits the number of JVM_StartThread invocations competing for Threads_lock at any given time to 1. > This gives a VM thread trying to call a safepoint a much better chance of acquiring Threads_lock when there are many JVM_StartThread invocations in flight, at the cost of one extra lock/unlock for every new thread. > > Can be disabled with new diagnostic flag `-XX:-UseExtraThreadStartLock`. > > Before (ThreadStartTtsp.java is shared in JDK-8340547): > > java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' > Reaching safepoint: 1291591 ns > Reaching safepoint: 59962 ns > Reaching safepoint: 1958065 ns > Reaching safepoint: 14456666258 ns <-- 14 seconds! > ... > > > After: > > java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' > Reaching safepoint: 214269 ns > Reaching safepoint: 60253 ns > Reaching safepoint: 2040680 ns > Reaching safepoint: 3089284 ns > Reaching safepoint: 2998303 ns > Reaching safepoint: 4433713 ns <-- 4.4ms > Reaching safepoint: 3368436 ns > Reaching safepoint: 2986519 ns > Reaching safepoint: 3269102 ns > ... > > > > **Alternatives** > > I considered some other options for mitigating this. For example, could we reduce the time spent holding the lock in StartThread? Most of the time is spent managing the threads list for ThreadSMR support, and each time we add a thread to that list we need to copy the whole list and free every entry in the original, which is slow. But I didn't see an easy way to avoid this. > I also looked at some kind of signal from the VM thread that it is ready to start synchronizing that StartThread could check before trying to grab Threads_lock, but I didn't find anything better than this extra lock. Oli Gillespie has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: - Merge remote-tracking branch 'origin/master' into threadlock-ttsp - PR feedback - Improve doc Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> - Improve comment Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> - Also address Thread::exit - Fix lock ranking - Fix build and address comments - Remove unused code - Improve ttsp while creating threads ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21111/files - new: https://git.openjdk.org/jdk/pull/21111/files/7b5633ae..26f7cdac Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21111&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21111&range=04-05 Stats: 160599 lines in 871 files changed: 152375 ins; 3841 del; 4383 mod Patch: https://git.openjdk.org/jdk/pull/21111.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21111/head:pull/21111 PR: https://git.openjdk.org/jdk/pull/21111 From ogillespie at openjdk.org Mon Sep 30 10:49:21 2024 From: ogillespie at openjdk.org (Oli Gillespie) Date: Mon, 30 Sep 2024 10:49:21 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v5] In-Reply-To: References: Message-ID: <9olXPuo5UgkyTIhgEKPyEt6HnnH8UlSGkWuQsF9kgPU=.0e640d56-8a42-4aa3-a858-eec571f4bcc7@github.com> On Sun, 29 Sep 2024 20:50:37 GMT, David Holmes wrote: > Looks fine. Just confirm general benchmarking results. Thanks Thank you. No observed regressions in my testing, including SpecJBB, SpecJVM and DaCapo. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21111#issuecomment-2382830857 From aph at openjdk.org Mon Sep 30 10:52:35 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 30 Sep 2024 10:52:35 GMT Subject: RFR: 8341013: Optimize x86/aarch64 MD5 intrinsics by reducing data dependency In-Reply-To: References: <0tTwfxNQJz8-XYxBL1zujuv7Cbbe8N1hVqsqddmYB1o=.367aa163-cae4-4d31-a84c-ee7e11c49776@github.com> <79fI8ByboUSgIF7r_ka5gQ3QpHy5QacucjQ9Cy429ZQ=.0a785c68-38f2-4734-abf9-5922a69312b1@github.com> Message-ID: On Fri, 27 Sep 2024 14:55:52 GMT, Andrew Haley wrote: >>> Given that there is so little advantage, almost down in the noise, you should do that. >> >> Just to check we're talking about the same results - the improvement shown in my aarch64 run is the same (actually a littler more) as the x86 run; around 5.6%, and very high confidence (+-0.1%). > >> > Given that there is so little advantage, almost down in the noise, you should do that. >> >> Just to check we're talking about the same results - the improvement shown in my aarch64 run is the same (actually a littler more) as the x86 run; around 5.6%, and very high confidence (+-0.1%). > > OK, fair enough. > Thanks all for reviewing. @theRealAph - any concerns with integrating? No, should be fine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21203#issuecomment-2382839456 From duke at openjdk.org Mon Sep 30 10:52:36 2024 From: duke at openjdk.org (duke) Date: Mon, 30 Sep 2024 10:52:36 GMT Subject: RFR: 8341013: Optimize x86/aarch64 MD5 intrinsics by reducing data dependency [v2] In-Reply-To: <0YcJsAbwtMOHJqUMFkQhpDKsEm3-AFXucByQATljmYc=.7c70a817-3055-4071-88e4-39222bdf3bfb@github.com> References: <0YcJsAbwtMOHJqUMFkQhpDKsEm3-AFXucByQATljmYc=.7c70a817-3055-4071-88e4-39222bdf3bfb@github.com> Message-ID: On Thu, 26 Sep 2024 14:58:49 GMT, Oli Gillespie wrote: >> As suggested in https://github.com/animetosho/md5-optimisation?tab=readme-ov-file#dependency-shortcut-in-g-function, we can delay the dependency on 'b' by recognizing that the ((d & b) | (~d & c)) is equivalent to ((d & b) + (~d & c)) in this scenario, and we can perform those additions independently, leaving our dependency on b to the final addition. This speeds it up around 5%. >> >> Benchmark results on my two hosts: >> >> >> Benchmark (algorithm) (dataSize) (provider) Mode Cnt Score Error Units >> >> x86 Before: >> MessageDigestBench.digest MD5 1048576 thrpt 10 636.389 ? 0.240 ops/s >> >> x86 After: >> MessageDigestBench.digest MD5 1048576 thrpt 10 671.611 ? 0.226 ops/s (+5.5%) >> >> >> aarch64 Before: >> MessageDigestBench.digest MD5 1048576 thrpt 10 498.613 ? 0.359 ops/s >> >> aarch64 After: >> MessageDigestBench.digest MD5 1048576 thrpt 10 526.008 ? 0.491 ops/s (+5.6%) > > Oli Gillespie has updated the pull request incrementally with one additional commit since the last revision: > > Fix aarch64 bug @olivergillespie Your change (at version e6d95c2f5e8a9bb505543afdb60c491ab6141ecd) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21203#issuecomment-2382845160 From ogillespie at openjdk.org Mon Sep 30 10:56:39 2024 From: ogillespie at openjdk.org (Oli Gillespie) Date: Mon, 30 Sep 2024 10:56:39 GMT Subject: Integrated: 8341013: Optimize x86/aarch64 MD5 intrinsics by reducing data dependency In-Reply-To: References: Message-ID: <4acoXlsAywN9gnG9ryB_MCYHmLHSmmPdWvvF-7KDLos=.8f20719b-c366-4338-825c-fc934940aca6@github.com> On Thu, 26 Sep 2024 11:33:00 GMT, Oli Gillespie wrote: > As suggested in https://github.com/animetosho/md5-optimisation?tab=readme-ov-file#dependency-shortcut-in-g-function, we can delay the dependency on 'b' by recognizing that the ((d & b) | (~d & c)) is equivalent to ((d & b) + (~d & c)) in this scenario, and we can perform those additions independently, leaving our dependency on b to the final addition. This speeds it up around 5%. > > Benchmark results on my two hosts: > > > Benchmark (algorithm) (dataSize) (provider) Mode Cnt Score Error Units > > x86 Before: > MessageDigestBench.digest MD5 1048576 thrpt 10 636.389 ? 0.240 ops/s > > x86 After: > MessageDigestBench.digest MD5 1048576 thrpt 10 671.611 ? 0.226 ops/s (+5.5%) > > > aarch64 Before: > MessageDigestBench.digest MD5 1048576 thrpt 10 498.613 ? 0.359 ops/s > > aarch64 After: > MessageDigestBench.digest MD5 1048576 thrpt 10 526.008 ? 0.491 ops/s (+5.6%) This pull request has now been integrated. Changeset: 1cf26a51 Author: Oli Gillespie Committer: Hamlin Li URL: https://git.openjdk.org/jdk/commit/1cf26a5179e619f17909426fdb26a3fb3b748483 Stats: 10 lines in 2 files changed: 4 ins; 4 del; 2 mod 8341013: Optimize x86/aarch64 MD5 intrinsics by reducing data dependency Reviewed-by: mli, ascarpino ------------- PR: https://git.openjdk.org/jdk/pull/21203 From thartmann at openjdk.org Mon Sep 30 10:59:38 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 30 Sep 2024 10:59:38 GMT Subject: Integrated: 8341197: [BACKOUT] 8322770: Implement C2 VectorizedHashCode on AArch64 In-Reply-To: <3muNlS1Y7HT5geXwEejOPNn37GNhH0I7lM-IMYFMMHU=.1cf7654c-e7e8-44ee-b34f-f06c4bd6b801@github.com> References: <3muNlS1Y7HT5geXwEejOPNn37GNhH0I7lM-IMYFMMHU=.1cf7654c-e7e8-44ee-b34f-f06c4bd6b801@github.com> Message-ID: On Mon, 30 Sep 2024 10:17:30 GMT, Tobias Hartmann wrote: > Let's backout [JDK-8322770](https://bugs.openjdk.org/browse/JDK-8322770) while we investigate [JDK-8341194](https://bugs.openjdk.org/browse/JDK-8341194). Applies cleanly. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 58b6fc5b Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/58b6fc5baa0931fa6f2aa37bf0bb125497cf6cc9 Stats: 1355 lines in 11 files changed: 0 ins; 775 del; 580 mod 8341197: [BACKOUT] 8322770: Implement C2 VectorizedHashCode on AArch64 Reviewed-by: shade, jpai ------------- PR: https://git.openjdk.org/jdk/pull/21260 From shade at openjdk.org Mon Sep 30 11:00:39 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 30 Sep 2024 11:00:39 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v6] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 10:49:21 GMT, Oli Gillespie wrote: >> Mitigate the impact of JVM_StartThread on safepoint synchronization, by adding a new ThreadStart_lock which limits the number of JVM_StartThread invocations competing for Threads_lock at any given time to 1. >> This gives a VM thread trying to call a safepoint a much better chance of acquiring Threads_lock when there are many JVM_StartThread invocations in flight, at the cost of one extra lock/unlock for every new thread. >> >> Can be disabled with new diagnostic flag `-XX:-UseExtraThreadStartLock`. >> >> Before (ThreadStartTtsp.java is shared in JDK-8340547): >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 1291591 ns >> Reaching safepoint: 59962 ns >> Reaching safepoint: 1958065 ns >> Reaching safepoint: 14456666258 ns <-- 14 seconds! >> ... >> >> >> After: >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 214269 ns >> Reaching safepoint: 60253 ns >> Reaching safepoint: 2040680 ns >> Reaching safepoint: 3089284 ns >> Reaching safepoint: 2998303 ns >> Reaching safepoint: 4433713 ns <-- 4.4ms >> Reaching safepoint: 3368436 ns >> Reaching safepoint: 2986519 ns >> Reaching safepoint: 3269102 ns >> ... >> >> >> >> **Alternatives** >> >> I considered some other options for mitigating this. For example, could we reduce the time spent holding the lock in StartThread? Most of the time is spent managing the threads list for ThreadSMR support, and each time we add a thread to that list we need to copy the whole list and free every entry in the original, which is slow. But I didn't see an easy way to avoid this. >> I also looked at some kind of signal from the VM thread that it is ready to start synchronizing that StartThread could check before trying to grab Threads_lock, but I didn't find anything better than this extra lock. > > Oli Gillespie has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into threadlock-ttsp > - PR feedback > - Improve doc > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > - Improve comment > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > - Also address Thread::exit > - Fix lock ranking > - Fix build and address comments > - Remove unused code > - Improve ttsp while creating threads Looks fine to me, thanks. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21111#pullrequestreview-2337154569 From rcastanedalo at openjdk.org Mon Sep 30 11:33:47 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 30 Sep 2024 11:33:47 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: <9o1uzat6Ap4MIn7o6xZhwXKaHsgMRNi_2qyvpcjAlIQ=.311f7fe0-45be-4f39-a28d-3fc1d5ea5471@github.com> References: <9o1uzat6Ap4MIn7o6xZhwXKaHsgMRNi_2qyvpcjAlIQ=.311f7fe0-45be-4f39-a28d-3fc1d5ea5471@github.com> Message-ID: On Mon, 30 Sep 2024 10:02:17 GMT, Thomas Schatzl wrote: > Still seems good. > > Mostly only looked at the changes in the GC directory and the barrier code themselves as I do not feel enabled to comment too much on other (compiler) changes. Thanks for re-reviewing, Thomas! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2382930857 From fyang at openjdk.org Mon Sep 30 11:53:52 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 30 Sep 2024 11:53:52 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 05:02:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: > > - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion > - riscv port refactor > - Remove temporary support code > - Merge jdk-24+17 > - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization > - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions > - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes > - Merge jdk-24+16 > - Ensure that detected encode-and-store patterns are matched > - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion > - ... and 43 more: https://git.openjdk.org/jdk/compare/dede1992...14483b83 Updated RISC-V part of the change looks good to me. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2337279856 From rcastanedalo at openjdk.org Mon Sep 30 12:06:48 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 30 Sep 2024 12:06:48 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 11:51:02 GMT, Fei Yang wrote: > Updated RISC-V part of the change looks good to me. Thanks, Fei! ------------- PR Comment: https://git.openjdk.org/jdk/pull/19746#issuecomment-2382997964 From fbredberg at openjdk.org Mon Sep 30 12:26:42 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Mon, 30 Sep 2024 12:26:42 GMT Subject: RFR: 8320318: ObjectMonitor Responsible thread [v5] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 07:42:15 GMT, Fredrik Bredberg wrote: >> Removed the concept of an ObjectMonitor Responsible thread. >> >> The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. >> >> The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). >> >> After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. >> >> Passes tier1-tier7 on supported platforms. >> x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. >> Arm32 and Zero doesn't need any changes as far as I can tell. > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Update four, after the review Thanks all for good review comments and testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/19454#issuecomment-2383041196 From fbredberg at openjdk.org Mon Sep 30 12:31:45 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Mon, 30 Sep 2024 12:31:45 GMT Subject: Integrated: 8320318: ObjectMonitor Responsible thread In-Reply-To: References: Message-ID: On Wed, 29 May 2024 12:58:02 GMT, Fredrik Bredberg wrote: > Removed the concept of an ObjectMonitor Responsible thread. > > The reason to have an ObjectMonitor Responsible thread was to avoid threads getting stranded due to a hole in the successor protocol. This hole was there because adding the necessary memory barrier was considered too expensive some 20 years ago. > > The ObjectMonitor Responsible thread code adds complexity, and doing timed parks just to avoid getting stranded is not the way forward. More info about the problems with the ObjectMonitor responsible thread can be found in [JDK-8320318](https://bugs.openjdk.org/browse/JDK-8320318). > > After removing the ObjectMonitor Responsible thread we see increased performance on all supported platforms except Windows. [JDK-8339730](https://bugs.openjdk.org/browse/JDK-8339730) has been created to handle this. > > Passes tier1-tier7 on supported platforms. > x64, AArch64, Riscv64, ppc64le and s390x passes ok on the test/micro/org/openjdk/bench/vm/lang/LockUnlock.java test. > Arm32 and Zero doesn't need any changes as far as I can tell. This pull request has now been integrated. Changeset: 180affc5 Author: Fredrik Bredberg URL: https://git.openjdk.org/jdk/commit/180affc5718c9bf2f009d6a7aa129cc36335384a Stats: 722 lines in 14 files changed: 301 ins; 284 del; 137 mod 8320318: ObjectMonitor Responsible thread Reviewed-by: aboldtch, coleenp, pchilanomate, eosterlund ------------- PR: https://git.openjdk.org/jdk/pull/19454 From rcastanedalo at openjdk.org Mon Sep 30 12:40:54 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 30 Sep 2024 12:40:54 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Thu, 12 Sep 2024 13:20:14 GMT, Emanuel Peter wrote: > Indeed, I could re-enable all tests in: > > ``` > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java > test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java > ``` > > but unfortunately not those others: > > ``` > > > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java > > > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java > ``` > > I think the issue with all of them is that vectorization in those scenarios only works when the operations inside the loop start at an array index that addresses an element at 8-byte-aligned offset. > > I have filed https://bugs.openjdk.org/browse/JDK-8340010 to track it. @rkennke A test run of the current changeset in our internal CI system revealed that the following tests fail (because of missing vectorization) when using `-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders -XX:UseSSE=N` with `N <= 3` on an Intel Xeon Platinum 8358 machine: - test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java - test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java - test/hotspot/jtreg/compiler/vectorization/runner/LoopCombinedOpTest.java Here are the failure details: test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java: 1) Method "public static void compiler.c2.irTests.TestVectorizationNotRun.test(byte[],long[])" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java: 1) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteByte1(byte[],byte[])" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! 2) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteByte2(byte[],byte[])" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! 3) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteLong1(byte[],long[])" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! 4) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteLong2(byte[],long[])" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! 5) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteLong3(byte[],long[])" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! 6) Method "public static void compiler.c2.irTests.TestVectorizationMismatchedAccess.testByteLong5(byte[],long[],int,int)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 >= 1 [given] - No nodes matched! test/hotspot/jtreg/compiler/vectorization/runner/LoopCombinedOpTest.java: 1) Method "public int[] compiler.vectorization.runner.LoopCombinedOpTest.multipleOpsWith2DifferentTypesAndComplexExpression()" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"asimd", "true", "sse2", "true"}, counts={"_#STORE_VECTOR#_", ">0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! 2) Method "public int[] compiler.vectorization.runner.LoopCombinedOpTest.multipleOpsWith2DifferentTypesAndInvariant()" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"asimd", "true", "sse2", "true"}, counts={"_#STORE_VECTOR#_", ">0"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2383072505 From adinn at openjdk.org Mon Sep 30 13:30:42 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 30 Sep 2024 13:30:42 GMT Subject: RFR: 8293187: Store initialized Enum classes in AOTCache [v10] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 03:25:26 GMT, Ioi Lam wrote: >> This is the 4th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> **Problem:** >> >> This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store [`sun.invoke.util.Wrapper`](https://github.com/openjdk/jdk/blob/c3711dc90980fb3e63ff199612c201c4464626bf/src/java.base/share/classes/sun/invoke/util/Wrapper.java) enums in the AOT cache. Although CDS has some limited support for storing enums, the `Wrapper` type is too complex for the existing solution to handle. Please see the JBS issue for details. >> >> **Solution:** >> >> In the assembly phase, we store the initialized states of the `Wrapper` class (captured in a `java.lang.Class` object, a.k.a. the *mirror* of this class) into the AOT cache. >> >> In production run, we no longer execute `Wrapper::`, because all the static fields (contained in its mirror) are already initialized. >> >> **Review Notes:** >> >> - The new capability is controlled by `CDSConfig::is_initing_classes_at_dump_time()`. We can aot-initialize classes only if `-XX:+AOTClassLinking` is enabled. >> - The old (more limited) support for enums is still there (it's required when `AOTClassLinking` is disabled). See the call to `CDSEnumKlass::handle_enum_obj()` in heapShared.cpp. >> - `AOTClassInitializer::can_archive_initialized_mirror()` decides what classes can be aot-initialized. This is currently a very small set of classes, but will expand in [JDK-8293336](https://bugs.openjdk.org/browse/JDK-8293336) >> - Before, `HeapShared::archive_java_mirrors()` would clear out all the states in the archived mirrors. With this PR, the states of aot-initialized classes are preserved via `HeapShared::copy_aot_initialized_mirror()`. >> - During the early state of the production run, `AOTLinkedClassBulkLoader::init_required_classes_for_loader()` is called to make sure that: >> - all aot-initialized classes are moved into the `initialized` state (without executing its `` method). This is done in `InstanceKlass::initialize_from_cds()` >> - the classes of all the objects that are reachable from the aot-initialized mirrors are initialized. See comments above ` HeapShared::init_classes_reachable_from_archived_mirrors()` >> >> **Caveats:** >> >> Not all Enum classes can be stored in the initialized state. E.g., some Enums might have static fields that depend on the e... > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 109 commits: > > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - Merge branch 'jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver' into jep-483-step-03-8329706-implement-xx-aot-class-linking > - Merge branch 'jep-483-step-01-8338017-add-aot-command-line-aliases' into jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver > - Merge branch 'master' of https://github.com/openjdk/jdk into jep-483-step-01-8338017-add-aot-command-line-aliases > - 8340864: Remove unused lines related to vmClasses > > Reviewed-by: shade, kvn > - 8340831: Simplify simple validation for class definition in MethodHandles.Lookup > > Reviewed-by: redestad > - 8340838: Clean up MutableCallSite to use explicit release fence instead of AtomicInteger > > Reviewed-by: jrose, redestad, shade > - 8340956: ProblemList 4 java/nio/channels/DatagramChannel tests on macosx-all > > Reviewed-by: liach, alanb, darcy, dfuchs > - 8340228: Open source couple more miscellaneous AWT tests > > Reviewed-by: prr > - 8340684: Reading from an input stream backed by a closed ZipFile has no test coverage > > Reviewed-by: lancea > - ... and 99 more: https://git.openjdk.org/jdk/compare/6029b35f...563bccb3 I could not spot any issues. Just a few recommendations for clarifying comments and naming. src/hotspot/share/cds/aotLinkedClassBulkLoader.cpp line 253: > 251: } > 252: if (ik->has_aot_initialized_mirror()) { > 253: ik->initialize_from_cds(CHECK); Can we put a comment in here to explain that this call may link class ik and may link+initialize its supers and/or implemented interfaces but it will not initialize ik itself because we are relying on the mirror to provide static field data computed via an init run at assembly time when the archive was created. It really helps to underline that this is the point where we rely on init in a previous VM, bypassing (repeated) init in this VM. src/hotspot/share/cds/heapShared.cpp line 939: > 937: > 938: _run_time_subgraph_info_table.serialize_header(soc); > 939: soc->do_ptr(&_runtime_default_subgraph_info); It would help to have a comment here explaining that 1) before the do_ptr call the specific subgraph_info passed into this call holds all the classes that need to be initialized on behalf of java.lang.Object 2) after the call it is includes all the extra classes that need initializing on behalf of some archived java.lang.Class mirror. This would help to clarify why it was picked as the holder for these extra classes (i.e. it is the obvious root class from which to start running initializations). If field `_runtime_default_subgraph_info` was renamed to identify it as being the root subgraph associated with class java.lang.Object this would also be clearer (see related comment) src/hotspot/share/cds/heapShared.cpp line 1009: > 1007: // > 1008: // The set of classes that are required to be initialized for the archived > 1009: // java mirrors are recorded in _runtime_default_subgraph_info (which probably _runtime_default_subgraph_info is derived from _default_subgraph_info, the latter being the subgraph associated with class java.lang.Object. Perhaps using the prefix `root` or `jlobject` would make this clearer i.e. `_root_subgraph_info` and `_runtime_root_subgraph_info` or `_jlobject_subgraph_info` and `_runtime_jlobject_subgraph_info`. ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20958#pullrequestreview-2337368712 PR Review Comment: https://git.openjdk.org/jdk/pull/20958#discussion_r1781024490 PR Review Comment: https://git.openjdk.org/jdk/pull/20958#discussion_r1781115343 PR Review Comment: https://git.openjdk.org/jdk/pull/20958#discussion_r1781090464 From duke at openjdk.org Mon Sep 30 14:10:43 2024 From: duke at openjdk.org (duke) Date: Mon, 30 Sep 2024 14:10:43 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v6] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 10:49:21 GMT, Oli Gillespie wrote: >> Mitigate the impact of JVM_StartThread on safepoint synchronization, by adding a new ThreadStart_lock which limits the number of JVM_StartThread invocations competing for Threads_lock at any given time to 1. >> This gives a VM thread trying to call a safepoint a much better chance of acquiring Threads_lock when there are many JVM_StartThread invocations in flight, at the cost of one extra lock/unlock for every new thread. >> >> Can be disabled with new diagnostic flag `-XX:-UseExtraThreadStartLock`. >> >> Before (ThreadStartTtsp.java is shared in JDK-8340547): >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 1291591 ns >> Reaching safepoint: 59962 ns >> Reaching safepoint: 1958065 ns >> Reaching safepoint: 14456666258 ns <-- 14 seconds! >> ... >> >> >> After: >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 214269 ns >> Reaching safepoint: 60253 ns >> Reaching safepoint: 2040680 ns >> Reaching safepoint: 3089284 ns >> Reaching safepoint: 2998303 ns >> Reaching safepoint: 4433713 ns <-- 4.4ms >> Reaching safepoint: 3368436 ns >> Reaching safepoint: 2986519 ns >> Reaching safepoint: 3269102 ns >> ... >> >> >> >> **Alternatives** >> >> I considered some other options for mitigating this. For example, could we reduce the time spent holding the lock in StartThread? Most of the time is spent managing the threads list for ThreadSMR support, and each time we add a thread to that list we need to copy the whole list and free every entry in the original, which is slow. But I didn't see an easy way to avoid this. >> I also looked at some kind of signal from the VM thread that it is ready to start synchronizing that StartThread could check before trying to grab Threads_lock, but I didn't find anything better than this extra lock. > > Oli Gillespie has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into threadlock-ttsp > - PR feedback > - Improve doc > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > - Improve comment > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > - Also address Thread::exit > - Fix lock ranking > - Fix build and address comments > - Remove unused code > - Improve ttsp while creating threads @olivergillespie Your change (at version 26f7cdac86237d8b080258d1c29fe944b179113e) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21111#issuecomment-2383316008 From shade at openjdk.org Mon Sep 30 14:45:37 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 30 Sep 2024 14:45:37 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v3] In-Reply-To: References: <0Dr860QgmZaGHq1QGgz5bqKLpiwVSZL-lDOV1JNjkdk=.1c09c464-e9cd-4f66-88c1-2b97e3a9f7ce@github.com> Message-ID: On Wed, 25 Sep 2024 20:30:28 GMT, Dean Long wrote: >>> If JVM_StartThread is only called by Thread.start0, then how about putting the new lock in Java instead? >> >> What benefit do you see of that? One downside is that the lock will be coarser than necessary. I'd rather keep the lock as tightly scoped as possible. > >> > If JVM_StartThread is only called by Thread.start0, then how about putting the new lock in Java instead? >> >> What benefit do you see of that? One downside is that the lock will be coarser than necessary. I'd rather keep the lock as tightly scoped as possible. > > I just thought it would be simpler, but I see your point. A coarser lock will serialize more of the native path. I think @dean-long and @merykitty might need to approve or indicate they do not have other comments without the formal review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21111#issuecomment-2383408983 From qamai at openjdk.org Mon Sep 30 14:58:41 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 30 Sep 2024 14:58:41 GMT Subject: RFR: 8340547: Starting many threads can delay safepoints [v6] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 10:49:21 GMT, Oli Gillespie wrote: >> Mitigate the impact of JVM_StartThread on safepoint synchronization, by adding a new ThreadStart_lock which limits the number of JVM_StartThread invocations competing for Threads_lock at any given time to 1. >> This gives a VM thread trying to call a safepoint a much better chance of acquiring Threads_lock when there are many JVM_StartThread invocations in flight, at the cost of one extra lock/unlock for every new thread. >> >> Can be disabled with new diagnostic flag `-XX:-UseExtraThreadStartLock`. >> >> Before (ThreadStartTtsp.java is shared in JDK-8340547): >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 1291591 ns >> Reaching safepoint: 59962 ns >> Reaching safepoint: 1958065 ns >> Reaching safepoint: 14456666258 ns <-- 14 seconds! >> ... >> >> >> After: >> >> java -Xlog:safepoint ThreadStartTtsp.java | grep -o 'Reaching safepoint: [0-9]* ns' >> Reaching safepoint: 214269 ns >> Reaching safepoint: 60253 ns >> Reaching safepoint: 2040680 ns >> Reaching safepoint: 3089284 ns >> Reaching safepoint: 2998303 ns >> Reaching safepoint: 4433713 ns <-- 4.4ms >> Reaching safepoint: 3368436 ns >> Reaching safepoint: 2986519 ns >> Reaching safepoint: 3269102 ns >> ... >> >> >> >> **Alternatives** >> >> I considered some other options for mitigating this. For example, could we reduce the time spent holding the lock in StartThread? Most of the time is spent managing the threads list for ThreadSMR support, and each time we add a thread to that list we need to copy the whole list and free every entry in the original, which is slow. But I didn't see an easy way to avoid this. >> I also looked at some kind of signal from the VM thread that it is ready to start synchronizing that StartThread could check before trying to grab Threads_lock, but I didn't find anything better than this extra lock. > > Oli Gillespie has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: > > - Merge remote-tracking branch 'origin/master' into threadlock-ttsp > - PR feedback > - Improve doc > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > - Improve comment > > Co-authored-by: David Holmes <62092539+dholmes-ora at users.noreply.github.com> > - Also address Thread::exit > - Fix lock ranking > - Fix build and address comments > - Remove unused code > - Improve ttsp while creating threads I think the patch looks good. ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/21111#pullrequestreview-2337831101 From rsunderbabu at openjdk.org Mon Sep 30 15:01:45 2024 From: rsunderbabu at openjdk.org (Ramkumar Sunderbabu) Date: Mon, 30 Sep 2024 15:01:45 GMT Subject: RFR: 8334305: Remove all code for nsk.share.Log verbose mode Message-ID: Cleaning up nsk.share.Log code after the verbose mode was set always true. Tested all the vmTestbase/ tests. ------------- Commit messages: - 8334305: Remove all code for nsk.share.Log verbose mode Changes: https://git.openjdk.org/jdk/pull/21267/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21267&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334305 Stats: 127 lines in 13 files changed: 0 ins; 98 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/21267.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21267/head:pull/21267 PR: https://git.openjdk.org/jdk/pull/21267 From lmesnik at openjdk.org Mon Sep 30 15:10:36 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 30 Sep 2024 15:10:36 GMT Subject: RFR: 8334305: Remove all code for nsk.share.Log verbose mode In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 14:55:33 GMT, Ramkumar Sunderbabu wrote: > Cleaning up nsk.share.Log code after the verbose mode was set always true. > > Tested all the vmTestbase/ tests. Marked as reviewed by lmesnik (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21267#pullrequestreview-2337864195 From eosterlund at openjdk.org Mon Sep 30 15:11:40 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 30 Sep 2024 15:11:40 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v3] In-Reply-To: References: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> Message-ID: On Fri, 27 Sep 2024 18:57:51 GMT, Aleksey Shipilev wrote: > > Is ZGC affected by this? I see only G1 and Shenandoah changes. > > Good question. > > ZGC expands the GC barriers late. This is why the IR test configuration that tests ZGC shows the same result as with other collectors: no additional fluff in IR. I would not expect we need anything else in late expansion for ZGC for Reference.clear, but maybe I am tired and cannot see it. Can you confirm this is fine, @fisk? ZGC needs some changes. Without doing anything, we propagate the AS_NO_KEEPALIVE decorator to the corresponding ZBarrierNoKeepalive bit being set in the barrier data of the StorePNode. However, we don't really do anything special with that information and we will in practice end up keeping the referent alive when clearing it with generational ZGC. The point of introducing the native implementation in the first place, was to make sure our GCs don't keep the referent alive when clearing it, as the user intention is clearly to not keep it alive. I think we need a new ZBarrierSetRuntime::no_keepalive_store_barrier_on_oop_field_without_healing(oop* p) and to make that the selected slow path function when ZBarrierNoKeepalive is set on a StorePNode. Its implementation would call ZBarrier::no_keep_alive_store_barrier_on_heap_oop_field. This should do the trick. Please let me know if you need further assistance. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2383479242 From coleen.phillimore at oracle.com Mon Sep 30 15:36:58 2024 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 30 Sep 2024 11:36:58 -0400 Subject: 8340363: Tag-specific default decorators for UnifiedLogging In-Reply-To: References: Message-ID: <2aeb5df5-5998-4b4e-8d31-fa05cefbb0f0@oracle.com> Thank you for this limited but creative solution to the problem that we don't want decorators on some logging tag combinations.? This would be useful for some runtime tracing that we didn't convert to UL because we don't want decorators, e.g. -XX:+PrintInterpreter. I like it a lot. Coleen On 9/30/24 5:32 AM, Anton Seoane Ampudia wrote: > > Yes. I agree on the original idea being more flexible, but it has the > risk of defaults being not completely agreed on and ?infecting? some > people?s expected output. The (only) use case that has driven this has > been the ?no decorators? default, which is what I am tentatively > restricting the original idea to. > > As Roberto Casta?eda mentioned before, yes, this would be equivalent > to -Xlog:tags::none, but a bit more ergonomic and convenient > considering that with the upcoming compiler migration to UL many of > these undecorated output cases will appear. > > Ant?n > > *From: *hotspot-dev on behalf of David > Holmes > *Date: *Monday, 30 September 2024 at 02:39 > *To: *hotspot-dev at openjdk.org > *Subject: *Re: 8340363: Tag-specific default decorators for UnifiedLogging > > Hi Anton, > > Thanks for bringing this up for general discussion outside the PR. > > Just to be clear for other readers, decorators are associated with a > given log output device. > > On 27/09/2024 7:07 pm, Anton Seoane Ampudia wrote: > > Hi all, > > > > Currently, the Unified Logging framework defaults to three decorators > > (uptime, level, tags) whenever the user does not specify otherwise > > through?-Xlog.?This resultssometimes inconvenient when specific users > > with some predefined needs do not want those tags. For example, C2 > > developers would rather not see those defaults in cases such as > > jit+inlining, but also do not want to specify so every time they run > -Xlog. > > > > One solution for this is found in this PR: > https://github.com/openjdk/ > > jdk/pull/20988 >. It can be > > considered as a ?flavoured? version of the existing default decorators > > and in no way it will override anything user-specified. Also, > decorators > > will still be consistent throughout an output device (i.e., no > different > > decorators ?mixed in?). > > > > However, upon recent talks with different teams this approach may be > too > > flexible/powerful. The ability of specifying LogSelection-bound default > > decorators may result in a situation where defaults for A+B and C+D > have > > been specified, and a user selects -Xlog:A+B,C+D. In that case, the > > union of the prespecified defaults is taken, which may not be what the > > end user wants (and might result in too many decorators). > > > > Actually, the main use case for this that I know as of now is C2 > > developers and the wish to not see decorators for some defined log > > selections. With this in mind, I have reduced the original idea to a > > feature where only the default decorators are not shown if we get a > > positive match with a prespecified list throughout the entire user log > > selection list (i.e.: > > > >?? * If there is a default for A+B and the user specifies -Xlog:A+B,C+D, > >???? he will still get the default decorators > >?? * If there is a default for A+B and the user specifies -Xlog:A+B, no > >???? default decorators will be supplied). > > So to be clear, is the proposal now to just drop the default decorators, > rather than allowing them to be replaced with alternate defaults? If > that is the case then it is the same as writing: > > -Xlog:A+B::none > > and I don't really see much value in that. But I wouldn't oppose it. > > Allowing new defaults gives more flexibility - but obviously the > developers using the specific tag combinations have to agree on what > defaults to set. > > Thanks, > David > ----- > > > Before scraping the original idea and moving on with this one (which > > will not change anything as it is right now, except for the really > > specific uses like C2 jit+inlining that may be decided), *I wanted to > > get a broader idea of people?s opinions on this, as well as other use > > cases for this behaviour.* > > > > ** > > > > Many thanks, > > > > Ant?n > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From svkamath at openjdk.org Mon Sep 30 15:37:06 2024 From: svkamath at openjdk.org (Smita Kamath) Date: Mon, 30 Sep 2024 15:37:06 GMT Subject: RFR: 8341052: SHA-512 implementation using SHA-NI Message-ID: 8341052: SHA-512 implementation using SHA-NI ------------- Commit messages: - Updated AMD64.java - Merge master - SHA-512 implementation using SHA-NI instructions Changes: https://git.openjdk.org/jdk/pull/20633/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20633&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341052 Stats: 267 lines in 10 files changed: 250 ins; 11 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/20633.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20633/head:pull/20633 PR: https://git.openjdk.org/jdk/pull/20633 From kirk at kodewerk.com Mon Sep 30 16:07:29 2024 From: kirk at kodewerk.com (Kirk Pepperdine) Date: Mon, 30 Sep 2024 09:07:29 -0700 Subject: 8340363: Tag-specific default decorators for UnifiedLogging In-Reply-To: References: Message-ID: <70804A3C-13CE-4F83-861B-BA280C7EEB47@kodewerk.com> > > So to be clear, is the proposal now to just drop the default decorators, > rather than allowing them to be replaced with alternate defaults? If > that is the case then it is the same as writing: > > -Xlog:A+B::none > > and I don't really see much value in that. But I wouldn't oppose it. > I would. At a minimum you need a timestamp. > > Allowing new defaults gives more flexibility - but obviously the > developers using the specific tag combinations have to agree on what > defaults to set. > At the moment to get a reasonable GC log for G1, one needs to use the following setting. -Xlog:gc*,gc+ref=debug,gc+phases=debug,gc+age=debug,safepoint:file=gc.log Also, the ?noise to signal? ratio in these logs is exceptionally high. Kind regards, Kirk -------------- next part -------------- An HTML attachment was scrubbed... URL: From shade at openjdk.org Mon Sep 30 16:36:20 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 30 Sep 2024 16:36:20 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v4] In-Reply-To: References: Message-ID: <836Da9CyLC9qQtoFe9YVHau-ftjmXFr87Xw2wy2DIxc=.19870ac5-3c52-4632-8093-ea47938642de@github.com> > [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. > > We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Attempt at implementing ZGC AArch64 parts - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear - Amend the test case for guaranteing it works under different compilation regimes - More precise barriers - Tests work - More touchups - Fixing the conditions, fixing the tests - Crude prototype, still failing the tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20139/files - new: https://git.openjdk.org/jdk/pull/20139/files/437f2329..cba0a8e9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=02-03 Stats: 258786 lines in 3084 files changed: 211178 ins; 30411 del; 17197 mod Patch: https://git.openjdk.org/jdk/pull/20139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20139/head:pull/20139 PR: https://git.openjdk.org/jdk/pull/20139 From shade at openjdk.org Mon Sep 30 16:36:20 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 30 Sep 2024 16:36:20 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v3] In-Reply-To: References: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> Message-ID: On Mon, 30 Sep 2024 15:08:53 GMT, Erik ?sterlund wrote: > I think we need a new ZBarrierSetRuntime::no_keepalive_store_barrier_on_oop_field_without_healing(oop* p) and to make that the selected slow path function when ZBarrierNoKeepalive is set on a StorePNode. Its implementation would call ZBarrier::no_keep_alive_store_barrier_on_heap_oop_field. This should do the trick. Thanks! See new commits: is that the shape you were thinking of? Once we get AArch64 parts right, I'll copy-paste that to other arches. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2383669579 From shade at openjdk.org Mon Sep 30 16:50:13 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 30 Sep 2024 16:50:13 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v3] In-Reply-To: References: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> Message-ID: On Fri, 27 Sep 2024 23:51:13 GMT, Kim Barrett wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Amend the test case for guaranteing it works under different compilation regimes > > src/java.base/share/classes/java/lang/ref/Reference.java line 420: > >> 418: /* Implementation of clear(), also used by enqueue(). A simple >> 419: * assignment of the referent field won't do for some garbage >> 420: * collectors. > > Description of clear0 is rendered stale by this change. The first sentence is no longer true, since it's now > clearImpl that has that role. The second sentence probably ought to also be moved into the description of > clearImpl. Thanks! I tightened up comments a bit, take another look? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20139#discussion_r1781452602 From shade at openjdk.org Mon Sep 30 16:50:13 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 30 Sep 2024 16:50:13 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v5] In-Reply-To: References: Message-ID: <4wz9wweAYNZCLw5c5fpldlEBgVyCh9io9iUih_hFVKM=.2d4a7679-0652-4f2c-8304-99bd84367519@github.com> > [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. > > We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. > > Additional testing: > - [ ] Linux x86_64 server fastdebug, `all` > - [ ] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: - Fix other arches - Tighten up comments in Reference javadoc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20139/files - new: https://git.openjdk.org/jdk/pull/20139/files/cba0a8e9..8ba681a4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=03-04 Stats: 16 lines in 4 files changed: 7 ins; 4 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/20139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20139/head:pull/20139 PR: https://git.openjdk.org/jdk/pull/20139 From shade at openjdk.org Mon Sep 30 16:59:16 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 30 Sep 2024 16:59:16 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v6] In-Reply-To: References: Message-ID: > [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. > > We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. > > Additional testing: > - [ ] Linux x86_64 server fastdebug, `all` > - [ ] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Also dispatch to slow-path on other arches ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20139/files - new: https://git.openjdk.org/jdk/pull/20139/files/8ba681a4..4fe4a911 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=04-05 Stats: 6 lines in 3 files changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20139/head:pull/20139 PR: https://git.openjdk.org/jdk/pull/20139 From kvn at openjdk.org Mon Sep 30 16:59:59 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 30 Sep 2024 16:59:59 GMT Subject: RFR: 8334060: Implementation of Late Barrier Expansion for G1 [v27] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 05:02:12 GMT, Roberto Casta?eda Lozano wrote: >> This changeset implements JEP 475 (Late Barrier Expansion for G1), including support for the x64 and aarch64 platforms. See the [JEP description](https://openjdk.org/jeps/475) for further detail. >> >> We aim to integrate this work in JDK 24. The purpose of this pull request is double-fold: >> >> - to allow maintainers of the arm (32-bit), ppc, riscv, s390, and x86 (32-bit) ports to contribute a port of these platforms in time for JDK 24; and >> - to allow reviewers to review the platform-independent, x64 and aarch64, and test changes in parallel with the porting work. >> >> ## Summary of the Changes >> >> ### Platform-Independent Changes (`src/hotspot/share`) >> >> These consist mainly of: >> >> - a complete rewrite of `G1BarrierSetC2`, to instruct C2 to expand G1 barriers late instead of early; >> - a few minor changes to C2 itself, to support removal of redundant decompression operations and to address an OopMap construction issue triggered by this JEP's increased usage of ADL `TEMP` operands; and >> - temporary support for porting the JEP to the remaining platforms. >> >> The temporary support code (guarded by the pre-processor flag `G1_LATE_BARRIER_MIGRATION_SUPPORT`) will **not** be part of the final pull request, and hence does not need to be reviewed. >> >> ### Platform-Dependent Changes (`src/hotspot/cpu`) >> >> These include changes to the ADL instruction definitions and the `G1BarrierSetAssembler` class of the x64 and aarch64 platforms. >> >> #### ADL Changes >> >> The changeset uses ADL predicates to force C2 to implement memory accesses tagged with barrier information using G1-specific, barrier-aware instruction versions (e.g. `g1StoreP` instead of the GC-agnostic `storeP`). These new instruction versions generate machine code accordingly to the corresponding tagged barrier information, relying on the G1 barrier implementations provided by the `G1BarrierSetAssembler` class. In the aarch64 platform, the bulk of the ADL code is generated from a higher-level version using m4, to reduce redundancy. >> >> #### `G1BarrierSetAssembler` Changes >> >> Both platforms basically reuse the barrier implementation for the bytecode interpreter, with the different barrier tests and operations refactored into dedicated functions. Besides this, `G1BarrierSetAssembler` is extended with assembly-stub routines that implement the out-of-line, slow path of the barriers. These routines include calls from the barrier into the JVM, which require support for saving and restoring live ... > > Roberto Casta?eda Lozano has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 53 additional commits since the last revision: > > - Merge remote-tracking branch 'feilongjiang/JEP-475-RISC-V' into JDK-8334060-g1-late-barrier-expansion > - riscv port refactor > - Remove temporary support code > - Merge jdk-24+17 > - Relax 'must match' assertion in ppc's g1StoreN after limiting pinning bypass optimization > - Remove unnecessary reg-to-reg moves in aarch64's g1CompareAndX instructions > - Reintroduce matcher assert and limit pinning bypass optimization to non-shared EncodeP nodes > - Merge jdk-24+16 > - Ensure that detected encode-and-store patterns are matched > - Merge remote-tracking branch 'snazarkin/arm32-JDK-8334060-g1-late-barrier-expansion' into JDK-8334060-g1-late-barrier-expansion > - ... and 43 more: https://git.openjdk.org/jdk/compare/ae84aa47...14483b83 Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19746#pullrequestreview-2338111198 From shade at openjdk.org Mon Sep 30 17:11:57 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 30 Sep 2024 17:11:57 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v3] In-Reply-To: References: Message-ID: > See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. > > In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. > > Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). > > I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` > - [x] GHA to test platform buildability + adhoc platform cross-compilation Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into JDK-8338379-class-init-checks - Pick up PPC64 patch from Martin - Relax to just a release - Initial version ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21110/files - new: https://git.openjdk.org/jdk/pull/21110/files/179d8aa1..a7895d94 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21110&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21110&range=01-02 Stats: 20318 lines in 465 files changed: 14509 ins; 3375 del; 2434 mod Patch: https://git.openjdk.org/jdk/pull/21110.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21110/head:pull/21110 PR: https://git.openjdk.org/jdk/pull/21110 From shade at openjdk.org Mon Sep 30 17:11:57 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 30 Sep 2024 17:11:57 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v2] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 09:30:23 GMT, Martin Doerr wrote: > The more frequently used parts are in platform specific code, so it might make sense to optimize the PPC64 parts. Also note that the "isync trick" is a faster acquire barrier than "lwsync". What do you think about this? I don't mind, and what you say as maintainer of PPC64 code goes :) I merged the patch in this PR, thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2383736192 From rkennke at openjdk.org Mon Sep 30 17:50:54 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 30 Sep 2024 17:50:54 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> Message-ID: On Fri, 27 Sep 2024 16:23:15 GMT, Roman Kennke wrote: >> I believe the code in the patch is good enough as-is, especially if `UseCompactObjectHeaders` is slated to go away. The existing `if` will prevent the < 16 byte header code from being emitted, which is the desired behavior - i.e., if the header size is >= 16, there will be no code emitted to the intrinsic for that block. So there will not be an additional branch for the code when it is executed. >> >> I'm good with a comment tying `UseCompactObjectHeaders` to the condition. The comment can be removed when the flag is removed. "Ship it" :-) > > Wait a second, I've probably not been clear. `UseCompactObjectHeaders` is slated to become *on by default* and then slated to go away. That means that array base offets <= 16 bytes will become the default. The generated code will be something like: > > > if (haystack_len <= 8) { > // Copy 8 bytes onto stack > } else if (haystack_len <= 16) { > // Copy 16 bytes onto stack > } else { > // Copy 32 bytes onto stack > } > > > So that is 2 branches in this prologue code instead of originally 1. > > However, I just noticed that what I proposed is not enough. Consider what happens when haystack_len is 17. This would take the last case and copy 32 bytes. But we only have 17+8=25 bytes that we can guarantee to be available for copying. If this happens to be the array at the very beginning of the heap (very rare/unlikely), this would segfault. > > I think I need to mull over it some more to come up with a correct fix. I changed the header<16 version to be a small loop: https://github.com/rkennke/jdk/commit/bcba264ea5c15581647933db1163ca1dae39b6c5 The idea is the same as before, except it's made as a small loop with a maximum of 4 iterations (backward-branches), and it copies 8 bytes at a time, such that 1. it may copy up to 7 bytes that precede the array and 2. doesn't run over the end of the array (which would potentially crash). I am not sure if using XMM_TMP1 and XMM_TMP2 there is ok, or if it would encode better to use one of the regular registers.? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1781535745 From mdoerr at openjdk.org Mon Sep 30 20:52:37 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 30 Sep 2024 20:52:37 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v3] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 17:11:57 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into JDK-8338379-class-init-checks > - Pick up PPC64 patch from Martin > - Relax to just a release > - Initial version Marked as reviewed by mdoerr (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21110#pullrequestreview-2338538916 From sviswanathan at openjdk.org Mon Sep 30 21:02:42 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 30 Sep 2024 21:02:42 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Tue, 24 Sep 2024 07:10:24 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Handling NPOT vector length for AArch64 SVE with vector sizes varying b/w 128 and 2048 bits at 128 bit increments. src/hotspot/share/opto/vectorIntrinsics.cpp line 2698: > 2696: int cast_vopc = VectorCastNode::opcode(-1, elem_bt, true); > 2697: if (is_floating_point_type(elem_bt)) { > 2698: index_elem_bt = elem_bt == T_FLOAT ? T_INT : T_LONG; index_elem_bt is already assigned at line 2676 and 2678 so this line could be removed. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 551: > 549: return ((ByteVector)src1).vectorFactory(res); > 550: } > 551: This could instead be: src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2); Or even simplified to: src1.rearrange(this.toShuffle(), src2); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1777839817 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1779722306 From cjplummer at openjdk.org Mon Sep 30 21:15:44 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Mon, 30 Sep 2024 21:15:44 GMT Subject: RFR: 8334305: Remove all code for nsk.share.Log verbose mode In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 14:55:33 GMT, Ramkumar Sunderbabu wrote: > Cleaning up nsk.share.Log code after the verbose mode was set always true. > > Tested all the vmTestbase/ tests. test/hotspot/jtreg/vmTestbase/nsk/share/Log.java line 264: > 262: > 263: /** > 264: * Print message to the assigned output stream, Need replace the comma with a period. test/hotspot/jtreg/vmTestbase/nsk/share/Log.java line 287: > 285: if (!verbose()) { > 286: doPrint(message); > 287: } Is this method ever called? Is there a CR to remove it (and any references to it)? test/hotspot/jtreg/vmTestbase/nsk/share/Log.java line 340: > 338: > 339: /** > 340: * Redirect log to the given stream Need period at end of sentence. test/hotspot/jtreg/vmTestbase/nsk/share/Log.java line 342: > 340: * Redirect log to the given stream > 341: * Prints errors summary to current stream, cancel current stream > 342: * and switches to new stream. Does it really do all this? It looks to me like it just switches to the new stream. I'm not sure what is meant by "error summary" and cancelling. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21267#discussion_r1781735358 PR Review Comment: https://git.openjdk.org/jdk/pull/21267#discussion_r1781735837 PR Review Comment: https://git.openjdk.org/jdk/pull/21267#discussion_r1781737610 PR Review Comment: https://git.openjdk.org/jdk/pull/21267#discussion_r1781738427 From psandoz at openjdk.org Mon Sep 30 21:30:42 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 30 Sep 2024 21:30:42 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Sat, 28 Sep 2024 17:37:10 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Handling NPOT vector length for AArch64 SVE with vector sizes varying b/w 128 and 2048 bits at 128 bit increments. > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 551: > >> 549: return ((ByteVector)src1).vectorFactory(res); >> 550: } >> 551: > > This could instead be: > src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2); > Or even simplified to: > src1.rearrange(this.toShuffle(), src2); I think you have to do the masking before conversion - `vec.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle()` is not the same as `vec.toShuffle()` for all inputs. jshell> IntVector indexes = IntVector.fromArray(IntVector.SPECIES_256, new int[] {0, 1, 8, 9, 16, 17, 24, 25}, 0); indexes ==> [0, 1, 8, 9, 16, 17, 24, 25] jshell> indexes.lanewise(VectorOperators.AND, indexes.length() * 2 - 1) $19 ==> [0, 1, 8, 9, 0, 1, 8, 9] jshell> indexes.lanewise(VectorOperators.AND, indexes.length() * 2 - 1).toShuffle() $20 ==> Shuffle[0, 1, -8, -7, 0, 1, -8, -7] jshell> indexes.toShuffle() $21 ==> Shuffle[0, 1, -8, -7, -8, -7, -8, -7] ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1781753872 From wkemper at openjdk.org Mon Sep 30 22:11:27 2024 From: wkemper at openjdk.org (William Kemper) Date: Mon, 30 Sep 2024 22:11:27 GMT Subject: RFR: 8337511: Implement JEP-404: Generational Shenandoah (Experimental) Message-ID: <8N7AiGx8AZc-d6MgBEKVw5R-qk8J_1FBZH-SbzmydGg=.d7ac9a04-5fa3-47e3-8d24-c8efd28a51f7@github.com> This PR merges JEP 404, a generational mode for the Shenandoah garbage collector. The JEP can be viewed here: https://openjdk.org/jeps/404. We would like to target JDK24 with this PR. ------------- Commit messages: - 8337511: Implement JEP-404: Generational Shenandoah (Experimental) Changes: https://git.openjdk.org/jdk/pull/21273/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21273&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337511 Stats: 22576 lines in 229 files changed: 20937 ins; 807 del; 832 mod Patch: https://git.openjdk.org/jdk/pull/21273.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21273/head:pull/21273 PR: https://git.openjdk.org/jdk/pull/21273 From sviswanathan at openjdk.org Mon Sep 30 22:42:41 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 30 Sep 2024 22:42:41 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Tue, 24 Sep 2024 07:10:24 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Handling NPOT vector length for AArch64 SVE with vector sizes varying b/w 128 and 2048 bits at 128 bit increments. src/hotspot/cpu/x86/x86.ad line 10490: > 10488: %{ > 10489: match(Set index (SelectFromTwoVector (Binary index src1) src2)); > 10490: effect(TEMP index); Just curious, why do we need TEMP index effect? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1781742786 From sviswanathan at openjdk.org Mon Sep 30 22:42:42 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 30 Sep 2024 22:42:42 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Mon, 30 Sep 2024 21:28:22 GMT, Paul Sandoz wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ByteVector.java line 551: >> >>> 549: return ((ByteVector)src1).vectorFactory(res); >>> 550: } >>> 551: >> >> This could instead be: >> src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2); >> Or even simplified to: >> src1.rearrange(this.toShuffle(), src2); > > I think you have to do the masking before conversion - `vec.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle()` is not the same as `vec.toShuffle()` for all inputs. > > > jshell> IntVector indexes = IntVector.fromArray(IntVector.SPECIES_256, new int[] {0, 1, 8, 9, 16, 17, 24, 25}, 0); > indexes ==> [0, 1, 8, 9, 16, 17, 24, 25] > > jshell> indexes.lanewise(VectorOperators.AND, indexes.length() * 2 - 1) > $19 ==> [0, 1, 8, 9, 0, 1, 8, 9] > > jshell> indexes.lanewise(VectorOperators.AND, indexes.length() * 2 - 1).toShuffle() > $20 ==> Shuffle[0, 1, -8, -7, 0, 1, -8, -7] > > jshell> indexes.toShuffle() > $21 ==> Shuffle[0, 1, -8, -7, -8, -7, -8, -7] Thanks for the example. Yes, you have a point there. So we would need to do: src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1781859166 From asmehra at openjdk.org Mon Sep 30 22:49:36 2024 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Mon, 30 Sep 2024 22:49:36 GMT Subject: RFR: 8293337: Store method handle intrinsics in AOT cache [v6] In-Reply-To: References: <4fMDUSZRg0HcIiZmr-yqr7vleFXrD_zNXpdd_pfgHQ8=.4a679e86-b0f9-4101-bcc2-f49d8bcb417b@github.com> Message-ID: On Tue, 24 Sep 2024 00:52:52 GMT, Ioi Lam wrote: >> This is the 5th PR for [JEP 483: Ahead-of-Time Class Loading & Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> This PR is necessary to support [JDK-8293336: AOT-linking of invokedynamic for lambda expression and string concat](https://bugs.openjdk.org/browse/JDK-8293336), which needs to store Java heap objects that have native pointers to the C++ `Method` objects returned by `SystemDictionary::find_method_handle_intrinsic()` >> >> These `Method` objects are created within the JVM. They do not belong to any actual Java classes. We store all these `Method` objects into the AOT cache, so that they can be referenced by other artifacts in the AOT cache. >> >> --- >> See [here](https://bugs.openjdk.org/browse/JDK-8315737) for the sequence of dependent RFEs for implementing JEP 483. > > Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 294 additional commits since the last revision: > > - Merge branch 'jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap' into jep-483-step-05-8293337-archive-method-handle-intrinsics > - Merge branch 'jep-483-step-03-8329706-implement-xx-aot-class-linking' of /jdk3/yak/open into jep-483-step-04-8293187-support-sun-invoke-util-wrapper-in-cds-archive-heap > - Merge branch 'jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver' into jep-483-step-03-8329706-implement-xx-aot-class-linking > - Merge branch 'jep-483-step-01-8338017-add-aot-command-line-aliases' into jep-483-step-02-8338018-rename-class-prelinker-to-aot-cp-resolver > - Merge branch 'master' into jep-483-step-01-8338017-add-aot-command-line-aliases > - 8339895: Open source several AWT focus tests - series 3 > > Reviewed-by: prr > - 8339192: Native annotation parsing code of deprecated annotations causes crash > > Reviewed-by: jrose, mgronlun > - 8340480: Bad copyright notices in changes from JDK-8339902 > > Reviewed-by: kcr, bpb, kizune > - 8340353: Remove CompressedOops::ptrs_base > > Reviewed-by: stefank, coleenp, shade, mli > - 8339902: Open source couple TextField related tests > > Reviewed-by: honkar > - ... and 284 more: https://git.openjdk.org/jdk/compare/dd5c7592...59dd8879 lgtm! ------------- Marked as reviewed by asmehra (Committer). PR Review: https://git.openjdk.org/jdk/pull/20959#pullrequestreview-2338744323 From sviswanathan at openjdk.org Mon Sep 30 23:17:43 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 30 Sep 2024 23:17:43 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Tue, 24 Sep 2024 07:10:24 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Handling NPOT vector length for AArch64 SVE with vector sizes varying b/w 128 and 2048 bits at 128 bit increments. src/hotspot/share/opto/vectorIntrinsics.cpp line 2689: > 2687: !arch_supports_vector(cast_vopc, num_elem, T_BYTE, VecMaskNotUsed) || > 2688: !arch_supports_vector(Op_VectorLoadShuffle, num_elem, index_elem_bt, VecMaskNotUsed) || > 2689: !arch_supports_vector(Op_Replicate, num_elem, T_BYTE, VecMaskNotUsed)) { Where SelectFromTwoVector is not supported, the alternate implementation is as part of SelectFromTwoVectorNode::Ideal() instead of right here. A comment both here as well as in the Ideal() implementation is needed to keep these checks in sync. src/hotspot/share/opto/vectornode.cpp line 2120: > 2118: // are held in a byte vector which are later transformed to target specific permutation > 2119: // index format by subsequent VectorLoadShuffle. > 2120: int cast_vopc = VectorCastNode::opcode(0, index_elem_bt, true); Good to use -1 when we are not sending an actual opcode: int cast_vopc = VectorCastNode::opcode(-1, index_elem_bt, true); src/hotspot/share/opto/vectornode.cpp line 2126: > 2124: Node* bcast_lane_cnt_m1_vec = phase->transform(VectorNode::scalar2vector(lane_cnt_m1, num_elem, Type::get_const_basic_type(T_BYTE), false)); > 2125: > 2126: // Compute the blend mask for merging two indipendently permututed vectors Typo indipendently -> independently ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1781867326 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1781873682 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1781888912 From sviswanathan at openjdk.org Mon Sep 30 23:17:43 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 30 Sep 2024 23:17:43 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: <6NYy2NcP98xm3QRYdBWaAkkrvTdquMhhWnm-svxQjwE=.955f6dc8-c74c-472b-8c32-10228bb68d99@github.com> On Mon, 30 Sep 2024 22:51:57 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Handling NPOT vector length for AArch64 SVE with vector sizes varying b/w 128 and 2048 bits at 128 bit increments. > > src/hotspot/share/opto/vectorIntrinsics.cpp line 2689: > >> 2687: !arch_supports_vector(cast_vopc, num_elem, T_BYTE, VecMaskNotUsed) || >> 2688: !arch_supports_vector(Op_VectorLoadShuffle, num_elem, index_elem_bt, VecMaskNotUsed) || >> 2689: !arch_supports_vector(Op_Replicate, num_elem, T_BYTE, VecMaskNotUsed)) { > > Where SelectFromTwoVector is not supported, the alternate implementation is as part of SelectFromTwoVectorNode::Ideal() instead of right here. A comment both here as well as in the Ideal() implementation is needed to keep these checks in sync. We need to add VectorMaskCast here in the checks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1781886783